Rob E wrote:
runner felice wrote:2014 ODDyssey Half Marathon 12 weeks out from Lehigh Marathon
Mile Split Chip Time Gun Time Age Age Place Gender Gender Place Pace City State
241 MIKE ROSSI 328 01:47:03 01:47:08 12 M 179 PACE 08:10
For other research I've created a database of people's race times. Using this database which tracks peoples race results, I did a quick analysis of the marathon times after someone had run between a 1:20 and 2:00 half marathon. I limited the data to only include people who raced the half no more than 90 days before the marathon. I chose this range because its close to the time between Mike's race at ODDyssey Half and the Lehigh Valley Marathon.
The results of my statistical analysis is that a runner's marathon time is usually their half marathon time multiplied by 2.0477 plus 0.2319 hrs (14 minutes).
For Mike Rossi this means his predicted marathon time would be:
= 2.0477 x 1.78333 hrs + .02319 hrs
= 3.88 hrs or a 3:52.
What is more interesting is that this statistical analysis also says that there is 99% chance that his forecasted time would fall between 3:24 and 4:21. Obviously this is a huge forecasted range, which is partly because I wanted to limit the sample to runners with times similar to Mike's and also where the half marathon was only a couple months before the full marathon. And as the sample size decreases the forecast range is likely to increase.
Regardless, the 3:11 time falls outside of this range. Statistically speaking, based on this relatively small sample of data (100 half marathon times between 1:20 and 2:00 hrs) there is only about a 1% chance that Mike could have run a 3:11 marathon after running a 1:47 half. If the sample size was larger, it would almost certainly help the predictive power of the statistical analysis, shrink the forecasted range and make it even less statistically probable that Mike could have run a 3:11.
This analysis is super simple, which has obvious drawbacks, but also has its benefits. For example, I interpret this as saying people 99% of the time people don't run 3:11 after running 1:47, regardless of conditions, injuries, setbacks etc. All of those factors could impact any individual time but on average they should impact all times equally. I've omitted plenty of variables but I've done so on both sides of the equation: half marathon time & full marathon time. Any oddities like good (bad) weather, fast (slow) course, good (bad) day should affect both times equally throughout the sample and should statistically cancel out.
Furthermore this model has some selection bias as I'd argue people are more likely to toe the line for for the full marathon if they are in better shape, so we're likely including all of the runners that were in really good shape as Mike claims he was and excluding all the runners that were in such terrible shape that they dropped out or didn't start. Hypothetically speaking if I could include those people who were in terrible shape, the statistical forecast for Mike's time would be even higher and would suggest that his 3:11 was even less probable.
Finally, of the marathon-half time pairings include in my sample, Mike's is the greatest outlier. His time of 3:11 is 41 minutes faster than his statistically forecasted time. There are two other runners who beat there predicted time by 30 minutes. Generously speaking, his time is a 1/100 occurrence because I have a sample of 100 runners. However quick test using the statistical distribution of the under-performance/over-performance of each athlete, suggests that Mike's over-performance is less than a 1/1,000,000 occurrence! In addition to all of the evidence against Mike it's more than obvious that he's a cheat!