So we can expect VIA to exercise good judgement and DQ Mike Rossi.Yeah.VIA - what a joke of a race. Hope someone sets a WR there this year, perhaps a grandmasters runner goes under 2 with far less odds than these for Mikey.
mileage_man wrote:
Thanks, hmmmmmmm! That video was awesome. Succinct, clear, and damning.
Two minor quibbles - first, we want to use the data about the other 199 runners to draw conclusions about Mike. So Mike should be removed from the pool of runners that we are considering when we are estimating the probability of a runner getting missed at one of the checkpoints. According to this reasoning, if we stick with P(not photo'd, assuming you ran) = (# not photo'd)/(# who ran) - more on that in a second - the new numbers we get are:
P(Mike not photo'd, assuming Mike ran) = (20/199)*(0/199)*(64/199)*(81/199)*(4/199) = 0.
This brings us to quibble #2, which is that we're using the wrong way to estimate the probability of getting missed at each checkpoint. The method used treats the the 200 runners as a population, whereas instead it is more appropriate to treat the 199 other runners as a sample from some larger distribution. As above, this is because we want to use the data from the other runners to evaluate the plausibility of the data we have about Mike, under the assumption that Mike actually ran (i.e. under the assumption that Mike Rossi is drawn from the same sample as the 199 other runners). It's a minor difference, but it can matter a lot in cases like the mile 7 checkpoint where Mike was the only one missed. Anyway, the 'right' way to do it is using 'Laplace's rule of succession' which suggests that if you perform n trials, and achieve s successes, the probability of success should be estimated as (s+1)/(n+2). See here for example:
http://www.cut-the-knot.org/Probability/RuleOfSuccession.shtml. This can be formally derived from the axioms of probability theory, assuming a 50% prior for the probability of success. In any case, this means that the numbers end up as follows:
P(Mike not photo'd, assuming Mike ran) = (21/201)*(1/201)*(65/201)*(82/201)*(5/201) = 1.7 x 10^-6.
So the slightly tweaked numbers are 1.7 out of a million (instead of 1.75 out of a million), or 1 in 586 thousand (instead of 1 in 572 thousand).
Thanks again for the link to the video, and sorry for having been behind the times. Obviously the quibbles I pointed out don't affect the conclusions of the analysis in a serious way, but I do feel that 1 in 586 thousand is the more defensible number (for all you probability purists out there).