stupid runner wrote:
Hi all. I recently started looking into some of the allegations against Mimi regarding data forgery to see if I could add an additional point of view to some of the accusations. I ended up putting this into a blog so that I could show my working, show my reasoning, and importantly show my code in case anybody spots any errors. It is really just touching the surface of things, but based on this I believe that Mimi's Strava data are genuine.
Thanks for the very thorough analysis, this is great! Please ignore anyone who is hurling insults. I really appreciate the effort you put into this. Remind me not to cheat on my taxes and try to sneak it by you.
I agree with your conclusion that the data is genuine. I'd more-or-less reached the same conclusion, based on the seeming lack of technical skills on Mimi's team compared to the skills needed to construct a convincing fake. And also on the strange bimodal cadence data, which would actually be *more* effort to fake than more normal-looking cadence data. A couple of your findings removed any lingering doubts I had:
1. The bimodal cadence data is matched by bimodal pace data.
2. The data passed two clever tests for evidence of fakery (little blips in the sample intervals, and the 2nd digit distribution of cadence). The known fake data failed both tests.
I agree with those who suggested Benford's Law doesn't apply here, but I still think your analysis of the 2nd digit distribution is valid and useful.
FYI, there's something wrong with your graph of "Pace Comparison Between Data Sets" and the accompanying text. You said Sandra's normal pace was about 11.5 minutes/mile and Mimi's was split between 11 min/mile walking and 8.5 min/mile running. Those numbers can't be correct, because both women averaged around 14 min/mile overall. Maybe you meant minutes/km?
If the data is genuine, and there's any remaining question about possible cheating, it could only be from another person running some segments instead of Mimi. We can't really prove or disprove that. However, the strange bimodal cadence is there from Day 1 of the data, which suggests it was probably Mimi running all of the segments.
In short, I believe it's very likely Mimi ran the whole thing exactly as she claimed.