This thread is for discussion of Mimi Anderson's run across America data.
This thread is for discussion of Mimi Anderson's run across America data.
Detailed Analysis of Evidence For Mimi Anderson's Run Across America
This is a split off of the thread on Mimi Anderson's run across America so that the discussion can be focused on analyzing her data in detail.
To recap, Sam Robson did a detailed analysis of a few days of Mimi's Strava data to try to show what he thinks is authentic data. However, there are major flaws in the assumptions made in his article that included incorrectly using Benfords Law and applying smoothing to Mimi's cadence data on the assumption that Sandra's data must be smoothed when that is in fact not the case as shown by Garmin documentation (Smart Recording samples at various intervals, it does not smooth).
I have begun a detailed analysis of the entire dataset for Mimi Anderson's Strava and have uploaded the GPX extract and an Excel file for each upload of her runs. You can find all this data in my Google Drive. The folder also includes GPX files and Excel exports for Sandra Vi (using a Garmin), Kris King (using various Suunto watches at varying recording intervals, ran across Britain), and a random assortment of other Suunto watch users.
This allows for very easy comparison of the data between a sizeable group of people to show any anomalies. There are folders containing the screen captures for the cadence histogram, line plot, speed histogram, and speed plot. You can view them in a thumbnail gallery on the Google Drive to easily see many days of data at once to spot patterns or issues.
I will also be uploading a written report in the near future that summarizes the timeline of events for Mimi's run across America as well as an overall assessment of the evidence for her run, whether it passes scrutiny or not.
https://drive.google.com/open?id=0B1F3plcU7fEmT2VPcWlEQnNtUVk
Ugh, Google Drive might not have applied the share permission correctly to the folders with the images. One moment.
Ok, the public sharing of the data files should be fixed now. Let me know if you don't see the image gallery of the cadence histograms inside the folders.
To continue on with the current investigation, here are the cadence histograms for all of Mimi's runs showing a very odd pattern of missing cadences:
https://drive.google.com/drive/folders/0B1F3plcU7fEmWnczWVYwQjV4aG8
Compare that with the run across Britain data from Kris King who used a number of Suunto watches that included recording at 1 second intervals as well as other intervals depending on his watch (you can verify this by looking at the Excel export for his data in my Google Drive):
https://drive.google.com/drive/folders/0B1F3plcU7fEmcGFkak9vejlOTkk
As well, here is a cadence histogram gallery for a random assortment of other Suunto users on Strava and none of them have the Swiss cheese cadence data problem Mimi has:
https://drive.google.com/drive/folders/0B1F3plcU7fEmaEItNFdQb0dsd0U
I cannot think of any reasonable explanation for Mimi's cadence data that fits the above evidence. Anyone have any more thoughts?
I also have some of the RaceDrone GPX files in case someone wants to analyze them:
https://drive.google.com/drive/folders/0B1F3plcU7fEmY0s2ZVU4QURlWFU
And here is Sandra Vi's GPX files and analysis:
https://drive.google.com/drive/folders/0B1F3plcU7fEmZHAyd2hCOUtBczg
Good stuff Scam. Thank you for moving the discussion here. The URC chaps have a place to make a UK-style defence.
1. I think you need to provide a clear description of how you acquired this data. I believe it was downloaded from Strava using some 3rd-party bookmarklet? How certain are you that data acquired this way is 100% the same as the original data that was uploaded, and isn't modified by Strava or the bookmarklet's code somehow?
2. The data is in tabular spreadsheet form, but I think GPX data is hierarchical XML. How did you convert it?
3. The data includes separate cadence numbers for rpm and steps/min. Are both of these present in the original data, or did you calculate one from the other? Which is the original data? I think you should only include columns that are in the original data, and not anything you calculated.
Sandy? wrote:
1. I think you need to provide a clear description of how you acquired this data. I believe it was downloaded from Strava using some 3rd-party bookmarklet? How certain are you that data acquired this way is 100% the same as the original data that was uploaded, and isn't modified by Strava or the bookmarklet's code somehow?
2. The data is in tabular spreadsheet form, but I think GPX data is hierarchical XML. How did you convert it?
3. The data includes separate cadence numbers for rpm and steps/min. Are both of these present in the original data, or did you calculate one from the other? Which is the original data? I think you should only include columns that are in the original data, and not anything you calculated.
1. I obtained the data the same way Sam Robson is for his analysis, using the Strava to GPX tool. You can actually fetch anybody's Strava data that is public by simply visiting a specially formatted URL.
https://mapstogpx.com/strava/2. I converted the data using a custom tool I wrote. The distances are calculated using Vincenty Distance method from the GeoCalc library. GPX parsing is handled by the GPX-Parser library with a fix to allow it to read cadence and extensions. The charts are generated using jFreeChart.
3. The cadence in RPM is the original from the GPX files, the cadence in SPM is simply the cadence in RPM x 2. I may remove the SPM in the future.
4. I have found a bug with my analysis. It seems that the histograms generated by the jFreeChart don't like the number 95 and moves those counts to either 94 or 96. Hence why we see a zero cadence count at 95 for many of the histograms. This doesn't affect other cadences. I need to figure out why jFreeChart is choking on number 95. It is very odd and driving me nuts.
scam_watcheroo wrote:
4. I have found a bug with my analysis. It seems that the histograms generated by the jFreeChart don't like the number 95 and moves those counts to either 94 or 96. Hence why we see a zero cadence count at 95 for many of the histograms. This doesn't affect other cadences. I need to figure out why jFreeChart is choking on number 95. It is very odd and driving me nuts.
I found out why jFreeChart was choking on the 95 cadence in the histograms. Looks like I gave it one too few bins to use. I've upped the number and am re-running all the analysis for all the files now to correct it. I've also removed the cadence in SPM per your suggestion.
Now I can have the other thread all to myself, and my stupid subject line obsession.
Notice how I've posted there many times with this nickname, trying to project my stupid subject line.
Now please leave that thread to just me, and everyone else post on this one.
Let's Run is rejecting all my replies to this post with "No spam. Error 297812"
I downloaded one of my own runs from Strava, using that tool you mentioned, and compared it to the original Garmin file I have for the same run.
The cadence data matches. So it looks like cadence is preserved just fine using that tool you mentioned.
I then downloaded an example of one of Mimi's runs, using the same tool. I did my own data analysis and confirmed what you found: every third discrete cadence value is absent in the data. So this strange pattern looks real, and isn't caused by your methods or tools somehow.
Mimi's run data wrote:
Now I can have the other thread all to myself, and my stupid subject line obsession.
Notice how I've posted there many times with this nickname, trying to project my stupid subject line.
Now please leave that thread to just me, and everyone else post on this one.
Please stick to defending or defeating Mimi's data on this thread.
it was nice of Scam to make this tread so that he could post his own analysis in it.
URC people are interested in protecting Mimi's good name. The discussion about raw data and analysis is not for the weak of heart. Better know you math and analysis tools.
Sandy? wrote:
Let's Run is rejecting all my replies to this post with "No spam. Error 297812"
Same here. Brojos, some help?
I'm pretty sure that the data is downloaded correctly. I actually had to go through the source code for it to fix the start time it uses for the activities since it by default just uses the time from a week ago instead of parsing the descriptive time on the page. You can actually see the raw data for any public Strava activity by just going to a URL. For example, here is a direct link to one of Mimi's activities showing time and cadence:
Eric the web guy is responding to thread issues on this post.
I found this random Suunto runner on Strava, and his cadence data shows a somewhat similar pattern of missing cadence values as Mimi's data: https://www.strava.com/activities/1187449540
Colin Sahlman runs 1:45 and Nico Young runs 1:47 in the 800m tonight at the Desert Heat Classic
Molly Seidel Fails To Debut As An Ultra Runner After Running A Road Marathon The Week Before
Megan Keith (14:43) DESTROYS Parker Valby's 5000 PB in Shanghai
Hallowed sub-16 barrier finally falls - 3 teams led by Villanova's 15:51.91 do it at Penn Relays!!!
Need female opinions: I’m dating a woman that is very sexual with me in public. Any tips/insight?