Mimi Anderson drops her attempt to run across America. How good is her data?

Masters Running

Moderation Moderation Information Moderation Information & Rules

Page 3 of 4

1 2 3 4

7 years ago 10/23/2017 3:59am EDT

re: scam_watcheroo

Thanks for splitting this off into a separate thread. I think it makes much more sense to discuss this away from Sandra's ongoing run.

scam_watcheroo wrote:
To recap, Sam Robson did a detailed analysis of a few days of Mimi's Strava data to try to show what he thinks is authentic data. However, there are major flaws in the assumptions made in his article that included incorrectly using Benfords Law and applying smoothing to Mimi's cadence data on the assumption that Sandra's data must be smoothed when that is in fact not the case as shown by Garmin documentation (Smart Recording samples at various intervals, it does not smooth).

Just to clarify, in the Benford's Law section, my entire point was that I assumed that it probably WOULD NOT hold for these data since they lie within a very narrow range (and this indeed proved to be the case). Nothing in my report suggested it even came close to holding. I was, however, then interested to see if there was anything suggestive about the non-significant digits (note that this is subtly different to Benford's Law which requires the digits to be at the same position from the left hand side of the number, whereas I was looking for the first digit from the right hand side) and there does appear to be a uniformish distribution. Thinking about it more, this is likely a result of the fact that Mimi's data is a mixture of two overlapping normalish distributions with a high standard deviation, whilst the faked data is taken from a single distribution (which importantly does not vary much from data set to data set).

The smoothing point is worth me looking into however. Note that I did not smooth Mimi's data for my analyses as this would reduce the fidelity. I only looked at smoothing when looking at explanations for why the high 200+ spa cadence measurements were not seen in both data sets. My assumption was that they were smoothed out in Sandra's data, but this does not seem to be the case. It says on the Garmin website that the "Smart Recording records key points where the fitness device changes direction, speed, heart rate or elevation." Presumably it needs to have some way internally of determining what is a key change compared to what is a random error. I wonder whether these would skip unfeasibly high recordings like those seen in Mimi's? The ideal test would be to see data from Sandra in both Smart Recording and 1s capture mode for one day, but I think that she is probably a little busy right now!

Note that neither of these points affect my conclusions, and it seems to me that you agree that the data are genuine. I know that there are other anomalies that still need looking into as well.

One thing that is odd to me is that I do not see the missing cadence data that you have described. I am using the same tool as you to get the raw .gpx data, and then parsing the data in R as per my code (which is neither missing data nor adding in values for missing data). Yet I see zero missing values for the cadence (at least in the few files that I analysed). Any idea why this might be? Could something be happening when you port them over to Excel?

7 years ago 10/23/2017 7:41am EDT

re: AquaDyne

AquaDyne wrote:
scam_watcheroo wrote:
I think Mimi's Strava data has passed my scrutiny and it is fair to say now that the Strava data looks authentic.
Yeah. It'd take quite the hacker to figure out a cadence artifact particular to one model of the watch and reproduce that on spoofed data.
Glad we've gotten to the bottom of this and removed doubt about this data.
scam_watcheroo wrote:
Note that this just leaves the matter of verifying the uncut video of Mimi running to check that the Strava data matches what is seen in real life with the same cadence and same durations.
Huh? First you say it's authentic and now you insist on more proof? When will you stop raising the bar?
She's not trying to claim a record. She has nothing to prove to anyone. So far everything we've seen says her run, such as it was, was legit. Let's let her recover and wish her the best.

The data being genuine doesn't tell anything about the identity of the person(s) who ran all those miles.

Maybe someone could compare a run where we are certain that Mimi is running (e.g. because someone joined her), with some of her monster days, to check if the cadence range is the same.

Do we know why her latter runs don't have heart rate? Did she stop using the watch with optical HR, did she turn it off?

7 years ago 10/23/2017 8:02am EDT

re: stupid runner

stupid runner wrote:
One thing that is odd to me is that I do not see the missing cadence data that you have described. I am using the same tool as you to get the raw .gpx data, and then parsing the data in R as per my code (which is neither missing data nor adding in values for missing data). Yet I see zero missing values for the cadence (at least in the few files that I analysed). Any idea why this might be? Could something be happening when you port them over to Excel?

User Sandy? Also did his own check of the cadence data and confirmed that there are missing cadence values like I show in my histograms so I'm pretty sure my plots are correct. I also opened up the GPX files in NotePad++ and confirmed in the XML that the cadence values are indeed missing. There must be some filtering going on with R.

7 years ago 10/23/2017 8:18am EDT

re: scam_watcheroo

To add, I'm using the Apache Poi library to write out the data to Excel after parsing it in with the GPX-Parser library. I just checked another of Mimi's GPX files in NotePad++ and those certain numbers for the cadence values are indeed missing. For example, values 91 and 92 exists in the XML but 92 does not (or only exists once). All the XML nodes are intact or else I would have blanks in the Excel cadence column.

What do you get if you simply plot a histogram in R of the raw cadence data with 1 rpm bins?

7 years ago 10/23/2017 8:42am EDT

re: scam_watcheroo

As a previous poster has said, I think it would be far more useful to look at the actual accelerometer data and algorithms used to calculate cadence from the force measurements. Not in terms of finding out whether the data was fabricated or not, but more to see if this sort of data could accurately inform a user about their running economy, injuries etc. Seems likely that her injury would've affected the raw data recorded and the algorithms probably don't account for someone missing their knee cartilage.

7 years ago 10/23/2017 9:02am EDT

re: scam_watcheroo

scam_watcheroo wrote:
To add, I'm using the Apache Poi library to write out the data to Excel after parsing it in with the GPX-Parser library. I just checked another of Mimi's GPX files in NotePad++ and those certain numbers for the cadence values are indeed missing. For example, values 91 and 92 exists in the XML but 92 does not (or only exists once). All the XML nodes are intact or else I would have blanks in the Excel cadence column.
What do you get if you simply plot a histogram in R of the raw cadence data with 1 rpm bins?

Typos in my post. Meant to say 91 and 93 exist but 92 values do not or only appear once or twice.

7 years ago 10/23/2017 9:47am EDT

re: scam_watcheroo

scam_watcheroo wrote:
stupid runner wrote:
One thing that is odd to me is that I do not see the missing cadence data that you have described. I am using the same tool as you to get the raw .gpx data, and then parsing the data in R as per my code (which is neither missing data nor adding in values for missing data). Yet I see zero missing values for the cadence (at least in the few files that I analysed). Any idea why this might be? Could something be happening when you port them over to Excel?
User Sandy? Also did his own check of the cadence data and confirmed that there are missing cadence values like I show in my histograms so I'm pretty sure my plots are correct. I also opened up the GPX files in NotePad++ and confirmed in the XML that the cadence values are indeed missing. There must be some filtering going on with R.

Sam, in your R code, you are multiplying the raw cadence values by 2 as part of your parsing where it says:

## Convert cadence to steps per minute

gpx[["extensions.cadence"]]

7 years ago 10/23/2017 10:19am EDT

re: scam_watcheroo

My post got cut off above. Sam, in your analysis you have distribution plots but no histograms. For the distribution plots, the curves on them are too smooth to represent jagged data, I think that is why you don't see the Swiss cheese bug. Plot a histogram of the raw cadence in spm and the issue should be visible.

7 years ago 10/23/2017 10:21am EDT

re: scam_watcheroo

^meant to say plot a histogram with the raw RPM cadence values, not the SPM values that are always even because it was multiplied by 2 from your parsing of the data.

7 years ago 10/23/2017 10:28am EDT

re: scam_watcheroo

Ah, sorry, I misunderstood. I thought you meant you had missing cells (rows) whilst mine are complete, so was confused how I could have filled these in with my parsing. I understand now.

7 years ago 10/23/2017 10:42am EDT

re: embedded electronics

embedded electronics wrote:
As a previous poster has said, I think it would be far more useful to look at the actual accelerometer data and algorithms used to calculate cadence from the force measurements. Not in terms of finding out whether the data was fabricated or not, but more to see if this sort of data could accurately inform a user about their running economy, injuries etc. Seems likely that her injury would've affected the raw data recorded and the algorithms probably don't account for someone missing their knee cartilage.

The only data we get is an integer value for "cadence" once per second. I think it would be tough to detect injuries just from that, except to note that lots of walking (lots of data points with a low cadence value) means the athlete is struggling and possibly injured. There's no low-level accelerometer data present that might be used for more sophisticated calculations. Maybe Suunto or Garmin could expose that data in a future firmware update, if somebody convinced them why it would be useful.

7 years ago 10/23/2017 11:10am EDT

re: Sandy?

Sandy? wrote:
embedded electronics wrote:
As a previous poster has said, I think it would be far more useful to look at the actual accelerometer data and algorithms used to calculate cadence from the force measurements. Not in terms of finding out whether the data was fabricated or not, but more to see if this sort of data could accurately inform a user about their running economy, injuries etc. Seems likely that her injury would've affected the raw data recorded and the algorithms probably don't account for someone missing their knee cartilage.
The only data we get is an integer value for "cadence" once per second. I think it would be tough to detect injuries just from that, except to note that lots of walking (lots of data points with a low cadence value) means the athlete is struggling and possibly injured. There's no low-level accelerometer data present that might be used for more sophisticated calculations. Maybe Suunto or Garmin could expose that data in a future firmware update, if somebody convinced them why it would be useful.

There would need to be a way to measure what each foot/leg is doing. A small device that talks to the GPS unit on the lace of each shoe might be able to do it. Now that could be really useful if it could help detect a problem--injury, stride issues, shoe problems, pronation, etc.

7 years ago 10/23/2017 6:09pm EDT

re: follower

follower wrote:
The data being genuine doesn't tell anything about the identity of the person(s) who ran all those miles.
Maybe someone could compare a run where we are certain that Mimi is running (e.g. because someone joined her), with some of her monster days, to check if the cadence range is the same.
Do we know why her latter runs don't have heart rate? Did she stop using the watch with optical HR, did she turn it off?

Best tool for this is probably Principal Component Analysis (PCA) of her stride variations over several splits. The eigenvalue vector will represent something of a signature that should be similar across days.

Or we can just assume since she totally thrashed her knee, she ran a helluvalotta miles on it. Seems the most likely explanation to me.

7 years ago 10/23/2017 6:18pm EDT

re: AquaDyne

AquaDyne wrote:
Or we can just assume since she totally thrashed her knee, she ran a helluvalotta miles on it. Seems the most likely explanation to me.

Sadly, she probably took a lot of pain killers to keep running. She may have drunk a lot of black tea, or some energy boosting drugs, to deal with the lack of proper sleep.

7 years ago 10/23/2017 6:26pm EDT

re: scam_watcheroo

scam_watcheroo wrote:
I've uploaded all the Suunto Sparta Ultra/Sport watch data from random people onto my Google Drive along with their cadence histograms. It looks like the issue with Mimi's cadence data is indeed because of a extremely odd problem specific to all Suunto Spartan watches and not because of any doctoring. I don't see why Suunto would do this intentionally to one of their flagship products, it seems like a major bug.

I thought of a much simpler explanation for the roundoff issues. Round-up/down at 0.5 is really only relevant if you're dealing with actual fractions or already rounded values (like money). In digital computing, floating point numbers are binary and math isn't exact. Famously, if you add 0.1 and 0.2 on a computer, you'll get an answer other than 0.3.

So it's entirely possible/probable that the particular model of watch made the poor decision to have a calculation frequently (always?) result in half-decimals like 90.500001, 91.499999, and 92.500001.

Again, speculation, just a reasonable guess for what causes the "swiss cheese".

7 years ago 10/25/2017 4:22pm EDT

re: AquaDyne

AquaDyne wrote:
follower wrote:
The data being genuine doesn't tell anything about the identity of the person(s) who ran all those miles.
Maybe someone could compare a run where we are certain that Mimi is running (e.g. because someone joined her), with some of her monster days, to check if the cadence range is the same.
Do we know why her latter runs don't have heart rate? Did she stop using the watch with optical HR, did she turn it off?
Best tool for this is probably Principal Component Analysis (PCA) of her stride variations over several splits. The eigenvalue vector will represent something of a signature that should be similar across days.
Or we can just assume since she totally thrashed her knee, she ran a helluvalotta miles on it. Seems the most likely explanation to me.

Here's the thing though. There is absolutely no reason at this point to not release video of her running at a reasonable pace as she did for the first half. She ran many miles early on, some up significant elevation, and at considerable speed -- all while being filmed. Why on earth would they not release some of this footage? It would have rendered this entire discussion moot.

Mimi is clearly proud of her accomplishments, we know there was a film crew there and Strava data says she was kicking ass. So what gives? There is only one reason I can think of for them not releasing video. The defensiveness of the team and absurd claims that we're asking for video footage of every single mile ran further suggests to me that something is amiss. I don't think we're being unreasonable. If we had some decent video, none of this extended analysis of cadence and watches would even be necessary. At this point, I don't care as much since we know she won't get the record. However, as someone who was convinced of her legitimacy, I can no longer maintain that position nor defend her. It just does not add up.

7 years ago 10/25/2017 4:25pm EDT

re: Results

And I do hope she heals up well, is enjoying NYC, and can move on with her life. I'm not one of those people who think cheaters deserve to suffer for the rest of their lives. I simply value the simplicity and transparency of the sport. There is very little in life that is as pure as running.

7 years ago 10/30/2017 10:06am EDT

re: Results

Scam did you ever finish up your written report? I'm looking forward to reading it. Maybe someday a documentary will be released too.

I'm still curious about this attempt. It's interesting to read that Sandy stops 100 times a day or thereabouts. I was confused why Mimi was doing that (it seemed inefficient) but I suppose that is just the way these runs are properly done.

7 years ago 10/30/2017 10:46am EDT

re: Results

I didn't get it written up since I've been analyzing some of the details and reviewing some of the short videos. The improvements I've made to my scripts will allow me to do a better analysis, faster in any future cases. I'm implementing some interpolation for smoothing data that has variable time intervals so I can do a proper output for Sandra. I also want to figure out why my elevation gains for Sandra make sense but seems way too high for Mimi even though both are calculated the same way. I suspect it is due to the poor resolution elevation data coming from Strava's database and there are more sample points for Mimi's data than for Sandra.

I'm more confused than anything at this point. It seems like the Strava data and that short montage video from Scrumptious do provide support that Mimi did her runs and was doing 190 spm (at least in those short videos). But then why did everything go so horribly wrong with a highly experienced crew that has multiple world records and many ultras under their belt?

7 years ago 10/30/2017 10:54am EDT

re: scam_watcheroo

scam_watcheroo wrote:
I didn't get it written up since I've been analyzing some of the details and reviewing some of the short videos. The improvements I've made to my scripts will allow me to do a better analysis, faster in any future cases. I'm implementing some interpolation for smoothing data that has variable time intervals so I can do a proper output for Sandra. I also want to figure out why my elevation gains for Sandra make sense but seems way too high for Mimi even though both are calculated the same way. I suspect it is due to the poor resolution elevation data coming from Strava's database and there are more sample points for Mimi's data than for Sandra.
I'm more confused than anything at this point. It seems like the Strava data and that short montage video from Scrumptious do provide support that Mimi did her runs and was doing 190 spm (at least in those short videos). But then why did everything go so horribly wrong with a highly experienced crew that has multiple world records and many ultras under their belt?

Well to me, if you believe she ran the miles she claims, then it's easy to explain why everything went so horribly wrong. As we know, her knee was already damaged at the start. She ran to the point where it likely will never even be that good again.

Page 3 of 4

1 2 3 4

Next Last

What People Are Talking About On LetsRun

No top threads at the moment. Check back soon.

Reply Replying to

Username

Password

Leave the password field blank to post anonymously.

Post Preview

By posting you acknowledge that you have read and abide by our Terms and Conditions.

Remember me on this device.

Mimi Anderson drops her attempt to run across America. How good is her data?

This thread has already been deleted.

You have been subscribed.

Mimi Anderson drops her attempt to run across America. How good is her data?

Follow stupid runner

Block stupid runner

Follow scam_watcheroo

Block scam_watcheroo

Follow scam_watcheroo

Block scam_watcheroo

Follow scam_watcheroo

Block scam_watcheroo

Follow scam_watcheroo

Block scam_watcheroo

Follow scam_watcheroo

Block scam_watcheroo

Follow scam_watcheroo

Block scam_watcheroo

Follow stupid runner

Block stupid runner

Follow AquaDyne

Block AquaDyne

Follow AquaDyne

Block AquaDyne

Follow Results

Block Results

Follow Results

Block Results

Follow Results

Block Results

Follow scam_watcheroo

Block scam_watcheroo

Follow Results

Block Results

This thread has already been deleted.

Reply Replying to

You have been subscribed.