Is anyone using big data to evaluate best practices for running, training or racing?

Training Advice/Discussion

Page 1 of 2

1 2

6 years ago 04/23/2018 12:00pm EDT

It seems as though running as a sport is riddled with quantitative data from thousands of practices and races per day. Will finding methods to compile and analyze this data be worthwhile?

6 years ago 04/23/2018 12:05pm EDT

re: Datum

I question how much training is really quantified. Even some of the quantification is problematic like heart rate. So much can influence HR that is a dicey metric to use.

I am not confident power will be useful for running either.

Pace? Well that has problems too (wind, hills, change in temperature, etc).

6 years ago 04/23/2018 12:07pm EDT

re: Luv2Run

Luv2Run wrote:
I question how much training is really quantified. Even some of the quantification is problematic like heart rate. So much can influence HR that is a dicey metric to use.
I am not confident power will be useful for running either.
Pace? Well that has problems too (wind, hills, change in temperature, etc).

A sophisticated enough model could take this into consideration. Whether or not anything useful results is another question.

6 years ago 04/23/2018 1:27pm EDT

re: Apophis99

I don't know for a fact, but I would suspect that they do. Big data is big money. Look at facebook/google, for the first few years, you would barely know they were collecting all of your data, now they own a recording of your entire life and they sell it 500 times a day.

Strava/Garmin/whoever else are no different. They're not going to store all that data for you if they can't monetize it, and definitely not for free. With sites like that, you're not the customer, you're the product.

I just read the strava terms of service, and the provision is in there.

"You grant us a non-exclusive, transferable, sub-licensable, royalty-free, worldwide license to use any Content that you post on or in connection with the Services. "

6 years ago 04/23/2018 3:33pm EDT

re: hank jr

I just read the strava terms of service, and the provision is in there.

okay, let's assume you're right.

what are they going to do with it?

suppose Strava and Garmin and all the other folks sold all their data to, Nike, for example. what exactly can Nike do with that data that makes them money?

genuine question.

cheers.

6 years ago 04/23/2018 3:35pm EDT

re: Cottonshirt

Cottonshirt wrote:
I just read the strava terms of service, and the provision is in there.
okay, let's assume you're right.
what are they going to do with it?
suppose Strava and Garmin and all the other folks sold all their data to, Nike, for example. what exactly can Nike do with that data that makes them money?
genuine question.
cheers.

Know where to advertise is just one example.

6 years ago 04/23/2018 3:37pm EDT

re: Luv2Run

Luv2Run wrote:
I question how much training is really quantified. Even some of the quantification is problematic like heart rate. So much can influence HR that is a dicey metric to use.
I am not confident power will be useful for running either.
Pace? Well that has problems too (wind, hills, change in temperature, etc).

You really don't know how data models work, do you?

6 years ago 04/23/2018 3:39pm EDT

re: Datum

Datum wrote:
It seems as though running as a sport is riddled with quantitative data from thousands of practices and races per day.

How would a 5:30 runner benefit from the run a 10:00 hobbyjogger did? Lots of data out there. Not all of it is useful in simple ways.

6 years ago 04/23/2018 3:44pm EDT

re: hank jr

hank jr wrote:
I don't know for a fact, but I would suspect that they do. Big data is big money. Look at facebook/google, for the first few years, you would barely know they were collecting all of your data, now they own a recording of your entire life and they sell it 500 times a day.
Strava/Garmin/whoever else are no different. They're not going to store all that data for you if they can't monetize it, and definitely not for free. With sites like that, you're not the customer, you're the product.
I just read the strava terms of service, and the provision is in there.
"You grant us a non-exclusive, transferable, sub-licensable, royalty-free, worldwide license to use any Content that you post on or in connection with the Services. "

Collecting data to sell widgets isn’t the same as being able to analyze/correlate/apply it develop a “perfect” training plan. Harder still would be tailoring it to a different athlete. Doing so would be more work and pay less than pimping out fake Strava runs to sell Hokas.

6 years ago 04/23/2018 4:34pm EDT

re: Apophis99

I expect that the sports science community will eventually start using things like Strava data. What passes for science at the moment (6 week studies of small, heterogeneous populations) isn't particularly useful for devising a complete approach to training. Looking at what real runners are doing and achieving could potentially be much more powerful.

As for the comment about 10:00-minute milers not being useful for helping faster runners train, that's exactly why using big data is so valuable. With a few keystrokes, you could pull together a dataset of, for example, runners in their early 30s who had run, by the time they were 24, 5k prs between 15:00 and 15:30 and who had annual mileage totals of 3000-4000. You could then look at how their training and results diverged in the following decade. I'm just making up some numbers, but my point is that when you analyze training, there are far too many variables to ever control in a lab setting. Strava and Garmin have so much data, however, that you can control for a lot more.

6 years ago 04/23/2018 4:47pm EDT

re: Apophis99

I built the 5k prediction model on Run Augur using training log data I collected. When I originally set out to collect training log data I had a much grander vision to analyze the data and build a tool or model that could help a runner improve their training, sort of like what the OP is suggesting. The training log data I have access is incredibly noisy and messy. I gathered data for thousands of athletes but lots of the data turned out to be not super useful for a number of reasons. What I was left with was a moderate sized dataset but not a "big" dataset. Because of this I chose to answer a silly and simple question of correlating interval workouts with 5k performances. The problem is not super complex but creating the dataset for even this simple question involved a lot of headaches because of the idiosyncratic way that people record their running data. I'd love to do more with the data I have on hand but haven't had the time to dive back into it.

Getting to the point I think if you had access to the entirety of the Strava dataset you could start to cut through the noise and build some pretty cool tools. I don't think you're going to create some magical tool that creates the perfect training plan, but you could essentially take the traditional online training log and build features and tools on top of that to provide feedback. From a research perspective you could use the dataset to test common training theories with real world data. For example you might analyze if increasing mileage by 5% is the best threshold, test whether continuous tempos are better than cruise intervals, or explore factors associated with injury risk.

Note: I'm not trying to shamelessly plug my website but wanted to share my experiences working with running data. Cheers!

http://www.runaugur.com/

6 years ago 04/23/2018 5:30pm EDT

re: 800 dude

800 dude wrote:
I expect that the sports science community will eventually start using things like Strava data. What passes for science at the moment (6 week studies of small, heterogeneous populations) isn't particularly useful for devising a complete approach to training. Looking at what real runners are doing and achieving could potentially be much more powerful.
As for the comment about 10:00-minute milers not being useful for helping faster runners train, that's exactly why using big data is so valuable. With a few keystrokes, you could pull together a dataset of, for example, runners in their early 30s who had run, by the time they were 24, 5k prs between 15:00 and 15:30 and who had annual mileage totals of 3000-4000. You could then look at how their training and results diverged in the following decade. I'm just making up some numbers, but my point is that when you analyze training, there are far too many variables to ever control in a lab setting. Strava and Garmin have so much data, however, that you can control for a lot more.

I agree with most of your statement. but let's not forget the advantage of experiments: observational data (in this case, the Strava data) has the disadvantage of making it very hard / impossible to distinguish correlation and causation. in an experiment, you can randomly assign different training plans to runners. in the Strava data, runner A might be doing plan X and runner B plan Y, now if runner A is performing better, we don't know if it's because the plan X works better or because runner A is just a better runner. sure, some things like weight etc might be in the data and you can control for them, but you never know what is missing that could be important (some things like personality traits are very hard to measure). in a randomized experiment, you don't have this problem because, well, the training plan is assigned randomly.

6 years ago 04/23/2018 5:55pm EDT

re: Datum

Check out the book called The Secret of Running and their website. Also, the Stryd and "run with power" community has some decent quantitative tools.

6 years ago 04/23/2018 7:40pm EDT

re: Datum

Good question.

At least in public (no idea what happens at places like NOP), coaching and "science" seem to be dominated by a few names and recieved wisdom based on what "works." Watching March Madness, every 5 minutes the Google cloud big Data commercial was on telling us about all of these correlations they were going to find. Baseball, of course, has been obsessed with stats for every situation. On the other hand, I've seen very limited quantative data regarding running. The sport seems relatively small-time (you can publish a Master's thesis, for example), with a mix of tradition and fads.

6 years ago 04/23/2018 9:06pm EDT

re: CKidd

Run then recover

Keep repeating.....

6 years ago 04/23/2018 9:37pm EDT

re: try this?

try this? wrote:
Check out the book called The Secret of Running and their website. Also, the Stryd and "run with power" community has some decent quantitative tools.

https://www.outsideonline.com/2276656/what-running-power-anyway

TL;DC ('click'). Stryd is making advances, but it is still in the territory of smoke and mirrors. Is it really feasible to do for running what happened in cycling?

6 years ago 04/23/2018 10:09pm EDT

re: Run Augur

You and 800 guy have the right idea on this. I dabble in this sort of stuff as a quasi-hobby, but wouldn’t have time to do anything meaningful with it as its not my profession. Imagine having funding to apply a real team of people on this.

All I can say is self-entered data would really ruin your models. I think you’d need to isolate what you draw to true data that isn’t edited.

If you had access to what I’ve seen on RunningAhead you’d probably get a better sample. The runners I keep in touch with span the gamut of skill, commitment and ability. I know the noise there is minimized there.

6 years ago 04/24/2018 7:34am EDT

re: pop_pop!_v2.2.1

pop_pop!_v2.2.1 wrote:
Datum wrote:
It seems as though running as a sport is riddled with quantitative data from thousands of practices and races per day.
How would a 5:30 runner benefit from the run a 10:00 hobbyjogger did? Lots of data out there. Not all of it is useful in simple ways.

They may not, but the 10 minute runner can benefit a lot from knowing what a 5:30 runner does.

I honestly think that running is far simpler than we make it out to be. We may not get anything useful out of it, but I bet they will try, and we can certainly expect them to try to monetize it, that's the whole point of their business.

6 years ago 04/24/2018 7:42am EDT

re: Apophis99

Apophis99 wrote:
hank jr wrote:
I don't know for a fact, but I would suspect that they do. Big data is big money. Look at facebook/google, for the first few years, you would barely know they were collecting all of your data, now they own a recording of your entire life and they sell it 500 times a day.
Strava/Garmin/whoever else are no different. They're not going to store all that data for you if they can't monetize it, and definitely not for free. With sites like that, you're not the customer, you're the product.
I just read the strava terms of service, and the provision is in there.
"You grant us a non-exclusive, transferable, sub-licensable, royalty-free, worldwide license to use any Content that you post on or in connection with the Services. "
Collecting data to sell widgets isn’t the same as being able to analyze/correlate/apply it develop a “perfect” training plan. Harder still would be tailoring it to a different athlete. Doing so would be more work and pay less than pimping out fake Strava runs to sell Hokas.

Agree completely. The data will be used to make money first, and any training advances will be secondary. I do think they will at least try though, if they can sell the results.

6 years ago 04/24/2018 8:52am EDT

re: Apophis99

Apophis99 wrote:
You and 800 guy have the right idea on this. I dabble in this sort of stuff as a quasi-hobby, but wouldn’t have time to do anything meaningful with it as its not my profession. Imagine having funding to apply a real team of people on this.
All I can say is self-entered data would really ruin your models. I think you’d need to isolate what you draw to true data that isn’t edited.
If you had access to what I’ve seen on RunningAhead you’d probably get a better sample. The runners I keep in touch with span the gamut of skill, commitment and ability. I know the noise there is minimized there.

This is exactly right. I started with very unstructured data where all information was recorded in a general comment field. That was a nightmare to clean and work with. A lot work was needed upfront with not enough return. I later moved to set of data that at least had structured fields. That was easier to parse and clean. Daily mileage and paces were especially easy to work with. Intervals were slightly challenging but mostly with respect to measuring/quantifying the rest periods. What I find is the most unreliable aspect of running log data was the reliability of the longitudinal sample. For example, you can't distinguish between a 0 mileage day and a day with missing information. Many people are inconsistent in how religiously they update their logs. People also tend to disappear and reappear months or years later. It's hard to know if those periods without information are down times/breaks or simply times where they stopped recording their runs.

All the issues aside, I do think there is far more potential in running log data than in experiments. I do not have a degree in exercise physiology but in the handful papers I've reviewed the sample size of the experiment ranges between 10 to 50 athletes . That setting is great for precisely very specific questions, but the results generally lack external validity.

Page 1 of 2

1 2

Next Last

Reply Replying to

Username

Password

Leave the password field blank to post anonymously.

Post Preview

By posting you acknowledge that you have read and abide by our Terms and Conditions.

Remember me on this device.

Is anyone using big data to evaluate best practices for running, training or racing?

Of General Interest

This thread has already been deleted.

You have been subscribed.

Is anyone using big data to evaluate best practices for running, training or racing?

Follow Luv2Run

Block Luv2Run

Follow Apophis99

Block Apophis99

Follow Cottonshirt

Block Cottonshirt

Follow pop_pop!_v2.2.1

Block pop_pop!_v2.2.1

Follow pop_pop!_v2.2.1

Block pop_pop!_v2.2.1

Follow Apophis99

Block Apophis99

Follow 800 dude

Block 800 dude

Follow Run Augur

Block Run Augur

Follow Apophis99

Block Apophis99

Follow Run Augur

Block Run Augur

Of General Interest

This thread has already been deleted.

Reply Replying to

You have been subscribed.