Hi Duncan,
>Compare to RMSE of 0.306 for LMC
you mean 0.0306?
(referring to table 2, bottom right?)
Hi Duncan,
>Compare to RMSE of 0.306 for LMC
you mean 0.0306?
(referring to table 2, bottom right?)
Dear ventolin,
> eh ???
i said that looks wrong for algorithm, but i'm maths-interest not code, but try to force to myself to learn some chi-square
please stop trolling Duncan, he didn't do anything to you.
Regarding chi-square, I assume you mean Pearson's chi-squared test?
The wikipedia page on that is very good, but here is how I often explain it intuitively to students (here used as test for independence):
if you have a table of counts, say
therapy placebo
cured 65 42
not cured 21 30
(statisticians call this contingency table)
the chi-squared test tells you whether it is plausible to believe that the proportions in the rows/columns are different - in the example whether the proposed therapy cures a higher fraction of patients than the placebo.
Just looking at the fractions and comparing them is not enough, since the therapy, if ineffective, could by pure randomness cure a higher or lower fraction people than the placebo.
Pearson's idea is to look at a a number which quantifies the imbalance between rows/columns, the chi-squared statistic.
If therapy and placebo were no different, then this number would stay small in most cases. So if it is big, you can argue there is imbalance and thus difference.
In the example, the chi-squared statistic which measures imbalance is 4.6 - you would get a larger imbalance only in around 3 out of 100 cases for the same number of patients, if therapy and placebo were similarly effective.
With the kind of inductive reasoning which is called the Pearson-Neyman paradigm, you may conclude that therapy is different from placebo since this is a small number.
The math is on wikipedia - for the coding which you seem to be less interested in: in the free statistics software R, type
exampletable = matrix(c(65,21,42,30),nrow = 2)
chisq.test(exampletable)
which will give you the numbers above. X-squared is the quantifier of imbalance, p is the "3 out of 100" above (given as a fraction).
Sorry, it's called Neyman-Pearson paradigm.
When you are going in the deep use of advanced statistic, honestly I'm no more able to follow.
But training is to work with the athletes, to know their life, to see the enviroment where they train and live, to know the small injuries they can have, to see how strong is there Group, and ALL this points can't be included in any algoritm, for the simple reason that the statistician/mathematic making the calculation doesn't know them.
So, I don't use any algoritm for telling you that Kimetto was very far from the possibility to run 2:01:30, also if, of course, could run A LITTLE BIT FASTER than his WR.
Looking at the last WRs, everytime we can see room of improvement if....
Makau ran 2 km under 5'30" (if I well remember, between 26 and 28) for attacking Gebre, and this is not the best way for a WR, so, with even pace, probably his 2:03:38 could be something very close 2:03.
Wilson Kipsang, in 2013, bettered this record, but the weather conditions were worse than two years before (Florence Kiplagat won both in 2011 and 2013, and she told me 2011 was very much better, and myself found difference only walking from the Hotel to the finish of the marathon), so arguably he could already run under 2:03.
However, NEVER we can say that better conditions can give, for a WR, an improvement of 1% in long distances.
You must think that long distances NEVER can have some specific advantage from external factors, the only advantage is... not to have some disadvantage ! So, perfect weather conditions and perfect pace are not advantages, but the normal situation we can find when we try a WR.
Instead, about technical events, we can find some advantage, at the limit of the rules : for example, 2 m/s of tail wind for sprint and horizontal jumps, every type of wind for throws (I was in Neubrandenburg in 1988 when Reinsch bettered the WR of discus, and there was a very strong wind against, helping the performance in reason of not less than 5 meters), so sometimes a WR can go out of every statistical calculation (Beamon 8.90 in Mexico, for example).
And, in this case, physiology has important role, and doesn't respond to statistical parameters.
For example, are we sure that a total even pace can produce better performances than a correct "negative split" ? The most part of record in long distances (athletics and swimming) are with negative splits (the records of Bekele in 5000 / 10000m, Shaheen in steeple, Tadese in HM, Kimetto and Wilson Kipsang in Marathon, Florence Kiplagat in HM, Paula Radcliffe for Marathon, Wang Junxia for 3000 and 10000m, Dibaba for 5000m, all the record in swimming) : why we have to think that even pace can produce better results ? And, if we want to support this idea, this is not OPPOSITE the statistical analysis, when we see that 90% of the best performances in long distances are with negative split (the last 5000m of Ayana in Shanghai is another clear demonstration) ?
About Marathon, again I explain that this is an event OUT of the range of the other events, because is the only one run UNDER THE THRESHOLD LEVEL.
For this reason, we can find a trend for calculating the potentiality of athletes from 800m to HM, but we have to exclude Marathon, because the physiological requirements are different.
To use statistic for arriving to conclusions regarding physiology and methodology is very Dangerous, and the most part of time produces fake results.
Many years ago, for example, a Group of mathematicians wanted to analyze the average of the age of all the Olympic Finalists for 100m and 200m (Men) from 1948 and 1972, and could see it was about 21 years (now I don't remember exactly). Their conclusion was that, after that age, the specific qualities for sprinting went to decrease.
Of course, they didn't know anything about the athletic history, otherwise could know, for example, that the 3rd runner for US in the relay 4x100m in Munich Olympics was Mel Pender, already 37 years old.
The reason because the average was 21y, instead, was that in all those years Athletics was a sport without money, the best sprinters were all students in American Universities, and, finishing their studies, they had to quit athletics for finding some job. Now athletics is a profession, and the best sprinters are around 30y at their max level.
Another example is a Group of mathematicians that went to study the evolution of Marathon Men and Marathon Women, comparing the two different trends, for arriving at the conclusion that, about the period we live, women had to overtake men. This because they didn't consider that, in 1988, the Marathon for men had already long life, while for women was at the beginning of the evolution, and this fact could produce faster improvements.
One improvement of 1% in athletics, when we speak about WR or, in any case, top performances, is very much : it means, for example, moving from 6 meters in pole vaulting to 6.06, or in 1500m from 3'26" to 3'24", or in 10000m from 26'17" to 26', and is really very rare this can happen.
Using statistics, a validity uses a very wider range, that makes it useless for every record where there are precise measures (swimming too).
Today, Paul Tergat, who shocked the World running under 2:05 for the first time in 2003, is no more in the best 30 performer all time, and his PB is in the range of 1% from the current WR.
This is not only a limit of statistics, but also a limit of the studies about physiology. We can't find ANY study regarding top athletes, and the effects of training, so we use data coming from completely different subjects, extending their validity to the best in the World.
This is a mistake, because the best are not part of the same family : for becoming top Champions, they already have more natural qualities, use a very more hard training because they can have advantages from their performances that normal people can't have, organize their life focusing for the sport activity like normal persons can't do : so, all the preliminar conditions are different.
This is also the reason because there is the diffuse idea that all the best are doped : BECAUSE THERE ARE NOT REAL RESEARCHES ON THE EFFECT OF TRAINING FOR THE BEST ATHLETES. And, if we don't know the effects of training, how we know the combined effect of training and doping ?
Dear Renato,
>For this reason, while I think possible to individuate the theorical possibilities of an athlete (Always becoming reality with proper training only)
yes, most definitely, I agree. As one would assume that all athletes in the database have received proper training (in whichever sense of the word), any statements derived from it will hold only for other athletes who have received proper training.
No reliable statements can be made about athletes who receive different kind of training.
> I don't think possible to find any formula for predicting the possibility in Marathon, because this event requires some specific physiological quality NOT required by any other event in athletics.
I think it is possible, sorry to contradict here. I conclude this from two things:
(1) there is an algorithm that actually makes predictions which are on average only 3% off (= the Purdy scheme), this is better than guessing (our table 6).
This is statistical evidence in the quantitative sense, derived by inductive reasoning using the statistical syllogism.
(2) You say a special qualities separate a Marathoner from a short distance runner (Renato: their max speed is very different and, on the other side, their cost of running changes on individual basis when the distance becomes longer).
These qualities become apparent on distances different from the Marathon (Renato: In the middle, these two athletes can run the same time, but their correlation with longer or shorter distances doesn't follow the same trend).
Therefore, by observing performances on distances different from the Marathon, one can make statements about the Marathon qualities.
Thus, predicting Marathon performance from other performances (in the sense of making plausible statements) is possible.
This is evidence in the qualitative sense, derived by deductive reasoning from your expert knowledge.
Please correct me if you think I got either statement or the conclusion wrong.
> This is a mistake, because the best are not part of the same family : for becoming top Champions, they already have more natural qualities, use a very more hard training because they can have advantages from their performances that normal people can't have, organize their life focusing for the sport activity like normal persons can't do : so, all the preliminar conditions are different.
Agreed - it may be problematic to generalize from the more normal athletes to the world elite, I did acknowledge that a couple of times.
(an express cautionary statement of this kind is now in the working version of our paper)
The correlation matrices (though not very sophisticated) I showed indeed seem to tell you that something interesting may be going on when you approach the top, and things start to behave differently. I am not sure how or whether models like ours are capable to pick the subtleties up, one would just need more data to see what happens.
Anyway, the preliminary conclusion I would draw from this is that a systematic research on the high end may prove quite interesting.
For knowing something about probability, please tell me why you agree with a formula that predicts someone shaving a kilometer off the WR Marathon time?
Is that "probable" or not fully explaining the entire situation? That is where the debate starts. There is at least empirical evidence of what 26:22/12:39 athlete can produce in the Marathon.
How do you test probability of something that has never happened? There is not a valid way except to debate on what we know. Excuse my lack of knowledge of a complex mathematical formula of 150,000 hobby joggers, but I am interested in talking about Bekele.
And my point about the 3 hour guy- I am saying theres data that supports what a 5k or 10k should be with regards to a 3 hour guy. There is zero data on what it takes to run 2:00:32.
The prediction just is that, a prediction.2:00:32 is far out at the moment. With respect to the two organizers of the study, its consideration to weigh the performances of homogenous subjects heavier?
Dear Franz,
maybe it's possible, but I don't know in which way.
From our observation, for example, many times we have athletes with similar morphology, running the same times in 1500m, 5000m, 10000m and HM, and, when they move to Marathon, the difference becomes very high (for example, more than 5 minutes, between 2:05 to 2:10 or 2:12).
And they have the same age, the same athletic history, the same coach.
So, the question is : WHY ?
Because there is a set of qualities at the base of every performance with some common denominator, and some other quality out of the set.
We must think athletes of endurance works with two different engines :
1) Biomechanic engine (and it's easy to compare this type of engines in two different athletes)
2) Bioenergetical engine (and this is more difficult to compare).
The bioenergetical engine works in different ways if we look at some performances where the lactic component is more evident, or where this is very little (or doesn't exist, like in the 100 km race).
So, we say that, STARTING FROM SHORTER DISTANCES, when we want to project the value of some athlete to longer (in this case, LONGEST) distances, one of the points is the ability to adapt the bioenergetical engine to different energy requirement (1), and to adapt the biomechanical engine to the correct technical action, in order to low the cost of running.
Under this point of view, for example, Eliud Kipchoge can be an example : his technical action is perfect for running Marathon, using a right combination between stride length and frequency. This is not the same for Kenenisa, who uses too long strides (compared with the frequency), or for Mo Farah (for the same reason).
This is one of the points because, if an athlete is very fast in short distances, needs more time for reaching a performance of similar level in Marathon.
So, we can say that :
1) A high percentage of fast runners in 5000 / 10000m DON'T HAVE the bioenergetical qualities for running Marathon at the same level of their short distances (so, are not predictable with any algoritm, considering their PB in shorter distances, also looking at their future career, because they can't change their physiology)
2) A high percentage of fast runners in 5000 / 10000m DON'T HAVE the biomechanical adaptation for the right technical action at the speed and at the duration of a Marathon (so, are not predictable for SHORT TERM performances in Marathon, but in some cases can learn how to change their running technique, and in this case we can find correlations between what they were able to do in short distances, and their Marathon possibility, in a period of 5-6 years of difference).
Renato Canova wrote:
This is also the reason because there is the diffuse idea that all the best are doped : BECAUSE THERE ARE NOT REAL RESEARCHES ON THE EFFECT OF TRAINING FOR THE BEST ATHLETES. And, if we don't know the effects of training, how we know the combined effect of training and doping ?
Also why none of the drug tests are reliable.
Ok. We both agree that he could run faster.
I think he could have run between 2:01:30 - 2:01:55. What do you think he could have run that day?
I looked at the splits here:
http://www.letsrun.com/news/2014/09/20300-barrier-marathon-gone-dennis-kimetto-runs-20257-break-world-record-berlin/. It says that one of the kilometers was run in 2:39! One of Kimetto's 5k splits was 14:09! Just based on intuition alone, I'd say that is worth at least 30 seconds.
Of course, this is all assuming that those leader splits are the same as Kimetto's. I did not watch the race. Was he with the leaders for the entire race?
I agree. That is why the algorithm does not use "statistics" or "mathematics". Instead, it is based on physiology and physics.
If really Kimetto, Emmanuel Mutai and the other runners had run the km between 21 and 22 in 2:39, this could bring to a final difference of 1 minute compared with the real time. But this NEVER happened : in the Letsrun table with all the splits, you must read, at 22 km, the time (2:39) plus the time between 21 km and HM : the split of 2:39 is referred to a distance of 893m (the distance between HM and the next km), so the real km was 2:55, perfectly on line with the pace used till 30 km.
I repeat : with everything perfect, Kimetto could run, maybe, 10 seconds better. Weather conditions were perfects, the pace was optimal, and he had a strong competitor (Emmanuel Mutai) till the end. I don't see any big room of improvement, how, for example, I saw in the two previous WR of Patrick Makau and Wilson Kipsang.
eh ???
what is mathematically impossible ???
why ???
if an elite guy like soulemain, who looked in 1'43-flat shape in doha & earlier in year ran a tactical 13'20, probably worth close to 13'00 in a tt, wants to know what his potential 1500 is now, are you going to tell him
" no can do
it's mathematically impossible !!!"
offer your 2 point formula
Sorry I indeed meant 0.0306
Dear Ventolin,
I must admit that I'm slightly in the dark as to your two point method.
By two point I understand that you take 2 times and predict a third.
Is this correct?
Also, you are claiming that the formula:
Tnew = (T1/D1+ (T2/D2 - T1/D1)*log(Dnew/D1)/log(D2/D1))*Dnew;
is not correct?
My point about "mathematical possibility" is that
it is not possible to derive the formula given the times that
you quote without additional information. E.g. is it a linear predictor
in log-coordinates, for example.
The information you gave does not sufficiently constrain the possible
solutions.
Perhaps you have a blog entry or paper where the details are described?
I tested the above formula and it seems to perform poorly, thus it
would be interesting to test the correct formula.
Dear Ventolin,
I checked if the predictions you give hold for the quoted formula.
They do indeed, up to a small error.
100m | 200m | 400m | 800m | 1500m | Mile | 5km | 10km | Half-Marathon | Marathon |
9.8 | 21.6 | 47.4 | 1 min, 43.0 | 3 mins, 27.1 | 3 mins, 43.9 | 13 mins, 0.0 | 27 mins, 43.1 | 1 hour, 2 mins, 22.9 | 2 hours, 12 mins, 0.6 |
i posted link to webpage few pages ago
http://www.jundo.co.uk/clearly you didn't read the blurb
that is not format of the formula i recognise, but you got correct results with it
eh ?
read the site
it gives times for 400 - 10k on the track
input/output involving 100/200 doesn't work as those races are not run at "even pace"
neither are road races because of varying courses
see website
only a website to point you to
it is meant for 400 - 10k on track
it however works very well for road races for radcliffe in her '02 shape, running 4'01 time trial in practice - it's in her autobiography ( better than her official 4'05pb ), 8'22pb ( which shouda been nearer 8'20 as slow pace to 2k ), 14'31 ( which was nonsense - she jogged to 600m & ran last 4400m of race at 14'24 pace with no help ) & solo 30'01 solo in poor weather with uneven pace ( worth nearer 29'50 in good conditions ) & a 2"17+
generally however, it isn't for road races
Duncan Blythe wrote:Dear Ventolin,
I checked if the predictions you give hold for the quoted formula.
They do indeed, up to a small error.
100m | 200m | 400m | 800m | 1500m | Mile | 5km | 10km | Half-Marathon | Marathon |
9.8 | 21.6 | 47.4 | 1 min, 43.0 | 3 mins, 27.1 | 3 mins, 43.9 | 13 mins, 0.0 | 27 mins, 43.1 | 1 hour, 2 mins, 22.9 | 2 hours, 12 mins, 0.6 |
like i said
website says for 400 - 10k on track
however works well for radcliffe in '02 for road
ventolin^3 wrote:
it however works very well for road races for radcliffe
but it didn't work so well for Asbel and Aman 3 weeks ago.
Dear Renato,
> maybe it's possible, but I don't know in which way.
if you allow for a few minutes average error, it is possible with the various prediction models, say Purdy, or the one we propose in our paper which is a bit more accurate.
Though with current technology, we only know that we can predict around the same year as the other performances - you probably were referring to developments after longer time spans.
> From our observation, for example, many times we have athletes with similar morphology, running the same times in 1500m, 5000m, 10000m and HM, and, when they move to Marathon, the difference becomes very high (for example, more than 5 minutes, between 2:05 to 2:10 or 2:12).
And they have the same age, the same athletic history, the same coach.
We observe this as well - the average error of prediction is around 3 min with the best we can do. Possibly, one cannot do better - or one cannot do better with taking only other performances into account, but maybe physiological parameters? Or maybe one can do better?
In any case, this is an interesting topic for future studies.
> 1) A high percentage...
This sounds plausible, also given the examples you mention.
We cannot say anything about long term predictions for 5-6 years at the moment, so I can also not comment on your hypothesized biomechanical/bioenergetical model.
This is because our study only looked at "snapshots" in a one year window. Looking at the effect of training over the years is perhaps not possible from the dataset we have, because there are no explicit records of what kind of training the athletes did receive.
It would be quite interesting to check whether one can see in the data what you describe as the two engines, and whether one could pick up at an early stage which of the two types (1) or (2) the athletes belong to (interesting question: how early?).
Though that would require following a somewhat reasonable number of athletes over the years, including records on their training (we currently do not have such data, sadly, nor an opportunity to obtain such data). And, of course, it would involve future research.
Anyway, I guess we now have discussed a lot and have reached a common denominator of some kind?
Or is there anything where our opinions still differ greatly?
At least, I from my side have gained a number of new insights, thanks for your explanations.
As I see it, the general discussion has also ebbed down quite a bit, and I think most arguments have been stated and talked about. There seems to be only ventolin left trolling Duncan.
[Duncan, please do not feed the troll.
Ventolin, we are not trying to sell anything, our code will be freely available. We'll leave you to roam the letsrun forums soon enough. I even explained to you the chi-squared test you wanted to learn, for free. Please be nice to your fellow men.]
I'm going to read carefully through everything again and write some thanks to everybody (thanks to everybody!), and a short summary of the discussed issues relevant for our research. Just in case anybody will ever look this thread up.
I guess we can then let this thread with the somewhat ill-fitted title be archived in peace.
(except if it somehow restarts, one never knows, but I can't quite imagine that)
What do you think, Renato?
Ventolin doesn't need feeding, he's self-exitated
RIP: D3 All-American Frank Csorba - who ran 13:56 in March - dead
RENATO can you talk about the preparation of Emile Cairess 2:06
Running for Bowerman Track Club used to be cool now its embarrassing
Hats off to my dad. He just ran a 1:42 Half Marathon and turns 75 in 2 months!
Great interview with Steve Cram - says Jakob has no chance of WRs this year