Dear Renato,
Thanks for your comments!
Please let me answer the points you raise separately.
> It's like to study a dog Terranova or Bull dog, and after to transport the same statistical and physiological data on a levrier which competes for winning a race.
Indeed the very elite athletes are “at the boundary” of the 150.000 subjects – that is, they seem to be very special. We already remarked this in finding (IV.ii) of the pre-print. The point you raise is very valid, and also overlaps, partly, with what I said in my reaction to your previous post.
We do very well for terranovae and bull dogs, and part of our reasoning in the paper is that levriers behave similarly. Which, while plausible via inductive analogy, need not necessarily be, and it has to be checked in the future.
So we added some caveats which will appear in the final version, among them this one:
“N.B.: We would like to stress that such predictions need to be taken with much caution, as they are only insofar correct as our model extends, from the general population of British runners (who successfully participated in official events), to the very extremes of human performance.”
Though one thing to point out is that our framework is, in principle, applicable to levriers as well – we just need enough of them. They might be rare, but with a bit of effort one might get hold of a high enough number…
About mathematics, deduction and induction:
>Mathematics are Inductive Science :
>Physiology, instead, is a DEDUCTIVE SCIENCE :
I would disagree here, and not draw a separating line in either case.
Mathematics allows for deductive arguments (axioms -> logical reasoning -> necessary conclusions), often found in the pure fields, and inductive ones, on which large part of statistics relies (observation -> experiment -> generalization), including inference, the statistical syllogism, and predictions in the sense of our paper.
We have used both kinds of arguments in our work – the argument that the model is rank three is inductive, from the prediction experiment; while the statement that this implies three coefficients per athlete is deductive, from theoretical properties of “matrix rank” (determinant rank = column/row rank).
The predictions about the elite athletes are again inductive, thus not necessarily true in the sense of logical stringency, but only plausible under generalizability assumptions.
In physiology and medicine you also have both kinds of arguments. Deductive: knowledge about chemistry, physics -> necessarily implied knowledge about physiological processes, or inductive: lab experiment/clinical trial -> working hypothesis about physiology, therapy, training etc.
Reiterating, I would really not want to draw a boundary along lines that are, very often, rather social phenomena than scientific necessity.
About correlation:
> At the max level, there is no correlation between 10000m and Marathon
I computed the (Spearman rank) correlation matrix of (male) athletes in their best year, here it is:
perf.1 perf.2 perf.3 perf.4 perf.5 perf.6 perf.7 perf.8 perf.9 perf.10
perf.1 1.000 0.935 0.785 0.541 0.208 0.628 0.248 0.323 0.241 0.122
perf.2 0.935 1.000 0.849 0.649 0.358 0.720 0.383 0.382 0.247 0.188
perf.3 0.785 0.849 1.000 0.777 0.559 0.806 0.559 0.495 0.415 0.373
perf.4 0.541 0.649 0.777 1.000 0.919 0.927 0.735 0.697 0.750 0.645
perf.5 0.208 0.358 0.559 0.919 1.000 0.966 0.855 0.824 0.846 0.808
perf.6 0.628 0.720 0.806 0.927 0.966 1.000 0.923 0.921 0.905 0.864
perf.7 0.248 0.383 0.559 0.735 0.855 0.923 1.000 0.960 0.949 0.889
perf.8 0.323 0.382 0.495 0.697 0.824 0.921 0.960 1.000 0.964 0.920
perf.9 0.241 0.247 0.415 0.750 0.846 0.905 0.949 0.964 1.000 0.931
perf.10 0.122 0.188 0.373 0.645 0.808 0.864 0.889 0.920 0.931 1.000
The events are ordered as in the paper. That is, perf.1, perf.2,… perf.10 are 100m, 200m, 400m, 800m, 1500m, the Mile, 5km, 10km, Half-Marathon, Marathon. So perf.8 = 10km, perf.10 = Marathon.
How this is to be read, for the non-statistician: the entries are between -1 and 1. The closer an entry in the row/column of two events A,B is to 1, the more it means: “if an athlete is better at event A, he will be better at event B”. This is why for A=B you have a 1. The closer to -1, the more it means “ … better at A, … worse at B”. (for more about the math, look on Wikipedia for “correlation”)
Note that all entries above are positive – so being better at one event makes you, on average, better at any other – if you are a random person among the 85,498 male athletes I looked at for this analysis, that is (females and athletes with only one attempted event are not in here, for technical reasons).
Also, the correlation between 10km and Marathon is 0.92, so quite high. Naively, a contradiction to your claims – but as said these are mostly “normal” athletes (bull dogs and terranovae).
Now here is the surprise (well, for me at least): I restricted the data to people who achieved top percentile performance in any event, i.e., the one-in-a-hundred performance elite – roughly as close to the very top as the data set allows to look without getting serious problems of estimating these numbers (2643 events of the 1064 very best athletes).
perf.1 perf.2 perf.3 perf.4 perf.5 perf.6 perf.7 perf.8 perf.9 perf.10
perf.1 1.000 0.296 -0.084 NA NA NA NA NA NA NA
perf.2 0.296 1.000 0.119 -0.855 NA NA NA NA NA NA
perf.3 -0.084 0.119 1.000 0.110 -0.537 -0.357 -0.484 -0.881 NA NA
perf.4 NA -0.855 0.110 1.000 0.337 0.527 -0.069 -0.021 0.117 -0.214
perf.5 NA NA -0.537 0.337 1.000 0.819 0.517 0.349 0.388 0.415
perf.6 NA NA -0.357 0.527 0.819 1.000 0.476 0.309 0.288 NA
perf.7 NA NA -0.484 -0.069 0.517 0.476 1.000 0.687 0.451 0.433
perf.8 NA NA -0.881 -0.021 0.349 0.309 0.687 1.000 0.575 0.517
perf.9 NA NA NA 0.117 0.388 0.288 0.451 0.575 1.000 0.710
perf.10 NA NA NA -0.214 0.415 NA 0.433 0.517 0.710 1.000
(NA here has no direct relation to Batman, it just means that there were not enough athletes who attempted both the row and column events for a reliable estimate).
What one sees is that the correlations drop a lot, some of them even become negative – even though from what we know, the data of the top percentile athletes are much “cleaner”. Correlation between 10 km and Marathon is around 0.5. So the general trend does seem to be in qualitative accordance with what you claim.
This is very interesting and indicates that something curious happens at the boundaries - and that we can pick it up. It also makes me hope we can perhaps get more levriers and/or data on terranovae training to become levriers…