The whole race was an outlier. There is a suggestion that the course was 200 meters short, which is nuts, and the splits are off. Comparative times are a decent method, but in high school speed ratings Bill had a lot more data to work with and was able to come up with course pars like with the Beyer speed rating. It worked especially well for the runners and courses he was familiar with in/around NY. Do these runners run enough and do these courses get raced enough to do that? And then the comparisons will easily get blown up by races tactics. A sit and kick like Piane results in a lower aggregate margin of victory although the gap between first and second was very significant and was the most telling information from the race. Piane is also interesting because if we are going with a course par approach, Tuohy broke a record set by Kelati and Kurgat by 12 seconds. So would a rating based on an aggregate margin really work? I think also I pointed out, again reflecting the outlier/screwed up nature of the SEC race, that Chelangat's initial rating did not correspond with empirical data from her race history.
I appreciate the effort but I do not envy your task.