Who Gets In If We Use The Butler Projections Instead Of The Rankings?

2014 NCAA Cross Country Championships

By LetsRun.com
November 13, 2014

As loyal LetsRun.com readers, we’re 100% certain that over the last two days, you’ve read which men’s and women’s teams will be going to the NCAA Cross-Country Championships if Friday’s regional results play out according to the USTFCCCA’s Regional Rankings (if not, read them now: Women’s Preview: Who’s Going (And Not Going) To NCAAs?, Men’s Preview: Who’s On Track To Qualify For The Big Dance?). But as any sports fan knows, things don’t always going according to plan.

In this article, we project who is going to NCAAs based on the Butler Projections, a projection system similar to chess’ Elo ratings. The Butler Projections are the brainchild of University of Missouri – Kansas City assistant coach James Butler, and the full explanation of the system can be found on his website. As Butler writes,

The system compares the expected outcome of a race based on the rankings and the actual outcome of the race based on finishing time. It then adjusts the rankings to more closely match the result.

All the runners start with a ranking of 1000. As the computer starts comparing performances, those that do better than expected gain points while those that underperform lose points. Once the computer has run through the entire season, the resulting rankings are used and the season is run through several thousand more times.

Article continues below player.

Eventually, the rankings converge on specific values, that is to say they no longer increase or decrease as the results are run through. These final values are what is used to rank the individuals with team scores derived from the individual rankings.

We’ve inputted the Butler Projections’ predicted regional results into the computer program created by former Duke runner Bo Waggoner, to predict which teams end up qualifying for nationals. Those results can be found below. We’ve also compared the results to the original results based on the USTFCCCA Regional Rankings. Neither projection is perfect, but it will be interesting to look back and see whether the coaches or the computers are more accurate in predicting which teams end up going to nationals.

Men’s Races

USTFCCCA Butler Projections

Automatic qualifiers (differences in bold)

1 Wisconsin Wisconsin
2 Michigan Michigan
3 Villanova Villanova
4 Georgetown Georgetown
5 Oklahoma St. Oklahoma St.
6 Tulsa Minnesota
7 Colorado Colorado
8 NAU New Mexico
9 Syracuse Syracuse
10 Iona Providence
11 Florida St. Florida St.
12 Mississippi Mississippi
13 Arkansas Arkansas
14 Texas Texas
15 Furman Furman
16 NC State NC State
17 Oregon Portland
18 Portland Oregon

At-large qualifiers

19 Stanford Iona
20 North Carolina NAU
21 Virginia Stanford
22 Washington UC Santa Barbara
23 UCLA Washington
24 New Mexico UCLA
25 BYU BYU
26 Air Force Colorado St.
27 Colorado St. Southern Utah
28 Southern Utah Indiana
29 Providence Michigan St.
30 E. Kentucky Navy
31 Oklahoma Penn St.

First teams out

32 Indiana North Carolina
33 Michigan St. E. Kentucky
34 Iowa St. Virginia

Click here for the USTFCCA’s full Regional Rankings; click here for the Butler Projections for each region.

The systems agree on 25 of the 31 teams. The USTFCCCA rankings have Tulsa, North Carolina, Virginia, Air Force, Eastern Kentucky and Oklahoma qualifying, while the Butler Projections have Minnesota, UC Santa Barbara, Indiana, Michigan St., Navy and Penn St. instead.

The differences arise from the Midwest and Southeast. Butler has Minnesota (USTFCCCA #7 in the Midwest) getting second in that region. That in and of itself doesn’t throw a ton of things off, but add in that the Butler Projections have Illinois (USTFCCCA #6 in the Midwest) getting third and Iowa St. fourth and that ends up blocking Tulsa and Oklahoma. Tulsa and Oklahoma can’t push Illinois into NCAAs, and Iowa State doesn’t have enough points to push Illinois, creating a logjam.

There’s a similar situation in the Southeast, where Virginia (which ends up with six points) is projected to finish behind North Carolina and E. Kentucky, blocking the Cavaliers. Those blockages in the Midwest and Southeast open up room for other teams to get in from the Great Lakes (Indiana & Michigan St.) and Mid-Atlantic (Navy & Penn St. – as the Butler Projections have Navy beating both Penn State and Princeton).

There are problems with the Butler Projections, though. For example, in the Northeast, Butler has Syracuse putting seven in the top eight and Iona losing to Providence even though the Gaels crushed them at Wisconsin. It seems to punish the Gaels for only running in one big race (Wisconsin). Amazingly, the Butler Projections say that Iona won’t put anyone in the top 19 (Butler has its top finisher in 20th) which is absurd.

It also seems like a stretch for Minnesota and Illinois to take second and third at the Midwest Regional after they were seventh and sixth, respectively at Big 10s.

The main problem with the Butler Projections is that there simply isn’t enough data from the regular season, especially for teams like Iona who compete in a weak conference. That can also lead to inflated individual rankings. Here are the top individuals according to the Butler Projections:

~~1. Stanley Kebenei, Arkansas 1145~~
~~2. Ricky Brown, Bethune-Cookman 1128~~
~~3. Craig Lutz, Texas 1122~~
~~4. Gabe Gonzales, Arkansas 1118~~
~~5. Tyler Udland, Florida State 1114~~

Edward Cheserek is at 1081 — still good enough for first in the West Region, but not as high as he should be. Ricky Brown, who was third at the MEAC Championships in 26:58, over a minute behind the winner, probably isn’t the second-best runner in the NCAA. How did that happen?

Update: We received an email from James Butler, who explained that it’s not really feasible to compare individual rankings across regions. Here’s what he said:

In order to produce a regional projection I have the computer only look at the teams within that region so each region’s ranking is an island alone from the others. This isn’t ideal but I have to do it for the sake of computational time. Ideally, I could just enter the results of every meet in tfrrs, select every team in the country, compute the rankings and then derive regional rankings from that. These would be the most accurate. The problem is the computational time would be close to weeks if not months. Computing each region is still usually 2-4 hours per gender depending on the region’s size. To give an example of the amount of computations done, in one 200 person meet the computer compares each runner to the other 199. It then does this 10,000 times. 200*199*10000 = 398,000,000. That’s just for 1 meet.

Of course, the USTFCCCA Regional Rankings aren’t going to completely hold up on Friday. If you assume most teams run five major meets per season (first weekend of October, Wisconsin/Pre-Nats, Conference, Regionals, NCAAs), then 40% of the season has yet to be completed. It’s near impossible for the Butler Projections, which rely on head-to-head matchups to determine its rankings, to be totally accurate as there isn’t a lot of data and momentum, which is key for a sport like cross country, is left out as well.

The reason why Elo ratings work well in chess is that players play a series of matches over multiple years, making their ranking more and more accurate as their careers proceed. The highest-ranked player in chess’ FIDE World Rankings (based on an Elo system) has a rating of 2863; the highest-ranked NCAA cross country runner under Butler’s system is at just 1145. The closer the top player is to 1000, the less data the system has. Clearly, Butler’s system is at a major disadvantage compared to international chess because he’s only analyzing data from a single season. Given more time (and more information), the cream would separate and the top players would climb into the 2000s. Unfortunately, each athlete only has one more race from which to gather data (Regionals) before the season’s final race. The nice thing about the Butler Projections is that they become more accurate as the season goes along, so they should do a better job predicting NCAAs than it does Regionals.

Women’s Races

USTFCCCA Butler Projections

Automatic qualifiers (differences in bold)

1 Michigan St. Michigan St.
2 Wisconsin Wisconsin
3 Georgetown Georgetown
4 West Virginia West Virginia
5 Iowa St. Iowa St.
6 Minnesota Minnesota
7 New Mexico Colorado
8 Colorado New Mexico
9 Iona Syracuse
10 Syracuse Dartmouth
11 Florida St. Florida St.
12 Vanderbilt Vanderbilt
13 Arkansas Arkansas
14 Baylor Baylor
15 North Carolina North Carolina
16 Virginia NC State
17 Oregon Oregon
18 Stanford Stanford

At-large qualifiers

19 Michigan Virginia
20 Ohio St. Michigan
21 NC State Boise St.
22 Washington Portland
23 Arizona St. Washington
24 Boise St. Toledo
25 UCLA Ohio St.
26 Toledo BYU
27 Dartmouth Penn St.
28 Boston College Princeton
29 Virginia Tech Lamar
30 BYU SMU
31 Notre Dame Iona

First teams out

32 Providence Providence
33 Villanova Boston College
34 Princeton Texas A&M

Click here for the USTFCCA’s full Regional Rankings; click here for the Butler Projections for each region.

Again, the over/underperformance of some schools in the Butler Projections accounts for the difference in teams selected toward the end.

UCLA and Arizona St. (seventh and fourth in the USTFCCCA Regional Rankings) finish just eighth and ninth in at the West Regional in the Butler Projections and get blocked by pointless Loyola Marymount and UC Davis (it’s worth pointing out that even if they both beat one of those schools, they couldn’t push the other in because Washington would have already pushed Portland in the same region).

Likewise, BC gets left out under the Butler Projections because it has to wait for Iona to accumulate enough points to get in (Providence, projected to finish fourth, won’t have enough to push Iona). Butler has Notre Dame finishing eighth in the Great Lakes, too far back to make use of the six points they would finish up with. It could be worse; Arizona St. is predicted to finish with eight points and miss out.

The beneficiaries of the results in the West, Northeast and Great Lakes are the squads from the Mid-Atlantic (Penn St. & Princeton) and South Central Regions (Lamar and SMU). Under the USTFCCCA projection, neither is expected to send even one at-large team; under the Butler Projections, chaos in the other regions allows them to send two at-large teams each.

The top women’s individuals according to Butler look a lot more accurate than its men’s projections, though a top five without the likes of Iowa State’s Crystal Nelson and Arizona State’s Shelby Houlihan feels incomplete.

~~1. Kate Avery, Iona 1212~~
~~2. Dominique Scott, Arkansas 1200~~
~~3. Grace Heymsfield, Arkansas 1193~~
~~4. Liv Westphal, Boston College 1188~~
~~5. Rachel Johnson, Baylor 1173~~

Who Gets In If We Use The Butler Projections Instead Of The Rankings?

2014 NCAA Cross Country Championships

You have been subscribed.

You have been subscribed.