TL;DR: The international drug testing agencies lack scientific transparency. Most of us probably assume that drug testing is reliable, but in fact there is very little public information to know how accurate these tests actually are. What are the false positive rates? We simply don't know. As supporters of the sport we should all demand much better.
In the last month we saw first the sad news that Peter Bol had tested positive for EPO, and then last week the stunning announcement that his B-sample was negative (actually "atypical, whatever that means). While Peter Bol has been formally cleared, his reputation may never fully recover.
Given the high stakes, we should certainly hope that drug testing is accurate. But is it? I would argue that we have almost no idea. There is a huge lack of scientific transparency from the testing agencies. What is the accuracy of any given test? What is the probability of a positive test in an innocent athlete? As far as I can tell, very little in the public domain allows us to assess any of this.
To think about these questions, it's important to understand what drug testing aims to do. Drug testing is much like a diagnostic test in medicine, and we should interpret it in much the same way. First, the test needs to have analytical accuracy. A test reports that molecule x is present at concentration y: How accurate is that? Second, we need to interpret the analytical finding: Is molecule x at concentration y actually strong evidence that the athlete has consumed banned substances?
I said that drug testing is like diagnostic testing in medicine, but there is a huge difference: if you want to deploy a new diagnostic test in healthcare, you need to get formal approval by an external agency (in the US it’s the FDA). Evaluation involves a full scientific evaluation of both criteria: does the test measure what it says it does, and how reliable is that for diagnosis? Despite the high stakes in international drug testing, there seems to be little or nothing in the public domain that would allow us to have faith that the PED tests work well.
Not only are the stakes high for individual athletes, but the system is presented with a huge multiple testing problem. In 2019, WADA tested 278,000 different samples (across all sports). Each sample is tested for many controlled substances. Even a tiny false positive rate per test -- let's say 1 error in 10,000 -- will result in many false positives.
What would I like to see? Ideally there would be peer-reviewed papers, or at minimum publicly available white papers, published by the drug testing agencies, that document the analytical accuracy and positive predictive values of their own tests. Instead, the peer-reviewed literature is generally from academia, generally with small sample sizes, often with poor-quality data analysis, and does not convince us about the efficacy of the tests.
I'll briefly cover two recent examples as illustrations of the main challenges:
Last month, Peter Bol's A sample tested positive for EPO. As far as I can tell from the limited amount of public information, the test evaluates presence of synthetic EPO (which contains certain protein modifications (PTMs) that are not present in naturally occurring human EPO). In this case, the focus should be on the analytical accuracy of the test since, to my knowledge, there is unlikely to be an innocent explanation for presence of synthetic EPO. Unfortunately, a recent published paper on EPO testing suggests a very high false positive rate for detection of synthetic EPO: around 5%! (PMID: 31232530). It's hard to imagine that the testing agency actually has such a high false positive rate as this would imply thousands of false positive A samples per year, but the point is that we have no idea what their false positive rate is! Even assuming a lower false positive rate of 1% per test would yield a rate of 1/10,000 for both A and B samples under the best-case scenario that false positive readings are independent of each other. Even these best-case assumptions would imply a rate that is unacceptably high considering that as many as 300,000 tests are performed each year.
As a second example, in 2021 Shelby Houlihan tested positive for high levels of a naturally occurring steroid, nandrolone. Interpretation of high nandrolone is much more difficult than detection of synthetic EPO, because we all have nandrolone in our bodies -- the key question is whether the levels are too high for an innocent explanation. Houlihan's reading was reported at 5 ng/mL which is 2.5x the WADA threshold of 2 ng/mL. How was this threshold decided? This does not seem to be documented in publicly available sources, as far as I can tell.
The peer-reviewed literature is modestly helpful on this: a 2008 study of 1202 female volunteers in England found 13 women over 1 ng/ML, and two women over 2 ng/ML. There is evidence that some of the highest recordings may have been due to use of steroidal contraceptives. There is also evidence that nandrolone spikes briefly during the menstrual cycle. These data show clearly that Houlihan's nandrolone reading was unusually high (assuming that the reading itself was accurate), but also show that nandrolone levels vary greatly in women, and may be high for reasons unrelated to use of performance enhancing drugs. Furthermore, there have been repeated concerns about nandrolone contamination in the US food system. I am not aware of any large-scale nandrolone survey data in the US that would even allow us to estimate how unusual a reading of 5 ng/ML is among US women. I would guess that the English data are consistent with Houlihan’s result being around a 1 in 10,000 type of reading, but remember that even this type of significance level will produce false positives somewhere in the testing system every year.
In summary, we simply lack the evidence to evaluate the accuracy of these tests. It is unclear whether the testing agencies have high quality unpublished data to back up their claims, or if -- as I strongly suspect -- they simply lack convincing data at all. The academic peer-reviewed literature is of variable quality, and not uniformly supportive of the tests as they stand.
At present we are simply asked to trust that the testing agencies are good at their jobs, but frankly everything we have learned about governance in sports in the last few years would argue against simply taking them at their word for it.