The "suggestions", "may be", and "plausibility" came from your quotes.
The flaw in this critique can be traced to the false assumption introduced when creating the "A-model", that a new fudge-factor "p" is needed to arrive at true prevalence, when in fact it is only introduced to arrive at UQM estimates.
To summarize, the "P-model" (a subset of SSC), calls non-compliance "cheating". The "A-model" describes two categories of "cheating", one it calls "cheating" if it is deliberate, and one called "cognitive limitation" if it is accidental, and then assumes the P-model doesn't account for "cognitive limitation". This is a false conclusion, as the "P-model" accounts for all non-compliance, whether you call all of it "cheating" or want to split it up and name multiple sub-mechanisms of "cheating".
The problem with their hypothetical exercise of 1000 athletes with 55% prevalence, is that they ran the "P-model" on the "A-model" output, generated with the new fudge-factor "p". The result is also missing the percentage of non-compliance, i.e. "cheating". The fit is not good, because the distribution of the non-compliance differs between the "A-model" and the "P-model".
Regarding "supplements", no supplement data was collected at Daegu. That data comes from the Pan Arab Games -- a kind of Arabian Olympics composed of many sports besides running.
Of course SSC also has limitations. While the "P-model" implements just one non-compliance hypothesis, SSC actually evaluates many hypotheses in parallel, and chooses the best-fitting one. The set of hypotheses used in SSC for these events was limited, and could be augmented with more hypotheses. Our UQM researchers here picked a new non-compliance hypothesis which doesn't match any of them -- hence the bad fit of the P-model on A-model data. Both UQM and SSC results are in doubt when non-compliance is high.
Whether we take SSC seriously or not, this does nothing to confirm that UQM results are accurate, or superior to SSC.