Code Review of Imperial College's Study Proves It Was Garbage

A bit technical but it explains how the computer models used by Ferguson's team were "garbage in, garbage out" from the get-go.

Code Review of Ferguson’s Model

.....

Clearly the documentation wants us to think that given a starting seed, the model will always produce the same results. Investigation reveals the truth – the code produces critically different results even for identical starting seeds and parameters.

....

Imperial advised Edinburgh that the problem goes away if you run in single threaded mode like they do, which means they suggest to use only a single CPU core rather than the many cores any video game would successfully use. For a simulation of a country using only a single CPU core is obviously a dire problem – that’s as far from supercomputing as you can get. Nonetheless that’s how Imperial use the code: they know it breaks when they try to run it faster. It’s clear from reading the code that in 2014 Imperial tried to make the code use multiple CPUs to speed it up but never made it work reliably. This sort of programming is known to be difficult and usually requires senior, experienced engineers to get good results. Results that randomly change from run to run is a common consequence of thread safety bugs or more colloquially, Heisenbugs.

But Edinburgh come back and report that even in single threaded mode they still see the problem, so Imperial’s understanding of the issue is wrong. Finally Imperial admit there’s a bug by referencing a code change they’ve made that fixes it. The explanation given is “It looks like historically the second pair of seeds had been used at this point, to make the runs identical regardless of how the network was made, but that this had been changed when seed-resetting was implemented“. In other words in the process of changing the model they made it non-replicable and never noticed.

Why didn’t they notice? Because their code is so deeply riddled with similar bugs and they struggled so much to fix them, that they got into the habit of simply averaging the results of multiple runs to cover it up … and eventually this behaviour became normalised within the team.

.....

Imperial are trying to have their cake and eat it. Reports of random results are dismissed with responses like “that’s not a problem, just run it a lot of times and take the average”, but at the same time, they’re fixing such bugs when they find them. They know their code can’t withstand scrutiny so they hid it until professionals had a chance to fix it, but the damage from over a decade of amateur hobby programming is so extensive that even Microsoft were unable to make it run right.

No tests. In the discussion of the fix for the first bug Imperial state the code used to be deterministic in that place but they broke it without noticing when changing the code.

......

The Imperial code doesn’t seem to have working regression tests. They tried, but the extent of the random behaviour in their code left them defeated. On 4th April they said: “However, we haven’t had the time to work out a scalable and maintainable way of running the regression test in a way that allows a small amount of variation, but doesn’t let the figures drift over time.“

Beyond the apparently unsalvageable nature of this specific codebase, testing model predictions faces a fundamental problem that the authors don’t know what the “correct” answer is until long after the fact, and by then the code has changed again anyway, thus changing the set of bugs in it. So it’s unclear what regression tests really mean for models like this even if they had some that worked.

.....

Continuing development. Despite being aware of the severe problems in their code that they “haven’t had time” to fix, the Imperial team continue to add new features, for instance, the model attempts to simulate the impact of digital contact tracing apps.

Adding new features to a codebase with this many quality problems will just compound them and make them worse. If I saw this in a company I was consulting for I’d immediately advise them to halt new feature development until thorough regression testing was in place and code quality had been improved.

Conclusions.

All papers based on this code should be retracted immediately. Imperial’s modelling efforts should be reset with a new team that isn’t under Professor Ferguson, and which has a commitment to replicable results with published code from day one.

On a personal level I’d actually go further and suggest that all academic epidemiology be defunded. This sort of work is best done by the insurance sector. Insurers employ modellers and data scientists, but also employ managers whose job is to decide whether a model is accurate enough for real world usage and professional software engineers to ensure model software is properly tested, understandable and so on. Academic efforts don’t have these people and the results speak for themselves.

https://lockdownsceptics.org/code-review-of-fergusons-model/

Allen53 wrote:
A bit technical but it explains how the computer models used by Ferguson's team were "garbage in, garbage out" from the get-go.
Code Review of Ferguson’s Model
.....
Clearly the documentation wants us to think that given a starting seed, the model will always produce the same results. Investigation reveals the truth – the code produces critically different results even for identical starting seeds and parameters.
....
Imperial advised Edinburgh that the problem goes away if you run in single threaded mode like they do, which means they suggest to use only a single CPU core rather than the many cores any video game would successfully use. For a simulation of a country using only a single CPU core is obviously a dire problem – that’s as far from supercomputing as you can get. Nonetheless that’s how Imperial use the code: they know it breaks when they try to run it faster. It’s clear from reading the code that in 2014 Imperial tried to make the code use multiple CPUs to speed it up but never made it work reliably. This sort of programming is known to be difficult and usually requires senior, experienced engineers to get good results. Results that randomly change from run to run is a common consequence of thread safety bugs or more colloquially, Heisenbugs.
But Edinburgh come back and report that even in single threaded mode they still see the problem, so Imperial’s understanding of the issue is wrong. Finally Imperial admit there’s a bug by referencing a code change they’ve made that fixes it. The explanation given is “It looks like historically the second pair of seeds had been used at this point, to make the runs identical regardless of how the network was made, but that this had been changed when seed-resetting was implemented“. In other words in the process of changing the model they made it non-replicable and never noticed.
Why didn’t they notice? Because their code is so deeply riddled with similar bugs and they struggled so much to fix them, that they got into the habit of simply averaging the results of multiple runs to cover it up … and eventually this behaviour became normalised within the team.
.....
Imperial are trying to have their cake and eat it. Reports of random results are dismissed with responses like “that’s not a problem, just run it a lot of times and take the average”, but at the same time, they’re fixing such bugs when they find them. They know their code can’t withstand scrutiny so they hid it until professionals had a chance to fix it, but the damage from over a decade of amateur hobby programming is so extensive that even Microsoft were unable to make it run right.
No tests. In the discussion of the fix for the first bug Imperial state the code used to be deterministic in that place but they broke it without noticing when changing the code.
......
The Imperial code doesn’t seem to have working regression tests. They tried, but the extent of the random behaviour in their code left them defeated. On 4th April they said: “However, we haven’t had the time to work out a scalable and maintainable way of running the regression test in a way that allows a small amount of variation, but doesn’t let the figures drift over time.“
Beyond the apparently unsalvageable nature of this specific codebase, testing model predictions faces a fundamental problem that the authors don’t know what the “correct” answer is until long after the fact, and by then the code has changed again anyway, thus changing the set of bugs in it. So it’s unclear what regression tests really mean for models like this even if they had some that worked.
.....
Continuing development. Despite being aware of the severe problems in their code that they “haven’t had time” to fix, the Imperial team continue to add new features, for instance, the model attempts to simulate the impact of digital contact tracing apps.
Adding new features to a codebase with this many quality problems will just compound them and make them worse. If I saw this in a company I was consulting for I’d immediately advise them to halt new feature development until thorough regression testing was in place and code quality had been improved.
Conclusions.
All papers based on this code should be retracted immediately. Imperial’s modelling efforts should be reset with a new team that isn’t under Professor Ferguson, and which has a commitment to replicable results with published code from day one.
On a personal level I’d actually go further and suggest that all academic epidemiology be defunded. This sort of work is best done by the insurance sector. Insurers employ modellers and data scientists, but also employ managers whose job is to decide whether a model is accurate enough for real world usage and professional software engineers to ensure model software is properly tested, understandable and so on. Academic efforts don’t have these people and the results speak for themselves.
https://lockdownsceptics.org/code-review-of-fergusons-model/

bump

Code Review of Imperial College's Study Proves It Was Garbage

This thread has already been deleted.

You have been subscribed.

Code Review of Imperial College's Study Proves It Was Garbage

Follow Allen53

Block Allen53

This thread has already been deleted.

Reply Replying to

You have been subscribed.