June 1, 2020

The code that launched a worldwide lockdown and the bugs that made it problematic

We are in some weird times in the world right now. We have various states and countries in various stages of re-opening and venturing out from the corona pandemic -and just when things started to seem to calm down some - the tragic killing of George Floyd took place on May 25th. Now - I am not going to talk in depth about this - the situation is still unfolding - but I do want to say this on behalf of myself - and while I did not run this by Josh or Aaron, I doubt they would argue with this.

How George Floyd died was tragic, and all those involved should face appropriate repercussions. His death was something where justice needs to be served against every person involved in his death - a death that was needless. That being said - there are many aspects of this which may not have been revealed yet to the public - but pressure for justice should continue. However, justice does not involve destruction of property. I'm glad to see that there are a lot of protests which are being done peacefully - they may cause inconvenience but for the most part are peaceful and following the law. There are always going to be those bad apples that make the rest of the groups look bad - but for a lot of these groups these disruptors are definitively in (not necessarily of) the minority. That being said - some of the groups that are violent, that are causing property damage, that are harming and attacking others - these groups are not protesting in the spirit of George Floyd and are taking advantage of the social disturbance going on right now. There will always be those who take advantage of a situation - and those that are taking advantage should not be aligned with those protesting peacefully.

Now with that out of the way let's get on with the topic at hand.

It's now the end of May when I am recording this - and life relating to Corona is starting to get back to normal for a lot of people. Businesses are opening back up - even at reduced capacity - but they are still opening. People are still wearing masks around stores when they go out, but the number seems to be going down on almost a daily basis. But as life starts to return to normal, I started to dig into something that had caught my attention a little while ago. At the start of the pandemic, we heard about computer models that indicated that if nothing was done, there was the potential for up to 2.2 million people in the United States and up to half a million people in the United Kingdom that would die from the Corona Virus. Again - this is if nothing was done. This lead to leaders around the world forming the policies that were implemented. I will admit - at the start when we had no information on how the virus spread/infected/killed/etc, the prudent thing to implement things such as closing down large groups, social gatherings, etc was a wise move. However we continued to hear new and new model results based on the changes we made - and the models continued to show large number of deaths even with social distancing. That was interesting to me, how these deaths continued to remain high even when we took the recommended precautions - and just chalked it up to normal models needing refinement over time. Then there was new about Neil that broke (and not the news regarding his affair - that does not impact this) and the code that he created to model the deaths of various pandemics and how bad it was. Now that caused me to cock my head some and say "hmmmm..."

For those who are unfamiliar with my background - I am a scripter and IT automation person by trade. 99.99% of my professional life I write code in PowerShell for mostly Windows based systems. I have spoken at conferences and user groups relating on the topic of PowerShell, and have passed community exams stating I know what I'm doing. I'm not doing this to toot my own horn, but to say here are my credentials. It is a fair question to ask for the bonafides of someone who is criticizing something and I'm also doing this to intercept people who will try to dismiss my opinion on this because "he don't know what he is talking about." Even though I am not a full time "real" programmer - I use proper programming techniques for all the scripts I write - and while I am not an expert in C or Fortran I know enough to be able to read the code and comment on the structure of code as well as general coding practices. I was also taught in college how to read the overall syntax of a program, even if I don't know the ins and outs of the language itself

So the code.

The code used to generate the model has been used multiple times in the past with various diseases - with bad results each time. Let's take a look at some of the past things this code has gotten wrong:

During the 2001 outbreak of Foot And Mouth Disease, his model was used to justify the death of around at least six million sheep and cattle. The model also predicted up to 150,000 people would die - with a death rate of less then 200 deaths.

In 2002, it was predicted that up to 50,000 people would die from Mad Cow Disease in beef in the UK. There were 117 deaths.

In 2005, it was predicted that up to 150 million people would die from the bird flu. In the end only 282 people died worldwide between the years 2003 - 2009.

In 2009 a "reasonable worst case scenario" for swine flue deaths was be around 65,000 deaths in the UK, there were only 457 deaths.

One would reasonably expect the models and code to be further refined and updated as time went on. The early models can easily be excused as you can think of them as version 1.0 of the code, but from 2001 to 2009, there should have been bug fixes done which made the model more reliable - but it wasn't. Why is that?

Well one issue of the code - which is documented on the official gitlhub site - is the fact that the code is stochastic. What is stochastic your asking?

randomly determined; having a random probability distribution or pattern that may be analyzed statistically but may not be predicted precisely.

In other words - it is impossible to run unit tests against this code. What is a unit test? Unit tests, in the most simple of terms, is with input a, output b should be returned each time. So if I provide the code with the number "1" after it finishes doing what it does - I should get the value of "6" each time. This is considered a standard with all code, showing that new bugs have not been introduced since the last update was done. The unit tests will be done on individual segments as well as the whole code base. In my line of work, before code can be promoted to production it must pass all the unit tests. If the unit tests fail then the code is rejected and it needs to be worked on again. One of the documented issues with this code is that if the same inputs are given to the code, each run will result in different results meaning it passes one of the most basic pieces of quality control. This is a big red flag for me. Now thankfully, since the code went live on github, a switch has been added to allow for the code to produce the same results with the same input - but reading through the issue log - this does not work 100% of the time still and has some bugs that need to be worked out still.

The next issue with the code is that it is hardware dependent. It does not need to be run on a specific computer (this would be a horrible solution) but instead has to be run on a single core system. Back when the code was originally written(13 years ago) this made sense - as dual core was still a relatively new thing and making code run across multiple cores was a tricky endeavor. However in the past few years, almost all major languages have made it easier to write programs to run across multiple cores. The fact that the code returns different results based on the number of cores is a cause for concern in case the code was not run correctly. In fact, even with the update to allow better tests, it calls out that in order for the same results to be returned each time not only do you need to use the same inputs (which is to be expected) but you must also use the exact same number of threads otherwise it cannot be tested. Strike 2.

The next issue is more a personal opinion on coding style - but is based on best standards. The original code (which no one outside of a few places has access to) was all one 15 thousand line long file. All of the code was contained in the one file, making it hard to scroll through. Proper coding techniques states that each function be contained in it's own file. This makes it easier to debug, update, and test. Now since the uproar of the code initially came up, the code was sent to Microsoft where they spent about a month working on refactoring the code to clean it up. For those of you who are not coders - refactor literally means to restructure. It too even Microsoft (and supposedly other companies) a good month to restructure the code to try to make it easier to update.

The third strike for the code relates directly to the code itself. While no one is providing direct answers - the code itself lends to the fact that the code was automatically translated from Fortran to C. Now there is nothing wrong with moving code from one system to another, but when you use a computer to automatically translate code from one programming language to another you have to be careful as the translation - while it does accelerate what is going on - is not fool proof and may introduce logic bugs/errors into the code that needs to be corrected manually. The fact that there are still artifacts in the code which point to the translation from Fortran gives me pause to wonder if all the necessary elements have been updated appropriately. One example of this is found in the issues log for the code on GitHub. There is an issue where the code uses a method which is old and returns back data which is not expected resulting in errors and rounding errors.

Thankfully the updated version of the code was finally released on Github were people are able to audit and contribute to it - so going forward I have a lot more hope in the code getting more accurate and giving better results - but the fact that we have so much policy based on code like this is scary - to say the least.

In fact, I'm not alone in thinking this. David Richards - co-founder of WanDisco recently was interviewed about the code quality. David is qualified to speak on this issue as well - since WanDisco is a company that specializes in writing code for distributed systems. According to David, the code was a "buggy mess that looks more like a bowl of angel hair pasta than a finely tuned piece of programming" and even added "In our commercial reality we would fire anyone for developing code like this and any business that relied on it to produce software for sale would likely go bust." David also brings up the concern that testing would be difficult on the original code, "Testing allows for guarantees. It is what you do on a conveyer belt in a car factory. Each and every component is tested for integrity in order to pass strict quality controls."

Another issue with the code was the base assumptions. The original code was written to handle flu viruses, not corona viruses. Even though the code can model the spread of corona viruses, it does so based on the assumptions of flu viruses. The assumptions had not been updated to handle the corona type viruses, as was confirmed by Neal. The basic assumptions of the models are inherently different - and the fact that these changes were not disclosed at the start when the original predictions were made is frightening.

Now I am not saying that we should get rid of the code base fully and start over. The fact that the code is now on GitHub and can be audited, updated, and changed in public is a good thing. More people are able to review the code, provide updates, feedback, and push back on the issues with testing and reproducibility of the results. In fact now that the code is being done in an open source framework, it has the possibility to only get more accurate as time goes on - and this is a good thing. But the fact that it took people to start questioning things before we were allowed to see the code on which so many policies were made, is scary. And is also a lesson in why we should always at least ask questions about the data used to make policies. I'm not saying be skeptical and conspiratorial - but when data is used to upend millions of peoples lives - we should at least take a look at how the model was generated, the assumptions used, and the quality of the code used to produce it.