Since the COVID-19 pandemic, we have been inundated by information. Many charts and numbers are shown on the various media outlets. Surprisingly, I hadn’t seen anything that I found compelling yet that was a rough guide to the future of the virus in my Country.
I felt this information would be useful to help many people, including my partner and I to predict the future user loads of our online exam hosting platform. Cancelled classes and exams left a lot of teachers and trainers scrambling to move their tests and exams online. I needed some way to predict future volumes to position our infrastructure accordingly.
If I’m honest with myself however, I mostly wanted a means to feel some personal control over a terrible global situation I had no control of.
Not knowing anything about statistics, I started working on a model that would predict the spread of the pandemic. I had the free time since I had Friday off work to accompany my wife to Vancouver to do a presentation. That was cancelled of course.
I didn’t plan to share this with anyone. However, once I saw how accurate the predictions were I shared it with some friends. Some of them suggested I share it with a wider audience and get some feedback, which lead to this blog post. (Something else I don’t know anything about how to do).
First, I had to come up with a hypothesis. What I came up with was nothing original, albeit debated in the media.
What’s happening in Italy can be used to accurately predict what will happen in the US and Canada.
Wait a minute, you might say. Italy is a very different country than the US and Canada. Different total population, different population density, different healthcare system, etc. Isn’t it rather presumptive to assume we can use Italy as a model for what will happen in other countries?
I propose that, while those points are true, they are just external factors to something that is constant between all countries. It’s a single virus infecting a single species, humans. The virus does not care about political boundaries, and countries are nothing more than a measurable environmental influence on a predictable process.
The concept is simple. If we know the spread rates of the virus in one country, we can use it to predict the growth in other countries. All of the political and societal differences between the countries can be rolled up into one number. Let’s call that number the coefficient. We multiply the first countries rate of spread by the coefficient of the country we want to apply it to, and we have our prediction.
This part is important. To compare countries we need to define “day 1” so we can make a common comparison. I chose the day that there are 100 confirmed cases in a country as day 1. This means the actual date gets offset in the charts. This will become obvious when you see the chart.
Day 1 = the first day there are 100 or more confirmed cases
I chose to use confirmed cases as my metric. As we know, this is not a measure of actual cases, only those who have been tested and were positive. It’s also pertinent to note that each country could choose to test differently. As you will see, this does not matter much since these differences will be captured enough by the coefficient.
Metric = Confirmed infected cases
This one is simple. For the model country get the count of confirmed cases for each day. The rate of spread for that day is the day before’s confirmed cases count divided by the next day’s count.
Rate = Day B count / Day B – 1 day count
The coefficient can be calculated in many different ways. Population density and other factors can be used to predict the coefficient. However, since the US and Canada had already been infected get the coefficient was much easier than that. I simply took the rate of increase of Italy for DAY-X and subtracted that from the rate of increase for DAY-X of the target country.
Coefficient = DAY-X of model country – DAY-X of target country
I am overwhelmed by how accurate the predictions are. And even more concerned about what they were showing for the future. Each new day the actual confirmed cases were within %95 of the predicted. In fact some days they were almost %99 accurate.
Here is the data. Bold is the actual confirmed cases count, and the “prediction” is what was predicted for that day. The columns that are empty are days in the future followed by the prediction to the right.
Here are the countries cropped out. The country column on the left is actual, the column on the far right is predicted.
USA (column F is the actual results, column N is the predicted). The coloured value on the bottom row is the calculated coefficient.
Canada (column K are actual results, column N are predicted). The coloured value on the bottom row is the calculated coefficient.
Here is the link to the Google Sheet where the screenshot came from:
The USA will have 1 million people infected around April 5th
Canada will have 10,000 people infected around April 2nd
I am surprised by the accuracy of such a crude and amature model, and I’m quite worried about what it’s showing for our future.
My biggest hope is that our flattening measures are more effective than Italy’s, and therefore the predictions will become less and less accurate.
What are your thoughts? Will our isolation measures be more effective than Italy’s? Will the predicted confirmed infected values start to “flatten”. My model certainly does not show any flattening.