I tried to predict the COVID-19 spread and the results are chillingly accurate

Home Blog Single

I tried to predict the COVID-19 spread and the results are chillingly accurate

 

Since the COVID-19 pandemic, we have been inundated by information. Many charts and numbers are shown on the various media outlets. Surprisingly, I hadn’t seen anything that I found compelling yet that was a rough guide to the future of the virus in my Country.

 

I felt this information would be useful to help many people, including my partner and I to predict the future user loads of our online exam hosting platform. Cancelled classes and exams left a lot of teachers and trainers scrambling to move their tests and exams online. I needed some way to predict future volumes to position our infrastructure accordingly.

 

If I’m honest with myself however, I mostly wanted a means to feel some personal control over a terrible global situation I had no control of.

 

Not knowing anything about statistics, I started working on a model that would predict the spread of the pandemic. I had the free time since I had Friday off work to accompany my wife to Vancouver to do a presentation. That was cancelled of course.

 

I didn’t plan to share this with anyone. However, once I saw how accurate the predictions were I shared it with some friends. Some of them suggested I share it with a wider audience and get some feedback, which lead to this blog post. (Something else I don’t know anything about how to do).

 

Hypothesis 

First, I had to come up with a hypothesis. What I came up with was nothing original, albeit debated in the media.

 

What’s happening in Italy can be used to accurately predict what will happen in the US and Canada

 

Wait a minute, you might say. Italy is a very different country than the US and Canada. Different total population, different population density, different healthcare system, etc. Isn’t it rather presumptive to assume we can use Italy as a model for what will happen in other countries?

 

I propose that, while those points are true, they are just external factors to something that is constant between all countries. It’s a single virus infecting a single species, humans. The virus does not care about political boundaries, and countries are nothing more than a measurable environmental influence on a predictable process.

 

How it works

The concept is simple. If we know the spread rates of the virus in one country, we can use it to predict the growth in other countries. All of the political and societal differences between the countries can be rolled up into one number. Let’s call that number the coefficient. We multiply the first countries rate of spread by the coefficient of the country we want to apply it to, and we have our prediction.

 

The Criteria 

  1. A common starting point between countries
  2. The count of infected folks for each day in the model country
  3. The day over day rate of the spread in the model country
  4. The coefficient

 

Criteria 1 – Day 1

This part is important. To compare countries we need to define “day 1” so we can make a common comparison. I chose the day that there are 100 confirmed cases in a country as day 1. This means the actual date gets offset in the charts. This will become obvious when you see the chart.

 

Day 1 = the first day there are 100 or more confirmed cases

 

Criteria 2 – Infected Count

I chose to use confirmed cases as my metric. As we know, this is not a measure of actual cases, only those who have been tested and were positive. It’s also pertinent to note that each country could choose to test differently.  As you will see, this does not matter much since these differences will be captured enough by the coefficient. 

 

Metric = Confirmed infected cases

 

Criteria 3 – Rate

This one is simple. For the model country get the count of confirmed cases for each day. The rate of spread for that day is the day before’s confirmed cases count divided by the next day’s count.

 

Rate = Day B count  / Day B – 1 day count

 

Criteria 4 – The coefficient 

The coefficient can be calculated in many different ways. Population density and other factors can be used to predict the coefficient. However, since the US and Canada had already been infected get the coefficient was much easier than that. I simply took the rate of increase of Italy for DAY-X and subtracted that from the rate of increase for DAY-X of the target country.

 

Coefficient = DAY-X of model country – DAY-X of target country

 

The results

I am overwhelmed by how accurate the predictions are. And even more concerned about what they were showing for the future. Each new day the actual confirmed cases were within %95 of the predicted. In fact some days they were almost %99 accurate.

 

Confirmed and predicted charts

Here is the data. Bold is the actual confirmed cases count, and the “prediction” is what was predicted for that day. The columns that are empty are days in the future followed by the prediction to the right.

Image of Google Sheets showing results

Here are the countries cropped out. The country column on the left is actual, the column on the far right is predicted.

USA (column F is the actual results, column N is the predicted). The coloured value on the bottom row is the calculated coefficient.

USA Prediction

 

Canada (column K are actual results, column N are predicted). The coloured value on the bottom row is the calculated coefficient.

Canada Prediction

 

  • Note that for the days that we have counts (yesterday and days in the past since 100 confirmed infected) the predicted value is based on the previous days actual value. For days in the future, the previous predicted day is used. This allows the accuracy of the prediction to improve each day as we get more data.

 

Here is the link to the Google Sheet where the screenshot came from:

 

https://docs.google.com/spreadsheets/d/1HOnrRrCwDiQ5AZ641zwRe_btGgDzjjhn3E2oHun7nQ8/edit?usp=sharing

 

Predicted big milestones

The USA will have 1 million people infected around April 5th

Canada will have 10,000 people infected around April 2nd

 

Conclusion

I am surprised by the accuracy of such a crude and amature model, and I’m quite worried about what it’s showing for our future.

 

My biggest hope is that our flattening measures are more effective than Italy’s, and therefore the predictions will become less and less accurate.

 

What are your thoughts? Will our isolation measures be more effective than Italy’s? Will the predicted confirmed infected values start to “flatten”. My model certainly does not show any flattening. 

Post a Comment

1111

Close Bitnami banner
Bitnami