Data inundation: is it useful to compare countries?

Grace Dean discusses why we shouldn't always take statistics at face value, especially when comparing global COVID-19 rates.

Grace Dean
9th May 2020

Back in March, I made the massive mistake of signing up to the World Health Organisation's media mailing list.

Alongside reminders of their press conferences and publicity releases explaining their recent achievements, I also receive their Daily Situation Report on COVID-19. In fact, one has just pinged through right as I'm writing this article. As well as highlighting recent action taken by WHO to curb the spread of the virus, they also contain information on the number of confirmed cases in table, map and now graph form too. "Where is the mistake here?" you may ask. "You're only receiving the information you knowingly signed up for." What I didn't expect, however, when I signed up for their mailing list is for the Situation Report to only ever arrive in the evenings. And tonight's was remarkably early by WHO standards - often they sneak their way into my email inbox at 10pm or later. Somehow I just can't go to bed without opening that email and reading through the latest statistics; since lockdown started it has somehow become an indispensable part of my daily routine. Sometimes it's the last thing I read before I go to bed. However much these numbers may scare me, I find myself always comparing them every single day. As a world, we have become obsessed with data - and rightly so when we have so much information at our fingertips.

So many statistics are thrown at us related to the coronavirus - how many countries, regions and cities have been affected? How many confirmed cases, deaths and recoveries have there been? How much PPE does each country have, and how much does it need? What proportion of the population are elderly or vulnerable? How much has footfall fallen and traffic decreased? How many days does it take for the virus to reach its peak? And how can Priti Patel possibly think that the UK has carried out "three hundred thousand thirty four, nine hundred and seventy four thousand tests"?

And so we become inundated with data, and we get a little bit lost. What's even trickier is that we don't really know what any of it means. So we start to compare, and we look from one country to another. When we do so, it becomes very clear: the UK has Europe's highest COVID-19 death count, and the second highest in the world. This is confirmed by the data provided by WHO. However, raw comparisons don't paint the true picture; what we really need is some context.

This issue has been beautifully addressed in Private Eye, that wonderfully satirical source of wisdom. In its arguments, The Eye catalogues how Italy's high initial death rate back in early April was in great part due to the demographics of the region where the virus first hit. As my friend from Verona will enthusiastically testify, Lombardy and Veneto, Italy's worst-hit regions, have elderly populations, many of whom smoke, and high levels of air pollution. The residents of this area of northern Italy were therefore much more susceptible to developing a severe case of the virus, which is a major contributor to the higher death rate there. At the end of March, Italy had a death rate of 11%, whereas for Germany that figure was just 1%. Many of the earlier cases of the virus in Germany were in younger people returning from skiing holidays - which alone requires a certain degree of physical health - and thus were much less likely to develop severe illness.

One thing that the WHO daily updates have taught me is that data does sometimes trickle through. On 4 May, its official communications somehow reported that the United States had seen -1696 deaths in the past 24 hours. Does WHO believe in zombies, ghosts or reincarnation? I think not. About a month before, WHO had also reported a daily death count of zero. On the very final page of the 18-page update, WHO explained that the negative death increase was because the data had been "retro-adjusted by national authorities" - somewhat reminiscent of the accruals and deferred payment concepts in accounting. What exactly it means isn't clear to me, but it could be that some previous deaths had been either incorrectly counted or counted twice.

Similar discrepancies were noticed when China's death rate suddenly spiked in mid-April, long after the virus' peak in the country. Wuhan increased its then total death toll by 50% - or 1290 deaths - after the reporting method changed to include deaths outside hospitals. A similar rapid rise in confirmed cases was noticed after China expanded its definition of the virus to include those with milder symptoms, causing the country's number of confirmed cases to increase by over 15,000 on 13 February. Taking into account how late this change in definition was, Hong Kong researchers estimate that China's death toll from the virus may actually be four times higher than what has officially been recorded. In Spain, meanwhile, 8000 extra deaths were added to the COVID-19 total after initially being mistaken for seasonal flu. One way that countries are not necessarily doctoring the figures but rather distorting them is by distinguishing between those who died "with" and "of" the disease. Whereas in Italy, almost all deaths from an individual who tested positive for the virus are being attributed to the virus, some countries are much more hesitant to do this, labelling the coronavirus as more of a secondary cause of death after any chronic illnesses the victim may have. This is where an important distinction lies between the “case fatality rate” - those who definitely died of the infection - and the “infection fatality rate” - those who died after having the infection overall, including deaths not necessarily directly caused by the virus.

These are only the reporting discrepancies that we are aware of. Ecuador has reported a COVID-19 death toll of around 1700 but the real figure is expected to be much higher - the current chaos of its healthcare system have made it almost impossible to report accurate figures. This is similarly the case in many developing nations, especially those with large rural populations, where citizens may be buried without ever being recorded as having the virus.

Naturally, more densely populated regions are much more prone to higher concentrations of the virus. This explains why hotspots have predominantly emerged in cities: Wuhan, New York, London. However, many more factors explain the great variations in both infection and death rates between countries - you can read more about why Sweden has such a low death rate here and here.

A comparison of the confirmed cases of and deaths from COVID-19 using data reported by WHO on 9 May 2020

Something that cannot be overlooked here is the prevalence of testing. WHO statistics are damning in this respect; at the time of writing the UK has 211,368 confirmed cases and a total of 31,241 deaths, whereas Germany has 168,551 cases but just 7369 deaths. This suggests that only 4.4% of Germans infected with the virus die from it, compared to 15% in the UK. This apparent lower death rate isn't necessarily because Germans are less vulnerable to developing severe cases of the disease, or because they receive better healthcare - though these may both be true - but because Germany has tested its citizens much more robustly. Because of restrictions on who exactly can be tested in the UK, often only those with severe cases of the virus are formally diagnosed, and inter-country comparisons simply can't be made when the available data is as incomplete, and indeed incomparable, as this. So few of those with the virus are actually being tested and added to government statistics - and this is not just the case in the UK. A major problem this brings is that it can be difficult for policymakers to identify hotspots of the virus, and thus lockdown measures risk being lifted too soon, purely because we have incomplete data.

We consume a lot of data related to the virus, and we don't know what to believe. In this age of information it would be expected that reliable statistics would be just at our fingertips, but we're just being given too much information to make any sense of it. As WHO say themselves in their daily situation reports:

"Caution must be taken when interpreting all data presented. Differences are to be expected between information products published by WHO, national public health authorities, and other sources using different inclusion criteria and different data cut-off times. While steps are taken to ensure accuracy and reliability, all data are subject to continuous verification and change. Case detection, definitions, testing strategies, reporting practice, and lag times differ between countries/territories/areas. These factors, amongst others, influence the counts presented with variable underestimation of true case and death counts, and variable delays to reflecting these data at global level."

In conclusion, data just doesn't make any sense without context, especially when you don't know whether that data is actually correct. Ultimately these comparisons are at present almost futile anyway. Focus on curbing the spread of coronavirus in your own country before you worry about how your nation's performance compares to others.

(Visited 40 times, 1 visits today)
AUTHOR: Grace Dean
Editor-in-Chief of the Courier 2019/20, News Editor 2018/19, writer since 2016 and German & Business graduate. I've written for all of our sections, but particularly enjoy writing breaking news and data-based investigative pieces. Best known in the office for making tea and blasting out James Blunt. Twitter: @graceldean

Leave a Reply

Your email address will not be published. Required fields are marked *

ReLated Articles
linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram
Copy link
Powered by Social Snap