Data Divisions: Projections, Uncertainty, and Unknowns

April 15, 2020

This is the third in a series of original articles on the COVID-19 pandemic by the Center for Inquiry as part of its Coronavirus Resource Center, created to help the public address the crisis with evidence-based information. Please check back periodically for updates and new information. 

There’s nothing quite like an international emergency—say, a global pandemic—to lay bare the gap between scientific models and the real world, between projections and speculations and what’s really going on in cities and hospitals around the world. 

A previous article discussed varieties of information about COVID-19, including information that’s true; information that’s false; information that’s trivially true (true but unhelpful); and speculation, opinion, and conjecture. Here we take a closer look at the role of uncertainty in uncertain times. 

Dueling Projections and Predictions

The record of wrong predictions about the coronavirus is long and grows by the hour. Around Valentine’s Day, the director of policy and emergency preparedness for the New Orleans health department, Sarah Babcock, said that Mardi Gras celebrations two weeks later should proceed, predicting that “The chance of us getting someone with coronavirus is low.” That projection was wrong, dead wrong: a month later the city would have one of the worst outbreaks of COVID-19 in the country, with correspondingly high death rates. Other projections have overestimated the scale of infections, hospitalizations, and/or deaths. 

It’s certainly true that many, if not most, news headlines about the virus are scary and alarmist; and that many, if not most, projections and predictions about COVID-19 are wrong to a greater or lesser degree. There’s a plague of binary thinking, and it’s circulating in many forms. One was addressed in the previous article: that of whether people are underreacting or overreacting to the virus threat. A related claim involves a quasi-conspiracy that news media and public health officials are deliberately inflating COVID-19 statistics. Some say it’s being done to make President Trump look incompetent at handling the pandemic; others say it’s being done on Trump’s behalf to justify coming draconian measures including Big Brother tracking. 

Many have suggested that media manipulation is to blame, claiming that numbers are being skewed by those with social or political agendas. There’s undoubtedly a grain of truth to that—after all, information has been weaponized for millennia—but there are more parsimonious (and less partisan) explanations for much of it, rooted in critical thinking and media literacy.

The Media Factors

In many cases, it’s not experts and researchers who skew information but instead news media who report on them. News and social media, by their nature, highlight the aberrant extremes. Propelled by human nature and algorithms, they selectively show the worst in society—the mass murders, the dangers, the cruelty, the outrages, and the disasters—and rarely profile the good. This is understandable, as bad things are inherently more newsworthy than good things.

To take one example, social media was recently flooded with photos of empty store shelves due to hoarding, and newscasts depict long lines at supermarkets. They’re real enough—but are they representative? Photos of fully stocked markets and calm shopping aren’t newsworthy or share-worthy, so they’re rarely seen (until recently when they in turn became unusual). The same happens when news media covers natural disasters; journalists (understandably) photograph and film the dozens of homes that were flooded or wrenched apart by a tornado, not the intact tens or hundreds of thousands of neighboring homes that were unscathed. This isn’t some conspiracy by the news media to emphasize the bad; it’s just the nature of journalism. But this often leads to a public who overestimates the terrible state of the world—and those in it—as well as fear and panic. 

Another problem are news stories (whether about dire predictions or promising new drugs or trends) that are reported and shared without sufficient context. An article in Health News Review discussed the problem of journalists stripping out important caveats: “Steven Woloshin, MD, co-director of the Center for Medicine and Media at The Dartmouth Institute, said journalists should view preprints [rough drafts of journal studies that have not been published nor peer-reviewed] as ‘a big red flag’ about the quality of evidence, similar to an animal study that doesn’t apply to humans or a clinical trial that lacks a control group. ‘I’m not saying the public doesn’t have the right to know this stuff,’ Woloshin said. ‘But these things are by definition preliminary. The bar should be really high’ for reporting them. In some cases, preprints have shown to be completely bogus … . Readers might not heed caveats about ‘early’ or ‘preliminary’ evidence, Woloshin said. ‘The problem is, once it gets out into the public it’s dangerous because people will assume it’s true or reliable.’”

One notable example of an unvetted COVID-19 news story circulating widely “sprung from a study that ran in a journal. The malaria medicine hydroxychloroquine, touted by President Trump as a potential ‘cure,’ gained traction based in part on a shaky study of just 42 patients in France. The study’s authors concluded that the drug, when used in combination with an antibiotic, decreased patients’ levels of the virus. However, the findings were deemed unreliable due to numerous methodological flaws. Patients were not randomized, and six who received the treatment were inappropriately dropped from the study.” Recently, a Brazilian study of the drug was stopped when some patients developed heart problems. 

Uncertainties in Models and Testing

In addition to media biases toward sensationalism and simplicity, experts and researchers often have limited information to work with, especially in predictions. There are many sources of error in the epidemiological data about COVID-19. Models are only as good as the information that goes into them; as they say: Garbage In, Garbage Out. This is not to suggest that all the data is garbage, of course, so it’s more a case of Incomplete Data In, Incomplete Data Out. As a recent article noted, “Models aren’t perfect. They can generate inaccurate predictions. They can generate highly uncertain predictions when the science is uncertain. And some models can be genuinely bad, producing useless and poorly supported predictions … .” But as to the complaint that the outbreak hasn’t been as bad as some earlier models predicted, “earlier projections showed what would happen if we didn’t adopt a strong response, while new projections show where our current path sends us. The downward revision doesn’t mean the models were bad; it means we did something.”

One example of the uncertainty of data is the number of COVID-19 deaths in New York City, one of the hardest-hit places. According to The New York Times, “the official death count numbers presented each day by the state are based on hospital data. Our most conservative understanding right now is that patients who have tested positive for the virus and die in hospitals are reflected in the state’s official death count.” 

All well and good, but “The city has a different measure: Any patient who has had a positive coronavirus test and then later dies—whether at home or in a hospital—is being counted as a coronavirus death, said Dr. Oxiris Barbot, the commissioner of the city’s Department of Health. A staggering number of people are dying at home with presumed cases of coronavirus, and it does not appear that the state has a clear mechanism for factoring those victims into official death tallies. Paramedics are not performing coronavirus tests on those they pronounce dead. Recent Fire Department policy says that death determinations on emergency calls should be made on scene rather than having paramedics take patients to nearby hospitals, where, in theory, health care workers could conduct post-mortem testing. We also don’t really know how each of the city’s dozens of hospitals and medical facilities are counting their dead. For example, if a patient who is presumed to have coronavirus is admitted to the hospital, but dies there before they can be tested, it is unclear how they might factor into the formal death tally. There aren’t really any mechanisms in place for having an immediate, efficient method to calculate the death toll during a pandemic. Normal procedures are usually abandoned quickly in such a crisis.”

People who die at home without having been tested of course won’t show up in the official numbers: “Counting the dead after most disasters—a plane crash, a hurricane, a gas explosion, a terror attack or a mass shooting, for example—is not complex. A virus raises a whole host of more complicated issues, according to Michael A.L. Balboni, who about a decade ago served as the head of the state’s public safety office. ‘A virus presents a unique set of circumstances for a cause of death, especially if the target is the elderly, because of the presence of comorbidities,’ he said—multiple conditions. For example, a person with COVID-19 may end up dying of a heart attack. ‘As the number of decedents increase,’ Mr. Balboni said, ‘so does the inaccuracy of determining a cause of death.’”

So while it might seem inconceivably Dickensian (or suspicious) to some that in 2020 quantifying something as seemingly straightforward as death is complicated, this is not evidence of deception or anyone “fudging the numbers” but instead an ordinary and predictable lack of uniform criteria and reporting standards. The international situation is even more uncertain; different countries have different guidelines, making comparisons difficult. Not all countries have the same criterion for who should be tested, for example, or even have adequate numbers of tests available. 

In fact, there’s evidence suggesting that if anything the official numbers are likely undercounting the true infections. Analysis of sewage in one metropolitan area in Massachusetts that officially has fewer than 500 confirmed cases revealed that there may be exponentially more undetected cases. 

Incomplete Testing

Some people have complained that everyone should be tested, suggesting that only rich are being tested for the virus. There’s a national shortage of tests, and in fact many in the public are being tested (about 1 percent of the public so far), but such complaints rather miss a larger point: Testing is of limited value to individuals.  

Testing should be done in a coordinated way, starting not with the general public but instead with the most seriously ill. Those patients should be quarantined until the tests come back, and if the result is positive, further measures should be taken including tracking down people who that patient may have come in contact with; in Wuhan, for example, contacts were asked to check their temperature twice a day and stay at home for two weeks. 

But testing people who may be perfectly healthy is a waste of very limited resources and testing kits; most of the world is asymptomatic for COVID-19. Screening the asymptomatic public is neither practical nor possible. Furthermore, though scientists are working on creating tests that yield faster and more accurate results, the ones so far have taken days. Because many people who carry the virus show no symptoms (or mild symptoms that mimic colds or even seasonal allergies), it’s entirely possible that a person could have been infected between the time they took the test and gotten a negative result back. So, it may have been true that a few days, or a week, earlier they hadn’t been infected, but they are now and don’t know it because they are asymptomatic or presymptomatic. The point is not that the tests are flawed or that people should be afraid, but instead that testing, by itself, is of little value to the patient because of these uncertainties. If anything, it could provide a false sense of security and put others at risk. 

As Dr. Paul Offit noted in a recent interview, testing for the virus is mainly of use to epidemiologists. “From the individual level, it doesn’t matter that much. If I have a respiratory infection, stay home. I don’t need to find out whether I have COVID-19 or not. Stay home. If somebody gets their test and they find out they have influenza, they’ll be relieved, as compared to if they have COVID-19, where they’re going to assume they’re going to die matter how old they are.” 

If you’re ill, on a practical level—unless you’re very sick or at increased risk, as mentioned above—it doesn’t really matter whether you have COVID-19 or not because a) there’s nothing you can do about it except wait it out, like any cold or flu; and b) you should take steps to protect others anyway. People should assume that they are infected and act as they would for any communicable disease: isolate, get rest, avoid unnecessary contact with others, wash hands, don’t touch your face, and so on. 

Certainty and the Unknown Knowns

As noted, the fact that our knowledge is incomplete doesn’t mean that we don’t know anything about the virus; quite the contrary, we have a pretty good handle on the basics including how it spreads, what it does to the body, and how the average person can minimize their risk. 

Humans crave certainty and binary answers, but science can’t offer it. The truth is that we simply don’t know what will happen or how bad it will get. For many aspects of COVID-19, we don’t have enough information to make accurate predictions. In a New York Times interview, one victim of the disease reflected on the measures being taken to stop the spread of the disease: “We could look back at this time in four months and say, ‘We did the right thing’—or we could say, ‘That was silly … or we might never know.’” 

There are simply too many variables, too many factors involved. Even hindsight won’t be 20/20 but instead be seen by many through a partisan prism. We can never know alternative history or what would have happened; it’s like the concern over the “Y2K bug” two decades ago. Was it all over nothing? We don’t know because steps were taken to address the problem. 

But uncertainty has been largely ignored by pundits and social media “experts” alike who routinely discuss and debate statistics while glossing over—or entirely ignoring—the fact that much of it is speculation and guesswork, unanchored by any hard data. It’s like hotly arguing over what exact time a great-aunt’s birthday party should be on July 4, when all she knows is that she was born sometime during the summer. 

So, if we don’t know, why do people think they know or act as if they know? 

Part of this is explained by what in psychology is known as the Dunning-Kruger effect: “in many areas of life, incompetent people do not recognize—scratch that, cannot recognize—just how incompetent they are … . Logic itself almost demands this lack of self-insight: For poor performers to recognize their ineptitude would require them to possess the very expertise they lack. To know how skilled or unskilled you are at using the rules of grammar, for instance, you must have a good working knowledge of those rules, an impossibility among the incompetent. Poor performers—and we are all poor performers at some things—fail to see the flaws in their thinking or the answers they lack.” 

Most people don’t know enough about epidemiology, statistics, or research design to have a good idea of how valid disease data and projections are. And of course, there’s no reason they would have any expertise in those fields, any more than the average person would be expected to have expertise in dentistry or theater. But the difference is that many people feel confident enough in their grasp of the data—or, often, confident enough in someone else’s grasp of the data, as reported via their preferred news source—to comment on it and endorse it (and often argue about it).  

Psychology of Uncertainty

Another factor is that people are uncomfortable admitting when they don’t know something or don’t have enough information to make a decision. If you’ve taken any standardized multiple-choice tests, you probably remember that some of the questions offered a tricky option, usually after three or four possibly correct specific answers. This is some version of “The answer cannot be determined from the information given.” This response (usually Option D) is designed in part to thwart guessing and to see when test-takers recognize that the question is insoluble or the premise incomplete. 

The principle applies widely in the real world. It’s difficult for many people—and especially experts, skeptics, and scientists—to admit they don’t know the answer to a question. Even if it’s outside our expertise, we often feel as if not knowing (or even not having a defensible opinion) is a sign of ignorance or failure. Real experts freely admit uncertainty about the data; Dr. Anthony Fauci has been candid about what he knows and what he doesn’t, responding for example when asked how many people could be carriers, “It’s somewhere between 25 and 50%. And trust me, that is an estimate. I don’t have any scientific data yet to say that. You know when we’ll get the scientific data? When we get those antibody tests out there.” 

Yet there are many examples in our everyday lives when we simply don’t have enough information to reach a logical or valid conclusion about a given question, and often we don’t recognize that fact. We routinely make decisions based on incomplete information, and unlike on standardized tests, in the real world of messy complexities there are not always clear-cut objectively verifiable answers to settle the matter. 

This is especially true online and in the context of a pandemic. Few people bother to chime in on social media discussions or threads to say that there’s not enough information given in the original post to reach a valid conclusion. People blithely share information and opinions without having the slightest clue as to whether it’s true or not. But recognizing that we don’t have enough information to reach a valid conclusion demonstrates a deeper and nuanced understanding of the issue. Noting that a premise needs more evidence or information to complete a logical argument and reach a valid conclusion is a form of critical thinking.

One element of conspiracy thinking is that those who disagree are either stupid (that is, gullible “sheeple” who believe and parrot everything they see in the news—usually specifically the “mainstream media” or “MSM”) or simply lying (experts and journalists across various media platforms who know the truth but are intentionally misleading the public for political or economic gain). This “If You Disagree with Me, Are You Stupid or Dishonest?” worldview has little room for uncertainty or charity and misunderstands the situation. 

The appropriate position to take on most coronavirus predictions is one of agnosticism. It’s not that epidemiologists and other health officials have all the data they need to make good decisions and projections about public health and are instead carefully considering ways to fake data to deceive the public and journalists. It’s that they don’t have all the data they need to make better predictions, and as more information comes in, the projections will get more accurate. The solution is not to vilify or demonize doctors and epidemiologists but instead to understand the limitations of science and the biases of news and social media.