Recently, two types of data analyses about vaccine effectiveness have flooded the airwaves.


One type is being termed "re-analysis" of data from randomized controlled trials (RCTs). They used to be called "post-hoc", or "not pre-specified", or "exploratory." Serious analysts do engage in this type of analyses but we treat the findings as purely suggestive, requiring confirmation by another study. It's bad practice to treat them as capable of overturning or correcting analyses of RCTs by the original research team. Beware when the analyst is not arranging further studies to confirm such findings.

Another currently popular type of analysis is labeled "real-world studies." Technically called "observational studies," or "causal studies", these are worth the effort. As noted in my previous post, such studies should not be expected to correct or overturn findings from an RCT either. They are most useful in situations when we cannot run an experiment, e.g. if the US Postal Service wants to measure the impact of a price hike, it cannot randomly apply increases to some citizens and not others.

When we can run either RCTs or observational studies, the "real-world studies" benefit from larger sample sizes, especially for subgroups, and a more realistic context. It's a double-edged sword though. When the investigators cannot control the context, there will be more variability in the study population ("statistical noise"). The observed difference between test and control may be explained by biases, both known and unknown ("confounding").

Imagine an observational study that aims to estimate the real-world effectiveness of the Moderna vaccine. The simplest first analysis is to compare the case rate of the vaccinated people to the case rate of the unvaccinated people. This is hopelessly flawed because in a real-world study, we must not assume "all else equal." People who have received the vaccines at this point are apples and oranges to those who haven't. We already know about racial inequities but the differences go beyond that: geographical, cultural, economic, religious, access to technology, access to transportation, age, comorbidities, occupation, etc. may all explain some of the observed difference between the vaccinated and the not vaccinated.

Numbersense_coverThe point of observational study methods is to statistically adjust the data. In Numbersense (link), I spend quite a few chapters explaining statistical adjustments (e.g. marketing and economic data). Critics complain that "massaging the data" is a horrible thing to do but for those of us invested in these methods, it is the only reasonable thing to do. Not adjusting the data when biases are present is to wilfully publish biased and therefore incorrect results.

In this post, I will walk you through an example of an observational study, conducted by the Mayo Clinic in partnership with nference (link). I chose this study because I'm impressed by its quality, and satisfied with the level of transparency into how they massaged the data.

What a real-world study tells us

Let's start with their headline finding:

They provide a "preliminary assessment of real-world vaccination efficacy in 62,138 individuals from the Mayo Clinic... between December 1st 2020 and February 8th 2021... Administration of two COVID-19 vaccine doses was 88.7% effective in preventing SARS-CoV-2 infection (68.4-97.1%) with onset at least 36 days after the first dose."

To a statistician, this is a heavily loaded statement, with every word chosen precisely, its meaning to be revealed in the rest of the paper. Here are the footnotes to save you time:

  1. "preliminary": This is an interim analysis in which the individuals under analysis have not all reached the end of the desired observation window. The length of follow-up differs by individual.

  2. "real-world": it's an observational study in which individuals did not get vaccinated at random, and so we cannot analyze this dataset using methods for analyzing RCTs. What determines one's vaccination status is self-selection plus government prioritization rules.

  3. "62,138 individuals": we did not generalize our finding beyond the analysis set of patients in our hospital system. You are on your own if you should extrapolate our finding outside this group.

  4. "December 1st 2020 and February 9th 2021": see point #3. Ditto for generalizing across time.

  5. "administration" [of two doses]: the 88.7% number was arrived at after removing a few cases from the vaccine arm because those people have not received their second shots despite being past the target time for them. Without removing those cases, the effectiveness was 83%.

  6. "onset at least 36 days after the first dose": as with other research teams, they made their own decision as to when to start counting cases. 36 days after the first dose is roughly 2 weeks after the second dose of Pfizer, and 1 week after the first dose of Moderna. (This just happens to flip the official case-counting windows from their respective RCTs, in which Pfizer started counting 1 week after the second dose, and Moderna, 2 weeks.)

How a real-world study assembles the control group

The key to any observational study is how they define the control group (i.e. the unvaccinated group).

They start with a vaccinated group. These are patients in the Mayo Clinic system who has been received at least the first shot of either Pfizer or Moderna during their study period, which was 12/1/2020 to 2/8/2021. Each of these patients has a vaccination date, PCR test dates, and whether each test came back positive or negative. For the moment, we will concern ourselves with just the vaccination date (Day 0, day of the first shot), and any positive test result after Day 0 to indicate infected. 

To form the control group, they start with all patients in the system who has not been vaccinated as of 2/8/2021 (subject to exclusions which I'll talk about later).

The researchers collect demographic data and zip codes on all patients in both groups. They then assign "propensity scores" to each individual. These scores project the probability that an individual of given zip code, age, sex, race, ethnicity, and number of prior PCR tests has taken a first shot by 2/8/2021. For example, the model that generates propensity scores might say someone in zip code 55443, age 25, male, white, non-Hispanic and 1 prior test has a 20% chance of having been vaccinated (and 80% chance of not).

The originators of the propensity-scoring method demonstrated that we can form well-balanced control groups in the following way: within each zip code, take each vaccinated individual and find an unvaccinated individual that has the same propensity score. Within this basic framework, there are many variants. What does it mean by "same score"? How many controls to match to a single vaccinated person? (The Mayo Clinic does 1-to-1 matches.) Which individual is selected if several viable candidates are available? (They use a "greedy" method.)

What is the point of such adjustments?

The following excerpt of Table 1 from the paper illustrates the core idea of "covariate balance."


Each of the vaccinated and unvaccinated groups consists of 31,069 people because of 1-to-1 matching. The distribution of age groups, sex, race, etc. are all well-balanced between the two groups. Both groups are approximiately 63% female, 21% 75 years or older, etc. (Sidenote: I flagged the age groups because I was surprised that only 23% of the vaccinations went to 75 years and older, equal to the proportion that went to 18-25.)

This balance condition emulates the randomization in an RCT. But don't mistakenly think they are equivalent. If truly randomized, we can see balance on any variable known or unknown to the investigators. The balance for propensity scores holds for variables used in the adjustment - precisely because we made it happen - but there is no guarantee that there will be balance for any variables not used.

So, it's very important to make sure researchers use as many variables as possible, and include all relevant ones. This is one of these nuggets of advice that sound better than practiced. We're making an assumption that we know all the factors that matter. But we soldier on.

One of the curious variables used by the Mayo-nference team is the number of prior PCR tests (any time before 2/8/2021). They assert that this variable is a good proxy for the level of exposure to the coronavirus.

Matching the case-counting window

Now, we get to a tricky problem that I have discussed before. We have pairs of patients, one vaccinated and one unvaccinated, which we will treat as if randomized. The vaccinated one has a Day 0, which is the day of the first shot. What about the unvaccinated person? Establishing Day 0 is really important - given the desire of these investigators to pinpoint when an infection occurred.

In the Mayo Clinic study, they assign to the unvaccinated person a "study enrollment date" (pretend), which is Day 0 of the paired vaccinated individual. Now, they can start counting cases after Day 0 in both groups. This then lead to a vaccine effectiveness (VE) number.

As the above quote suggests, their headline number ignores all cases that occur prior to Day 36, meaning 2D+14 (2D = 2nd dose) for Pfizer and 2D+7 for Moderna, and Day 36 from the study enrollment date for the unvaccinated. The VE is 83%.

Like every other team doing re-analyses or real-world studies, these investigators computed VE for all reasonable case-counting windows, and selected the one with the maximum VE. They show their homework in Table 3. Here is an excerpt:


The top part of the table shows calculations of VE for cumulative case-counting windows of the type Day X to end of study while the bottom part of the table shows calculations of VE for incremental weekly windows of the type Day X - Day X+7.

Note that there are two rows for "Day 36 onwards". The second row has VE 89%, slightly higher than 83% (but of course, the margin of errors are plus/minus 15%). I'll let the authors explain their reasoning:

Starting 36 days ... vaccine efficacy of 83.4%... Importantly, we found that two of the six infections in the vaccinated cohort on or after day 36 occurred in individuals who had received only one vaccine dose, even though all vaccinated individuals should have received two doses by this time point... Among the properly vaccinated individuals... efficacy of 88.7%.

This style of reasoning is now state of the art. They are not alone in using it. Every recent study I've read is doing the same. I've said a lot about this issue in a prior post (link) so I won't repeat this here. The one addition is that this final adjustment only applies to the vaccinated group because the unvaccinated group has no such thing as missing a second shot (unlike in the RCT, when the control group got placebo shots).

Can this study's finding be generalized outside the study population?

Before ending this post, I want to address one other issue - why the research team was very careful in not generalizing their finding outside the analysis set of people.

There are two challenges to such a generalization: the vaccinated group in the analysis set has to be representative of all vaccinated Mayo Clinic patients; and then the Mayo Clinic vaccinated patients has to be representative of the wider vaccinated population.

(Note that because of matching, the unvaccinated group does not represent the general population. It mirrors the vaccinated group in the analysis set. That is the point.)

The analysis set is actually not representative because of a series of exclusions. They exclude:

  • anyone who did not take at least one PCR test prior to February 8, 2021
  • anyone who have not previously granted proper research authorization
  • anyone under 18
  • anyone who tested positive before December 1, 2020
  • anyone living in zip codes with fewer than 25 people vaccinated at one of Mayo's facilities

In addition, for the vaccinated patients, they exclude:

  • anyone who tested positive prior to their first shot (the data appear to show that people are given shots despite prior infections)
  • anyone who got their first shot on the last day of the study period, thus having no follow-up period

The zip-code exclusion is particularly instructive. This creates biases. For example, less populous zip codes may be excluded. Also, it's likely that zip codes that have lower exposure may have fewer people interested in getting vaccinations.

It's very possible that sparsely-populated zip codes pose obstacles to finding matches. This is what is called incurable imbalance. If we can't find a matched control, we have to drop the vaccinated case. Dropping such cases makes the vaccinated group in the analysis set deviate from the vaccinated group in the system. (But it improves the balance of the people who are included.)

One of the key pieces of information you're looking for in an observational study is how many cases were dropped for lack of matches. This affects whether you can generalize the result. Sometimes, lack of matches is masquerading as an exclusion, as in the example above. If those small zip codes are excluded up top, they aren't looking for a match later but the effect is the same; we would have dropped them for lack of matches.


If you have come this far with me, I'm very impressed. I ran twice the length as the usual post so I'm going to move the rest into a future post. I will be addressing several concerns I have with this analysis for those who want to jump in the deep end. I believe this paper will stand the test of time as one of the better efforts at real-world studies.