Death is permanent… unless you are an avenger!

I have a habit of “productive procrastinating”, a situation where I trick myself into believing I am somehow yielding productivity while procrastinating. This past weekend, I devoured 10 Marvel movies in 2 days after ignoring the Marvel Cinematic Universe for about a decade. As a result of a Marvel high and wasted time, I implemented this flawed framework and performed a statistical analysis.

I went on to Dataset Search, an excellent platform for data nerds looking for datasets. There was a plethora of Avengers/Marvel datasets available, but FiveThirtyEight’s Avenger dataset stood out.

The dataset was perfectly curated it contained all the Avengers’ full names and information about their deaths. I thought I could expand on FiveThirtyEight’s idea as explained in the README and come up with interesting insights of my own through visualizations.

It is important to note that “death” include instances of faking one’s death. What truly classifies as “death” is when both the readers and other characters think the character is really death, not whether or not the character breathes. I also believe this data set is based on the Marvel comics.

Explanatory

The first phase was exploring the data set. The raw data I obtained after importing into RStudio contained many columns. I wanted to focus on columns that contained the characters’ names, deaths, and number of appearances.

I realized there were 5 death columns: Death1, Death2, Death3, Death4, Death5. They were categorical and they showed how whether or not each character died.

Here are some examples:

Example 1: Wanda Maximoff, beloved Scarlett Witch who moves objects with her thoughts, died once. So column Death1 has a Yes entry, while every other Death column is empty.

Example 2: Our hammer wielding superhero Thor, on the other hand, died once was resurrected once only to die again.

I knew I had to restructure way death was recorded for ease, which brings me to next step, data cleaning.

Data Cleaning

I ran a loop that gave a point each time the character died by making a new column. So characters who died once like Wanda would get 1 point while Thor would get 2 points. Characters who never died like T’Challa (truly the perfect character) gets 0 points.

I’m not sure if there was a data processing mix up but after the previous step, I found some rows had no name entries, which I later removed.

Visualizations

I used ggplot2’s library to perform the data visualization, which was relatively straightforward if you know R.

A lot of the Avengers have died at least once but only a few characters have died more than three times.

Now, let’s take a look at this pie chart that will give you a better sense of how “dangerous” it is to be an avenger.

Being an avenger has a ~61% death rate. I’m not sure about the specifics but the odds seem pretty high to me. Almost every other job or activity on Earth has a better mortality rate.

But let’s look at the Avengers’ rate of return. There were many cases, even in the movies, when we thought a character had died only to have them return at the end of the movie or in the next movie- I’m looking at you Nick Fury.

Less than 50% of the characters who “died” stayed dead. The rate of revival is high but given the high rate of death in the first place, this is justified. If all Avengers characters actually died when they did, then Marvel comic writers would have to constantly churn out new characters.

I also investigated if character appearances influenced the character’s death rate.

Welch’s sample t-test shows there doesn’t seem to be a significant difference in the number of appearance between the groups of Avengers character who have died at least once and those who have never.

It’s safe to say that as an Avengers, dying or having everyone think you’re dead when you’re still very much alive is part of the job. And number of appearances does not discriminate.

Fun findings: As someone who wasn’t interest in the MCU prior, I always thought Captain America or Iron Man was the face of Avengers. So I was quite surprised to find that Spiderman had the highest number of appearances, meaning he appeared most frequently in the comics.

But I guess there’s that.

Thanks for reading! Here’s the link to the code on Github if you’re interested.


As an Avenger, Dying Is Part of The Job was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.