An Introduction and Tutorial on Choropleth Maps

Image by Miroslava Chrienova from Pixabay

Introduction

I am a huge advocate for data visualizations because I believe that it is the most effective method to illustrate and explain complex information, especially numerical data, in a simple and digestible manner. Also when performed properly, visualizing data can reduce or help mitigate bias in data interpretation.

One of my favorite types of visualizations is animated choropleth maps. Given that there’s a pandemic going on right now, I thought that now would be a good time to demonstrate the power of animated choropleth maps.

In this article, you’ll learn the following:

  1. What choropleth maps are
  2. When and why they are most effective to use
  3. Python code to create your own choropleth maps like this:

Choropleth Maps

A choropleth map is a type of thematic map where areas or regions are shaded in proportion to a given data variable.

Static choropleth maps are most useful when you want to compare a desired variable by region. For example, if you wanted to compare the crime rate of each state in the US at a given moment, you could visualize it with a static choropleth.

An animated or dynamic choropleth map is similar to a static choropleth map, except that you can compare a variable by region, over time. This adds a third dimension of information and is what makes these visualizations so interesting and powerful.

Visualizing the Coronavirus Pandemic

The data that I used to create the following visualizations is the Novel Corona Virus 2019 dataset from Kaggle, which can be found here. The dataset is a composite of multiple sources including the World Health Organization, National Health Commission of the People’s Republic of China, and the United States Centers for Disease Control.

Note: If any one of these sources fails to provide accurate and timely data, the visualizations may be skewed or inaccurate.

Static Choropleth

Below is a static choropleth of the total number of confirmed cases of the coronavirus by country as of March 28, 2020. You can see that the countries with the leading number of cases include the US, China, and Italy, along with a couple of other European countries.

Static Choropleth of Confirmed Cases of the Coronavirus

The code to create this is as follows:

# Import libraries
import numpy as np
import pandas as pd
import plotly as py
import plotly.express as px
import plotly.graph_objs as go
from plotly.subplots import make_subplots
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
init_notebook_mode(connected=True)
# Read Data
df = pd.read_csv("../input/novel-corona-virus-2019-dataset/covid_19_data.csv")
# Rename columns
df = df.rename(columns={'Country/Region':'Country'})
df = df.rename(columns={'ObservationDate':'Date'})
# Manipulate Dataframe
df_countries = df.groupby(['Country', 'Date']).sum().reset_index().sort_values('Date', ascending=False)
df_countries = df_countries.drop_duplicates(subset = ['Country'])
df_countries = df_countries[df_countries['Confirmed']>0]
# Create the Choropleth
fig = go.Figure(data=go.Choropleth(
locations = df_countries['Country'],
locationmode = 'country names',
z = df_countries['Confirmed'],
colorscale = 'Reds',
marker_line_color = 'black',
marker_line_width = 0.5,
))
fig.update_layout(
title_text = 'Confirmed Cases as of March 28, 2020',
title_x = 0.5,
geo=dict(
showframe = False,
showcoastlines = False,
projection_type = 'equirectangular'
)
)

Notice that all you’re really doing is setting up a couple of parameters to refer to specific variables in your dataset, like locations, locationmode, and z. The rest of the code is used to change the appearance of the choropleth map.

Animated Choropleth Map

You can see how much more effective and engaging an animated choropleth map is compared to a static one. Here, we’re looking at the total number of confirmed cases of the coronavirus by country over time. You can see how China has had the most number of cases by far until very recently.

The code to create this is as follows:

# Manipulating the original dataframe
df_countrydate = df[df['Confirmed']>0]
df_countrydate = df_countrydate.groupby(['Date','Country']).sum().reset_index()
df_countrydate
# Creating the visualization
fig = px.choropleth(df_countrydate,
locations="Country",
locationmode = "country names",
color="Confirmed",
hover_name="Country",
animation_frame="Date"
)
fig.update_layout(
title_text = 'Global Spread of Coronavirus',
title_x = 0.5,
geo=dict(
showframe = False,
showcoastlines = False,
))

fig.show()

Thanks for the read!

If you like my work and want to support me…

  • Be one of the FIRST to subscribe to my new YouTube channel here! While there aren’t any videos yet, I’ll be sharing lots of amazing content like this but in video form.
  • Follow me on LinkedIn here.
  • Sign up on my email list here.

Related Articles

Resources