When two-dimensional scatters aren’t sufficient, turn to the ternary chart
The vast majority of data visualisation deals with data presented in two spatial dimensions. Take a line chart, a bar or a column chart, or an x-y scatter. Each one takes two variables — GDP and time, for example — and plots them in a two-dimensional space. But there is a chart in the toolkit that breaks that mould: a three-dimensional scatter called a “ternary” plot. In recent years these plots have mostly been confined to specialist publications (examples can be seen here, here and here). This is the story of how we used the ternary chart to map the Brexit conundrum for an article for The Economist.
In February 2019 we published an article in the “Graphic detail” section of the newspaper about Brexit. For the piece we looked at which demographic variables affect an individual’s attitude towards Britain’s future relationship with the European Union: whether they wanted a deal; favoured “no deal”; or wanted to remain in the EU altogether.
How did our tri-chart come together? Firstly, we owe a debt of gratitude to YouGov, the polling organisation that provided us with data for 90,000 unique responses to a single question:
“The UK is currently scheduled to leave the EU on 29 March 2019. UK and EU negotiators recently completed an agreement on the terms of the UK’s exit from the EU, but this proposed deal has yet to be confirmed by the UK House of Commons. From most preferred to least preferred, how would you rank the following three options?”
No Deal: leave the EU without a withdrawal agreement.
Proposed Deal: leave the EU under the terms of the negotiated agreement.
Remain: stop the exit process and remain in the EU.
That question was asked over a 12-day period at the end of November to the beginning of December 2018. Along with each anonymous person’s response, YouGov provided us with detailed demographic variables: the respondents’ age; sex; education status; household income; region, previous voting history; and so on. We spent some time importing the codebooks and cleaning the data, to turn “pid=4” into “party=Labour”, for example. Here’s a snippet of our R code, an open-source statistical-programming language that demonstrates that process.
With the data in a clean and understandable format we set about creating a statistical model that attempted to explain what demographic factors set people apart in their Brexit preferences. Although we had 18 different variables at our disposal we found that just a handful mattered. They were previous voting history (in general elections), sex, age, education, income and political interest. We found that a three-way multi-nominal regression model predicted two-thirds of individuals’ responses correctly.
For the print version of our article we summarised these probabilities by presenting a profile of the most likely deal, no-deal, and remain supporters. This illustrated just how demographically distinct people are when they hold opposing political views.
For the online presentation we could be much more ambitious: we had the opportunity to show the distribution of Brexit views in Britain. To do so we opted for an interactive ternary plot. We worried that at first sight a 3D scatter might appear bewildering to readers, but on balance it gave us the opportunity to portray a far more detailed profile of the country.
In truth, the ternary plot isn’t that difficult to understand. It works much like a typical two-dimensional scatter chart, except each side of the triangle allows you to visualise a new dimension of the data. In this instance, each side refers to the probability that a given individual would support a particular Brexit position: no deal, deal or remain. We considered using a static version of the ternary chart for print, but quickly realised that without the ability to play around with the data it would be off-putting for some. Online, with clear sign-posting, we were confident that it would be a good charting solution.
To begin with, we made some initial drafts of the chart using a purpose-built open-source package in R, “ggtern”. Plotting every single survey response — 90,000 points in total — was overwhelming. We needed another solution. In order to whittle-down the data, we first created a set of 675,000 hypothetical voters for each combination of sex, education, income, and so on. We then attached the sample weights to each of these voters so that we knew for each theoretical observation how prevalent their demographic characteristics were.
Remarkably, just 29,000 of our 675,000 theoretical voters actually existed. The remainder were profiles of theoretical voters that didn’t exist — or at least didn’t exist in a survey of 90,000 adults. With 29,000 profiles of voters to deal with we took a weighted random sample of 2,500 of our observations. Because our random sample picked the most prevalent profiles, we were able to represent some 25% of the British electorate with just 2,500 observations of individuals. Plotting this number of points gave us the right balance between exploration and clarity.
The response to our experimental and ambitious visualisation has been wholly warm. And in the months since publication other news publications have followed suit (for example, here and here). We hope that the more people are aware of the power of this kind of chart, the more frequently it will be used in the future. Please feel free to post your favourite examples of the ternary chart in the comments below.
NB: We have made the data underlying our chart available via our GitHub repository.