Pacific Rim and Data Visualization

Pacific Rim is one of my favorite movies, but its data visualization is awful.  Let's talk my data-presentation philosophy!

In the film, giant monsters "(Kaiju", from the Japanese: think "Godzilla") have begun appearing from a portal deep under the Pacific Ocean. After a few false starts humanity concludes that the best counter is to build giant robots to fight them. Awesome. Midway through, a scientist reveals that he's analyzed kaiju sightings and they are coming more frequently: soon, there will be a "double event" where two kaiju transit the portal simultaneously.

To make his point, the scientist (Gottlieb) shows us multiple chalkboards filled with calculations, pointing at them occasionally. But...he's measuring how frequently something happens! That's "basic statistics in Excel" work (because best-fit lines are a pain to calculate by hand and trivial with a computer). And in presenting that information, a graph would be far more effective and convincing because his audience could see the trend for themselves. So let's talk about a few graphs.

NOTE for all of the below charts: from the various sources I've found (mostly Fandom.com), the canon attacks of Kaiju do NOT match the pattern described. I've constructed some data using an exponential trend to illustrate my point; if you're writing a dissertation on canon Kaiju attacks in Pacific Rim for some reason, please don't use my data.

"Kaiju per Day"

Starting with my favorite because it has the strongest message. The Kaiju attacks are increasing at an exponential rate and we want to impress our audience with how the trend is exploding, so having a line that disappears off the top of the chart is the most effective. Here is a "Kaiju per Day" chart that starts with 0 emergences (before the movie starts), then creeps up after the first attack in August 2013. To create that frequency data, I grouped the data into six-month intervals (otherwise data would be very lumpy based on whether an attack happened that day or not). I picked six months because canonically the second attack happened just under six months after the first (February 2014), so every period should have at least one attack.

Chart 1 - Kaiju per Day
Chart 1 - Kaiju per Day

So this chart clearly shows "We start off slowly, but watch that line go towards vertical - we're about to be in big trouble!"

"Intervals between Kaiju"

This is one of the most obvious ways to depict the data; indeed, back in 2014 Kyle Hill in Discover Magazine used this format. I don't like it because although it shows the same base data, the line goes down rather than up. It makes the audience do the math of "lower numbers mean more monsters" and so the emotional impact and messaging is lost. Also, as the "time between Kaiju" goes to 0, it also becomes impossible for the audience to see if it's still decreasing or leveling off.

Chart 2 - Days between Kaiju
Chart 2 - Days between Kaiju

Log Scale

Either of the above two charts could also use logarithmic scales: rather than linear like we're used to, they count by orders of magnitude. So rather than the axis of the chart counting "1, 2, 3", they count "1, 10, 100". They're very helpful when numbers get very big or very small quickly or on the same chart. The problem with them is that they require a more-trained eye to read them, and like the "intervals between Kaiju" chart they lose some emotional impact because they require more processing by the person reading them.

Chart 3 - Kaiju per Day (log scale)
Chart 3 - Kaiju per Day (log scale)

Chart 4 - Days between Kaiju (log scale)
Chart 4 - Days between Kaiju (log scale)

xkcd makes the same point in number 1162 "Log Scale":  log scales are for quitters who can't find enough paper to make their point properly.

Pie Charts

Lots of people love pie charts. They're colorful and when the data includes only one or two points they can show the relationship well. Their problem is that the human eye/brain is bad at comparing angles, especially in slices that don't include a vertical or horizontal line as one of their edges. So if the main pie slice is just about a half or quarter, they show up fine. But if the viewer needs to distinguish whether slice 5 is bigger or smaller than slice 6 and they're each 10%-20% of the pie, it's almost impossible.

In addition, those pie charts often take up a lot of space on a page or screen relative to how much information they present. In those situations, I prefer to use a bar chart for each category with the vertical axis scaled to the grand total. This makes it extremely simple to see the relationship between bars, their absolute values, and their percentages of the total. We are very good at estimating "this column is 1/3 as tall as this other column."

For these charts, pie charts aren't the best either because pies aren't for showing time-series data. In this case, I'm showing how future kaiju attacks dwarf those to-date: by 2027, more than 3 times as many attacks ever will have happened in the past three years than the previous 12.

Chart 5 - Kaiju Attacks by Period (pie)
Chart 5 - Kaiju Attacks by Period (pie - worse)

Chart 6 - Kaiju Attacks by Period (columns)
Chart 6 - Kaiju Attacks by Period (columns - better)

Conclusions

I hope you had fun with this, and maybe will avoid pie charts next time someone asks you to visualize data with a chart. Identical information can be presented in very different ways, and whether those presentations land with emotional impact dictates whether or not your message gets through.

Let me know if you have any favorite or pet-peeve charts! If you want to see other movies I've analyzed, check out the tag here or here are some of my favorites:

Comments

Most-Viewed Posts

Self-Driving Accidents

Stand Up and Cheer

Ideal Jobs and Superpowers