Why data storytelling is a critical data science skill

In many of my blog posts, I emphasize the importance of data visualization. But after you learn data visualization, you need to learn how to do “data storytelling.”

Before talking about data storytelling, let’s talk about why single, independent visualizations are commonly inadequate to generate and communicate insights.

In several of my recent blog posts – and many of my historical posts – I did a “quick analysis” of a topic. For example, I created a visualization of the “Best states for business” where I visualized survey data about (… wait for it) the best states for business. As I pointed out in the article, my initial goal was to gain some insight for myself. I wanted to learn about which states are the best for business.


Map of the best states for business

I also created a different visualization a few months ago of possible locations for Amazon’s new headquarters.


Map of possible Amazon HQ cities

In that blog post, I plotted the population of each city along with the percent of the population with a bachelors degree. I did this to get a sense of which cities might be good locations for Amazon’s new HQ. It’s a good visualization but a very simple analysis.

I want to make it clear that small and simple visualizations like these don’t give the full picture. The two examples above are both good visualizations, but they don’t completely answer the question that prompted them.

When I post visualizations like these on the Sharp Sight blog, I’m typically using them to demonstrate a particular technique or to illustrate a particular point.

Having said that, you need to understand that single, one-off visualizations almost never completely answer a question. Simple visualizations also probably wouldn’t pass as a full analysis in a real-world business environment.

In a real-world environment (like in a business, on a data science team), individual visualizations would need to be viewed in the context of a larger analysis. These larger analyses are usually composed of many visualizations and a data-driven argument that leads the viewer to a particular conclusion.

That is to say: one visualization is almost never enough.

One visualization is almost never enough … you need data storytelling

One data visualization is almost never enough to answer a question completely.

More often than not, you’ll need to create several data visualizations. You’ll start from a high level and “drill in” to the data.

This is actually very similar to exploratory data analysis (AKA, data exploration or EDA). When you perform exploratory data analysis, you typically need to start from the “top” and dig in to a dataset. What this means is that you commonly create high-level visualizations and drill into the data by using more detailed visualizations. When you find something interesting, you drill down with more charts and graphs.

Data storytelling is like data exploration: you need to use multiple charts

During exploratory data analysis, you use multiple data visualizations to discover insights. You are using visualizations as tools to generate insight for yourself first and foremost. You build insight in your own mind step by step, chart by chart, visualization by visualization.

The critical point here is that you commonly need many charts to do this. To discover insights, you start with the high level charts and then you drill down by using more charts. Starting at the beginning, you’ll use charts to find interesting features in the data. Next, you use new charts to explore and explain those features. Then you find more interesting features, create more charts to dig deeper, and so on. In exploratory data analysis, you need to make multiple charts to really understand the data and find out what’s interesting.

When you present your results, you will need multiple charts

The same thing applies when you present your analytical results to other people. One chart isn’t enough. Commonly, you need to start at the top with a high level chart and then “drill in” showing them interesting details in the data. Then you’ll continue to drill in with new charts that explain or otherwise highlight those details.

A solid, detailed analysis might be dozens of pages long with dozens of visualizations. For example, when I worked at Apple, we had a monthly analysis that was over 120 pages long, with almost as many charts, graphs, and visualizations. When you create an analysis for business partners, you’re going to use a lot of charts.

That being said, one chart is not enough. One chart doesn’t tell a full story. You need to learn how to use multiple charts so you can “tell a story” about your data and convince your partners of a particular point or set of points.

Managers and recruiters want you to tell stories with data

Being able to use data to convince business partners of some point is a critical data skill. That being the case, you’ll often find “storytelling with data” as a job requirement in many data science job ads.


Guy holding a microphone and doing some data storytelling

Hiring managers and data science recruiters constantly use the term “storytelling with data.” They want you to be able to uncover and communicate insights using charts and graphs. That’s what they mean when they say “storytelling with data.” They want you do be a data communicator. They want you to be able to find and communicate insights using data-driven tools like charts and visualizations.

To be able to do this – to be able to communicate with data and do data storytelling – you typically need to use many different charts. Just like with data exploration, you need to start at the top and “drill down” with more detailed charts that explain important features and walk the viewer through a set of insights that you’ve found.

Like I mentioned earlier, when I worked at Apple, we had a monthly presentation that was somewhere around 120 pages long. That sounds like a lot, but it was broken down into sections, and in each section we started a high level and then “drilled down” into relevant details a methodical way. To do this, we used multiple charts and graphs. Moreover, I’ll also point out that we carefully selected these charts and visualizations to highlight relevant information and tell the right “story” with the data.

Critically though, you need to understand that a more complete analysis typically requires a lot of charts and graphs. When you’re analyzing data, presenting an analysis to your partners, and storytelling with data, one chart probably won’t be enough.

You need to build arguments with your data

Another way to think about this is that you’re building an argument with data. In this way, data storytelling is sort of like an argumentative essay or like litigation.

Data storytelling is like building a legal argument

I’ve actually heard litigation described as “competitive storytelling with provable facts.” The idea of storytelling with provable, data-driven facts seems to be a good way of thinking about data storytelling. When you do data storytelling, you’re sort of building an argument with charts and graphs.

Commonly, when you perform an analysis, you’ll do so in the context of some strategic decision that’s being made; there’s some request from a partner that’s prompted by a strategic need. For example, a business partner will request an analysis to find out the “best performing market.” Even more generally, a business partner might ask you to find “the best opportunity,” which is sufficiently vague to be almost impossible, but important enough that your business partners will demand that you answer anyway.

In these cases, you’ll need to build an argument. You’ll need to answer their question by discovering a compelling story in the data by using top-down exploratory data analysis techniques. Then, when you find such a story, you will need to present that story to your partners via a series of charts. You’ll need to lead them to a conclusion that is well supported by the data. Once again though, to find this story in the data, you’ll need to use many charts. And to present the story to your partners, you’ll need to use multiple charts and graphs. When you build a data-driven argument – when you perform data storytelling – you’ll need to use multiple charts.

Data storytelling is like writing an argumentative essay

Another, and perhaps simpler analogy, is the argumentative essay. Most people in high school and college had to write papers where they started with a particular thesis, and then defended and supported that thesis with a set of facts. In the simplest version, the 5 paragraph essay, the author starts with the thesis in the first paragraph, has 3 “body” paragraphs where you support and defend the thesis, and a conclusion paragraph that restates the thesis, summarizes the main supporting evidence, and tries to lead the reader to the final conclusion.

At its best, “data storytelling” is quite a bit like an argumentative essay. Ideally, you’ll start with a position at the beginning of an analysis, and then use charts, graphs, and visualizations as evidence to support that position. In my previous example of the 120 page analysis at Apple, we started with key findings at the beginning, and commonly made strategic recommendations at the beginning of the presentation, and then we supported those findings and recommendations with the ensuing charts and graphs.

The critical point here though, is that you typically need many charts, graphs, tables, statistics, and visualizations to make your argument and support your position.

Data storytelling is a next step after you master data visualization

For almost 4 years, I’ve been telling readers of this blog to master data visualization first. There are several reasons for this recommendation, but one of them is because data visualization gives you the basic tools for data storytelling.

As I’ve already noted, data storytelling requires you to use many charts in a sequence. Commonly, you start with high-level visualizations and then “drill down” to more detailed information. At every step, you use a well-chosen data visualization to highlight a point or lead your audience toward a particular, well-supported conclusion.

However, this requires that you can actually build those individual charts. You need to be able to create charts and graphs; you need to be able to polish them; and you need to know how to use them properly as tools for generating insights. They are critical, so you need to master data visualization skills.

But once you do, data storytelling is the logical next step. Once you know the basic tools, you can start stringing them together into larger presentations.

Before you start data storytelling, make sure you’ve mastered data visualization

Having said that, you need a full set of visual tools in your data toolkit before you try to do data storytelling.

Data visualizations are a lot like tools which can be used in different ways. Some visualizations are good for particular analytical problems or particular communication problems.

Ideally, you need to master the basic tools of visualization and you need to understand how to use them. The basics that you need are:

  • The scatterplot
  • The bar chart
  • The line chart
  • The histogram
  • The small multiple

These five charts form the core tools that you’ll need to storytell with data. There are a few others that are also useful like the box-plot and a few others, but these 5 are the core.

Data visualization and data storytelling help you create large amounts of business value

The reason that these visualization tools are “core” tools is that they are very useful and valuable.

The five visualizations that I just mentioned are the primary tools that a junior data scientist (who in many cases are just glorified data analysts) can use to find insights and “tell stories” with data. Don’t be fooled by their simplicity. You can create large amounts of value with just these tools. With just these 5 visualizations, you can discover valuable insights in data and communicate those insights to business partners. Don’t underestimate the importance of these tools. You need to have these tools in your toolkit, and you need to know how to use them properly.

Ideally, you need to learn to use these tools before you move on to advanced topics. Beginners always want to move on to the most advanced topics. Beginners think they need the sexiest, coolest, most advanced tools to create value. For example, a beginner recently contacted me saying that they were struggling with machine learning. When I pressed them and asked a few questions, they told me that they hadn’t mastered the basic tools like the bar chart and scatter plot. This person and many other beginning data science students think that they need to know advanced tools to create value and get a data science job.

No.

If you’ve mastered basic tools and you know how to use them step by step to “build an argument” or “tell a story.” you can find insights that could be worth millions of dollars.

This is why I stress over and over that you should absolutely master the basics first. If you’ve moved on to machine learning, artificial intelligence, deep learning, or other topics, but you haven’t mastered the basics, then you’re doing it wrong. Don’t be that guy. Don’t be the guy that says “I’m studying machine learning” but can’t create a simple scatterplot (which you’ll need for ML anyway). Master. The. Basics. You should know the 5 core visualizations backwards and forwards.

How and why to use data storytelling

So let’s summarize my recommendations.

To tell stories with data, one chart or visualization typically isn’t enough. Commonly, you’ll need to create multiple charts, graphs, and visualizations to identify insights and communicate those insights to other people. You’ll need to use charts as tools to “tell a story” or “defend” a particular argumentative position.

If you want to get good at this, master basic tools first. Master the bar, line, scatter, histogram, and small multiple.

Once you’ve done this, you’ll have a good foundation for advanced topics like advanced data visualization, geospatial visualization (i.e., maps), deep-dive analytics, and machine learning.

Master the tools of data storytelling

Data storytelling is important, and it requires you to know critical data science tools.

I want to show you how to master those tools faster than you though possible.

Sign up for our email list, and you’ll get the tips, tricks and strategies to rapidly master the tools of data science.

When you sign up, you’ll get weekly tutorials delivered to your inbox.

Moreover, if you sign up now, you’ll get access to our FREE Data Science Crash Course.

In the Data Science Crash Course, you’ll learn:

  • a step-by-step data science learning plan

  • the 1 programming language you need to learn

  • 3 essential data visualizations
  • how to do data manipulation in R
  • how to get started with machine learning
  • the difference between machine learning and statistics

SIGN UP NOW

Joshua Ebner

Joshua Ebner is the founder, CEO, and Chief Data Scientist of Sharp Sight.   Prior to founding the company, Josh worked as a Data Scientist at Apple.   He has a degree in Physics from Cornell University.   For more daily data science advice, follow Josh on LinkedIn.

Leave a Comment