Select Page

Here at Sharp Sight, we have a particular philosophy about learning data science: master the fundamentals.

Data science is just like any other skill: if you want to be great at what you do, it pays to master foundational skills. If you look at true masters in any field – art, music, sports – you will consistently see that elite performers focus relentlessly on fundamentals.

But don’t just take my word for it. Listen to many of the most successful people in the world. Elite performers routinely talk about the importance of mastering the basics.

“Get the fundamentals down and the level of everything you do will rise.”
– Michael Jordan

If you want to be a great data scientist, data analyst, or analytics professional, you need to know the core data techniques backwards and forwards.

One of those techniques is the bar chart.

It’s a simple technique. To many people, it feels too simple. It’s not sexy like artificial intelligence.

But on your path to learning advanced techniques, you need to master the fundamentals like the bar chart.

In this post, I’ll tell you everything you need to know about creating a bar chart in `R`‘s `ggplot2` using `geom_bar()`.

How to make a simple bar chart with geom_bar

Ok, later in the post, I’ll show you a large variety of variations of the bar chart. I’ll also explain how ggplot2 works.

But before we do that, let’s just create a simple bar chart.

In the following example, we’ll work with the `Arthritis` dataset from the `vcd` package. If you don’t have `vcd` installed yet, make sure you install it.

Ok, he’s the code:

```library(tidyverse)
library(forcats)
library(vcd)

ggplot(data = Arthritis, aes(x = Sex)) +
geom_bar()
```

If you don’t entirely understand how this works, keep reading. I’ll explain everything.

The ultimate guide to geom_bar and ggplot barcharts

If you’re new to `R` and especially if you’re new to using `ggplot2`, even creating something simple like a ggplot bar chart can seem very confusing. I get it. The `ggplot2` system can seem a little arcane in the beginning.

However, once you understand how `ggplot` works, creating bar charts is easy. In fact, once you understand `ggplot2`, lots of visualizations become easy … even data visualizations that seem very complex.

Having said that, let’s first review how the `ggplot2` data visualization system works.

Quick review: how the ggplot2 visualization system works

`ggplot2` is very systematic. Once you get the hang of it, everything “just works.”

Critically though, you need to understand the “building blocks” of the `ggplot` system. `ggplot2` is very modular in how it works. The system has many little functions that do one thing.

This is in contrast to the base `R` system for plotting. Many of the base `R` plotting tools are single functions, but with lots of little parameters.

ggplot2 is different because each little functional “building block” typically does one thing and one thing only. Because of this, you can start by making a simple chart (like a very simple bar chart) and then iteratively build the plot, step by step.

I’ll show you that in a moment, but first let’s review some of the most important functions in the ggplot2 system.

The ggplot() function

First, let’s talk about the `ggplot()` function.

The `ggplot()` function is probably the most important piece of the `ggplot2` plotting system. You’ll use it essentially every time you create a data visualization.

Its purpose is simple: it initiates a plot. `ggplot()` basically indicates that we’re going to plot something.

… what exactly we plot will depend on the other functions we call and how we structure the rest of the syntax.

Let’s talk about those other syntactical pieces.

The data parameter

Let’s take another look at our code example for a simple bar chart:

```ggplot(data = Arthritis, aes(x = Sex)) +
geom_bar()
```

Do you see the `data =` parameter inside of the `ggplot()` function? This is the part of the syntax where we indicate what data we will plot.

In this particular example, we are telling `ggplot` that we will be plotting data in the `Arthritis` dataframe.

Let’s quickly examine `Arthritis`, just so we can see what’s in it. We’ll use two functions to inspect the data:

• `head()`
• `glimpse()`
```#=============
# INSPECT DATA
#=============

# ID Treatment  Sex Age Improved
# 1 57   Treated Male  27     Some
# 2 46   Treated Male  29     None
# 3 77   Treated Male  30     None
# 4 17   Treated Male  32   Marked
# 5 36   Treated Male  46   Marked
# 6 23   Treated Male  58   Marked

glimpse(Arthritis)
# Observations: 84
# Variables: 5
# \$ ID         57, 46, 77, 17, 36, 23, 75, 39, 33, 55, 30, 5, 63, 83, 66, 40, 6, 7, 72...
# \$ Treatment  Treated, Treated, Treated, Treated, Treated, Treated, Treated, Treated...
# \$ Sex        Male, Male, Male, Male, Male, Male, Male, Male, Male, Male, Male, Male...
# \$ Age        27, 29, 30, 32, 46, 58, 59, 59, 63, 63, 64, 64, 69, 70, 23, 32, 37, 41,...
# \$ Improved   Some, None, None, Marked, Marked, Marked, None, Marked, None, None, Non...

```

`head()` and `glimpse()` are both tools for “looking” at a dataset. As you can see, the output is a little different, but it’s similar. They both show you what’s in the data. I won’t go into extended detail about data inspection here, but I will tell you that you need to get into the habit of inspecting your data.

Ok, back to ggplot and the ggplot bar chart.

Remember: ggplot requires that your data is in a dataframe

When we use the `data =` parameter, we’re just telling the `ggplot()` function what dataset we will be working with.

Importantly, `ggplot()` expects a dataframe. If your data is in a different format – like a matrix, array, or a vector – `ggplot()` won’t work.

In this case, our dataset `Arthritis` is a dataframe.

There’s also a special kind of dataframe that `ggplot2()` understands called a `tibble`. I won’t explain dataframes and `tibbles` here, but understand that they are similar.

Ok, now let’s move on to the next piece in the syntax, the `aes()` function.

The aes() function

Let’s take another look at our code example for a simple bar chart:

```ggplot(data = Arthritis, aes(x = Sex)) +
geom_bar()
```

Notice the `aes()` function inside of `ggplot()`. The code reads `aes(x = Sex)`.

Remember that `Sex` is a variable in our dataframe. What the `aes()` function is doing is “mapping” the `Sex` variable to the x-axis.

The whole purpose of the `aes()` function is to “map” variables in our dataframe (which we identify with the `data` parameter) to the aesthetic attributes of the objects in the plot. The `aes()` function “connects” the variables in the dataset to the objects that we draw in the visualization.

ggplot2 ‘geoms’

It’s important to point out here that the “objects” that we draw are called “geoms” in `ggplot2`.

If we are drawing points (e.g., for a scatterplot) we will draw point “geoms.” If we create a line chart, we will draw line “geoms.” And if we draw bars to create a bar chart, we will draw bar geoms. If it sounds complicated, don’t overthink it. Geoms are just things that we draw like points, lines, bars, and lines.

And remember: as I noted above, the `aes()` function “maps” variables to the aesthetic attributes of the things that we draw. The aes `aes()` function maps variables to the aesthetics of geoms, like color, shape, x-axis, y-axis.

Recap: how to make a simple bar chart with geom_bar

Now that I’ve explained some of the details of the `ggplot2` system, let’s take a second look at the code we used to create our simple bar chart.

```ggplot(data = Arthritis, aes(x = Sex)) +
geom_bar()
```

Here’s what this code is doing:

• `ggplot()` initiates plotting.
• The `data=` parameter specifies that we will plot data in the `Arthritis` dataframe.
• Inside of the `aes()` function, we are specifying that we want to map the `Sex` variable to the x axis. `Sex` is a categorical variable, so the different categories will be placed on the x-axis.
• `geom_bar()` indicates that we will draw “bar” geoms … i.e., we’ll make a bar chart

It’s really straight forward once you understand how the individual pieces all work.

Note: when you’re learning data science, break everything down

Take note of how I explained the ggplot2 bar chart: I broke everything down into little pieces.

As you’re learning data science, you need to do this. Data science tools can be complicated. It is a huge help if you break everything down into small learnable pieces. Break everything down, learn and understand each small piece, and then put things “back together” to understand the greater whole.

How to modify a simple ggplot bar chart

Now that you’ve learned how to make a simple ggplot bar chart using geom_bar, let’s modify that bar chart.

We’ll make two minor modifications:

• Change the interior color of the bars
• Change the “line” color of the bars

How to change the interior color of the bars

Changing the interior color of the bars is relatively simple, but it’s a little tricky.

It’s tricky because the parameter we need to use to change the interior color is not 100% intuitive.

To change the interior color, you need to use the `fill` aesthetic, not the `color` aesthetic.

There is a reason for this (which you’ll learn about below) but you just need to remember that the `fill` aesthetic controls the interior color of the bars.

Let’s take a look. Here, we’ll change the interior color of the bars to “cyan”:

```ggplot(data = Arthritis, aes(x = Sex)) +
geom_bar(fill = 'cyan')
```

Here, we have set the `fill` aesthetic to the color ‘cyan.’

A couple of notes on this:

1. The syntax to accomplish this, `fill = 'cyan'`, appears inside of geom_bar. This is because we are modifying the bar geoms of our plot.

This is a bit of a moot point in this plot, because we only have one type of geom. But if we were creating a more complicated plot with multiple geoms, this would matter. If our plot had multiple geoms, we would need to modify the color of the geoms independently … to do this, we must modify the `fill` aesthetic inside of the geom.

2. We don’t need to use the `aes()` function in this case. Here, the modification of the `fill` aesthetic is happening inside of geom_bar, but outside of the `aes()` function. This is because we are setting the `fill` aesthetic to a constant value. When we set an aesthetic to a constant value, we don’t need to use the `aes()` function. We only need to use the `aes()` function if we need to map a variable in our dataframe to an aesthetic attribute.

Now that you’ve learned how to change the interior color of the bars, let’s talk about how to change the border color of the bars.

How to change the border color of the bars

Changing the border color is also simple. To change the border color of the bars, you need to use the `color` aesthetic.

```ggplot(data = Arthritis, aes(x = Sex)) +
geom_bar(color = 'red')

```

Here, we have set the `color` aesthetic to the value ‘red.’

Again, this can be a little confusing. Remember that the `color` aesthetic controls the border color, not the interior color.

On more than one occasion, I have mistakenly used the `color` aesthetic when I meant to use the `fill` aesthetic and visa versa. This is just one of those things that you have to memorize and use repeatedly until you completely remember the difference.

Bar chart variations

Now that I’ve shown you how to make a simple bar chart, and shown you how to make some simple color modifications, let’s take a look at some variations of the bar chart.

We’ll look at the:

• horizontal bar chart
• stacked bar chart
• 100% stacked bar chart

Horizontal bar chart

In `ggplot2`, it’s very easy to create a horizontal bar chart. To do this, just add the code `coord_flip()` after your bar chart code (and don’t forget the `+` sign).

```# HORIZONTAL BAR CHART
ggplot(data = Arthritis, aes(x = Sex)) +
geom_bar() +
coord_flip()
```

`coord_flip()` does exactly what it sounds like it does: it “flips” the coordinates.

In this case, it is taking the x-axis and “flipping” it to the position typically occupied by the y-axis (and visa versa).

By the way, this is one of the reasons that I love `ggplot2`. In many cases, making modifications to a chart can be accomplished by calling a new function in addition to the original code. The different functions of `ggplot2` work like LEGO building blocks that you can “stack up.” This enables you to build plots itteratively, building your code one line at a time.

Stacked bar chart

Now, let’s make a stacked bar chart.

The staked bar is a pretty simple variation of the basic bar chart.

To create it, just map a categorical variable to the `fill` aesthetic.

```library(vcd)

ggplot(data = Arthritis, aes(x = Treatment)) +
geom_bar(aes(fill = Improved))
```

This is almost exactly the same as code for a simple bar chart. We’ve specified our dataframe with the `data` parameter, and we’ve mapped a categorical variable, `Treatment`, to the x-axis.

But notice that inside of geom_bar, we have another call to the `aes()` function. There, inside of the `aes()` function, we’ve mapped another categorical variable, `Improved`, to the `fill` aesthetic.

What this is doing is it’s “filling in” the bars according to the number of the records for each category of `Improved`.

Note that this does not change the total length of the bar. It’s just dividing up the bar according to a second categorical variable. This enables us to see the proportion of observations that belong to a particular category, within the original category.

100% stacked bar chart

Now let’s do a variation of the stacked bar. Here, we’ll create a 100% stacked bar.

The 100% stacked bar is very similar to the stacked bar, but instead of the length of the bar corresponding to the number of records, it “fills in” the whole axis.

```ggplot(data = Arthritis, aes(x = Treatment)) +
geom_bar(aes(fill = Improved), position = "fill")

```

Here, you can see that the length of each bar extends along the entire length of the y-axis. The bars “fill in” the plot area.

Syntactically, we are doing this with a small snippet of code: `position = "fill"`.

Notice though that this is otherwise identical to the code for the stacked bar chart. Again, this is one of the benefits of using `ggplot2`. One simple modification enables you to change the chart from a stacked bar chart to a 100% stacked bar chart.