Purpose

Use the bar chart when you want to compare different categories on some numerical metric.

Uses:

Visualizing frequencies or counts
Visualizing categorical data

Code (basic R bar chart)

# Load ggplot2 graphics package
library(ggplot2)

# Create dummy dataset
df.dummy_data <- data.frame(category_var = c("A","B","C","D","E"),
                          numeric_var = c(5,2,9,4,5)
                          )

# Plot dataset with ggplot2
ggplot(data=df.dummy_data, aes(x=category_var, y=numeric_var)) +
  geom_bar(stat="identity")

 
 

Results

r-bar-chart-basic
 

Explanation

Along with the scatterplot and line chart, I consider the bar chart to be one of the “big 3” visualizations: the 3 foundational visualizations that you need to master.

And just like the code for the scatterplot and line chart, this R bar chart code (made with ggplot2) has a deep structure behind it.

Essentially, we create a plot using ggplot2 in a few key steps:
1. Specify the dataset
2. Create a relationship between the dataset and the “aesthetic attributes” of the things we want to plot
3. Plot stuff

Let’s take a look at this line by line.

(Note: we’re not going to review the dataset creation process in this tutorial. We’ll cover that in a separate post.)

ggplot(data=df.dummy_data, aes(x=category_var, y=numeric_var)) 



Here in the first line, we’re taking care of the first two steps: specifying the data we’re going to plot and creating a relationship between the dataset and “aesthetic elements.”

Using the data= parameter, we specify the dataset we want to plot.

Next, we call the aes() function. This is the function that actually creates the relationship between our dataset and the “aesthetic attributes” of the stuff we’ll plot. In this case, we’re specifying that we want to put our categorical variable, “category_var,” on the x-axis, and then we want to put numeric var on the y-axis.

Next we specify what we want to plot with the following code:

  geom_bar(stat="identity")



To be clear, this is the code that actually does the plotting.

Note as well that inside of geom_bar(), we have the following snippet of code: stat=”identity”. Don’t worry about this too much. Having said that, all this does is say that we want ggplot to plot the exact values of our variable “numeric_var.” We need to specify this because by default, geom_bar() counts the number of cases. Here, we don’t want that. We want to plot the values “numeric_var,” and we specify that with stat=”identity”.

Side note: R vs Excel

As an aside, you might be thinking that this doesn’t do much that you can’t do in Excel. While I generally agree with that in the simplest of cases, there are a lot of advantages to building such a bar chart in R, such as greater workflow automation, use the ggplot2 theme system to dramatically simplify chart formatting, and being able to perform all of your end-to-end analytics tasks in one tool, from data wrangling, to visualization, to machine learning. There’s a bit of a learning curve here, but in the end, the advantages of learning this tool are vast and will be apparent as you continue to learn beyond the basics.

Related Visualizations

Stacked Bar
Dodged Bar
Filled Bar (AKA, 100% Bar)
Histogram