# The ultimate guide to the ggplot boxplot

This tutorial will explain how to create a ggplot boxplot.

It explains the syntax, and shows clear, step-by-step examples of how to create a boxplot in R using ggplot2.

If you need something specific, you can click on any of the following links, and it will take you to the appropriate section in the tutorial:

If you have the time though, you should probably read the whole tutorial. It will make more sense if you do.

## A Quick Review of Boxplots

Before we look at the syntax for the ggplot boxplot, let’s quickly review what boxplots are and how they’re structured.

#### Boxplots visualize summary statistics for your data

Boxlots are a type of data visualization that shows summary statistics for your data.

More specifically, boxplots visualize what we call the “five number summary.” The five number summary is a set of values that includes:

• the minimum
• the first quartile (25th percentile)
• the median
• the third quartile (75th percentile)
• the maximum

When we plot these statistics in the form of a boxplot, it looks something like this: Take a look specifically at the structure. The different parts of the box and the two ends of the “whiskers” visualize our 5 number summary.

##### The Box

The box itself forms the core of the boxplot.

One side of the box represents the 25th percentile of our data (this is also called “the 1st quartile”, or Q1). The other end of the box represents the 75th percentile of our data (this is also called “the 3rd quartile”, or Q3).

Notice as well that there’s a line that’s a drawn interior of the box (the dotted line, in the above example). That line represents the median of the data (AKA, the “second quartile” or Q2).

So the box itself shows us the 25th percentile, the median, and the 75th percentile. All by itself, this gives us a lot of information about how the data are distributed.

Additionally, the width of the box gives us some information. The width of the box ranges from the 25th percentile and the 75th percentile. This is commonly known as the “interquartile range,” or IQR for short.

##### The Whiskers

Notice that on either side of the box, there are some lines that extend beyond the box. We typically call these the “whiskers.”

These whisker lines show the location of the minimum value on one side, and the maximum value on the other.

Typically, these minimum and maximum values are calculated according to a formula. Commonly, the minimum is calculated as Q1 – 1.5*IQR and the maximum is calculated as Q3 + 1.5*IQR.

So in addition to showing the interquartile range, the boxplot also shows us minima and maxima. This can help us understand the high and low ranges for the data.

##### The Outliers

Finally, in the simple example above, you might notice some dots that exist beyond one of the whiskers.

These points represent outliers.

Remember, as noted in the section above, the “minimum” and “maximum” values in the boxplot are commonly calculated values.

Any outliers that we plot are simply values that are more extreme than those calculated minima and maxima (i.e., beyond 1.5*IQR from either end of the box).

These outliers show us the extreme values that might exist in the data.

So that’s the basic structure of a boxplot.

Now that we’ve reviewed the parts of a boxplot, let’s look at how to create one with ggplot2.

## An Introduction to the ggplot Boxplot

Now, let’s talk about how to create a boxplot in R with ggplot2.

In the next few sections, I’ll explain the syntax, and then I’ll show you clear examples of how to create both a simple boxplot, and also how to create variations of the boxplot.

### Syntax of the ggplot Boxplot

Let’s take a look at the syntax.

The syntax is relatively straightforward, as long as you already know how ggplot2 works. (To learn more about the ggplot2 visualization system check out our guide to ggplot2 for beginners.) To plot a boxplot, you’ll call the `ggplot` function. Inside the function, you’ll have the `data` parameter, the `x` and `y` parameter (which are typically called inside the `aes` function). And finally you have the `geom_boxplot` function.

Let’s talk about each of these.

#### The data parameter

The `data` parameter enables us to specify the dataframe that we want to plot.

Remember that ggplot2 is primarily set up to work with R dataframes, so we specify the dataframe with this parameter. For example, if your dataframe is named `mydataframe`, then you’ll set the syntax to `data = mydataframe`.

You’ll see examples of how this works in the examples section.

#### The x and y parameters

The `x` and `y` parameters enable you to specify the variables that you want to map to the x-axis and y-axis, respectively.

Note that these parameters are called inside of the `aes()` function. Remember that in the ggplot2 system, the the `aes()` function specifies how we map variables to aesthetic attributes of the plot.

(Again, to learn more about the `aes()` function, check out our guide to ggplot2 for beginners.)

#### The boxplot “geom”

Finally, we have the syntax `geom_boxplot()`. This syntax tells ggplot that we want to create a boxplot from our data, and from the variable mappings that we’ve set with the `aes` function.

If you’re confused about this, you need to understand what geoms are. Once again, to understand “geoms” and how they fit into the ggplot2 system, please see our our guide to ggplot2 for beginners.

The ggplot system also has other parameters that you can manipulate, like:

• `color`
• `fill`
• `alpha` (i.e., opacity)

And others.

I’ll show you some examples of some simple modifications that you can made in the upcoming examples.

## Examples: How to make boxplots with ggplot2

The boxplot is very easy to make using ggplot2. We’ll take a look at a few variations.

Examples:

But before we actually make our boxplots, we’ll need to run some code.

#### Preliminary code

In order to run our examples, we need to load the `tidyverse` package. We should also look at the data we’re going to plot.

First, we’ll load the `tidyverse` package. The `tidyverse` package actually contains the `ggplot2` package, as well as several other important R packages like `dplyr`, `tidyr`, and others.

```# LOAD TIDYVERSE PACKAGE
library(tidyverse)
```
##### Inspect data

In these examples, we’ll be working with the `msleep` dataframe.

This dataset contains data on the sleep patterns of different animals.

We can take a look with the `glimpse()` function.

```# INSPECT DATA
msleep %>% glimpse()
```

OUT:

```Rows: 83
Columns: 11
\$ name          "Cheetah", "Owl monkey", "Mountain beaver", "Greater short-tai…
\$ genus         "Acinonyx", "Aotus", "Aplodontia", "Blarina", "Bos", "Bradypus…
\$ vore          "carni", "omni", "herbi", "omni", "herbi", "herbi", "carni", N…
\$ order         "Carnivora", "Primates", "Rodentia", "Soricomorpha", "Artiodac…
\$ conservation  "lc", NA, "nt", "lc", "domesticated", NA, "vu", NA, "domestica…
\$ sleep_total   12.1, 17.0, 14.4, 14.9, 4.0, 14.4, 8.7, 7.0, 10.1, 3.0, 5.3, 9…
\$ sleep_rem     NA, 1.8, 2.4, 2.3, 0.7, 2.2, 1.4, NA, 2.9, NA, 0.6, 0.8, 0.7, …
\$ sleep_cycle   NA, NA, NA, 0.1333333, 0.6666667, 0.7666667, 0.3833333, NA, 0.…
\$ awake         11.9, 7.0, 9.6, 9.1, 20.0, 9.6, 15.3, 17.0, 13.9, 21.0, 18.7, …
\$ brainwt       NA, 0.01550, NA, 0.00029, 0.42300, NA, NA, NA, 0.07000, 0.0982…
\$ bodywt        50.000, 0.480, 1.350, 0.019, 600.000, 3.850, 20.490, 0.045, 14…
```

Notice that there are several categorical variables, as well as numeric variables.

This makes it very well suited for visualization with a boxplot.

Let’s look at some examples.

### EXAMPLE 1: simple ggplot boxplot

First, we’ll create a very simple boxplot.

Here, we’l

```# PLOT BOXPLOT
ggplot(data = msleep, aes(x = sleep_total)) +
geom_boxplot()
```

And here’s what it looks like: ##### Explanation

Here, we’ve mapped a single numeric variable to the `x` parameter, `sleep_total`.

When we create a boxplot with this mapping, ggplot outputs a horizontal boxplot of that numeric variable.

### EXAMPLE 2: ggplot boxplot by category

Next, we’ll create a boxplot that’s broken out by a categorical variable.

Let’s run the code, and then I’ll explain.

```# PLOT BOXPLOT
ggplot(data = msleep, aes(x = vore, y = sleep_total)) +
geom_boxplot()
```

And here’s what it looks like: ##### Explanation

Here, we mapped the categorical variable `vore` to the `x` parameter and the numeric variable `sleep_total` to the `y` parameter.

Notice that the orientation of the boxplot depends on what variable you map to which axis!

As you can see, since `vore` is a categorical variable, ggplot creates a separate boxplot for each category.

This is very useful for comparing data distributions across categories in your data.

### EXAMPLE 3: make a horizontal boxplot by category

Next, we’ll create a horizontal boxplot.

This will be the same as the boxplot in example 2, except the orientation will be different.

```# PLOT BOXPLOT
ggplot(data = msleep, aes(x = sleep_total, y = vore)) +
geom_boxplot()
```

OUT: ##### Explanation

Again, this is the same boxplot that we had in example 2, except it’s flipped on it’s side.

Notice again that the orientation of the boxplot depends on which variables are mapped to the `x` and `y` parameters.

### EXAMPLE 4: Change the box color

Next we’ll change the color of the boxes.

To do this, we actually need to use the `fill` parameter.

The `fill` parameter controls the color of the interior of the boxes, but the `color` parameter actually controls the border color.

```ggplot(data = msleep, aes(x = sleep_total, y = vore)) +
geom_boxplot(fill = 'red')
```

OUT: ##### Explanation

Here, we changed the box color to `red` by setting `fill = 'red'`.

Notice that we did this inside the `geom_boxplot()` function. This tells ggplot2 that we’re specifically changing the fill color of the boxes. (This comes in handy if we have a layered plot with more than one geom type.)

### EXAMPLE 5: Add a title

Let’s do one more thing.

```ggplot(data = msleep, aes(x = sleep_total, y = vore)) +
geom_boxplot(fill = 'red') +
labs(title = 'On average, insects sleep more than other organism types')
```

OUT: ##### Explanation

Here, we added a title using the `labs()` function.

Titles and axis labels are relatively easy, but there are some important details that you might need to know.

Having said that, for more information on titles and axis labels, check out our tutorial on ggplot titles.

Do you have questions about the ggplot boxplot?

Is there something that I missed, or something else you’d like to know?

If so, leave your question in the comments section near the bottom of the page.