The ggplot histogram is very easy to make.

But like many things in `ggplot2`, it can seem a little complicated at first. In this article, we’ll show you exactly how to make a simple ggplot histogram, show you how to modify it, explain how it can be used, and more.

Let’s jump in.

## Building a simple ggplot histogram

In order to build a histogram using `ggplot2`, you need to know how the ggplot system works. It’s not terribly hard once you get the hang of it, but it can be a little confusing to beginners.

Before we get into it, let’s install `ggplot2` and the `tidyverse` package. We’ll also inspect `txhousing`, which is the dataset that we’ll be using.

```#-----------------
# INSTALL PACKAGES
#-----------------
library(tidyverse)
library(ggplot2)

#--------
# INSPECT
#--------
txhousing %>% glimpse()

# Observations: 8,602
# Variables: 9
# \$ city       "Abilene", "Abilene", "Abilene", "Abilene", "Abilene", "Abilene", "Abil...
# \$ year       2000, 2000, 2000, 2000, 2000, 2000, 2000, 2000, 2000, 2000, 2000, 2000,...
# \$ month      1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1...
# \$ sales      72, 98, 130, 98, 141, 156, 152, 131, 104, 101, 100, 92, 75, 112, 118, 1...
# \$ volume     5380000, 6505000, 9285000, 9730000, 10590000, 13910000, 12635000, 10710...
# \$ median     71400, 58700, 58100, 68600, 67300, 66900, 73500, 75000, 64500, 59300, 7...
# \$ listings   701, 746, 784, 785, 794, 780, 742, 765, 771, 764, 721, 658, 779, 700, 7...
# \$ inventory  6.3, 6.6, 6.8, 6.9, 6.8, 6.6, 6.2, 6.4, 6.5, 6.6, 6.2, 5.7, 6.8, 6.0, 6...
# \$ date       2000.000, 2000.083, 2000.167, 2000.250, 2000.333, 2000.417, 2000.500, 2...

```

Now, let’s make a simple ggplot histogram:

```#-----
# PLOT
#-----

# BASIC HISTOGRAM
ggplot(data = txhousing, aes(x = median)) +
geom_histogram()

``` This histogram is pretty simple to create if you know how ggplot works.

But on the assumption that you’re a little unfamiliar with ggplot, let’s quickly review how the `ggplot2` system works.

### Quick review: how the ggplot2 system works

I love `ggplot2`.

Part of the reason is that it’s extremely systematic. Once you know how the `ggplot2` system works, you can create almost any visualization with relative ease. Histograms are just a very simple example.

#### The `ggplot()` function

The `ggplot()` function essentially initiates ggplot plotting. It tells `R` that we’ll be using the `ggplot2` library to build a plot or data visualization.

#### The `aes()` function

The `aes()` indicates our variable mappings.

If you haven’t done this before, then “variable mapping” might not immediately make sense. It’s relatively straightforward though.

Let’s take a look at our histogram code again to try to make this more clear.

```# BASIC HISTOGRAM
ggplot(data = txhousing, aes(x = median)) +
geom_histogram()
```

Notice that inside of the `aes()` we have the expression `x = median`.

What are we doing there?

We are “mapping” the `median` variable to the x axis. Notice again that this expression appears inside of the `aes()` function. Why? Because it is a variable mapping. All mappings from datasets to “aesthetic attributes” like the x-axis occur inside of the `aes()` function.

This can get a lot more complicated. For example, with a scatterplot, you’ll map a variable to the x axis and another variable to the y axis. It can get even more complicated with advanced visualization techniques, but the basics are straightforward. A dataset has variables. A visualization has aesthetic attributes like the x axis, y axis, color, shape, etc. We need to “connect” the variables to the aesthetic attributes. This can be accomplished with the `aes()` function.

### A step-by-step breakdown of a ggplot histogram

Ok.

The `ggplot()` function initiates plotting. The `aes()` function specifies how we want to “map” or “connect” variables in our dataset to the aesthetic attributes of the shapes we plot.

With that knowledge in mind, let’s revisit our ggplot histogram and break it down.

```#-----
# PLOT
#-----

# BASIC HISTOGRAM
ggplot(data = txhousing, aes(x = median)) +
geom_histogram()

```

What are we doing here?

`ggplot()` indicates that we’re going to plot something. The `data =` parameter indicates that we’ll plot data from the `txhousing` dataset. Inside of the `aes()` function, we’re specifying that we want to put the “`median`” variable on the x axis. Finally, `geom_histogram()` indicates that we are going to plot a histogram.

(I wont’ go over “geom” entirely here. Suffice it to say, there are many different geoms in `ggplot2` that plot different types of things.)

## How to modify the ggplot histogram

Now that we’ve created a simple histogram with `ggplot2`, let’s make some simple modifications.

### Change the bar colors

The first modification we’ll make is we will change the color of the bars.

This is very simple to do. Changing the bar colors for a ggplot histogram is essentially the same as changing the color of the bars in a ggplot bar chart.

We will take the simple ggplot histogram that we just made, and we’re going to add a little piece of code inside of the call to `geom_histogram()`. Inside of `geom_histogram()`, we will add the code `fill = 'red'`. This will effectively change the interior fill color of all of the histogram bars.

```# ADD COLOR
# - fill in bars
ggplot(data = txhousing, aes(x = median)) +
geom_histogram(fill = 'red')

``` As an aside, I recommend that you learn ggplot and `R` like this. Start with a simple technique. Learn it. Master it. Then systematically make small changes (and master how to make those changes). Start simple and expand your skill outward.

### Change the border colors

Next, we’ll change the color of the borders of the histogram bars.

This is very similar to changing the `fill` color, but instead of using the `fill =` parameter we will use the `color =` parameter.

```# ADD COLOR
# - color the edges of the bars
ggplot(data = txhousing, aes(x = median)) +
geom_histogram(color = 'red')
``` You’ll notice that this histogram is basically the same as the original except the borders are colored red.

### Change the number of histogram bins

Now, let’s change the number of histogram bins.

By default, `ggplot2` will use 30 bins for the histogram.

However, we can manually change the number of bins. This can be useful depending on how the data are distributed. If there is a lot of variability in the data we can use a larger number of bins to see some of that variation. Or, we can use a smaller number of bins to “smooth out” the variability.

Either way, changing the number of bins is extremely easy to do.

We will simply use the `bins =` parameter to change the number of bins.

#### Use fewer histogram bins

First, here’s a look at using fewer bins. Here, we’ll use 10 bins.

```# USE FEWER BINS
ggplot(data = txhousing, aes(x = median)) +
geom_histogram(bins = 10)
``` #### Use more histogram bins

Next, we’ll use more bins. We’ll increase the number of bins to 100:

```# USE MORE BINS
ggplot(data = txhousing, aes(x = median)) +
geom_histogram(bins = 100)
``` Again, which one you use depends on what your objectives are. Personally, in this case, 30 bins works well, but again, it depends on your objective.

## Bonus: how to make a “small multiple” histogram

As I already said, I love `ggplot2`. It makes things easy.

A great example of this is the small multiple chart. Personally, I think the small multiple chart (AKA, the trellis chart) is wildly under-used. It’s extremely useful for a variety of data science and data analysis tasks. But you rarely see them because they are difficult to create in other software.

`ggplot2` makes the small multiple easy to create. To create a small multiple in ggplot, we’ll just add a piece of code that will “break out” the chart based on a categorical variable.

Here, we will use the code `facet_wrap(~city)` to make a small version of the chart for each value of the `city` variable.

```# SMALL MULTIPLE HISTOGRAM
ggplot(data = txhousing, aes(x = median)) +
geom_histogram() +
facet_wrap(~city)
``` There’s a lot of data here and a lot of detail. It will be easier to see if you run the code on your own computer and increase the size of the chart. (Try it …)

What’s great about the small multiple is that it let’s you see a lot of information in a very small space.

In this chart, we can see individual histograms for each city. This might be very useful if you were doing an analysis on cities and how they are different.

## Bonus: how to make a density plot

The density plot is just a variation of the histogram, but instead of the y axis showing the number of observations, it shows the “density” of the data.

In `ggplot2`, the density plot is actually very easy to create. Just take the code for the basic ggplot histogram that we used above and swap out `geom_histogram()` with `geom_density()`.

```# DENSITY PLOT
ggplot(data = txhousing, aes(x = median)) +
geom_density()

``` Again, `ggplot2` makes things like this easy to do. Once you know the basics, changing a histogram to a density plot is as easy as changing one line of code.

## Applications and uses of a ggplot histogram

We typically use histograms to examine the density of a variable or how a variable is distributed.

Moreover, there are several reasons that we might want this information.

As a data scientist, many times you may need your data to be distributed in a particular way. For example, linear regression often requires that the variables are normally distributed. Therefore, prior to building a linear regression model, a data scientist might examine the variable distributions to verify that they are normal. To do this, a data scientist will commonly use a histogram.

Histograms can also be used for outlier detection, detection of skewness, and detection of other features that may be important for particular data science tasks.

Moreover, histograms are often useful simply for high level exploratory data analysis. A full explanation of EDA and how to use histograms for EDA is beyond the scope of this post. However, to put it simply, we can use histograms to examine variables and look for “insights” or interesting features in the data.

That’s just about everything that you need to know about the ggplot histogram.

But, if you want to get a job as a data scientist, you’ll need to know a lot more.

Don’t try to learn it alone.

Here at Sharp Sight, we’re committed to helping you master data science as fast as possible.

Sign up for our email list, and discover how to rapidly master data science.