The histogram groups data into bins and plots the binned data as bars.
Histograms are used to plot the density of a data distribution.
(read more at Wikipedia)
Code: Histogram in R
set.seed(11) df.histogram_dummy <- data.frame( x_var = rnorm(2000) ) # DEFAULT BINWIDTH (range/30) ggplot(data=df.histogram_dummy, aes(x=x_var)) + geom_histogram()
The code to create a histogram in R is very straight forward.
The fundamentals of building a histogram in R (using ggplot2) are basically the same as those for building any data visualization. We:
1. Specify our dataset
2. Create a “mapping” (i.e., a relationship) between variables in our dataset and aesthetic attributes in our plot
3. Specify the exact geometric objects we want to plot
(This deep underlying structure is essentially the same for all visualizations. We’ve seen a similar process for building a bar chart, scatterplot, and line chart. This deep structure will also remain as we create more sophisticated visualizations.)
Let’s take a look at the code:
Here we start by calling the
Next, with the
The typical process of creating a histogram, requires that we “bin” the data; that is, we need to divide the numeric variable into intervals (analysts frequently call these “bins”, or sometimes “buckets”). After creating the numeric intervals, we count the number of records that fall within each interval.
Let’s take a look at
Here, this line tells the
Again, this automatically bins our data for us: by default, when we use
That said, we can specify the bin widths manually.
Here, we set binwidth to .5 (using the
# Wider binwidth ggplot(data=df.histogram_dummy, aes(x=x_var)) + geom_histogram(binwidth=.5)
We can also narrow the binwidth, which gives us a slightly different view of the data distribution.
# Narrower binwidth ggplot(data=df.histogram_dummy, aes(x=x_var)) + geom_histogram(binwidth=.1)