How to Make a Plotly Histogram

In this tutorial, I’ll show you how to make a Plotly histogram with the px.histogram function.

I’ll explain the syntax of px.histogram and I’ll also show you clear, step-by-step examples of how to make histograms with Plotly express. I’ll show you a simple histogram, as well as a few variations.

The tutorial has several sections. If you need something specific, you can click on any of the following links. The links will take you to specific locations in the tutorial.

Ok. Let’s get to it.

A Quick Introduction to Histograms

Before we look at the syntax, let’s quickly review histograms.

Histograms

When you explore or analyze data, you need to look at your variables.

There are different ways to do this, depending on the variable type.

When you’re inspecting numeric variables, one of the most common methods of inspection us the histogram. Histograms show you how the numeric variable is distributed.

Typically, in a histogram, we map a numeric variable to the x-axis.

Then, the x-axis is divided up into sections, which we call “bins.” There might be a bin from 0 to 20, then another bin from 20 to 40, and so on.

Then, in the final step, we count the number of observations in each bin and plot a bar for each bin. The length of the bin corresponds to the count of the observations in that bin. So histograms are one way to look at the density of the data for different values of the variable. It’s a way to plot the distribution of the variable.

Plotly Histograms

Now that we’ve reviewed histograms generally, let’s discuss how to create Plotly histograms.

Plotly, as you probably know, is a data visualization toolkit for Python.

Plotly has a powerful API for creating complex visualizations.

But it also has a simplified toolkit called Plotly Express, which you can use to quickly create a variety of simple visualizations like bar charts, line charts, scatterplots, and more.

Plotly express has a specialized function for creating histograms, the px.histogram() function. This function is a simple, yet flexible way to create histograms in Python (while also giving you access to the powerful underlying syntax of the Plotly system).

The syntax of px.histogram

Now that we’ve reviewed histograms generally, let’s look at the syntax for a Plotly express histogram.

The syntax for a histogram made with px.histogram is very simple.

Typically, we call the function as px.histogram(). Keep in mind that this assumes that you’ve imported Plotly Express with the alias px. You can do that with the code import plotly.express as px.

Inside the parenthesis, you can use the data_frame parameter to specify a DataFrame (optional). And you use the x parameter to specify the numeric variable that you want to plot.

There are also a few additional parameters that you can use to modify your histograms.

With that in mind, let’s look at a few.

The parameters of px.histogram

The bad news is that the px.histogram function has several dozen parameters.

The good news is that you really only need to learn a few of them. Whenever you’re learning Python functions, you should try to apply the 80/20 rule, and identify the most commonly used parameters.

The parameters that I recommend you learn are:

• data_frame
• x
• nbins
• color_discrete_sequence

Let’s take at each of these.

data_frame (optional)

The data_frame parameter allows you to specify a Python DataFrame that you want to plot.

This parameter is optional.

If you use it, then you’re telling px.histogram that you want to plot a variable in the DataFrame that you specify.

x

You use the x parameter to specify the specific numeric variable that you want to plot.

If you use the data_frame parameter to specify a DataFrame, then the argument to this parameter will be the name one of the columns in that DataFrame. If you specify a DataFrame column, then you’ll pass the name of that column as a string. For example, if you want to plot the mycolumn variable in a DataFrame, you’ll use the syntax x = 'mycolumn'.

Alternatively, you can also choose to plot a numeric variable that exists outside of a DataFrame. This could be data in a Python list or a Numpy array. If you do this, then you can skip the quotation marks around the name. (For the most part, the quotation marks are only required when you plot a DataFrame column.)

color_discrete_sequence (optional)

The color_discrete_sequence parameter changes the color of your histogram. It changes the color of the bars.

The argument you to this parameter can be a “named color,” like ‘red‘, ‘orange‘, or ‘blue‘. (Python has a long list of named colors.)

You can also use a hex color. Hexadecimal colors are a little complicated for beginners, so in the interest of space and simplicity, I’m not going to explain them here (but they’re useful to learn, eventually).

nbins

The nbins parameter controls the number of bins in the histogram (i.e., the number of bars).

The argument to this parameter should be an integer (i.e., the number of bins you want). For example, if you set nbins = 50, the function will create a histogram with 50 bars (i.e., bins).

I’ll point out here that sometimes, you will specify a value for nbins, but the function will create a histogram with slightly fewer bins. You sometimes need to play with this to get a histogram with a roughly appropriate number of bins.

I’ll show you how to change the number of bins in example 4.

Examples: how to make histograms with Plotly Express

Now that we’ve looked at the syntax, let’s look at some examples of how to create histograms with Plotly Express.

Examples:

Run this code first

Before we look at the examples, you’ll need to run some preliminary code to import some packages and to create our data.

Import packages

First, we’ll import some Python packages.

import numpy as np
import pandas as pd
import plotly.express as px

We’re going to use Numpy to create some normally distributed data that we can plot.

We’ll use Pandas to turn that data into a DataFrame.

And we’ll use Plotly Express to create our histograms.

Create dataset

Next, let’s create our DataSet.

We’re going to do this in two steps:

• create normally distributed data with the Numpy random normal function
• combine the normally distributed variables into a DataFrame

First, we’ll use the Numpy Random Normal to create two normally distributed variables. Here, I’m calling those variables normal_data_a and normal_data_b.

np.random.seed(33)
normal_data_a = np.random.normal(size = 500, loc = 100, scale = 10)
normal_data_b = np.random.normal(size = 700, loc = 75, scale = 5)

We’re creating two normally distributed datasets, because I want the dataset to have two separate peaks.

So now that we have our two normally distributed variables, we’ll turn each of them into a dataframe. Notice that as we do this, we’re using the Pandas assign function to create a new categorical variable called group.

df_normal_a = pd.DataFrame(data = normal_data_a, columns=['score']).assign(group = 'Group A')
df_normal_b = pd.DataFrame(data = normal_data_b, columns=['score']).assign(group = 'Group B')

score_data = pd.concat([df_normal_a, df_normal_b])

Here, we’re also using the Pandas concat function to combine these two different DataFrames together into one single dataframe called score_data.

Let’s look at the final score_data DataFrame with a print statement:

print(score_data)

OUT:

score    group
0    96.811465  Group A
1    83.970194  Group A
2    84.647821  Group A
3    94.295991  Group A
4    97.832717  Group A
..         ...      ...
695  84.728235  Group B
696  75.675768  Group B
697  78.171504  Group B
698  69.243985  Group B
699  75.073327  Group B

If you look a the data, you can see that this DataFrame has two variables: score and group. We’re going to use both in our histograms.

Set Plotly Image Rendering

One last thing before we plot our data.

If you’re using Plotly in an Integrated Development Environment like Spyder or PyCharm, you’ll need to run some extra code to get the visualizations to display inside the IDE:

If you’re using an IDE you can run the following code:

import plotly.io as pio
pio.renderers.default = 'svg'

Having said that, if you’re using a Jupyter notebook, that code shouldn’t be necessary.

Ok. All of our setup should be complete. We’re ready to make some histograms.

EXAMPLE 1: Create a simple Plotly histogram

Here, we’ll just use px.histogram to plot the data in the score variable.

Here’s the code:

px.histogram(data_frame = score_data
,x = 'score'
)

And here’s the output: Explanation

This is a very simple histogram plotted with the px.histogram function.

Inside the function call, we specified the DataFrame to plot with the code data = score_data. And we specified the exact column to plot x = 'score'.

Notice that the name of the column, 'score', is presented as a string. When we plot a DataFrame column, the name of the column must be passed to the x parameter this way.

Visually, you can see that the histogram has about 50 bins. We can actually change that, which we’ll do in one of the other examples.

Also, notice that the default color is a light blue. This is fine most of the time, but there may be situations where you want to change the color for aesthetic or design reasons.

Let’s look at how to do that.

EXAMPLE 2: Change the bar color

In this example, we’ll look at how to change the color of the histogram bars.

As I just mentioned, the default color is a sort of medium blue.

Here, we’ll to change the bar color to “darkred.” To do this, we’ll set the color parameter to color = 'darkred'.

px.histogram(data_frame = score_data
, x = 'score'
,color_discrete_sequence = ['darkred']
)

OUT: Explanation

This histogram is almost exactly the same as the histogram in example 1.

The only difference is that we’ve changed the color of the bars to the color 'darkred'.

To do that, we set the color_discrete_sequence parameter to color_discrete_sequence = ['darkred'].

Notice that the name of the color is in quotations, inside of a list. (With this parameter, you can actually specify multiple different colors, if you have multiple categories.)

As I mentioned previously, in the syntax section, you can use a variety of named colors when you use this parameter, like red, green, dark red, etc. You can also use hexadecimal colors. Try a few out!

EXAMPLE 3: Change the number of bins

Now, let’s change the number of bins in the histogram.

Here, we’re going create a histogram with 20 bins.

px.histogram(data_frame = score_data
,x = 'score'
,nbins = 20
)

OUT: Explanation

Here, we’ve created a histogram with 20 bins.

To do this, we’ve used the nbins parameter, which we set to nbins = 20.

Keep in mind that when you analyze your data, it can be useful to look at different histograms with different numbers of bins. A large number of bins tends to show lots of detail, whereas a small number of bins smooths over the details to show the rough shape of the distribution. Neither one is “right.” How you look at your data depends on what you’re looking for or what you’re trying to do.

So you need to try different numbers of bins and evaluate the resulting histogram based on your analytical goals.

Ultimately, there’s a bit of an art to choosing the right number of bins. You’ll need to practice using this technique to know the best choice.

EXAMPLE 4: Create a histogram with multiple categories

Next, let’s create a histogram with multiple categories.

Remember that when we created our dataset, we created a categorical variable called group. This variable has two values: Group A and Group B.

We can use this categorical variable to create a histogram with multiple categories.

So in this example, we’re going to plot our data and break it out into different groups.

To do this, we’ll use the color parameter:

px.histogram(data_frame = score_data
,x = 'score'
,color = 'group'
)

OUT: Explanation

As you can see, the resulting visualization has two histograms: one for Group A and one for Group B. Both of these histograms appear in the same visualization, but they have different colors.

To do this, we set the color parameter to color = 'group'.

Remember that the group variable is contains categorical data. It has two values: Group A and Group B.

So when we set color = 'group', Plotly Express actually breaks the data out into two histograms … one for each category, where each histogram has a different color.

This is a great technique to use if your data has different categories, and you want to visualize those categories in the same plot.

EXAMPLE 5: Create a histogram with multiple categories, and change the colors

Finally, let’s create a histogram with multiple categories, and change the colors.

This will be sort of like example 4, where we made a multi-category histogram, and sort of like example 2, where we changed the color.

Let’s take a look, and then I’ll explain.

px.histogram(data_frame = score_data
,x = 'score'
,color = 'group'
,color_discrete_sequence = ['navy','darkorange']
)

OUT: Explanation

So what happened here?

In this example, we called the px.histogram function to create a Plotly histogram.

We used the data_frame parameter to specify the DataFrame, and we used the x parameter to specify the exact column.

We used color = 'group' to indicate that we want to make a multi-category histogram, where the different categories of the group variable correspond to the different colors.

And finally, we used the code color_discrete_sequence = ['navy','darkorange'] to set the exact colors used for the different categories.

This example sort of combines several of the techniques we used in previous examples.

Do you still have questions about making a Plotly histogram?

In this blog post, I’ve shown you how create a Plotly histogram using px.histogram(). But to really master Python data visualization with Plotly, there’s a lot more to learn.

That said, if you’re serious about learning Plotly and if you want to master data visualization in Python, you should join our premium online course, Plotly Mastery.

Plotly Mastery is an online course that will teach you everything you need to know about Python data visualization with the Plotly package.

Inside the course, you’ll learn:

• how to create essential plots like bar charts, line charts, scatterplots, and more
• techniques for creating multivariate data visualization