How to Make a Seaborn Scatter Plot

This tutorial will show you how to make a Seaborn scatter plot.

It will explain the syntax of the sns.scatterplot function, including some important parameters.

It will also show you clear, step-by-step examples of how to create a scatter plot in Seaborn.

If you need help with something specific, you can click on one of the links below. These links will take you to the appropriate section in the tutorial.

Table of Contents:

But, if you’re new to Seaborn or new to data science in Python, it would be best if you read the whole tutorial.

Ok. Let’s get to it.

A quick overview of Seaborn

Just in case you’re new to Seaborn, I want to give you a quick overview.

(If you already know about Seaborn and data visualization in Python, you can skip this section and go to the Intro to the Seaborn scatter plot.)

Seaborn is a data visualization toolkit for Python

Seaborn is a package for the Python programming language.

Specifically, Seaborn is a data visualization toolkit for Python.

Python has a variety of data visualization packages, including Matplotlib, Matplotlib’s Pyplot, Bokeh, Altair, and many others.

Frankly, there’s almost too many Python visualization packages to keep track of.

Even among a variety of options, Seaborn is one of the best. I recommend it to most of my data science students.

Seaborn is one of the best options for data visualization in Python

Seaborn is probably the best option for statistical data visualization in Python, as of 2019.

To be fair, Seaborn is not quite as good as R’s ggplot2, but it’s still good.

Seaborn has a few advantages over other visualization toolkits. The most important, is that the syntax is relatively simple. It’s not perfect, but it’s fairly easy to learn and use.

In particular, Seaborn has easy-to-use functions for creating plots like scatterplots, line charts, bar charts, box plots, etc.

Second, Seaborn has been designed to work well with DataFrames. Many other data visualization options for Python – Matplotlib in particular – were designed before Pandas DataFrames became popular data structures in Python. Because of this, many of the other visualization tools in Python are hard to use with DataFrames. The syntax is often clumsy or complicated.

For the most part though, Seaborn was designed with Pandas DataFrames in mind. Considering that Pandas is almost essential for data science in Python today, it helps to have a toolkit that works well with Pandas and with DataFrame structures.

If you need to do data visualization in Python, particularly with Pandas DataFrames, I recommend Seaborn.

A quick introduction to the Seaborn scatter plot

As I mentioned earlier, Seaborn has tools that can create many essential data visualizations: bar charts, line charts, boxplots, heatmaps, etc.

But one of the most essential data visualizations is the scatter plot.

Arguably, scatter plots are one of the top 5 most important data visualizations. As a data scientist, you’re very likely to use them all the time. Any time you need to plot two numeric variables at the same time, a scatterplot is probably the right tool.

Creating scatterplots in Seaborn is easy.

To create a scatterplot in Seaborn, you can use the seaborn.scatterplot() function (AKA, sns.scatterplot) .

Let’s take a look at the syntax.

The syntax of sns.scatterplot

The syntax for creating a Seaborn scatterplot is fairly straightforward.

In the simplest case, you can call the function, provide the name of the DataFrame, and then the variables you want to put on the x and y axis. You can pass the DataFrame name to the data parameter, and pass the variables to the x parameter and y parameter. (I’ll show you a clear example of this in the examples section.)

An image that shows the basic syntax for a Seaborn scatter plot.

Typically, when we call the function, we call it as sns.scatterplot(). This is because we typically import the Seaborn package with the import statement import seaborn as sns. This enables us to reference Seaborn with the alias sns. This is the common convention when using Seaborn, and it’s the convention that we’ll be using as we move forward in the tutorial.

As noted above, you can create a simple scatterplot with only 3 parameters. Having said that, the sns.scatterplot function has quite a few other parameters that you can use to modify the behavior of the function.

The parameters of sns.scatterplot

The sns.scatterplot() function has roughly two dozen parameters that you can use to carefully manipulate the output scatterplot.

In the interest of brevity, this tutorial won’t explain all of them. However, we will cover a few important parameters for sns.scatterplot:

  • data
  • x
  • y
  • edgecolor
  • color
  • alpha
  • hue

These are the most important parameters for creating basic scatterplots. Let’s take a look at each of them.

data (required)

The data parameter enables you to specify the Pandas DataFrame that contains the variables that you want to plot.

The data in your DataFrame should be in so-called “tidy” form. Tidy data is data organized such that each variable is in its own column and every record has its own row.

x

The x parameter enables you to specify the variable that will be mapped to the x axis.

This variable must be numeric.

You can specify a variable that is in the DataFrame being passed to the function via the data parameter. You can also specify a variable directly (independent of data).

y

The y parameter enables you to specify the variable that will be mapped to the y axis.

This variable must be numeric.

Similar to the x parameter, for the y parameter you can specify a variable that is in the DataFrame being passed to the function via the data parameter. You can also specify a variable directly (independent of data).

color

The color parameter specifies the color of the interior of the points.

By default, the color is a sort of medium blue, but you can change it to a wide variety of colors.

Keep in mind that this is not one of the parameters listed in the official Seaborn scatterplot documentation. That’s because this is a parameter from the Pyplot scatter function.

edgecolor

The edgecolor parameter enables you to specify the color of the edges of the points.

Like the color parameter, you won’t find the edgecolor parameter in the documentation for the Seaborn scatter plot. Again, that’s because this is a plt.scatter parameter that can be used within the Seaborn scatter plot function.

alpha

The alpha parameter enables you to modify the opacity of the points … how opaque they are.

The alpha opacity scale is from 0 to 1, with 1 being fully opaque and 0 being fully transparent. 1 is the default.

hue

The hue parameter will enable you to change the color of the points according to some variable.

Unlike the x and y parameters – which require a numeric – variable, the hue parameter is more flexible.

You can provide a categorical variable to the hue parameter, in which case, it will change the color of the points according to the different categories.

You can also pass a numeric variable to the hue parameter, in which case, it will vary the color of the points along a gradient according to numeric values of the hue variable.

Examples: how to make a a scatterplot in Seaborn

Ok. Now that you’ve learned about the syntax and parameters of sns.scatterplot function, let’s take a look at some examples of how to create a scatter plot with Seaborn.

Examples:

Run this code first

Before you run any of the examples, you’ll need to run some preliminary code.

You’ll need to import the right packages and create the DataFrame that we’ll be plotting.

Import packages

First, we’ll import a few packages.

We’ll need Seaborn, obviously.

We’ll also need Pandas and Numpy to help create our DataFrame.

That being the case, run this code:

import pandas as pd
import seaborn as sns
import numpy as np
Create data

Now that we have our packages imported, we need to create a DataFrame.

This will be a fairly simple DataFrame with two normally distributed numeric variables and one categorical variable. For clarity and simplicity, I’ll call these variables x_var, y_var, and categorical_var.

np.random.seed(0)
x_var = np.random.normal(size = 6000)
y_var = np.random.normal(size = 6000)

norm_data = pd.DataFrame({'x_var':x_var
                  ,'y_var':y_var}
                  )

norm_data = norm_data.assign(category_var = np.where(x_var > 1, "Category A","Category B"))

To create x_var and y_var, we’ve used the Numpy random normal function. Notice that we also used Numpy random seed to set the seed for the random number generator. (If you don’t understand those functions yet, click on the links and read the tutorials.)

We used the pd.DataFrame function to create the DataFrame.

And finally, we added the category_var variable by using the Numpy where function inside of the Pandas assign method. The Pandas assign method enables us to add variables to a Pandas DataFrame.

Let’s quickly print out the first few rows of data and take a look:

norm_data.head()

OUT:

      x_var     y_var category_var
0  1.764052  2.042536   Category A
1  0.400157 -0.919461   Category B
2  0.978738  0.114670   Category B
3  2.240893 -0.137424   Category A
4  1.867558  1.365527   Category A

The DataFrame has 6000 rows of data, but the first 5 rows gives you a good idea of what the data looks like.

Set Seaborn formatting

Finally, one last quick thing.

You need to run the following formatting code:

sns.set()

This code will set the back ground formatting for our plots to make them look a little better compared to the matplotlib defaults.

Ok … now we’re ready for some examples.

EXAMPLE 1: Create a simple scatter plot

First, let’s just create a simple scatterplot.

To do this, we’ll call the sns.scatterplot() function.

Inside of the parenthesis, we’re providing arguments to three parameters: data, x, and y.

To the data parameter, we’re passing the name of the DataFrame, norm_data.

Then we’re passing the names of the variables x_var and y_var to the x parameter and y parameter respectively. This just tells sns.scatterplot to put x_var on the x-axis and y_var on the y-axis.

Let’s run the code and take a look at the plot:

sns.scatterplot(data = norm_data
                ,x = 'x_var'
                ,y = 'y_var'
                )

OUTPUT:

An image of a basic Seaborn scatter plot.

This is pretty simple.

The sns.scatterplot function put x_var on the x-axis and y_var on the y-axis.

The function drew a single point for every row of data at the locations specified by x_var and y_var.

That said, there are some problems with this chart.

There is a serious problem with overplotting; there are too many points overlapping each other.

Also, the white edges around the points are distracting.

We’ll fix some of those issues in the upcoming examples.

EXAMPLE 2: Change the edge color of the points

Here, we’ll change the edge color of the points.

As you saw in the last example, by default, the points have white edges. To be blunt, this doesn’t look very good. And, it causes visual issues when you have a lot of points.

To fix this, we’re going to remove the edges. To do this, we’ll set the edgecolor parameter to edgecolor = 'none'.

sns.scatterplot(data = norm_data
                ,x = 'x_var'
                ,y = 'y_var'
                ,edgecolor = 'none'
                )

OUT:

A seaborn scatterplot with the edge color changed (i.e., the edge color removed).

As you can see, by setting edgecolor = 'none', we’ve removed the edges of the points.

You could also set the edge color to a specific color (e.g., ‘red‘, ‘green‘, etc). Having said that, I actually recommend removing the edges. For the most part, giving your scatterplot points a different edge color than the interior is bad design. There are a few cases where it’s appropriate, but those are really more intermediate uses of the technique.

Again, typically, I recommend that you remove the edges.

(Note that the above scatterplot has a severe problem with overplotting. I’ll show you how to fix that in example 4.)

EXAMPLE 3: Change the interior color of all of the points

Next, let’s change the interior color of the points.

To do this, we’ll use the color parameter. Here, we’re going to set the color to the value 'darkred'. But you can use any “named color” that Python recognizes. You can also use hexidecimal colors, if you know how to use them.

Let’s take a look:

sns.scatterplot(data = norm_data
                ,x = 'x_var'
                ,y = 'y_var'
                ,color = 'darkred'
                )

OUT:

A seaborn scatter plot where the color of the points is changed to 'darkred'.

Here, we’ve changed the color of the points to the color "darkred".

Again, you can use any of the colors recognized by Python, as well as hexidecimal colors.

EXAMPLE 4: Make the points more transparent to mitigate overplotting

As you probably noticed, the scatter plots in the previous examples had serious problems with overplotting. There are almost too many points in the dataset, and when you plot them, they overlap.

This is problematic, because it can obscure or cover up important visual information that we might want to see.

One way to fix overplotting is by making the points more transparent.

In the Seaborn scatter plot function, you can modify the transparency/opacity of the points with the alpha parameter.

Here, we’re going to set alpha = .1.

Remember, the scale for alpha is between 0 and 1, with 1 being fully opaque and 0 being fully transparent.

By setting alpha = .1, we’re making the points only 10% of full opacity. At this setting, they’re almost transparent!

Let’s take a look:

sns.scatterplot(data = norm_data
                ,x = 'x_var'
                ,y = 'y_var'
                ,edgecolor = 'none'
                ,alpha = .1
                )

OUT:

A seaborn scatter plot where the points are more transparent to mitigate overplotting.

As you can see, the points are quite a bit more transparent.

This actually helps us see more of the structure of the data. The points are normally distributed. When we make the points more transparent like this, you can actually see how the points cluster near the center (i.e., the mean) of the distribution.

Modifying the alpha parameter is a very useful technique when you have a lot of data and have issues with overplotting.

EXAMPLE 5: Change the color according to a categorical variable

In this example, we’ll go back to modifying the color of the points.

Here, we’re going to modify the color of the points according to a categorical variable.

To do this, we’ll map our categorical variable, category_var, to the hue parameter.

This will change the color of the points to different colors depending on the different values of category_var.

Let’s take a look:

sns.scatterplot(data = norm_data
                ,x = 'x_var'
                ,y = 'y_var'
                ,hue = 'category_var'
                )

OUT:

A Seaborn scatter plot with a categorical variable mapped to the hue parameter.

As you can see, the points that are in ‘Category A‘ are blue, and the points that are in ‘Category B‘ are orange.

Using the hue parameter is very useful when you have an additional categorical variable that you want to use to further analyze your scatterplot data.

EXAMPLE 6: A few examples that combine multiple techniques

Finally, here are some examples that combine several of the techniques from the previous examples.

Notice that we’re modifying the color, edgecolor, and opacity (i.e., alpha) all at the same time.

sns.scatterplot(data = norm_data
                ,x = 'x_var'
                ,y = 'y_var'
                ,alpha = .1
                ,edgecolor = 'none'
                ,color = 'darkred'
                )

OUT:

An example that combines several techniques for a Seaborn scatterplot.

As you can see, we’ve changed the color to darkred, removed the edge color, and made the points more transparent. This chart is a lot better looking than the simple example from example 1 and also shows more of the structure in the data.

Let’s take a look at another:

sns.scatterplot(data = norm_data
                ,x = 'x_var'
                ,y = 'y_var'
                ,hue = 'category_var'
                ,edgecolor = 'none'
                ,alpha = .2
                )

OUT:

An example of a Seaborn scatterplot.

This final example shows the normal distribution of the data (which is easier to see because we made the points more transparent). You can also see the differences between different categories.

Of course, we’ve shown these examples using “dummy” data, but you can use these techniques to great effect when you analyze your own real-world data.

EXAMPLE 7: Add a title to your scatterplot

Let me show you one more thing.

Here, I’ll show you how to add a title to your plot.

To add a title to your plot, you can use the plt.title function from Matplotlib’s Pyplot submodule.

(Note that if you’re working in an IDE, you need to highlight and run sns.scatterplot and plt.title at the same time.)

import matplotlib.pyplot as plt

sns.scatterplot(data = norm_data
                ,x = 'x_var'
                ,y = 'y_var'
                ,alpha = .1
                ,edgecolor = 'none'
                ,color = 'darkred'
                )
plt.title('A scatterplot of x_var vs. y_var')

OUT:

A seaborn scatter plot with a title.

This final scatter plot has a title.

There’s definitely more that we could do here. There’s quite a bit more formatting and some other modifications that we could make, but this is pretty good for a straightforward scatter plot.

Leave your other questions in the comments below

Do you have questions about creating scatter plots with Seaborn?

Is there something that we didn’t cover here that you need to understand?

Write your question in the comments section at the bottom of the page.

Join our course to learn more about Seaborn

The examples you’ve seen in this tutorial should be enough to get you started, but if you’re serious about learning Seaborn, you should enroll in our premium course called Seaborn Mastery.

There’s a lot more to learn about Seaborn, and Seaborn Mastery will teach you everything, including:

  • How to create essential data visualizations in Python
  • How to add titles and axis labels
  • Techniques for formatting your charts
  • How to create multi-variate visualizations
  • How to think about data visualization in Python
  • and more …

Moreover, it will help you completely master the syntax within a few weeks. You’ll discover how to become “fluent” in writing Seaborn code.

Find out more here:

Learn More About Seaborn Mastery

Joshua Ebner

Joshua Ebner is the founder, CEO, and Chief Data Scientist of Sharp Sight.   Prior to founding the company, Josh worked as a Data Scientist at Apple.   He has a degree in Physics from Cornell University.   For more daily data science advice, follow Josh on LinkedIn.

Leave a Comment