{"id":3921,"date":"2019-03-11T14:00:50","date_gmt":"2019-03-11T19:00:50","guid":{"rendered":"https:\/\/www.sharpsightlabs.com\/?p=3921"},"modified":"2024-02-06T15:16:35","modified_gmt":"2024-02-06T21:16:35","slug":"matplotlib-scatter-plot","status":"publish","type":"post","link":"https:\/\/www.sharpsightlabs.com\/blog\/matplotlib-scatter-plot\/","title":{"rendered":"How to make a matplotlib scatter plot"},"content":{"rendered":"
In this tutorial, I’ll show you how to make a matplotlib scatter plot.<\/p>\n
The scatter plot is a relatively simple tool, but it’s also essential for doing data analysis and data science.<\/p>\n
Having said that, if you want to do data science in Python, you really need to know how to create a scatter plot in matplotlib. You should know how to do this with your eyes closed.<\/p>\n
This tutorial will show you how to make a matplotlib scatter plot, and it will show you how to modify your scatter plots too.<\/p>\n
Overall, the tutorial is designed to be read top to bottom, particularly if you’re new to Python and want the details of how to make a scatter plot in Python. Ideally, it’s best if you read the whole tutorial.<\/p>\n
Having said that, if you just need quick help with something, you can click on one of the following links. These links will bring you to the appropriate section in the tutorial.<\/p>\n
Again though, if you’re a relative beginner and you have the time, I recommend that you read the full tutorial. Everything will make more sense that way.<\/p>\n
Ok. Before I show you how to make a scatter plot with matplotlib, let me quickly explain what matplotlib is.<\/p>\n
<\/a><\/p>\n Matplotlib is a data visualization module for the Python<\/a> programming language. It provides Python users with a toolkit for creating data visualizations.<\/p>\n Some of those data visualizations can be extremely complex. You can use matplotlib to create complex visualizations, because the syntax is very detailed. This makes the syntax very adaptable for different visualization problems.<\/p>\n On the other hand, the complex syntax of matplotlib can make it more complicated to quickly create simple data visualizations.<\/p>\n This is where pyplot comes in.<\/p>\n When you start working with matplotlib, you might read about pyplot.<\/p>\n What is pyplot?<\/p>\n To put it simply, pyplot is part of matplotlib. Pyplot is a sub-module of the larger matplotlib module.<\/p>\n Specifically, pyplot provides a set of functions for creating simple visualizations. For example, pyplot has simple functions for creating simple plots like histograms<\/a>, bar charts, and scatter plots.<\/p>\n Ultimately, the tools from pyplot give you a simpler interface into matplotlib. It makes visualization easier for some relatively standard plot types.<\/p>\n As I mentioned, one of those plots that you can create with pyplot is the scatter plot.<\/p>\n Let’s take a look at the syntax.<\/p>\n <\/a><\/p>\n Creating a scatter plot with matplotlib is relatively easy.<\/p>\n To do this, we’re going to use the pyplot function For the most part, the synax is relatively easy to understand. Let’s take a look.<\/p>\n <\/p>\n First of all, notice the name of the function. Here, we’re calling the function as To create a scatter plot with matplotlib though, you obviously can’t just call the function. You need to use the parameters of the function to tell it exactly what to plot, and how to plot it.<\/p>\n With that in mind, let’s take a look at the parameters of the plt.scatter function.<\/p>\n Pyplot’s plt.scatter function has a variety of parameters that you can manipulate … nearly a dozen.<\/p>\n The large number of parameters can make using the function a little complicated though. <\/p>\n So in the interest of simplicity, we’re only going to discuss five of them: Let’s talk about each of them.<\/p>\n The Essentially, they are the x and y axis positions of the points you want to plot. <\/p>\n The data that you pass to each of these should be in an “array like” format. In Python, structures with “array like” formats include things like lists, tuples, and NumPy arrays. <\/p>\n Commonly, you’ll find that people pass data to these parameters in the form of a Python list. For example, you might set In this tutorial though, we’ll work with NumPy arrays<\/a>. You’ll see this later in the examples section, but essentially, we’ll pass values to the The There are several ways to manipulate this parameter.<\/p>\n First, you can set the You can also set the It’s also possible to create a color mapping for your points, such that the color of the points varies according to some variable. Unfortunately, this is somewhat complicated for a beginner. So in the interest of simplicity, I won’t explain it here. If you’re really interested in complex visualization with more visually appealing colors, I strongly recommend using R’s ggplot2 system instead<\/a>.<\/p>\n The The default value is controlled by the We’re not going to work extensively with the s parameter, but I’ll show you a simple example of how it works in the examples below.<\/p>\n Finally, the This must be a value between 0 and 1 (inclusive), where 1 is fully opaque and 0 is fully transparent.<\/p>\n <\/a><\/p>\n Now that you understand the syntax and the parameters of the plt.scatter function, let’s work through some examples.<\/p>\n One last thing though before you try to run the examples.<\/p>\n … you’ll need to run some code to get these examples to work properly.<\/p>\n First, you’ll need to import a few modules into your working environment. The following code will import matplotlib, numpy and pyplot.<\/p>\n Also, you need to create some data.<\/p>\n We’re essentially going to create two vectors of data. <\/p>\n We’ll create the first, The second variable, You’ll see what the data looks like in a minute. The whole point of this tutorial is that we’re going to plot it! But essentially, when we plot them together, they will look like highly correlated linear data.<\/p>\n Ok, now that we have our data, let’s plot it.<\/p>\n We’ll start off by making a very simple scatter plot.<\/p>\n To do this, we’re going to call And here is the output:<\/p>\n <\/p>\n Let me explain a few things about the code and the output.<\/p>\n First, notice the code. We mapped You can see that this directly translates into how the points are plotted. For any given point in the scatter plot, the x axis value comes from the I also want to note that you don’t need to explicitly need to type the parameters This code works the same as At this point, I need to point out that a default matplotlib scatter plot is a little plain looking. It’s a little unrefined. <\/p>\n An unrefined chart is fine if you’re doing exploratory data analysis for personal consumption. But if you need to create a chart and show it to anyone important \u2013 like a management team in a business \u2013 this chart is unrefined. It lacks polish. <\/p>\n Like it or not, that lack of polish will reflect a little poorly on you. You can deny it all you want, but it can be very useful to learn how to polish your charts and make them look more professional. I’ll show you how in an example further down in this tutorial.<\/p>\n Although there is a lot we would need to do to make the basic scatter plot look better, changing the color of the points is a simple way to improve the aesthetics of the chart.<\/p>\n Let me show you how.<\/p>\n As noted earlier in this tutorial, you can modify the color of the points by manipulating the There are actually several different ways to modify the The two primary ways to do this are to set the parameter to a “named color” or to set the parameter to a “hex color.”<\/p>\n Here in this tutorial, I’ll show you how to set the color of the points to a “named color.” Hex colors are a little more complicated, so I’m not going to explain them here.<\/p>\n We can change the color of the points in our scatter plot by setting the What are named colors? This is very simple. Named colors are colors like “red,” “green,” and “blue.” Python has a pretty long list of named colors<\/a>. I recommend that you become familiar with a few of them, so you have a few that you can use regularly in your plots.<\/p>\n When you know what color you want to use for your points, provide that color as the argument to the A quick introduction to matplotlib<\/h2>\n
What is pyplot<\/h3>\n
The syntax of the matplotlib scatter plot<\/h2>\n
plt.scatter()<\/code>. <\/p>\n
plt.scatter()<\/code>. Keep in mind that we’re using the syntax
plt<\/code> to refer to pyplot. Essentially, this code assumes that you’ve imported pyplot with the code
import matplotlib.pyplot as plt<\/code>. For more information on that, see the examples below.<\/p>\n
The parameters of plt.scatter<\/h3>\n
x<\/code> and
y<\/code>,
c<\/code>,
s<\/code>, and
alpha<\/code>.<\/p>\n
x and y<\/h5>\n
x<\/code> and
y<\/code> parameters of plt.scatter are very similar, so we’ll talk about them together.<\/p>\n
x = [1,2,3,4,5]<\/code>.<\/p>\n
x<\/code> and
y<\/code> parameters in the form of two NumPy arrays.<\/p>\n
c<\/h5>\n
c<\/code> parameter controls the color of the points.<\/p>\n
c<\/code> parameter to a “named color.” Named colors are colors like “red,” “green,” “blue,” and so on. Python has a large number of named colors, so if you want something specific, take a look at the options<\/a> and use one in your plot.<\/p>\n
c<\/code> parameter using a hexidecimal color. For example, you can set
c = \"#CC0000\"<\/code> to set the color of the points to a sort of “fire engine red” color. Using hex colors is great, because they can give you very fine-grained control over the colors in your visualization. On the other hand, hexidecimal colors can be a little bit complicated for beginners. That being the case, we’re not going to really cover hex colors in this tutorial. <\/p>\n
s<\/h5>\n
s<\/code> parameter controls the size of the points.<\/p>\n
lines.markersize<\/code> value in the
rcParams<\/code> file.<\/p>\n
alpha<\/h5>\n
alpha<\/code> parameter controls the opacity of the points. <\/p>\n
Examples: how to make a scatter plot in matplotlib<\/h2>\n
Run this code before you get started<\/h5>\n
Import modules<\/h6>\n
\r\nimport matplotlib\r\nimport numpy as np\r\nimport matplotlib.pyplot as plt\r\n<\/pre>\n
Create dataset<\/h6>\n
x_var<\/code>, by using the np.arange function<\/a>. This data,
x_var<\/code>, essentially contains the integer values from 0 to 49.<\/p>\n
y_var<\/code>, is the same value of x_var with a little random noise added in with the np.random.normal function<\/a>. <\/p>\n
\r\n# CREATE DATA\r\nnp.random.seed(42)\r\n\r\nx_var = np.arange(0, 50)\r\ny_var = x_var + np.random.normal(size = 50, loc = 0, scale = 10)\r\n<\/pre>\n
How to make a simple scatter plot with matplotlib<\/h3>\n
plt.scatter()<\/code> and set
x = x_var<\/code> and
y = y_var<\/code>. <\/p>\n
\r\n# PLOT A SIMPLE SCATTERPLOT\r\nplt.scatter(x = x_var, y = y_var)\r\n<\/pre>\n
x_var<\/code> to the x axis and we mapped
y_var<\/code> to the y axis. <\/p>\n
x_var<\/code> variable, and the y axis value comes from the
y_var<\/code> variable. Said differently, the locations of the points are contained in the variables
x_var<\/code> and
y_var<\/code>. <\/p>\n
x<\/code> and
y<\/code>. For example, you could remove
x =<\/code> and
y =<\/code> from the code, and it would still work. Like this:<\/p>\n
\r\nplt.scatter(x_var, y_var)\r\n<\/pre>\n
plt.scatter(x = x_var, y = y_var)<\/code>. They are operationally identical. If you remove
x =<\/code> and
y =<\/code> from the code, Python still knows that you are passing
x_var<\/code> and
y_var<\/code> to the
x<\/code> and
y<\/code> parameter. It essentially knows that the first variable should be mapped to
x<\/code> and the second should be mapped to
y<\/code>. This is known as defining argument values by position<\/em>. It’s very common to see that in code, so I want you to understand it.<\/p>\n
A basic matplotlib scatter plot is a little “ugly”<\/h5>\n
Change the color of the points<\/h3>\n
c<\/code> parameter.<\/p>\n
Multiple ways to set the color of points<\/h5>\n
c<\/code> parameter to change the color of the points. <\/p>\n
c<\/code> parameter to a “named color.”<\/p>\n
c<\/code> parameter.<\/p>\n