This tutorial will cover the NumPy random normal function (AKA, np.random.normal).
If you’re doing any sort of statistics or data science in Python, you’ll often need to work with random numbers. And in particular, you’ll often need to work with normally distributed numbers.
The NumPy random normal function generates a sample of numbers drawn from the normal distribution, otherwise called the Gaussian distribution.
This tutorial will show you how the function works, and will show you how to use the function.
If you’re a little unfamiliar with NumPy, I suggest that you read the whole tutorial. However, if you just need some help with something specific, you can skip ahead to the appropriate section. The following links link to specific parts of this tutorial:
A quick introduction to NumPy
If you’re a real beginner with NumPy, you might not entirely be familiar with it.
With that in mind, let’s briefly review what NumPy is.
NumPy is a module for the Python programming language that’s used for data science and scientific computing.
Specifically, NumPy performs data manipulation on numerical data. It enables you to collect numeric data into a data structure, called the NumPy array. It also enables you to perform various computations and manipulations on NumPy arrays.
Essentially, NumPy is a package for working with numeric data in Python.
NumPy random normal generates normally distributed numbers
So NumPy is a package for working with numerical data. Where does np.random.normal fit in?
As I mentioned previously, NumPy has a variety of tools for working with numerical data. In most cases, NumPy’s tools enable you to do one of two things: create numerical data (structured as a NumPy array), or perform some calculation on a NumPy array.
The NumPy random normal function enables you to create a NumPy array that contains normally distributed data.
Hopefully you’re familiar with normally distributed data, but just as a refresher, here’s what it looks like when we plot it in a histogram:
Normally distributed data is shaped sort of like a bell, so it’s often called the “bell curve.”
Now that I’ve explained what the np.random.normal function does at a high level, let’s take a look at the syntax.
The syntax of numpy random normal
The syntax of the NumPy random normal function is fairly straightforward.
Note that in the following illustration and throughout this blog post, we will assume that you’ve imported NumPy with the following code:
import numpy as np. That code will enable you to refer to NumPy as
Let’s take a quick look at the syntax.
Let me explain this. Typically, we will call the function with the name
np.random.normal(). As I mentioned earlier, this assumes that we’ve imported NumPy with the code
import numpy as np.
Inside of the function, you’ll notice 3 parameters:
Let’s talk about each of those parameters.
The parameters of the np.random.normal function
The np.random.normal function has three primary parameters that control the output:
I’ll explain each of those parameters separately.
loc parameter controls the mean of the function.
This parameter defaults to
0, so if you don’t use this parameter to specify the mean of the distribution, the mean will be at 0.
scale parameter controls the standard deviation of the normal distribution.
By default, the
scale parameter is set to 1.
size parameter controls the size and shape of the output.
Remember that the output will be a NumPy array. NumPy arrays can be 1-dimensional, 2-dimensional, or multi-dimensional (i.e., 2 or more).
This might be confusing if you’re not really familiar with NumPy arrays. To learn more about NumPy array structure, I recommend that you read our tutorial on NumPy arrays.
Having said that, here’s a quick explanation.
The argument that you provide to the
size parameter will dictate the size and shape of the output array.
If you provide a single integer,
x, np.random.normal will provide
x random normal values in a 1-dimensional NumPy array.
You can also specify a more complex output.
For example, if you specify
size = (2, 3), np.random.normal will produce a numpy array with 2 rows and 3 columns. It will be filled with numbers drawn from a random normal distribution.
Keep in mind that you can create ouput arrays with more than 2 dimensions, but in the interest of simplicity, I will leave that to another tutorial.
The np.random.randn function
There’s another function that’s similar to np.random.normal. It’s called np.random.randn.
Just like np.random.normal, the np.random.randn function produces numbers that are drawn from a normal distribution.
The major difference is that np.random.randn is like a special case of np.random.normal. np.random.randn operates like np.random.normal with
loc = 0 and
scale = 1.
So this code:
np.random.seed(1) np.random.normal(loc = 0, scale = 1, size = (3,3))
Operates effectively the same as this code:
np.random.seed(1) np.random.randn(3, 3)
Examples: how to use the numpy random normal function
Now that I’ve shown you the syntax the numpy random normal function, let’s take a look at some examples of how it works.
Run this code before you run the examples
Before you work with any of the following examples, make sure that you run the following code:
import numpy as np
I briefly explained this code at the beginning of the tutorial, but it’s important for the following examples, so I’ll explain it again.
import numpy as np essentially imports the NumPy module into your working environment and enables you to call the functions from NumPy. If you don’t use the
import statement to import NumPy, NumPy’s functions will be unavailable.
Moreover, by importing NumPy as
np, we’re giving the NumPy module a “nickname” of sorts. So we’ll be able to refer to NumPy as np when we call the NumPy functions.
You probably understand this if you’ve worked with Python modules before, but if you’re really a beginner, it might be a little confusing. So, I wanted to quickly explain it.
Ok, now let’s work with some examples.
Draw a single number from the normal distribution
First, let’s take a look at a very simple example.
Here, we’re going to use np.random.normal to generate a single observation from the normal distribution.
This code will generate a single number drawn from the normal distribution with a mean of 0 and a standard deviation of 1.
Essentially, this code works the same as
np.random.normal(size = 1, loc = 0, scale = 1). Remember, if we don’t specify values for the
scale parameters, they will default to
loc = 0 and
scale = 1.
Draw 5 numbers from the normal distribution
Now, let’s draw 5 numbers from the normal distribution.
This code will look almost exactly the same as the code in the previous example.
Here, the value
5 is the value that’s being passed to the
size parameter. It essentially indicates that we want to produce a NumPy array of 5 values, drawn from the normal distribution.
Note as well that because we have not explicitly specified values for
scale, they will default to
loc = 0 and
scale = 1.
Create a 2-dimensional NumPy array of normally distributed values
Now, we’ll create a 2-dimensional array of normally distributed values.
To do this, we need to provide a tuple of values to the
np.random.seed(42) np.random.normal(size = (2, 3))
Which produces the output:
array([[ 1.62434536, -0.61175641, -0.52817175], [-1.07296862, 0.86540763, -2.3015387 ]])
So we’ve used the
size parameter with the
size = (2, 3). This has generated a 2-dimensional NumPy array with 6 values. This output array has 2 rows and 3 columns.
To be clear, you can use the
size parameter to create arrays with even higher dimensional shapes.
Generate normally distributed values with a specific mean
Now, let’s generate normally distributed values with a specific mean.
To do this, we’ll use the
loc parameter. Recall from earlier in the tutorial that the
loc parameter controls the mean of the distribution from which we draw the numbers with np.random.normal.
Here, we’re going to set the mean of the data to 50 with the syntax
loc = 50.
np.random.seed(42) np.random.normal(size = 1000, loc = 50)
The full array of values is too large to show here, but here are the first several values of the output:
array([ 50.49671415, 49.8617357 , 50.64768854, 51.52302986, 49.76584663, 49.76586304, 51.57921282, 50.76743473, 49.53052561, 50.54256004, 49.53658231, 49.53427025 ...
You can see at a glance that these values are roughly centered around 50. If you were to calculate the average using the numpy mean function, you would see that the mean of the observations is in fact 50.
Generate normally distributed values with a specific standard deviation
Next, we’ll generate an array of values with a specific standard deviation.
As noted earlier in the blog post, we can modify the standard deviation by using the
In this example, we’ll generate 1000 values with a standard deviation of 100.
np.random.seed(42) np.random.normal(size = 1000, scale = 100)
And here is a truncated output that shows the first few values:
array([ 4.96714153e+01, -1.38264301e+01, 6.47688538e+01, 1.52302986e+02, -2.34153375e+01, -2.34136957e+01, 1.57921282e+02, 7.67434729e+01, -4.69474386e+01 ...
Notice that we set
size = 1000, so the code will generate 1000 values. I’ve only shown the first few values for the sake of brevity.
It’s a little difficult to see how the data are distributed here, but we can use the
std() method to calculate the standard deviation:
np.random.seed(42) np.random.normal(size = 1000, scale = 100).std()
Which produces the following:
If we round this up, it’s essentially 100.
Notice that in this example, we have not used the
loc parameter. Remember that by default, the loc parameter is set to
loc = 0, so by default, this data is centered around 0. We could modify the
loc parameter here as well, but for the sake of simplicity, I’ve left it at the default.
How to use the loc and scale parameter in np.random.normal
Let’s do one more example to put all of the pieces together.
Here, we’ll create an array of values with a mean of 50 and a standard deviation of 100.
np.random.seed(42) np.random.normal(size = 1000, loc = 50, scale = 100)
I won’t show the output of this operation …. I’ll leave it for you to run it yourself.
Let’s quickly discuss the code. If you’ve read the previous examples in this tutorial, you should understand this.
We’re defining the mean of the data with the
loc parameter. The mean of the data is set to 50 with
loc = 50.
We’re defining the standard deviation of the data with the
scale parameter. We’ve done that with the code
scale = 100.
size = 1000 indicates that we’re creating a NumPy array with 1000 values.
If you want to learn data science in Python, learn NumPy
That’s it. You can use the NumPy random normal function to create normally distributed data in Python.
If you really want to master data science and analytics in Python though, you really need to learn more about NumPy. Here, we’ve covered the np.random.normal function, but NumPy has a large range of other functions. The np.random.normal function is just one piece of a much larger toolkit for data manipulation in Python.
Having said that, if you want to be great at data science in Python, you’ll need to learn more about NumPy.
For more Python data science tutorials, sign up for our email list
More broadly though, if you want to learn data science in Python, you should sign up for our email list.
Here at Sharp Sight, we regularly post tutorials about a variety of data science topics. In particular, we regularly publish tutorials about NumPy.
If you sign up for our email list, we will send our Python data science tutorials directly to your inbox.
You’ll get free tutorials on:
- Base Python
- Scikit learn
- Machine learning
- Deep learning
- … and more.
Want to learn data science in Python? Sign up now.