This tutorial will explain how to use the Numpy standard deviation function (AKA, np.std).
At a high level, the Numpy standard deviation function is simple. It calculates the standard deviation of the values in a Numpy array.
But the details of exactly how the function works are a little complex and require some explanation.
That being said, this tutorial will explain how to use the Numpy standard deviation function.
It will explain the syntax of np.std(), and show you clear, step-by-step examples of how the function works.
The tutorial is organized into sections. You can click on any of the following links, which will take you to the appropriate section.
Table of Contents:
- A very quick review of Numpy
- Introduction to Numpy standard deviation
- The syntax of np.std
- Numpy standard deviation examples
- Numpy standard deviation FAQ
Having said that, if you’re relatively new to Numpy, you might want to read the whole tutorial.
A quick review of Numpy
Let’s just start off with a veeeery quick review of Numpy.
What is Numpy?
Numpy is a toolkit for working with numeric data
To put it simply, Numpy is a toolkit for working with numeric data.
First, Numpy has a set of tools for creating a data structure called a Numpy array.
You can think of a Numpy array as a row-and-column grid of numbers. Numpy arrays can be 1-dimensional, 2-dimensional, or even n-dimensional.
A 2D array looks something like this:
For simplicity sake, in this tutorial, we’ll stick to 1 or 2-dimentional arrays.
There are a variety of ways to create different types of arrays with different kinds of numbers. A few other tools for creating Numpy arrays include numpy arrange, numpy zeros, numpy ones, numpy tile, and other methods.
Regardless of how you create your Numpy array, at a high level, they are simply arrays of numbers.
Numpy provides tools for manipulating Numpy arrays
Numpy not only provides tools for creating Numpy arrays, Numpy also provides tools for working with Numpy arrays.
Some of the most important of these Numpy tools are Numpy functions for performing calculations.
There’s a whole set of Numpy functions for doing things like:
- computing the sum of a Numpy array
- calculating the maximum
- calculating the exponential of the numbers in an array
- computing the value x to some power, for every value in a Numpy array
… and a variety of other computations.
The Numpy standard deviation is essentially a lot like these other Numpy tools. It is just used to perform a computation (the standard deviation) of a group of numbers in a Numpy array.
A quick introduction to Numpy standard deviation
At a very high level, standard deviation is a measure of the spread of a dataset. In particular, it is a measure of how far the datapoints are from the mean of the data.
Let’s briefly review the basic calculation.
Standard deviation is calculated as the square root of the variance.
So if we have a dataset with numbers, the variance will be:
(1)
And the standard deviation will just be the square root of the variance:
(2)
Where:
= the individual values in the dataset
= the number of values in the dataset
= the mean of the values
Most of the time, calculating standard deviation by hand is a little challenging, because you need to compute the mean, the deviations of each datapoint from the mean, then the square of the deviations, etc. Frankly, it’s a little tedious.
However, if you’re working in Python, you can use the Numpy standard deviation function to perform the calculation for you.
A quick note if you’re new to statistics
Because this blog post is about using the numpy.std() function, I don’t want to get too deep into the weeds about how the calculation is performed by hand. This tutorial is really about how we use the function. So, if you need a quick review of what standard deviation is, you can watch this video.
Ok. Having quickly reviewed what standard deviation is, let’s look at the syntax for np.std.
The syntax of np.std
The syntax of the Numpy standard deviation function is fairly simple.
I’ll explain it in just a second, but first, I want to tell you one quick note about Numpy syntax.
A quick note: the exact syntax depends on how you import Numpy
Typically, when we write Numpy syntax, we use the alias “np”. That’s the common convention among most data scientists.
To set that alias, you need to import Numpy like this:
import numpy as np
If we import Numpy with this alias, we’ll can call the Numpy standard deviation function as np.std()
.
Ok, that being said, let’s take a closer look at the syntax.
np.std syntax
At a high level, the syntax for np.std looks something like this:
As I mentioned earlier, assuming that we’ve imported Numpy with the alias “np
” we call the function with the syntax np.std()
.
Then inside of the parenthesis, there are several parameters that allow you to control exactly how the function works.
Let’s take a look at those parameters.
The parameters of numpy.std
There are a few important parameters you should know:
a
axis
dtype
ddof
keepdims
out
Let’s take a look at each of them.
a
(required)
The a
parameter specifies the array of values over which you want to calculate the standard deviation.
Said differently, this enables you to specify the input array to the function.
Appropriate inputs include Numpy arrays, but also “array like” objects such as Python lists.
Importantly, you must provide an input to this parameter. An input is required.
Having said that, the parameter itself can be implicit or explicit. What I mean by that, is that you can directly type the parameter a=
, OR you can leave the parameter out of your syntax, and just type the name of your input array.
I’ll show you examples of this in example 1.
axis
The axis parameter enables you to specify an axis along which the standard deviation will be computed.
To understand this, you really need to understand axes.
Numpy arrays have axes.
You can think of an “axis” like a direction along the array.
In a 2-dimensional array, there will be 2 axes: axis-0 and axis-1.
In a 2D array, axis-0 points downward along the rows, and axis-1 points horizontally along the columns.
Visually, you can visualize the axes of a 2D array like this:
Using the axis
parameter, you can compute the standard deviation in a particular direction along the array.
This is best illustrated with examples, so I’ll show you an example in example 2.
(For a full explanation of Numpy array axes, see our tutorial called Numpy axes explained.)
dtype
(optional)
The dtype
parameter enables you to specify the data type that you want to use when np.std computes the standard deviation.
If the data in the input array are integers, then this will default to float64
.
Otherwise, if the data in the input array are floats, then this will default to the same float type as the input array.
ddof
(optional)
This enables you to specify the “degrees of freedom” for the calculation.
To understand this, you need to look at equation 2 again.
In this equation, the first term is .
Remember: is the number of values in the array or dataset.
But if we’re thinking in statistical terms, there’s actually a difference between computing a population standard deviation vs a sample standard deviation.
If we compute a population standard deviation, we use the term in our equation.
However, when we compute the standard deviation on a sample of data (a sample of datapoints), then we need to modify the equation so that the leading term is
. In that case, the equation for a sample standard deviation becomes:
(3)
How do we implement this with np.std?
We can do this with the ddof
parameter, by setting ddof = 1
.
And in fact, we can set the ddof
term more generally. When we use ddof
, it will modify the standard deviation calculation to become:
(4)
To be honest, this is a little technical. If you need to learn more about this, you should watch this video at Khan academy about degrees of freedom, and population vs sample standard deviation.
out
(optional)
The out
parameter enables you to specify an alternative array in which to put the output.
It should have the same shape as the expected output.
keepdims
(optional)
The keepdims
parameter can be used to “keep” the original number of dimensions. When you set keepdims = True
, the output will have the same number of dimensions as the input.
Remember: when we compute the standard deviation, the computation will “collapse” the number of dimensions.
For example, if we input a 2-dimensional array as an input, then by default, np.std will output a number. A scalar value.
But if we want the output to be a number within a 2D array (i.e., an output array with the same dimensions as the input), then we can set keepdims = True
.
To be honest, some of these parameters are a little abstract, and I think they will make a lot more sense with examples.
Let’s take a look at some examples.
Examples of how to use Numpy standard deviation
Here, we’ll work through a few examples. We’ll start simple and then increase the complexity.
Examples:
- Calculate standard deviation of a 1-dimensional array
- Calculate the standard deviation of a 2-dimensional array
- Use np.std to compute the standard deviations of the columns
- Use np.std to compute the standard deviations of the rows
- Change the degrees of freedom
- Use the keepdims parameter in np.std
Run this code first
Before you run any of the example code, you need to import Numpy.
To do this, you can run the following code:
import numpy as np
This will import Numpy with the alias “np
“.
EXAMPLE 1: Calculate standard deviation of a 1 dimensional array
Here, we’ll start simple.
We’re going to calculate the standard deviation of 1-dimensional Numpy array.
Create 1D array
First, we’ll just create our 1D array:
array_1d = np.array([12, 14, 99, 72, 42, 55])
Calculate standard dev
Now, we’ll calculate the standard deviation of those numbers.
np.std(array_1d)
OUT:
30.84369195367723
So what happened here?
The np.std function just computed the standard deviation of the numbers [12, 14, 99, 72, 42, 55]
using equation 2 that we saw earlier. Each number is one of the in that equation.
One quick note
In the above example, we did not explicitly use the a=
parameter. That is because np.std understands that when we provide an argument to the function like in the code np.std(array_1d)
, the input should be passed to the a
parameter.
Alternatively, you can also explicitly use the a=
parameter:
np.std(a = array_1d)
OUT:
30.84369195367723
EXAMPLE 2: Calculate the standard deviation of a 2-dimensional array
Ok. Now, let’s look at an example with a 2-dimensional array.
Create 2-dimensional array
Here, we’re going to create a 2D array, using the np.random.randint function.
np.random.seed(22) array_2d = np.random.randint(20, size =(3, 4))
This array has 3 rows and 4 columns.
Let’s print it out, so we can see it.
print(array_2d)
OUT:
[[ 4 12 0 4] [ 6 11 8 4] [18 14 13 7]]
This is just a 2D array that contains 12 random integers between 0 and 20.
Compute standard deviation with np.std
Okay, let’s compute the standard deviation.
np.std(array_2d)
OUT:
5.007633062524539
Here, numpy.std() is just computing the standard deviation of all 12 integers.
The standard deviation is 5.007633062524539
.
EXAMPLE 3: Compute the standard deviation of the columns
Now, we’re going to compute the standard deviation of the columns.
To do this, we need to use the axis
parameter. (You learned about the axis
parameter in the section about the parameters of numpy.std)
Specifically, we need to set axis = 0
.
Why?
As I mentioned in the explanation of the axis
parameter earlier, Numpy arrays have axes.
In a two dimensional array, axis-0 is the axis that points downwards.
When we use numpy.std with axis = 0
, that will compute the standard deviations downward in the axis-0 direction.
Let’s take a look at an example so you can see what I mean.
Create 2-dimensional array
First, we’ll create a 2D array, using the np.random.randint function.
(This is the same array that we created in example 2, so if you already created it, you shouldn’t need to create it again.)
np.random.seed(22) array_2d = np.random.randint(20, size =(3, 4))
Let’s print it out, so we can see it.
print(array_2d)
OUT:
[[ 4 12 0 4] [ 6 11 8 4] [18 14 13 7]]
This is just a 2D array that contains integers between 0 and 20.
Use np.std to compute standard deviation of the columns
Now, we’ll set axis = 0
inside of np.std to compute the standard deviations of the columns.
np.std(array_2d, axis = 0)
OUT:
array([6.18241233, 1.24721913, 5.35412613, 1.41421356])
Explanation
What’s going on here?
When we use np.std with axis = 0
, Numpy will compute the standard deviation downward in the axis-0 direction. Remember, as I mentioned above, axis-0 points downward.
This has the effect of computing the standard deviation of each column of the Numpy array.
Now, let’s do a similar example with the row standard deviations.
EXAMPLE 4: Use np.std to compute the standard deviations of the rows
Now, we’re going to use np.std to compute the standard deviations horizontally along a 2D numpy array.
Remember what I said earlier: numpy arrays have axes. The axes are like directions along the Numpy array. In a 2D array, axis-1 points horizontally, like this:
So, if we want to compute the standard deviations horizontally, we can set axis = 1
. This has the effect of computing the row standard deviations.
Let’s take a look.
Create 2-dimensional array
To run this example, we’ll again need a 2D Numpy array, so we’ll create a 2D array using the np.random.randint function.
(This is the same array that we created in example 2, so if you already created it, you shouldn’t need to create it again.)
np.random.seed(22) array_2d = np.random.randint(20, size =(3, 4))
Let’s print it out, so we can see it.
print(array_2d)
OUT:
[[ 4 12 0 4] [ 6 11 8 4] [18 14 13 7]]
This is just a 2D array that contains integers between 0 and 20.
Use np.std to compute standard deviation of the rows
Now, we’ll use np.std with axis = 1
to compute the standard deviations of the rows.
np.std(array_2d, axis = 1)
OUT:
array([4.35889894, 2.58602011, 3.93700394])
Explanation
If you understood example 3, this new example should make sense.
When we use np.std and set axis = 1
, Numpy will compute the standard deviations horizontally along axis-1.
Effectively, when we use Numpy standard deviation with axis = 1
, the function computes the standard deviation of the rows.
EXAMPLE 5: Change the degrees of freedom
Now, let’s change the degrees of freedom.
Here in this example, we’re going to create a large array of numbers, take a sample from that array, and compute the standard deviation on that sample.
First, let’s create our arrays.
Create Numpy array
First, we’ll just create a normally distributed Numpy array with a mean of 0 and a standard deviation of 10.
To do this, we’ll use the Numpy random normal function. Note that we’re using the Numpy random seed function to set the seed for the random number generator. For more information on this, read our tutorial about np.random.seed.
np.random.seed(22) population_array = np.random.normal(size = 100, loc = 0, scale = 10)
Ok. Now we have a Numpy array, population_array
, that has 100 elements that have a mean of 0 and a standard deviation of 10.
Create sample
Now, we’ll use Numpy random choice to take a random sample from the Numpy array, population_array
.
np.random.seed(22) sample_array = np.random.choice(population_array, size = 10)
This new array, sample_array
, is a random sample of 10 elements from population_array
.
We’ll use sample_array
when we calculate our standard deviation using the ddof
parameter.
Calculate the standard deviation of the sample
Now, we’ll calculate the standard deviation of the sample.
Specifically, we’re going to use the Numpy standard deviation function with the ddof
parameter set to ddof = 1
.
np.std(sample_array, ddof = 1)
OUT:
10.703405562234051
Explanation
Here, we’ve calculated:
And when we set ddof = 1
, the equation evaluates to:
To be clear, when you calculate the standard deviation of a sample, you will set ddof = 1
.
To be honest, the details about why are a little technical (and beyond the scope of this post), so for more information about calculating a sample standard deviation, I recommend that you watch this video.
Keep in mind, that for some other instances, you can set ddof
to other values besides 1 or 0. If you don’t use the ddof
parameter at all, it will default to 0.
No matter what value you select, the Numpy standard deviation function will compute the standard deviation with the equation:
EXAMPLE 6: Use the keepdims parameter in np.std
Ok. Finally, we’ll do one last example.
Here, we’re going to set the keepdims
parameter to keepdims = True
.
Create 2-dimensional array
First, we’ll create a 2D array, using the np.random.randint function.
(This is the same array that we created in example 2, so if you already created it, you shouldn’t need to create it again.)
np.random.seed(22) array_2d = np.random.randint(20, size =(3, 4))
Let’s print it out:
print(array_2d)
OUT:
[[ 4 12 0 4] [ 6 11 8 4] [18 14 13 7]]
Check the dimensions
Now, let’s take a look at the dimensions of this array.
array_2d.ndim
OUT:
2
This is a 2D array, just like we intended.
Compute the standard deviation, and check the dimensions
Ok. Now, we’re going to compute the standard deviation, and check the dimensions of the output.
output = np.std(array_2d)
Let’s quickly print the output:
print(output)
OUT:
5.007633062524539
So the standard deviation is 5.007633062524539.
Now, what’s the dimensions of the output?
output.ndim
OUT:
0
The output has 0 dimensions (it’s a scalar value).
Why?
When np.std computes the standard deviation, it’s computing a summary statistic. In this case, the function is taking a large number of values and collapsing them down to a single metric.
So the input was 2-dimensional, but the output is 0-dimensional.
What if we want to change that?
What if we want the output to technically have 2-dimensions?
We can do that with the keepdims
parameter.
Keep the original dimensions when we use np.std
Here, we’ll set keepdims = True
to make the output the same dimensions as the input.
output_2d = np.std(array_2d, keepdims = True)
Now, let’s look at the output:
print(output_2d)
OUT:
[[5.00763306]]
Notice that the output, the standard deviation, is still 5.00763306. But the result is enclosed inside of double brackets.
Let’s inspect output_2d
and take a closer look.
type(output_2d)
OUT:
numpy.ndarray
So, output_2d
is a Numpy array, not a scalar value.
Let’s check the dimensions:
output_2d.ndim
OUT:
2
This Numpy array, output_2d
, has 2 dimensions.
This is the same number of dimensions as the input.
What happened?
When we set keepdims = True
, that caused the np.std function to produce an output with the same number of dimensions as the input. Even though there are not any rows and columns in the output, the output output_2d
has 2 dimensions.
So, in case you ever need your output to have the same number of dimensions as your input, you can set keepdims = True
.
(This also works when you use the axis
parameter … try it!)
Frequently asked questions about Numpy standard deviation
Now that you’ve learned about Numpy standard deviation and seen some examples, let’s review some frequently asked questions about np.std.
Frequently asked questions:
Question 1: Why does numpy std() give a different result than matlab std() or another programing language?
The simple reason is that matlab calculates the standard dev according to the following:
(Many other tools use the same equation.)
However, Numpy calculates with the following:
Notice the subtle difference between the vs the
.
To fix this, you can use the ddof
parameter in Numpy.
If you use np.std with the ddof
parameter set to ddof = 1
, you should get the same answer as matlab.
Leave your other questions in the comments below
Do you have other questions about the Numpy standard deviation function?
Leave your question in the comments section below.
Join our course to learn more about Numpy
The examples you’ve seen in this tutorial should be enough to get you started, but if you’re serious about learning Numpy, you should enroll in our premium course called Numpy Mastery.
There’s a lot more to learn about Numpy, and Numpy Mastery will teach you everything, including:
- How to create Numpy arrays
- How to use the Numpy random functions
- What the “Numpy random seed” function does
- How to reshape, split, and combine your Numpy arrays
- and more …
Moreover, it will help you completely master the syntax within a few weeks. You’ll discover how to become “fluent” in writing Numpy code.
Find out more here:
You’re a great guy.
Please keep this up.
Well done.
Good to hear that you like it, Emmanuel.
Great way of explaining !! Thank you!!
Thank you for the compliment.
I try to explain things so they are clear and easy to understand.
I am a beginner on Python. While trying to understand ‘return statement’ on Real Python tutorial, I strayed across this essay because the example on theReal Python calculated variance and used ddof = 0. I tried to understand what it meant that’s how I strayed across your essay. I am glad I did. I have done some statistics in the past but never came across ddof. So thank you and well explained. I’ll be back
Good to hear.
Generally speaking, what is the relatiionship, if any, between std deviation or variance derived from axis=0 and axis=1?
How is this used in real life, say excel?
I have tried to use the ‘range’ and ‘len’ function to try and print the results in a vertical manner instead of horizontal. Is that possible?
Please ignore my question about setting out the values vertically.