Numpy Random Seed, Explained

In this tutorial, I’ll explain how to use the NumPy random seed function, which is also called np.random.seed or numpy.random.seed.

The function itself is extremely easy to use.

However, the reason that we need to use it is a little complicated. To understand why we need to use NumPy random seed, you actually need to know a little bit about pseudo-random numbers.

That being the case, this tutorial will first explain the basics of pseudo-random numbers, and will then move on to the syntax of numpy.random.seed itself.

Contents

The tutorial is divided up into several different sections.

You can click on any of the above links, and it will take you directly to that section.

However, I strongly recommend that you read the whole tutorial.

As I said earlier, numpy.random.seed is very easy to use, but it’s not that easy to understand. Understanding why we use it requires some background. That being the case, it’s much better if you actually read the tutorial.

Ok … let’s get to it.

NumPy random seed is for pseudo-random numbers in Python

So what exactly is NumPy random seed?

NumPy random seed is simply a function that sets the random seed of the NumPy pseudo-random number generator. It provides an essential input that enables NumPy to generate pseudo-random numbers for random processes.

Does that make sense? Probably not.

Unless you have a background in computing and probability, what I just wrote is probably a little confusing.

Honestly, in order to understand “seeding a random number generator” you need to know a little bit about pseudo-random numbers.

That being the case, let me give you a quick introduction to them …

A quick introduction to pseudo-random numbers

Here, I want to give you a very quick overview of pseudo-random numbers and why we need them.

Once you understand pseudo-random numbers, numpy.random.seed will make more sense.

WTF is a pseudo-random number?

At the risk of being a bit of a smart-ass, I think the name “pseudo-random number” is fairly self explanatory, and it gives us some insight into what pseudo-random numbers actually are.

Let’s just break down the name a little.

A pseudo-random number is a number. A number that’s sort-of random. Pseudo-random.

So essentially, a pseudo-random number is a number that’s almost random, but not really random.

It might sound like I’m being a bit sarcastic here, but that’s essentially what they are. Pseudo-random numbers are numbers that appear to be random, but are not actually random.

In the interest of clarity though, let’s see if we can get a definition that’s a little more precise.

A proper definition of psuedo-random numbers

According to the encyclopedia at Wolfram Mathworld, a pseudo-random number is:

… a computer-generated random number.

The definition goes on to explain that ….

The prefix pseudo- is used to distinguish this type of number from a “truly” random number generated by a random physical process such as radioactive decay.

A separate article at random.org notes that pseudo-random numbers “appear random, but they are really predetermined”.

Got that? Pseudo-random numbers are computer generated numbers that appear random, but are actually predetermined.

I think that these definitions help quite a bit, and they are a great starting point for understanding why we need them.

Why we need pseudo-random numbers

I swear to god, I’m going to bring this back to NumPy soon.

But, we still need to understand why pseudo-random numbers are required.

Really. Just bear with me. This will make sense soon.

A problem: computers are deterministic, not random

There’s a fundamental problem when using computers to simulate or work with random processes.

Computers are completely deterministic, not random.

Setting aside some rare exceptions, computers are deterministic by their very design. To quote an article at MIT’s School of Engineering “if you ask the same question you’ll get the same answer every time.”

Another way of saying this is that if you give a computer a certain input, it will precisely follow instructions to produce an output.

An image that shows how a computer transforms an input into an output.

… And if you later give a computer the same input, it will produce the same output.

If the input is the same, then the output will be the same.

THAT’S HOW COMPUTERS WORK.

The behavior of computers is deterministic

Essentially, the behavior of computers is NOT random.

This introduces a problem: how can you use a non-random machine to produce random numbers?

pseudo-random numbers are generated by algorithms

Computers solve the problem of generating “random” numbers the same way that they solve essentially everything: with an algorithm.

Computer scientists have created a set of algorithms for creating psuedo random numbers, called “pseudo-random number generators.”

These algorithms can be executed on a computer.

As such, they are completely deterministic. However, the numbers that they produce have properties that approximate the properties of random numbers.

pseudo-random numbers appear to be random

That is to say, the numbers generated by pseudo-random number generators appear to be random.

Even though the numbers they are completely determined by the algorithm, when you examine them, there is typically no discernible pattern.

For example, here we’ll create some pseudo-random numbers with the NumPy randint function:

np.random.seed(1)
np.random.randint(low = 1, high = 10, size = 50))

OUT:

[6, 9, 6, 1, 1, 2, 8, 7, 3, 5, 6, 3, 5, 3, 5, 8, 8, 
2, 8, 1, 7, 8, 7, 2, 1, 2, 9, 9, 4, 9, 8, 4, 7, 6, 
2, 4, 5, 9, 2, 5, 1, 4, 3, 1, 5, 3, 8, 8, 9, 7]

See any pattern here?

Me neither.

I can assure you though, that these numbers are not random, and are in fact completely determined by the algorithm. If you run the same code again, you’ll get the exact same numbers.

pseudo-random numbers can be re-created exactly

Importantly, because pseudo-random number generators are deterministic, they are also repeatable.

What I mean is that if you run the algorithm with the same input, it will produce the same output.

So you can use pseudo-random number generators to create and then re-create the exact same set of pseudo-random numbers.

Let me show you.

Generate pseudo-random integers

Here, we’ll create a list of 5 pseudo-random integers between 0 and 9 using numpy.random.randint.

(And notice that we’re using np.random.seed here)

np.random.seed(0)
np.random.randint(10, size = 5)

This produces the following output:

array([5, 0, 3, 3, 7])

Simple. The algorithm produced an array with the values [5, 0, 3, 3, 7].

Generate pseudo-random integers again

Ok.

Now, let’s run the same code again.

… and notice that we’re using np.random.seed in exactly the same way …

np.random.seed(0)
np.random.randint(10, size = 5)

OUTPUT:

array([5, 0, 3, 3, 7])

Well take a look at that …

The. numbers. are. the. same.

We ran the exact same code, and it produced the exact same output.

I will repeat what I said earlier: pseudo random number generators produce numbers that look random, but are 100% determined.

Determined how though?

Remember what I wrote earlier: computers and algorithms process inputs into outputs. The outputs of computers depend on the inputs.

So just like any output produced by a computer, pseudo-random numbers are dependent on the input.

THIS is where numpy.random.seed comes in …

The numpy.random.seed function provides the input (i.e., the seed) to the algorithm that generates pseudo-random numbers in NumPy.

How and why we use NumPy random seed

Ok, you got this far.

You’re ready now.

Now you can learn about NumPy random seed.

numpy.random.seed provides an input
to the pseudo-random number generator

What I wrote in the previous section is critical.

The “random” numbers generated by NumPy are not exactly random. They are pseudo-random … they approximate random numbers, but are 100% determined by the input and the pseudo-random number algorithm.

The np.random.seed function provides an input for the pseudo-random number generator in Python.

That’s all the function does!

It allows you to provide a “seed” value to NumPy’s random number generator.

We use numpy.random.seed in conjunction with other numpy functions

Importantly, numpy.random.seed doesn’t exactly work all on its own.

The numpy.random.seed function works in conjunction with other functions from NumPy.

Specifically, numpy.random.seed works with other function from the numpy.random namespace.

So for example, you might use numpy.random.seed along with numpy.random.randint. This will enable you to create random integers with NumPy.

You can also use numpy.random.seed with numpy.random.normal to create normally distributed numbers.

… or you can use it with numpy.random.choice to generate a random sample from an input.

In fact, there are several dozen NumPy random functions that enable you to generate random numbers, random samples, and samples from specific probability distributions.

I’ll show you a few examples of some of these functions in the examples section of this tutorial.

NumPy random seed is deterministic

Remember what I said earlier in this tutorial …. pseudo-random number generators are completely deterministic. They operate by algorithm.

What this means is that if you provide the same seed, you will get the same output.

And if you change the seed, you will get a different output.

The output that you get depends on the input that you give it.

I’ll show you examples of this behavior in the examples section.

NumPy random seed makes your code repeatable

The important thing about using a seed for a pseudo-random number generator is that it makes the code repeatable.

Remember what I said earlier?

… pseudo-random number generators operate by a deterministic process.

If you give a pseudo-random number generator the same input, you’ll get the same output.

This can actually be a good thing!

There are times when you really want your “random” processes to be repeatable.

Code that has well defined, repeatable outputs is good for testing.

Essentially, we use NumPy random seed when we need to generate pseudo-random numbers in a repeatable way.

NumPy random seed makes your code easier to share

The fact that np.random.seed makes your code repeatable also makes is easier to share.

Take for example the tutorials that I post here at Sharp Sight.

I post detailed tutorials about how to perform various data science tasks, and I show how code works, step by step.

When I do this, it’s important that people who read the tutorials and run the code get the same result. If a student reads the tutorial, and copy-and-pastes the code exactly, I want them to get the exact same result. This just helps them check their work! If they type in the code exactly as I show it in a tutorial, getting the exact same result gives them confidence that they ran the code properly.

Again, in order to get repeatable results when we are using “random” functions in NumPy, we need to use numpy.random.seed.

Ok … now that you understand what NumPy random seed is (and why we use it), let’s take a look at the actual syntax.

The syntax of NumPy random seed

The syntax of NumPy random seed is extremely simple.

There’s essentially only one parameter, and that is the seed value.

An image that shows the syntax of NumPy random seed.

So essentially, to use the function, you just call the function by name and then pass in a “seed” value inside the parenthesis.

Note that in this syntax explanation, I’m using the abbreviation “np” to refer to NumPy. This is a common convention, but it requires you to import NumPy with the code “import numpy as np.” I’ll explain more about this soon in the examples section.

Examples of numpy.random.seed

Let’s take a look at some examples of how and when we use numpy.random.seed.

Before we look at the examples though, you’ll have to run some code.

Run this code first

To get the following examples to run properly, you’ll need to import NumPy with the appropriate “nickname.”

You can do that by executing the following code:

import numpy as np

Running this code will enable us to use the alias np in our syntax to refer to numpy.

This is a common convention in NumPy. When you read NumPy code, it is extremely common to see NumPy referred to as np. If you’re a beginner you might not realize that you need to import NumPy with the code import numpy as np, otherwise the examples won’t work properly!

Now that we’ve imported NumPy properly, let’s start with a simple example. We’ll generate a single random number between 0 and 1 using NumPy random random.

Generate a random number with numpy.random.random

Here, we’re going to use NumPy to generate a random number between zero and one. To do this, we’re going to use the NumPy random random function (AKA, np.random.random).

Ok, here’s the code:

np.random.seed(0)
np.random.random()

OUTPUT:

0.5488135039273248

Note that the output is a float. It’s a decimal number between 0 and 1.

For the record, we can essentially treat this number as a probability. We can think of the np.random.random function as a tool for generating probabilities.

Rerun the code

Now that I’ve shown you how to use np.random.random, let’s just run it again with the same seed.

Here, I just want to show you what happens when you use np.random.seed before running np.random.random.

np.random.seed(0)
np.random.random()

OUTPUT:

0.5488135039273248

Notice that the number is exactly the same as the first time we ran the code.

Essentially, if you execute a NumPy function with the same seed, you’ll get the same result.

Generate a random integer with numpy.random.randint

Next, we’re going to use np.random.seed to set the number generator before using NumPy random randint.

Essentially, we’re going to use NumPy to generate 5 random integers between 0 and 99.

np.random.seed(74)
np.random.randint(low = 0, high = 100, size = 5)

OUTPUT:

array([30, 91,  9, 73, 62])

This is pretty simple.

NumPy random seed sets the seed for the pseudo-random number generator, and then NumPy random randint selects 5 numbers between 0 and 99.

Run the code again

Let’s just run the code so you can see that it reproduces the same output if you have the same seed.

np.random.seed(74)
np.random.randint(low = 0, high = 100, size = 5)

OUTPUT:

array([30, 91,  9, 73, 62])

Once again, as you can see, the code produces the same integers if we use the same seed. As noted previously in the tutorial, NumPy random randint doesn’t exactly produce “random” integers. It produces pseudo-random integers that are completely determined by numpy.random.seed.

Select a random sample from an input array

It’s also common to use the NP random seed function when you’re doing random sampling.

Specifically, if you need to generate a reproducible random sample from an input array, you’ll need to use numpy.random.seed.

Let’s take a look.

Here, we’re going to use numpy.random.seed before we use numpy.random.choice. The NumPy random choice function will then create a random sample from a list of elements.

np.random.seed(0)
np.random.choice(a = [1,2,3,4,5,6], size = 5)

OUTPUT:

array([5, 6, 1, 4, 4])

As you can see, we’ve basically generated a random sample from the list of input elements … the numbers 1 to 6.

In the output, you can see that some of the numbers are repeated. This is because np.random.choice is using random sampling with replacement. For more information about how to create random samples, you should read our tutorial about np.random.choice.

Rerun the code

Let’s quickly re-run the code.

I want to re-run the code just so you can see, once again, that the primary reason we use NumPy random seed is to create results that are completely repeatable.

Ok, here is the exact same code that we just ran (with the same seed).

np.random.seed(0)
np.random.choice(a = [1,2,3,4,5,6], size = 5)

OUTPUT:

array([5, 6, 1, 4, 4])

Once again, we used the same seed, and this produced the same output.

Frequently asked questions about np.random.seed

Now that we’ve taken a look at some examples of using NumPy random seed to set a random seed in Python, I want to address some frequently asked questions.

What does np.random.seed(0) do?

Dude. I just wrote 2000 words explaining what the np.random.seed function does … which basically explains what np.random.seed(0) does.

Ok, ok … I get it. You’re probably in a hurry and just want a quick answer.

I’ll summarize.

We use np.random.seed when we need to generate random numbers or mimic random processes in NumPy.

Computers are generally deterministic, so it’s very difficult to create truly “random” numbers on a computer. Computers get around this by using pseudo-random number generators.

These pseudo-random number generators are algorithms that produce numbers that appear random, but are not really random.

In order to work properly, pseudo-random number generators require a starting input. We call this starting input a “seed.”

The code np.random.seed(0) enables you to provide a seed (i.e., the starting input) for NumPy’s pseudo-random number generator.

NumPy then uses the seed and the pseudo-random number generator in conjunction with other functions from the numpy.random namespace to produce certain types of random outputs.

Ultimately, creating pseudo-random numbers this way leads to repeatable output, which is good for testing and code sharing.

Having said all of that, to really understand numpy.random.seed, you need to have some understanding of pseudo-random number generators.

… so if what I just wrote doesn’t make sense, please return to the top of the page and read the f*#^ing tutorial.

What number should I use in numpy random seed?

Basically, it doesn’t matter.

You can use numpy.random.seed(0), or numpy.random.seed(42), or any other non-negative integer.

For the most part, the number that you use inside of the function doesn’t really make a difference.

You just need to understand that using different seeds will cause NumPy to produce different pseudo-random numbers. The output of a numpy.random function will depend on the seed that you use.

Here’s a quick example. We’re going to use NumPy random seed in conjunction with NumPy random randint to create a set of integers between 0 and 99.

In the first example, we’ll set the seed value to 0.

np.random.seed(0)
np.random.randint(99, size = 5)

Which produces the following output:

array([44, 47, 64, 67, 67])

Basically, np.random.randint generated an array of 5 integers between 0 and 99. Note that if you run this code again with the exact same seed (i.e. 0), you’ll get the same integers from np.random.randint.

Next, let’s run the code with a different seed.

np.random.seed(1)
np.random.randint(99, size = 5)

OUTPUT:

array([37, 12, 72,  9, 75])

Here, the code for np.random.randint is exactly the same … we only changed the seed value. Here, the seed is 1.

With a different seed, NumPy random randint created a different set of integers. Everything else is the same. The code for np.random.randint is the same. But with a different seed, it produces a different output.

Ultimately, I want you to understand that the output of a numpy.random function ultimately depends on the value of np.random.seed, but the choice of seed value is sort of arbitrary.

Do I always need to use numpy random seed?

The short answer is, no.

If you use a function from the numpy.random namespace (like np.random.randint, np.random.normal, etc) without using NumPy random see first, Python will actually still use numpy.random.seed in the background. NumPy will generate a seed value from a part of your computer system (like /urandom on a Unix or Linux machine).

So essentially, if you don’t set a seed with numpy.random.seed, NumPy will set one for you.

However, this has a disadvantage!

If you don’t explicitly set a seed, your code will not have repeatable outputs. NumPy will generate a seed on its own, but that seed might change moment to moment. This will make your outputs different every time you run it.

So to summarize: you don’t absolutely have to use numpy.random.seed, but you should use it if you want your code to have repeatable outputs.

What’s the difference between np.random.seed and np.random.RandomState?

Ok.

We’re really getting into the weeds here.

Essentially, numpy.random.seed sets a seed value for the global instance of the numpy.random namespace.

On the other hand, np.random.RandomState returns one instance of the RandomState and does not effect the global RandomState.

Confused?

That’s okay …. this answer is a little technical and it requires you to know a little about how NumPy is structured on the back end. It also requires you to know a little bit about programming concepts like “global variables.” If you’re a relative data science beginner, the details that you need to know might be over your head.

The important thing is that NumPy random seed is probably sufficient if you’re just using NumPy for some data science or scientific computing.

However, if you’re building software systems that need to be secure, NumPy random seed is probably not the right tool.

To summarize, np.random.seed is probably fine if you’re just doing simple analytics, data science, and scientific computing, but you need to learn more about RandomState if you want to use the NumPy pseudo-random number generator in systems where security is a consideration.

Applications of np.random.seed

Now that I’ve explained the basics of NumPy random seed, I want to tell you a few applications …

Here’s where you might see the np.random.seed function.

Probability and statistics

It’s possible to do probability and statistics using NumPy.

Almost by definition, probability involves uncertainty and randomness. As such, if you use Python and NumPy to model probabilistic processes, you’ll need to use np.random.seed to generate pseudo-random numbers (or a similar tool in Python).

Random sampling

More specifically, if you’re doing random sampling with NumPy, you’ll need to use numpy.random.seed.

NumPy has a variety of functions for performing random sampling, including numpy random random, numpy random normal, and numpy random choice.

In almost every case, when you use one of these functions, you’ll need to use it in conjunction with numpy random seed if you want to create reproducible outputs.

Monte Carlo Methods

Monte Carlo methods are a class of computational methods that rely on repeatedly drawing random samples.

I won’t go into the details here, since Monte Carlo methods are a little complicated, and beyond the scope of this post.

Essentially though, Monte Carlo methods are a powerful computational tool used in science and engineering. In fact, Monte Carlo methods were initially used at the Manhattan Project!

Monte Carlo methods require random numbers. In most cases, when these methods are used, they actually use pseudo-random numbers instead of true random numbers.

Machine learning

Interested in machine learning?

Great … it’s a powerful toolset, and it will be extremely important in the 21st century.

Broadly speaking, pseudo-random numbers are important in machine learning.

Performing simple tasks like splitting datasets into training and test sets requires random sampling. In turn, random sampling almost always requires pseudo-random numbers.

So if you’re doing machine learning in Python, you’ll almost certainly need to use NumPy random seed …

Deep learning

More specifically, you’ll also probably use pseudo-random numbers if you want to do deep learning.

For example, if you want to do deep learning in Python, you’ll often need to split datasets into training and test sets (just like with other machine learning techniques). Again, this requires pseudo-random numbers.

… so when people do deep learning in Python, you’ll frequently see at least a few uses of numpy.random.seed.

Learn numpy to learn data science in Python

I’ve really only touched on a few applications of numpy.random.seed in Python. There are many more.

Speaking generally, if you want to use NumPy, you really need to know this little function.

But even though we focused on NumPy random seed in this tutorial, there are many other NumPy functions that you probably need to learn …

If you want to learn how to do data science in Python, NumPy is very important …

To learn more NumPy, sign up for our email list

If you want to learn NumPy and data science in Python, then sign up for our email list.

Here at Sharp Sight, we teach data science.

… and we regularly post FREE data science tutorials just like this one.

If you want to get our free tutorials delivered directly to you email inbox, then sign up now.

If you sign up for our email list, you’ll get tutorials about:

  • NumPy
  • Pandas
  • Matplotlib
  • Seaborn
  • Sci-kit learn
  • Machine learning
  • Deep learning
  • … and more

We also teach data science in R, so if you sign up, you’ll get tutorials for both languages.

So if you want to learn more data science for FREE, sign up now.

Joshua Ebner

Joshua Ebner is the founder, CEO, and Chief Data Scientist of Sharp Sight.   Prior to founding the company, Josh worked as a Data Scientist at Apple.   He has a degree in Physics from Cornell University.   For more daily data science advice, follow Josh on LinkedIn.

91 thoughts on “Numpy Random Seed, Explained”

  1. This is the first tutorial I read from this site. Definitely, the author goal is teaching not to show knowledge to others. I hope other tutorials of this site would be clear and nice as this one. Deeply appreciated.

    Reply
  2. Thank you very much for this tutorial, I have been trying to understand this from many sources, this is the best explanation I read and understood properly.
    Thanks again!

    Reply
  3. This was the only one place when I found the straight explanation to np.random.seed(). I was sooo confused about the use of that function, but you clarified it so well. Excellent.

    Reply
  4. Hi, your tutorial was great, but I still have a question. What does numpy exactly do with the seed we give it to produce the results it does? How does giving a different seed give a different output?

    Reply
  5. I liked ur way of explaining, maybe I will post this article for everyone on linked in thank very much for such good explanation on such topic, thank u again.

    Reply
  6. Read to the “WTF … “, my mind “Hm…. interesting”. Read to the “F*#king”, my mind “OK, you got me”.
    p/s: greate content, easy to understand

    Reply
  7. Long-winded but got the job done. Got really annoyed reading a whole lot of useless trash just to get to the relevant parts.

    Reply
  8. Great F******g tutorial.

    I had one question tho. Is the computer generating the value for np.random.seed the only true random aspect? Like, is the function /urandom related to date/time or something similar?

    Reply
  9. Very well written.
    Finding ways to get `true` random numbers brought me here.
    You should have added words a two about the `random.SystemRandom()` class.

    Reply
  10. I actually never understood what np.random.seed does. But now I actually get it. Thanks for this article, it’s the best.

    Reply
  11. I just started with numpy and met this seed thing and it scare me and I started because i had know fuckin idea what it was but thanks to you, I am good to continue

    Reply
    • Thank you! I try to explain things as clearly as possible, but also try to keep a sense of humor (if only for my own amusement).

      Reply
  12. that was good but also a lil stupid you wrote 1000000 words and didn’t explain whats the difference between np.random.seed(89) and np.random.seed(7777777) :)

    Reply
  13. Awesome! Simple explanations for complex issues.
    Many thanks for your tutorials!

    There is one more question is left in my mind:
    what does it mean when we see the input np.random.seed(seed)?

    Reply

Leave a Comment