Over and over again, I emphasize the importance of data manipulation.
Data manipulation is the foundation for almost everything else in data science and machine learning.
So to be a great data scientist, one of your first goals should be to master data manipulation tools.
And if you’re doing data manipulation in Python, that means that you need to learn Numpy.
Numpy is easy in many ways, but many people still get stuck on it.
So in this blog post, I’ll give you 7 tips to help you learn Numpy.
1: Use the 80/20 rule
Numpy is a very powerful toolkit.
One of the reasons that it’s so powerful is the vast number of tools and techniques within the Numpy package.
I don’t know the exact number, but there are probably well over 100 functions, methods, and tools within the Numpy system.
But this poses a problem.
There are a lot of things to potentially learn.
You could easily jump down the Numpy rabbit hole, and try to learn everything.
This is a bad strategy.
When you learn Numpy, a small number of tools account for the vast majority of your code.
This is an example of the 80/20 rule. You’ve probably heard of this before.
According to the 80/20 rule, 80 percent of the outputs result from 20 percent of the inputs.
The 80/20 rule applies to all sorts of things, and for the most part, it also applies to Numpy.
Even though there are 100+ functions and methods in Numpy, you’ll really probably only use a couple dozen of them.
You should focus relentlessly on learning and masting those commonly-used techniques.
(I’ll point out most of them in the upcoming sections.)
2: Learn How Arrays Are Structured
So, now let’s talk about some things specific to Numpy.
One of the first and most important things you need to know is how Numpy arrays are structured.
Let’s start with the basics:
Numpy arrays are data structures that hold numeric data. They can have 1-dimension, 2-dimensions, or multiple dimensions.
So they are very similar to vectors and matrices in mathematics and linear algebra.
Importantly, you always need to be mindful of the shape of the array that you’re working with (i.e., the number of rows, columns, etc).
That’s because some Numpy functions change depending on the shapes of the arrays that you’re working with. For example, when you use Numpy multiply and Numpy divide, the functions will work differently if you have two same sized arrays verses two differently sized arrays.
Having said this, you can check the size and shape with the
For example, if you have an array named
my_array, you can use
my_array.shape to check the shape.
3: Learn Numpy Axes
You also need to learn about Numpy axes.
Axes in a Numpy array are very similar to axes in a Cartesian, x/y/z coordinate system. Remember that in a Cartesian coordinate system, we can identify a point in space by it’s location along the x-axis and location along the y-axis.
Axes in Numpy are very similar.
Much like in a Cartesian coordinate system, Numpy axes are directions. You can think of them like directions along the edges of the array.
For example, in a 2D array, axis-0 points downward and axis-1 points across.
This is important for a few reasons.
First, these axes enable you to locate individual cells in the array. To do this, you need to know which axis references the rows, which references the columns, etc.
Second, many Numpy functions can operate in a specific direction (i.e., along a specific axis). For example, Numpy sum can operate along specific axes to compute row sums and column sums.
I can’t stress this enough: Numpy axes come up over and over again. They are one of the most important things in the Numpy system.
To learn more about Numpy axes, you should read our full tutorial that explains how axes work.
4: Learn the Important Array Creation Methods
Once you understand the essentials of array structure and array axes, you need to know how to create arrays.
There are many, many ways to create Numpy arrays, but the ones that I use most often are:
- the Numpy array function
- Numpy arange
- Numpy linspace
- numpy ones and Numpy zeroes
- a few of the Numpy random functions like Numpy random normal and Numpy random choice
To be clear, and as I mentioned previously, Numpy has a lot of array creation methods. There are methods for creating arrays with random numbers of various properties. There are techniques for creating arrays with numbers drawn from various probability distributions. I could go on and on.
But in the beginning, you need to focus on the most commonly used techniques. Remember: 80/20.
5: Learn How to Reshape Arrays
Once you have an array with some Numbers, it’s very common to need to reshape the array in some way.
For example, you may have a 1-dimensional array that you want to reshape to 2 dimensions. Or visa versa.
This is fairly easy to do once you know how, but it comes up over and over again, so you should learn it early.
To learn more about reshaping Numpy arrays, read this tutorial about Numpy reshape.
6: Learn Numpy Aggregation Techniques
Next, you should learn some array aggregation techniques.
The most common and most important are:
You’ll commonly use these techniques when you need to analyze the numbers in your array by computing summary statistics.
I’ll point out that Numpy axes are particularly important when you use these aggregation functions. That’s because you will often need to compute column statistics and row statistics (e.g., the row means, etc).
7: Learn Numpy Array Math
Finally, learn some essential array math.
At minimum, you should know how to add and subtract arrays (using np.add and np.subtract).
You may also want to learn about dot products with Numpy and other linear algebra operations. These can be useful for machine learning and deep learning. Having said that, they are slightly more advanced, and you may be able to avoid them in the beginning.
Leave Your Questions in The Comments
Do you have any questions about what I’ve written here? Do you have other suggestions or things you’re struggling with?
If so, leave your questions and suggestions in the comments section near the bottom of the page.
There’s more to learn, but start with the essentials
In this blog post, I’ve tried to distill Numpy down to the most important things, so you can learn Numpy more quickly. I’ve tried to explain the most important, most commonly used techniques and concepts that you should focus on first.
But, if you really want to master Numpy, there’s a lot more to learn.
So if you’re serious about mastering Numpy, and serious about data science in Python, you should consider joining our premium course called Numpy Mastery.
Numpy Mastery will teach you everything you need to know about Numpy, including:
- How to create Numpy arrays
- How to reshape, split, and combine your Numpy arrays
- What the “Numpy random seed” function does
- How to use the Numpy random functions
- How to perform mathematical operations on Numpy arrays
- and more …
Moreover, this course will show you a practice system that will help you master the syntax within a few weeks. We’ll show you a practice system that will enable you to memorize all of the Numpy syntax you learn. If you have trouble remembering Numpy syntax, this is the course you’ve been looking for.
Find out more here: