The post How to use the NumPy linspace function appeared first on Sharp Sight.

]]>It’s somewhat similar to the NumPy arange function, in that it creates sequences of evenly spaced numbers structured as a NumPy array.

There are some differences though. Moreover, some people find the linspace function to be a little tricky to use.

It’s not that hard to understand, but you really need to learn how it works.

That being said, this tutorial will explain how the NumPy linspace function works. It will explain the syntax, and it will also show you concrete examples of the function so you can see it in action.

Near the bottom of the post, this will also explain a little more about how np.linspace differs from np.arange.

Ok, first things first. Let’s look a little more closely at what the np.linspace function does and how it works.

The NumPy linspace function creates sequences of evenly spaced values within a defined interval.

Essentally, you specify a starting point and an ending point of an interval, and then specify the total number of breakpoints you want within that interval (*including* the start and end points). The np.linspace function will return a sequence of evenly spaced values on that interval.

To illustrate this, here’s a quick example. (We’ll look at more examples later, but this is a quick one just to show you what np.linspace does.)

np.linspace(start = 0, stop = 100, num = 5)

This code produces a NumPy array (an `ndarray`

object) that looks like the following:

That’s the `ndarray`

that the code produces, but we can also visualize the output like this:

So what’s going on here?

Remember: the NumPy linspace function produces a *evenly spaced observations* within a defined interval.

We specified that interval with the `start`

and `stop`

parameters. In particular, this interval starts at 0 and ends at 100.

We also specified that we wanted 5 observations within that range. So, the linspace function returned an `ndarray`

with 5 evenly spaced elements. The first element is 0. The last element is 100. The remaining 3 elements are evenly spaced between 0 and 100.

As should be expected, the output array is consistent with the arguments we’ve used in the syntax.

Having said that, let’s look a little more closely at the syntax of the np.linspace function so you can understand how it works a little more clearly.

The syntax of the NumPy linspace is very straightforward.

Obviously, when using the function, the first thing you need to do is call the function name itself:

To do this, you use the code `np.linspace`

(assuming that you’ve imported NumPy as `np`

).

Inside of the `np.linspace`

code above, you’ll notice 3 parameters: `start`

, `stop`

, and `num`

. These are 3 parameters that you’ll use most frequently with the linspace function. There are also a few other optional parameters that you can use.

Let’s talk about the parameters of np.linspace:

There are several parameters that help you control the `linspace`

function: `start`

, `stop`

, `num`

, `endpoint`

, and `dtype`

.

To understand these parameters, let’s take a look again at the following visual:

`start`

The `start`

parameter is the beginning of the range of numbers.

So if you set `start = 0`

, the first number in the new `nd.array`

will be 0.

Keep in mind that this parameter is required.

`stop`

The `stop`

parameter is the stopping point of the range of numbers.

In most cases, this will be the last value in the range of numbers. Having said that, if you modify the parameter and set `endpoint = False`

, this value will *not* be included in the output array. (See the examples below to understand how this works.)

** num** (optional)

The

`num`

parameter controls how many total items will appear in the output array.For example, if `num = 5`

, then there will be 5 total items in the output array. If, `num = 10`

, then there will be 10 total items in the output array, and so on.

This parameter is optional. If you don’t provide a value for `num`

, then np.linspace will use `num = 50`

as a default.

** endpoint** (optional)

The

`endpoint`

parameter controls whether or not the `stop`

value is included in the output array. If `endpoint = True`

, then the value of the `stop`

parameter will be included as the last item in the `nd.array`

.

If `endpoint = False`

, then the value of the `stop`

parameter will **not** be included.

By default, `endpoint`

evaluates as `True`

.

** dtype** (optional)

Just like in many other NumPy functions, with

`np.linspace`

, the `dtype`

parameter controls the data type of the items in the output array.If you don’t specify a data type, Python will *infer* the data type based on the values of the other parameters.

If you do explicitly use this parameter, however, you can use any of the available data types from NumPy and base Python.

Keep in mind that you won’t use all of these parameters every time that you use the np.linspace function. Several of these parameters are optional.

Moreover, `start`

, `stop`

, and `num`

are much more commonly used than `endpoint`

and `dtype`

.

Also keep in mind that you don’t need to explicitly use the parameter names. You can write code *without* the parameter names themselves; you can add the arguments as “positional arguments” to the function.

Here’s an example:

np.linspace(0, 100, 5)

This code is functionally identical to the code we used in our previous examples: `np.linspace(start = 0, stop = 100, num = 5)`

.

The main difference is that we did not explicitly use the `start`

, `stop`

, and `num`

parameters. Instead, we provided arguments to those parameters by *position*. When you don’t use the parameter names explicitly, Python knows that the first number (0) is supposed to be the `start`

of the interval. It know that 100 is supposed to be the `stop`

. And it knows that the third number (5) corresponds to the `num`

parameter. Again, when you don’t explicitly use the parameter names, Python assigns the argument values to parameters *strictly by position*; which value appears first, second, third, etc.

You’ll see people do this frequently in their code. People will commonly exclude the parameter names in their code and use positional arguments instead. Although I realize that it’s a little faster to write code with positional arguments, I think that it’s clearer to actually use the parameter names. As a best practice, you should probably use them.

Now that you’ve learned how the syntax works, and you’ve learned about each of the parameters, let’s work through a few concrete examples.

A quick example

np.linspace(start = 0, stop = 1, num = 11)

Which produces the output array:

array([ 0. , 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1. ])

An example like this would be useful if you’re working with percents in some way. For example, if you were plotting percentages or plotting “accuracy” metrics for a machine learning classifier, you might use this code to construct part of your plot. Explaining how to do that is beyond the scope of this post, so I’ll leave a deeper explanation of that for a future blog post.

A very similar example is creating a range of values from 0 to 100, in breaks of 10.

np.linspace(start = 0, stop = 100, num = 11)

The code for this is almost identical to the prior example, except we’re creating values from 0 to 100.

Since it’s somewhat common to work with data with a range from 0 to 100, a code snippet like this might be useful.

As mentioned earlier in this blog post, the `endpoint`

parameter controls whether or not the `stop`

value is included in the output array.

By default (if you don’t set any value for `endpoint`

), this parameter will have the default value of `True`

. That means that the value of the `stop`

parameter *will* be included in the output array (as the final value).

However, if you set `endpoint = False`

, then the value of the `stop`

parameter will *not* be included.

Here’s an example.

In the following code, `stop`

is set to 5.

np.linspace(start = 1, stop = 5, num = 4, endpoint = False)

But because we’re also setting `endpoint = False`

, 5 will *not* be included as the final value.

On the contrary, the output `nd.array`

contains 4 evenly spaced values (i.e., `num = 4`

), starting at 1, up to but *excluding* 5:

array([ 1., 2., 3., 4.])

Personally, I find that it’s a little un-intuitive to use `endpoint = False`

, so I don’t use it often. But if you have a reason to use it, this is how to do it.

As mentioned earlier, the NumPy linspace function is supposed to “infer” the data type from the other input arguments. You’ll notice that in many cases, the output is an array of floats.

If you want to manually specify the data type, you can use the `dtype`

parameter.

This is very straightforward. Using the `dtype`

parameter with np.linspace is identical to how you specify the data type with np.array, specify the data type with np.arange, etc.

Essentially, you use the `dtype`

parameter and indicate the exact Python or NumPy data type that you want for the output array:

np.linspace(start = 0, stop = 100, num = 5, dtype = int)

In this case, when we set `dtype = int`

, the linspace function produces an nd.array object with *integers* instead of floats.

Again, Python and NumPy have a variety of available data types, and you can specify any of these with the `dtype`

parameter.

If you’re familiar with NumPy, you might have noticed that np.linspace is rather similar to the np.arange function.

The essential difference between NumPy linspace and NumPy arange is that linspace enables you to control the precise end value, whereas arange gives you more direct control over the increments between values in the sequence.

To be clear, if you use them carefully, both linspace and arange can be used to create evenly spaced sequences. To a large extent, these are two similar different tools for creating sequences, and which you use will be a matter of preference. I personally find np.arange to be more intuitive, so I tend to prefer arange over linspace. Again though, this will mostly be a matter of preference, so try them both and see which you prefer.

Here at Sharp Sight, we teach data science. We want to help you master data science as fast as possible.

If you sign up for our email list, you’ll receive Python data science tutorials delivered to your inbox.

You’ll get free tutorials on:

- NumPy
- Pandas
- Base Python
- Scikit learn
- Machine learning
- Deep learning
- … and more

… delivered to your inbox every week.

Want to learn data science in Python? Sign up now.

The post How to use the NumPy linspace function appeared first on Sharp Sight.

]]>The post How to use the NumPy arange function appeared first on Sharp Sight.

]]>If you’re learning data science in Python, the Numpy toolkit is important. The NumPy arange function is particularly important because it’s very common; you’ll see the np.arange function in a lot of data science code.

Having said that, this tutorial will show you how to use the NumPy arange function in Python.

It will explain how the syntax works. It will also show you some working examples of the np.arange function, so you can play with it and see how it operates.

The NumPy arange function returns evenly spaced numeric values within an interval, stored as a NumPy array (i.e., an `ndarray`

object).

That might sound a little complicated, so let’s look at a quick example.

We can call the `arange()`

function like this:

numpy.arange(5)

Which will produce a NumPy array like this:

What happened here?

The np.arange function produced a sequence of 5 evenly spaced values from 0 to 4, stored as an `ndarray`

object (i.e., a NumPy array).

Having said that, what’s actually going on here is a little more complicated, so to fully understand the np.arange function, we need to examine the syntax. Once we look at the syntax, I’ll show you more complicated examples which will make everything more clear.

The syntax for NumPy arange is pretty straightforward.

Like essentially all of the NumPy functions, you call the function name and then there are a set of parameters that enable you to specify the exact behavior of the function.

Assuming that you’ve imported NumPy into your environment as `np`

, you call the function with `np.arange()`

.

Then inside of the `arange()`

function, there are 4 parameters that you can modify:

`start`

`stop`

`step`

`dtype`

Let’s take a look at each of those parameters, so you know what each one does.

(optional)**start**

The `start`

parameter indicates the beginning value of the range.

This parameter is *optional*, so if you omit it, it will automatically default to 0.

(required)**stop**

The `stop`

parameter indicates the end of the range. Keep in mind that like all Python indexing, this value will *not* be included in the resulting range. (The examples below will explain and clarify this point.) So essentially, the sequence of values will extend up to but excluding the `stop`

value.

Additionally, this parameter is *required*, so you need to provide a `stop`

value.

(optional)**step**

The `step`

parameter specifies the spacing between values in the sequence.

This parameter is optional. If you don’t specify a `step`

value, by default the `step`

value will be 1.

(optional)**dtype**

The `dtype`

parameter specifies the data type.

Python and NumPy have a variety of data types that can be used here.

Having said that, if you don’t specify a data type, it will be infered based on the other arguments to the function.

Now, let’s work through some examples of how to use the NumPy arange function.

Before you start working through these examples though, make sure that you’ve imported NumPy into your environment:

# IMPORT NUMPY import numpy as np

Ok, let’s get started.

First, let’s use a simple case.

Here, we’re going to create a simple NumPy array with 5 values.

np.arange(stop = 5)

Which produces something like the following array:

Notice a few things.

First, we didn’t specify a `start`

value. Because of this, the sequence starts at “0.”

Second, when we used the code `stop = 5`

, the “5” serves as the `stop`

position. This causes NumPy to create a sequence of values starting from 0 (the `start`

value) up to but *excluding* this stop value.

Next, notice the spacing between values. The values are increasing in “steps” of 1. This is because we did *not* specify a value for the `step`

parameter. If we don’t specify a value for the `step`

parameter, it defaults to 1.

Finally, the data type is integer. We didn’t specify a data type with the `dtype`

parameter, so Python has inferred the data type from the other arguments to the function.

One last note about this example. In this example, we’ve explicitly used `stop =`

parameter. It’s possible to remove the parameter itself, and just leave the argument, like this:

np.arange(5)

Here, the value “5” is treated a positional argument to the `stop`

parameter. Python “knows” that the value “5” serves as the `stop`

point of the range. Having said that, I think it’s much clearer to explicitly use the parameter names. It’s clearer if you just type `np.arange(stop = 5)`

.

Now that you’ve seen a basic example, let’s look at something a little more complicated.

Here, we’re going to create a range of values from 0 to 8, in increments of 2.

To do this, we will use the `start`

position of 0 and a `stop`

position of 2. To increment in steps of 2, we’ll set the `step`

parameter to 2.

np.arange(start = 0, stop = 8, step = 2)

The code creates a `ndarray`

object like this:

Essentially, the code creates the following sequence of values stored as a NumPy array: `0, 2, 4, 6`

.

Let’s take a step back and analyze how this worked. The output range begins at 0. This is because we set `start = 0`

.

The output range then consists of values starting from 0 and incrementing in steps of 2: `2, 4, 6`

.

The range *stops* at 6. Why? We set the `stop`

parameter to 8. Remember though, numpy.arange() will create a sequence up to but *excluding* the `stop`

value. So once arange() gets to 6, the function can’t go any further. If it attempts to increment by the step value of 2, it will produce the value of 8, which should be excluded, according to the syntax `stop = 8`

. Again, np.arange will produce values up to but *excluding* the `stop`

value.

As noted above, you can also specify the data type of the output array by using the `dtype`

parameter.

Here’s an example:

np.arange(start = 1, stop = 5, dtype = 'float')

First of all, notice the decimal point at the end of each number. This essentially indicates that these are floats instead of integers.

How did we create this? This is very straightforward, if you’ve understood the prior examples.

We’ve called the np.arange function starting from 1 and stopping at 5. And we’ve set the datatype to float by using the syntax `dtype = 'float'`

.

Keep in mind that we used floats here, but we could have one of several different data types. Python and NumPy have a couple dozen different data types. These are all available when manipulating the `dtype`

parameter.

It’s also possible to create a 2-dimensional NumPy array with numpy.arange(), but you need to use it in conjunction with the NumPy reshape method.

Ultimately, we’re going to create a 2 by 2 array with the values from 1 to 9.

But before we do that, there’s an “intermediate” step that you need to understand.

…. let’s just create a 1-dimensional array with the values from 1 to 9:

np.arange(start = 1, stop = 10, step = 1)

This code creates a NumPy array like this:

Notice how the code worked. The sequence of values starts at 1, which is the `start`

value. The final value is 9. That’s because we set the `stop`

parameter equal to 10; the range of values will be up to but *excluding* the `stop`

value. Because we’re incrementing in steps of 1, the last value in the range will be 9.

Ok. Let’s take this a step further to turn this into a 2 by 2 array.

Here, we’re going to use the reshape method to re-shape this 1-dimensional array into a 2-dimensional array.

np.arange(start = 1, stop = 10, step = 1).reshape((3,3))

Notice that this is just a modification of the code we used a moment ago. We’re using `np.arange(start = 1, stop = 10, step = 1)`

to create a sequence of numbers from 1 to 9.

Then we’re calling the `reshape()`

method to re-shape the sequence of numbers. We’re reshaping it from a 1-dimensional NumPy array, to a 2-dimensional array with 3 values along the rows and 3 values along the columns.

In the last few examples, I’ve given you an overview of how the NumPy arange function works.

It’s pretty straightforward once you understand the syntax, and it’s not that hard to learn.

Having said that, make sure that you study and practice this syntax. Like all programming techniques, the key to mastery is repeated practice. Use the simple examples that I’ve shown above and try to recall the code. Practice and review this code until you know how the syntax works from memory.

If you have more questions about the NumPy arange function, leave your questions in the comments below.

Here at Sharp Sight, we teach data science. We want to help you master data science as fast as possible.

If you sign up for our email list, you’ll receive Python data science tutorials delivered to your inbox.

Want to learn data science in Python? Sign up now.

The post How to use the NumPy arange function appeared first on Sharp Sight.

]]>The post How to use the NumPy zeros function appeared first on Sharp Sight.

]]>If you’re doing data science in Python, you need to be able to work with numerical data. One of the primary tools for working with numerical data is the NumPy array.

In some cases, you will have data that you can import into Python. In those cases, you can use the NumPy `array()`

function to transform the data into a NumPy array.

But in other cases, you might be starting from scratch. You might not have any data yet, or you might otherwise need to create an empty NumPy array.

To do this, you can create an array that only contains only zeros using the NumPy `zeros()`

function.

In this blog post, I’ll explain how to use the NumPy zeros function. I’ll explain the syntax of the function, some of the parameters that help you control the function, and I’ll show you some examples of how it works.

However, before we dive into the syntax of NumPy zeros, we’re going to quickly review NumPy arrays. Ultimately, to understand the NumPy zeros function though it helps to have a good understanding of NumPy arrays.

Having said that, if you want to skip ahead directly to the section about the `np.zeros()`

syntax, click here.

A NumPy array is basically like a container that holds numeric data that’s all of the same data type. I’m simplifying things a little, but that’s the essence of them.

We can create a very simple NumPy array as follows:

import numpy as np np.array([[1,2,3,4,5,6],[7,8,9,10,11,12]])

Which we can visually represent as follows:

Here, we’ve used the NumPy array function to create a 2-dimensional array with 2 rows and 6 columns. Notice as well that all of the data are integers. Again, in a NumPy array, all of the data must be of the same data type.

Keep in mind that NumPy arrays can be quite a bit more complicated as well. It’s possible to construct 3-dimensional arrays and N-dimensional arrays. For the sake of clarity though, we’ll work with 1 and 2-dimensional arrays in this post.

There will be times when you’ll need to create an empty array. For example, you may be performing some computations and you need a structure to hold the results of those computations. In such a case, you might want to create an empty array …. that is, an array with all zeroes.

One way of doing this is with the NumPy `array()`

function. You can create a an empty NumPy array by passing in a Python list with all zeros:

np.array([[0,0,0],[0,0,0]])

The problem with this though is that it may not always be efficient. It’s a little cumbersome. And it would be *very* cumbersome if you needed to create a very large array or an array with high dimensions. For example, here’s some code to create an array with 3 rows and 30 columns that contains all zeros:

np.array([[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0] ,[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0] ,[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]])

Honestly, this is a little ridiculous (and would be even more ridiculous if you made a larger array).

Hypothetically, you could also create a 1-dimensional array with the right number of zeros, and then reshape it to the correct shape using the NumPy reshape method.

np.zeros(90).reshape((3,30))

To be honest though, that method is also cumbersome and a little prone to errors.

There must be a better way, right?

Yes, there is.

Enter, the NumPy zeros function.

The NumPy zeros function enables you to create NumPy arrays that contain only zeros.

Importantly, this function enables you to specify the exact *dimensions* of the array. It also enables you to specify the exact *data type*.

The syntax works like this:

You basically call the function with the code `numpy.zeros()`

.

Then inside the `zeros()`

function, there is a set of arguments. The first positional argument is a tuple of values that specifies the dimensions of the new array.

Next, there’s an argument that enables you to specify the data type. If you don’t specify a data type, `np.zeros()`

will use floats by default.

The syntax for using the zeros function is pretty straightforward, but it’s always easier to understand code when you have a few examples to work with. That being the case, let’s take a look at some examples.

Let’s first take a look at a very simple example.

Here, we’re just going to create a 1-dimensional NumPy array with 5 zeros.

np.zeros(5)

Which creates a NumPy array that looks something like this:

This is very simple. There are 5 elements in the array. They are zeros. Moreover, they are all floating point numbers. Remember, the elements of a NumPy array must all be of the same data type, and if we don’t specify the data type, the function will create floats by default.

That leads us to the next example of how to use the NumPy zeros function: creating a NumPy array with a specific data type.

Here, we’ll create a NumPy array with all zeros *of a specific data type*.

This is very easy to do. We’re going to call the `zeros()`

function just like we’ve done previously in this blog post.

But inside the function, we’re going to use the `dtype`

parameter to specify the specific data type of elements in the output array.

Here, we’re going to create an array with *integers*, so we’ll use the code `dtype = int`

.

np.zeros(3, dtype = int)

Which produces a NumPy array that looks like this:

Notice a few things. First of all, all of the elements of this array are *integers*.

You can examine the data type with the following code:

np.zeros(3, dtype = int).dtype

This code retrives the data type attribute from the resulting NumPy array. To be precise, the data type is `int64`

:

dtype('int64')

This is just one example. Python and NumPy support a couple dozen different datatypes, so you can use this technique to create a variety NumPy arrays with zeros of specific data types (floats, complex numbers, etc).

Next, let’s talk about creating arrays with a specific *shape*.

We can do this with the shape parameter:

np.zeros(shape = (2, 3))

Which produces an array like the following:

Technically though, you don’t need to explicitly call out the `shape =`

parameter; you can simply specify the shape with a tuple of values, and Python will infer that it refers to the shape (i.e., `shape`

is a “positional argument”).

np.zeros((2, 3))

You’ll see people do this a lot. They’ll create NumPy arrays with a specific shape, but won’t explicitly include the `shape =`

parameter in the syntax. I think that’s a bad practice though. Be explicit. I think it’s better to type `shape = (2, 3)`

and explicitly use the `shape`

parameter. It’s just clearer.

Finally, let’s put all of those pieces together and create a NumPy zeros array with a specific shape and a specific data type:

np.zeros(shape = (3, 5), dtype = 'int')

Which will produce a NumPy array that looks something like this:

What did we do here?

We used the NumPy zeros function to create a 3 by 5 NumPy array that contains all zeros. Integers, to be specific. 3 rows, 5 columns.

To do this, we used the `shape`

parameter to specify the shape, and we used the `dtype`

parameter to specify the data type.

If you’re feeling a little ambitious, you can use this tool to create much larger arrays and arrays with different shapes.

np.zeros(shape = (3, 3, 5), dtype = 'int')

Which would look something like this:

I recommend that you play with the syntax a little, but make sure that you practice the syntax with simple cases first. Several of the examples earlier in this post are perfect for daily practice. Try to recall the syntax. Try to remember it and practice it regularly until you have it memorized.

Here at Sharp Sight, we teach data science. We want to help you master data science as fast as possible.

If you sign up for our email list, you’ll receive Python data science tutorials delivered to your inbox.

Want to learn data science in Python? Sign up now.

The post How to use the NumPy zeros function appeared first on Sharp Sight.

]]>The post How to use Numpy reshape appeared first on Sharp Sight.

]]>When working with NumPy arrays, you’re going to need to be able to perform basic data manipulation.

In particular, you may need to change the “shape” of the data; you may need to change how the data are arranged in the NumPy array.

To do this, you can use the **NumPy reshape method**.

In this blog post, I’ll explain the NumPy reshape method. I’ll give you a quick overview of the `shape`

attribute of NumPy arrays, explain how the syntax works, and show you a couple of examples of the NumPy reshape method.

Ok. Let’s get started.

Before we dive into the syntax of `reshape()`

, I’ll quickly review some of the basics of NumPy arrays.

If you want, you can click here to skip ahead directly to the section about the reshape() syntax.

NumPy arrays are a structure in Python that hold numerical values that are all of the same type.

Visually, you can represent a NumPy array as something like this:

This is a visual representation of a NumPy array that contains five values: `88`

, `19`

, `46`

, `74`

, `94`

. All of these values have the same data type (in this case, they are integers).

If you’re not terribly familiar with them, you should read our quick introduction to NumPy arrays. That blog post will give you a solid foundation on arrays and how they work.

An important thing about NumPy arrays, is that they have a *shape*.

So if we create a simple NumPy array, the array will have a `shape`

attribute that we can reference.

I’m going to show you more syntax later in this blog post, but here I’ll quickly give you an example so you can see what the `shape`

attribute is.

simple_array = np.array([[1,2,3,4,5,6],[7,8,9,10,11,12]]) print(simple_array.shape)

OUTPUT:

(2, 6)

Notice what I did here.

I created an array with 2 rows and 6 columns.

Then I referenced the `shape`

attribute with the code `simple_array.shape`

. Python displayed the `shape`

attribute as a tuple of values: `(2, 6)`

. Python always returns the `shape`

as a tuple.

So what exactly is the `shape`

?

The `shape`

attribute tells us how many elements are along each dimension. Said differently, the `shape`

attribute essentially tells us how the values are laid out inside of the NumPy array.

That might not make sense yet, so it’s helpful to look at a few more examples. We’ll look at some toy examples to try to illustrate the `shape`

attribute and how it relates to the reshape method.

Let’s create and examine another 2-dimensional array:

array_3x3 = np.array([[1,2,3],[4,5,6],[7,8,9]]) print(array_3x3)

This array has 3 elements along the first dimension and 3 elements along the second dimension. Technically speaking, this particular array has a `shape`

of `(3,3)`

. (Remember that Python displays the `shape`

attribute as a tuple of values.)

Because this is two dimensional, you can think of the `shape`

in terms of rows and columns. There are 3 rows and 3 columns. If we’re thinking in 2-dimensions, that’s basically what the `shape`

tells us … the number of rows and columns.

Let’s take a look at another example.

array_2x6 = np.array([[1,2,3,4,5,6],[7,8,9,10,11,12]]) print(array_2x6.shape)

OUTPUT:

(2, 6)

This array has 2 rows and 6 columns. It is a 2 by 6 array. A little more technically, the `shape`

is `(2,6)`

.

Ok. One more example.

array_6x2 = np.array([[1,2],[3,4],[5,6],[7,8],[9,10],[11,12]]) print(array_6x2.shape)

OUTPUT:

(6, 2)

Here, this array has 6 rows and 2 columns. Its shape is `(6, 2)`

.

Do you get it?

The `shape`

of a NumPy array tells us the number of elements along the dimensions of the array.

If we’re working with 2-dimensional arrays, the `shape`

basically tells us the number of rows and columns. And if we’re working with higher dimensional arrays, the `shape`

tells us the number of elements along all of the dimensions: dimension 1, dimension 2, dimension 3, etc.

Now that you understand the `shape`

attribute of NumPy arrays, let’s talk about the NumPy reshape method.

NumPy reshape enables us to *change* the shape of a NumPy array.

For example, if we have a 2 by 6 array, we can use `reshape()`

to re-shape the data into a 6 by 2 array:

In other words, the NumPy reshape method helps us reconfigure the data in a NumPy array. It enables us to change a NumPy array from one shape to a new shape. It “re-shapes” the data.

Let’s take a closer look at how `reshape()`

works.

When we use the `reshape()`

method, we need to have an existing NumPy array.

We then use the Python “dot” notation to call the method.

Inside of the call to `reshape()`

, we need to provide a tuple of values that specify the shape of the new array.

Keep in mind that the `reshape()`

method doesn’t operate directly on the original NumPy array. It produces a *new* array. What that means is that you need to save the output in some way. Typically, that means that you’ll use the equal sign to assign the new array to a new name (which I’ve called “`new_array`

” in the image above).

Also keep in mind that the new array must have the same number of elements as the original array.

Ok. Now that I’ve shown you the syntax for `reshape()`

, let’s quickly work through a simple example.

First, I’ll create a NumPy array. To use the reshape method, you need to have an existing NumPy array.

import numpy as np simple_array = np.array([1,2,3,4,5,6,7,8,9,10,11,12]) print(simple_array.shape)

OUT:

(12,)

Here, we’ve created a simple NumPy array with 12 elements called `simple_array`

. When we retrieve the `shape`

attribute, you can see that the shape is `(12,)`

.

That might be a little confusing, so let me explain.

`simple_array`

is a 1 dimensional array, not a 2 dimensional array. Therefore, we can’t think of this in terms of rows and columns, per se. There is strictly 1 dimension, and all 12 of the elements are aligned along that dimension. That’s why the `shape`

attribute shows up as a single value within a tuple: `(12,)`

.

Ok. Now that we have an array, let’s reshape it.

new_array_2x6 = simple_array.reshape((2,6)) print(new_array_2x6)

When we use the print function to print out the array, you can see that the data have been re-shaped into a format with 2 rows and 6 columns:

[[ 1 2 3 4 5 6] [ 7 8 9 10 11 12]]

We can also directly examine the shape by retrieving the `shape`

attribute:

print(new_array_2x6.shape)

Which gives us the following tuple of values:

(2, 6)

You can see that the shape of `new_array_2x6`

is `(2, 6)`

.

So what did we do here?

We used the reshape method to re-shape the data in `simple_array`

into a new array with a new shape.

The original array had a shape of `(12, )`

, but the new array – which we’ve called `new_array_2x6`

– has a shape of `(2, 6)`

. It has 2 rows and 6 columns.

Also, notice that the new array has the same number of elements as the original array. The original array, `simple_array`

, had 12 total elements, all arranged in a single dimension. The new array, `new_array_2x6`

, also has 12 elements. *This is required*. When you use the reshape method, it needs to produce an output array with the same number of elements as the original array. That is, the new array needs to have the same `size`

as the original array.

Let’s work through another example by reshaping our data once more.

We’re going to take the array that we just created, `new_array_2x6`

, and re-shape it into a NumPy array with a different shape.

We’re going to create an array with 6 rows and 2 columns.

new_array_6x2 = new_array_2x6.reshape((6,2)) print(new_array_6x2)

And here’s the output of the `print()`

function:

[[ 1 2] [ 3 4] [ 5 6] [ 7 8] [ 9 10] [11 12]]

What did we do here?

We re-shaped the array `new_array_2x6`

. That array had 2 rows and 6 columns.

This *new* array – `new_array_6x2`

– has 6 rows and 2 columns. The value have been flipped into a new shape, so to speak.

The data is the same. The individual observations are the same. The *number* of observations is the same. The primary difference is that they data in the new array are laid out in a new form. `new_array_6x2`

has a new shape.

In this blog post, I’ve shown you examples of how to reshaped data in 1 and 2 dimensions. But, it’s possible to reshape data in more dimensions as well (e.g., reshaping data in 3-dimensions, etc).

In the interest of simplicity, I won’t cover those topics here (although I might address them in a future blog post).

If you’re just getting started with NumPy and data manipulation in Python, I strongly recommend that you work with very simple cases. Work on and *master* rehaping data in 1 and 2 dimensions first. That will help you build mastery of the basic syntax. It will also help build your intuition about how the toolkit works.

Once you’ve mastered the basics, you can increase the complexity to higher dimensions and larger datasets.

Are you still confused about how to use the NumPy reshape method?

Leave your questions in the comments below.

The post How to use Numpy reshape appeared first on Sharp Sight.

]]>The post A quick introduction to the NumPy array appeared first on Sharp Sight.

]]>That being the case, if you want to learn data science in Python, you’ll need to learn how to work with NumPy arrays.

In this blog post, I’ll explain the essentials of NumPy arrays, including:

- What NumPy arrays are
- How to create a NumPy array
- Attributes of NumPy arrays
- Retrieving individual values from NumPy arrays
- Slicing NumPy arrays
- Creating and working with 2-dimensional NumPy arrays

Let’s get started.

A NumPy array is a collection of elements that have the same data type.

You can think of it like a container that has several compartments that hold data, as long as the data is of the same data type.

Visually, we can represent a simple NumPy array sort of like this:

Let’s break this down.

We have a set of integers: 88, 19, 46, 74, 94. These values are all integers; they are all of the same type. You can see that these values are stored in “compartments” of a larger structure. The overall structure is the NumPy array.

Very quickly, I’ll explain a little more about some of the properties of a NumPy array.

As I mentioned above, NumPy arrays must contain data all of the same type.

That means that if your NumPy array contains integers, *all* of the values must be integers. If it contains floating point numbers, *all* of the values must be floats.

I won’t write extensively about data types and NumPy data types here. There is a section below in this blog post about how to create a NumPy array of a particular type. Having said that, if you want to learn a lot more about the various data types that are available in NumPy, then (as the saying goes) read the f*cking manual.

Each of the compartments inside of a NumPy array have an “address.” We call that address an “index.”

If you’re familiar with computing in general, and Python specifically, you’re probably familiar with indexes. Many data structures in Python have indexes, and the indexes of a NumPy array essentially work the same.

If you’re not familiar with indexes though, let me explain. Again, an index is sort of like an address. These indexes enable you to reference a specific value. We call this indexing.

Just like other Python structures that have indexes, the indexes of a NumPy array begin at zero:

So if you want to reference the value in the very first location, you need to reference location “0”. In the example shown here, the value at index 0 is `88`

.

I’ll explain how exactly to use these indexes syntactically, but to do that, I want to give you working examples. To give you working examples, I’ll need to explain how to actually create NumPy arrays in Python.

There are a lot of ways to create a NumPy array. Really. A lot. Off the top of my head, I can think of at least a half dozen techniques and functions that will create a NumPy array. In fact, the purpose of many of the functions in the NumPy package is to create a NumPy array of one kind or another.

But, this blog posts is intended to be a *quick* introduction to NumPy arrays. That being the case, I don’t want to show you every possible way to make a NumPy array. Let’s keep this simple. I’ll show you a few very basic ways to do it.

In particular, I’ll how you how to use the NumPy `array()`

function.

To use the NumPy `array()`

function, you call the function and pass in a Python list as the argument.

Let’s take a look at some examples. We’ll start by creating a 1-dimensional NumPy array.

Creating a 1-dimensional NumPy array is easy.

You call the function with the syntax `np.array()`

. Keep in mind that before you call `np.array()`

, you need to import the NumPy package with the code `import numpy as np`

.

When you call the `array()`

function, you’ll need to provide a list of elements as the argument to the function.

#import NumPy import numpy as np # create a NumPy array from a list of 3 integers np.array([1,2,3])

This isn’t complicated, but let’s break it down.

We’ve called the `np.array()`

function. The argument to the function is a list of three integers: `[1,2,3]`

. It produces a NumPy array of those three integers.

Note that you can also create NumPy arrays with other data types, besides integers. I’ll explain how to do that a little later in this blog post.

You can also create 2-dimensional arrays.

To do this using the `np.array()`

function, you need to pass in a list of lists.

# 2-d array np.array([[1,2,3],[4,5,6]])

Pay attention to what we’re doing here, syntactically.

Inside of the call to `np.array()`

, there is a list of two lists: `[[1,2,3],[4,5,6]]`

. The first list is `[1,2,3]`

and the second list is `[4,5,6]`

. Those two lists are contained *inside* of a larger list; a list of lists. Then that list of lists is passed to the array function, which creates a 2-dimensional NumPy array.

This might be a little confusing if you’re just getting started with Python and NumPy. In that case, I highly recommend that you review Python lists.

There are also other ways to create a 2-d NumPy array. For example, you can use the `array()`

function to create a 1-dimensional NumPy array, and then use the `reshape()`

method to reshape the 1-dimensional NumPy array into a 2-dimensional NumPy array.

# 2-d array np.array([1,2,3,4,5,6]).reshape([2,3])

For right now, I don’t want to get too “in the weeds” explaining `reshape()`

, so I’ll leave this as it is. I just want you to understand that there are a few ways to create 2-dimensional NumPy arrays.

I’ll write more about how to create and work with 2-dimensional NumPy arrays in a future blog post.

It’s also possible to create 3-dimensional NumPy arrays and N-dimensional NumPy arrays. However, in the interest of simplicity, I’m not going to explain how to create those in this blog post. I’ll address N-dimensional NumPy arrays in a future blog post.

Using the NumPy `array()`

function, we can also create NumPy arrays with specific data types. Remember that in a NumPy array, all of the elements must be of the same type.

To do this, we need to use the `dtype`

parameter inside of the `array()`

function.

Here are a couple of examples:

**integer**

To create a NumPy array with integers, we can use the code `dtype = 'int'`

.

np.array([1,2,3], dtype = 'int')

**float**

Similarly, to create a NumPy array with floating point number, we can use the code `dtype = 'float'`

.

np.array([1,2,3], dtype = 'float')

These are just a couple of examples. Keep in mind that NumPy supports almost 2 dozen data types … many more than what I’ve shown you here.

Having said that, a full explanation of Python data types and NumPy data types is beyond the scope of this post. Just understand that you can specify the data type using the `dtype`

parameter.

For more information on data types in NumPy, consult the documentation about the NumPy types that are available.

I want to point out one common mistake that many beginners make when they try to create a NumPy array with the `np.array()`

function.

As I mentioned above, when you create a NumPy array with `np.array()`

, you need to provide a *list of values*.

Many beginners forget to do this and simply provide the values directly to the `np.array()`

function, without enclosing them inside of a list. If you attempt to do that it will cause an error:

np.array([1,2,3,4,5]) #This works! np.array(1,2,3,4,5) #This will cause an error

In the two examples above, pay close attention to the syntax. The top example works properly because the integers are contained inside of a Python list. The second example causes an error because the integers are passed directly to `np.array()`

, without enclosing them in a list.

Having said that, pay attention! Make sure that when you use `np.array()`

, you’re passing the values as a list.

Again, if you’re confused about this or don’t understand Python lists, I strongly recommend that you go back and review lists and other basic “built-in types” in Python.

NumPy arrays have a set of attributes that you can access. These attributes include things like the array’s size, shape, number of dimensions, and data type.

Here’s an abbreviated list of attributes of NumPy arrays:

Attribute | What it records |
---|---|

`shape` |
The dimensions of the NumPy array |

`size` |
The total number of elements in the NumPy array |

`ndim` |
The number of dimensions of the array |

`dtype` |
The data type of the elements in the array |

`itemsize` |
The length of a single array element in bytes |

I want to show you a few of these. To illustrate them, let’s make a NumPy array and then investigate a few of its attributes.

Here, we’ll once again create a simple NumPy array using `np.random.randint()`

.

np.random.seed(72) simple_array = np.random.randint(low = 0, high = 100, size=5)

`simple_array`

is a NumPy array, and like all NumPy arrays, it has attributes.

You can access those attributes by using a dot after the name of the array, followed by the attribute you want to retrieve.

Here are some examples:

`ndim`

is the number of dimensions.

simple_array.ndim

Which produces the output:

1

What this means is that `simple_array`

is a 1-dimensional array.

The `shape`

attribute tells us the number of elements along each dimension.

simple_array.shape

With the output:

(5,)

What this is telling us is that `simple_array`

has 5 elements along the first axis. (And that’s the only information provided, because `simple_array`

is 1-dimensional.)

The `size`

attribute tells you the total number of elements in a NumPy array.

simple_array.size

With the output:

5

This is telling us that `simple_array`

has 5 total elements.

`dtype`

tells you the type of data stored in the NumPy array.

Let’s take a look. We can access the `dtype`

parameter like this:

simple_array.dtype

Which produces the output:

dtype('int64')

This is telling us that `simple_array`

contains *integers*.

Also remember: NumPy arrays contain data that are all of the same type.

Although we constructed `simple_array`

to contain integers, but we could have created an array with floats or other numeric data types.

For example, we can create a NumPy array with decimal values (i.e., floats):

array_float = np.array([1.99,2.99,3.99] ) array_float.dtype

Which gives the output:

dtype('float64')

When we construct the array with the above input values, you can see that `array_float`

contains data of the `float64`

datatype (i.e., numbers with decimals).

Now that I’ve explained attributes, let’s examine how to index NumPy arrays.

Indexing is very important for accessing and retrieving the elements of a NumPy array.

Recall what I wrote at the beginning of the blog post:

A NumPy array is like a container with many compartments. Each of the compartments inside of a NumPy array have an “address.” We call that address an “index.”

Notice again that the index of the first value is 0.

We can use the index to retrieve specific values in the NumPy array. Let’s take a look at how to do that.

First, let’s create a NumPy array using the function `np.random.randint()`

.

np.random.seed(72) simple_array = np.random.randint(low = 0, high = 100, size=5)

You can print out the array with the following code:

print(simple_array)

And you can see that the array has 5 integers.

[88, 19, 46, 74, 94]

For the sake of clarity tough, here’s a visual representation of `simple_array`

. Looking at this will help you understand array indexing:

In this visual representation, you can see the values stored in the array, `88, 19, 46, 74, 94`

. But, I’ve also shown you the index values associated with each of those elements.

These indexes enable us to retrieve values in specific locations.

Let’s take a look at how to do that.

The simplest form of indexing is retrieving a single value from the array.

To retrieve a single value from particular location in the NumPy array, you need to provide the “index” of that location.

Syntactically, you need to use bracket notation and provide the index inside of the brackets.

Let me show you an example. Above, we created the NumPy array `simple_array`

.

To get the value at index 1 from `simple_array`

, you can use the following syntax:

# Retrieve the value at index 1 simple_array[1]

Which returns the value `19`

.

Visually though, we can represent this indexing action like this:

Essentially, we’re using a particular index (i.e., the “address” of a particular location in the array) to retrieve the value stored at that location.

So the code `simple_array[1]`

is basically saying, “give me the value that’s at index location 1.” The result is `19`

… `19`

is the value at that index.

NumPy also supports negative index values. Using a negative index allows you to retrieve or reference locations starting from the *end* of the array.

Here’s an example:

simple_array[-1]

This retrieves the value at the very end of the array.

We could also retrieve this value by using the index `4`

(both will work). But sometimes you won’t know exactly how long the array is. This is a convenient way to reference items at the end of a NumPy array.

I just showed you simple examples of array indexing, but array indexing can be quite complex.

It’s actually possible to retrieve *multiple* elements from a NumPy array.

To do this, we still use bracket notation, but we can use a colon to specify a range of values. Here’s an example:

simple_array[2:4]

This code is saying, “retrieve the values stored from index 2, up to but *excluding* index 4.”

Visually, we can represent this as follows:

Now that you’ve learned how to use indexes in 1-dimensional NumPy arrays, lets review how to use indexes in 2-dimensional NumPy arrays.

Working with 2-d NumPy arrays is very similar to working with 1-d arrays. The major difference (with regard to indexes) is that 2-d arrays have 2 indexes, a row index and a column index.

To retrieve a value from a 2-d array, you need to provide the specific row and column indexes.

Here’s an example. We’ll create a 2-d NumPy array, and then we’ll retrieve a value.

np.random.seed(72) square_array = np.random.randint(low = 0, high = 100, size = 25).reshape([5,5]) square_array[2,1]

Here, we’re essentially retrieving the value at row index 2 and column index 1. The value at that position is `45`

.

This is fairly straightforward. The major challenge is that you need to remember that the row index is first and the column index is second.

Finally, let’s review how to retrieve slices from 2-d NumPy arrays. Slicing 2-d arrays is very similar to slicing 1-d arrays. The major difference is that you need to provide 2 ranges, one for the rows and one for the columns.

np.random.seed(72) square_array = np.random.randint(low = 0, high = 100, size = 25).reshape([5,5]) square_array[1:3,1:4]

Let’s break this down.

We’ve again created a 5×5 square NumPy array called `square_array`

.

Then, we took a slice of that array. The slice included the rows from index 1 up-to-and-excluding index 3. It also included the columns from index 1 up-to-and-excluding index 4.

This might seem a little confusing if you’re a true beginner. In that case, I recommend working with 1-d arrays first, until you get the hang of them. Then, start working with relatively small 2-d NumPy arrays until you build your intuition about indexing works with 2-d arrays.

If you’re a beginner or you don’t have a lot of experience with NumPy arrays, this might seem a little overwhelming. It’s not that complicated, but there’s a lot here and it will take a while to learn and master.

That said, I want to know if you’re still confused about something.

What questions do you still have about NumPy arrays?

Leave your questions and challenges in the comments below …

The post A quick introduction to the NumPy array appeared first on Sharp Sight.

]]>