How to Use the Pandas Astype Function in Python

In this tutorial, I’ll explain how to use the Pandas astype function to modify the datatype of Pandas dataframe columns and Pandas objects.

I’ll explain what the technique does, explain the syntax, and show you step-by-step examples.

If you need something specific, you can click on any of the following links.

Table of Contents:

Let’s start with a quick introduction to what the astype technique does.

A Quick Introduction to Pandas Astype

The Pandas astype method modifies the datatype of Pandas objects.

Frequently, we use this tool to modify the datatype of the columns of a dataframe.

But there are some other ways to use it, which I’ll cover in the examples section.

Most commonly, we use this technique when we clean a dataset in Python, although there are potentially other uses for this tool during other parts of the data science workflow.

A simple image that shows how the Pandas astype method changes the datatype of Pandas dataframes or Series objects.

You can use the Pandas astype technique a few different ways.

We can use this tool to change the datatype of:

  • a Pandas Series
  • a single column of a Pandas dataframe
  • multiple columns of dataframe

I’ll show you examples of each of these in the examples section.

But first, let’s take a look at the syntax.

The Syntax of Pandas Astype

Here, we’ll look at the syntax of the Pandas astype() technique.

Since you can use the astype technique on both dataframes and Series objects, we’ll look at the syntax for:

A quick note

Before we look at the syntax, I just want to remind you that everything else this syntax section assumes that you’ve already imported Pandas, and that you already have a Pandas dataframe or Series.

You can import Pandas like this:

import pandas as pd

And there are a variety of ways to create or import dataframes. To learn more about dataframes, you can read our tutorial on Python dataframes.

syntax: use astype on a Series

First, let’s look at how to use astype on a Pandas Series.

To call the method for a Series, just type the name of the series, and then use “dot syntax” to call the astype() method.

An image that explains how to use the Pandas astype method on a Pandas Series.

Inside the parenthesis, you provide the name of the data type. The name of the data type should be enclosed inside quotations. For example, astype('int16').

syntax: use astype on dataframe column(s)

Next, let’s look at how to use astype on a Python dataframe.

First, you’ll type the name of the dataframe, and then use “dot syntax” to call the astype() method.

But inside the parenthesis, instead of providing a single value as the argument, we’ll provide a dictionary.

An image that shows the syntax for how to use the Pandas astype method on a Python dataframe.

Inside the dictionary, you provide the column name, and the datatype you want to use for the column.

An image that shows the dictionary argument to the astype function, which lets you change the datatype of a specific dataframe column.

You can use this to change the datatype of a single dataframe column.

But you can use this to change the datatype of multiple columns. If you want to operate on multiple columns, simply provide several column/datatype pairs, separated by commas, like this:

your_dataframe.astype({'column1':'datatype', 'column2':'datatype'})

This syntax enables you to operate on several dataframe columns at the same time.

This is very useful for data cleaning when you have several variables that need to be modified.

The parameters of Pandas astype

Now, let’s look at some of the parameters of the astype function:

  • dtype
  • copy
  • errors
dtype (required)

The dtype parameter enables you to specify the datatype.

If you provide a single datatype, it will try to apply the datatype to the whole object. If you’re operating on a Series, it will convert that series to the particular datatype. But if you’re operating on a dataframe, it will try to convert every variable to that datatype (which can cause errors).

Alternatively, if you’re operating on a dataframe, you can provide a dictionary as the argument to this parameter. As shown in the syntax section, if you use this style of syntax, you can provide a dictionary of column/datatype pairs.

copy

The copy parameter specifies whether or not the astype method returns a copy of the object, or operates directly on the original.

By default, this parameter is set to copy = True. What that means is that by default, the method will return a new object (i.e., a new dataframe), and leave the original object unchanged.

Be careful if you set copy = False, because it will directly overwrite your dataset. If you do this, you need to make sure that your code works exactly as expected.

errors

The errors parameter specifies whether the technique will raise an exception if you try to apply an inappropriate or invalid datatype to the object.

The two possible arugments to this are:

  • raise (which will raise an exception if there is an issue)
  • ignore (which will suppress exceptions if there is an issue. If there is an error, the original object will be returned)

By default this is set to raise = True.

The output of Pandas astype

By default, the astype method will return a new object.

If you’re operating on a Series, it will return a Series (with the modified datatype).

If you’re operating on a dataframe, it will return a dataframe (with the modified datatype(s)).

Examples: How change the datatype in a Python dataframe

Now that we’ve looked at the syntax for the Python astype function, let’s look at some examples.

Examples:

Run this code first

Before you run the examples, you’ll need to:

  • import necessary packages
  • get the example dataframe

Let’s do each of those.

Import necessary packages

First, we need to import Pandas.

You can do that with the following code:

import pandas as pd

Create example dataframe

Next, we need to create the example dataset that we’re going to work with.

We’ll build the dataset with the Pandas DataFrame function, but providing it with a dictionary of column names, and some lists with the data that we want the variables to contain:

sales_data = pd.DataFrame({"name":["William","Emma","Sofia","Markus","Edward","Thomas","Ethan","Olivia","Arun","Anika","Paulo"]
                           ,"region":["East","North","East","South","West","West","South","West","West","East","South"]
                           ,"sales":[50000,52000,90000,34000,42000,72000,49000,55000,67000,65000,67000]
                           ,"expenses":[42000,43000,50000,44000,38000,39000,42000,60000,39000,44000,45000]
                           }
                          ,dtype = 'object'
                          )

We’ll also create a Pandas series object, that contains only the expenses variable from the dataframe:

expenses_variable = sales_data.expenses

Let’s print out the dataframe so we can see the contents:

print(sales_data)

OUT:

       name region  sales expenses
0   William   East  50000    42000
1      Emma  North  52000    43000
2     Sofia   East  90000    50000
3    Markus  South  34000    44000
4    Edward   West  42000    38000
5    Thomas   West  72000    39000
6     Ethan  South  49000    42000
7    Olivia   West  55000    60000
8      Arun   West  67000    39000
9     Anika   East  65000    44000
10    Paulo  South  67000    45000

You can see 4 variables:

  • name
  • region
  • sales
  • expenses

So the dataframe contains dummy sales and expense data for several salespeople, in four different regions.

Examine data types

Let’s quickly look at the data types:

sales_data.dtypes

OUT:

name        object
region      object
sales       object
expenses    object
dtype: object

Right now, all of the variables have the object datatype.

We’ll want to change most of these.

EXAMPLE 1: Change Datatype of a Pandas Series

Let’s start ultra simple.

Here, we’re going to use astype() on a Pandas series.

In the section where we created our dataframe, we also separated out one of the variables into a Series called expenses_variable.

Check initial datatype

First, let’s check the original datatype:

expenses_variable.dtype

OUT:

dtype('O')

This is telling us that the Series has the ‘object’ datatype.

Change datatype

Now, we’ll use Pandas astype to change the datatype to int32.

expenses_variable.astype('int32')

OUT:

0     42000
1     43000
2     50000
3     44000
4     38000
5     39000
6     42000
7     60000
8     39000
9     44000
10    45000
Name: expenses, dtype: int32
Explanation

If you look carefully at the bottom of the output, you’ll see that the datatype of the output is dtype: int32.

Keep in mind: this has not changed the expenses_variable directly. The expenses_variable Series itself still has the ‘object’ datatype, because the output of astype() was sent to the console.

If we wanted to directly change the data, permanently, we’d need to reassign the output back to the original variable name with the code:

 
expenses_variable = expenses_variable.astype('int32')

EXAMPLE 2: Change the datatype of one column in a dataframe

Now, let’s operate on a column inside of a dataframe.

This will be a little bit different than example 1, where we operated on a Pandas Series.

Here, we’re going to operate on a dataframe, and to do that, the syntax will be slightly different.

Check initial datatype

First, let’s check the original datatypes:

sales_data.dtypes

OUT:

name        object
region      object
sales       object
expenses    object
dtype: object

You’ll notice that all of the columns have the object datatype.

Change datatype

Here, we’re going to use Pandas astype to change the data type of the sales column.

(We’re going to also use the .dtypes attribute to look at the data types of the output.)

sales_data.astype({'sales':'int32'}).dtypes

OUT:

name        object
region      object
sales        int32
expenses    object
dtype: object
Explanation

Notice that in the output, the datatype of sales has been changed to int32.

To do this, we called astype, and provided a dictionary as the argument.

The dictionary contains the name of the column on the left side and on the right hand side we have the new datatype.

When we use astype to operate on a full dataframe, we can use this syntax style, where we provide a dictionary with pairs for the column and datatype in the form {'column':'data-type'}.

EXAMPLE 3: Change the datatypes of multiple variables in a dataframe

Now, let’s modify the datatype of multiple columns in a dataframe.

How to accomplish this will be very similar to how we modified one column in example 2 (we’ll have to use another dictionary).

Check initial datatype

Again, before we perform the operation, let’s check the original datatypes:

sales_data.dtypes

OUT:

name        object
region      object
sales       object
expenses    object
dtype: object

Once again, notice that all of the columns have the object datatype.

Change datatype

Now, we’ll change the datatype of multiple columns.

To do this, we’ll use a dictionary with the name of the variable and the datatype as the 'key':'value' pairs of our dictionary.

(We’ll also call the .dtypes attribute after the astype operation, so we can see the new datatypes)

sales_data.astype({'region':'category'
                   ,'sales':'int32'
                   ,'expenses':'int32'
                   }
                  ).dtypes

OUT:

name          object
region      category
sales          int32
expenses       int32
dtype: object
Explanation

Notice in the output that the datatypes of 3 columns have been changed:

  • region was changed to category
  • sales was changed to int32
  • expenses was also changed to int32

How did we do it?

We called the astype method and inside the parenthesis, we provided a dictionary.

The dictionary contained several key/value pairs of the form 'column':'datatype'.

That’s really it … you just need to provide a dictionary with the column names and the new data types.

Frequently asked questions about Pandas replace

Now that we’ve looked at some examples, let’s look at some common questions about the replace() technique.

Frequently asked questions:

Question 1: I used astype, but my dataframe is unchanged. Why?

If you use the astype method, you might notice that your original dataframe remains unchanged after you call the method.

For example, in example 1, we used the following code:

sales_data.astype({'sales':'int32'})

But if you check the datatypes of sales_data after you run the code, you’ll realize that the datatype of sales is unchanged.

That’s because when we run the astype() method, it produces a new dataframe, and leaves the original dataframe unchanged.

This is how most Pandas methods work.

By default, the output of the method is sent to the console. We can see the output in the console, but to save it, we need to store it with a name.

For example, you could store the output like this:

sales_data_updated = sales_data.astype({'sales':'int32'})

You can name the new output whatever you want. You could even name it with the original name sales_data.

Just be careful. If you reassign the output of astype to the original variable name, it will overwrite your original dataset. Make sure that you check your code so it works properly before you overwrite an input dataframe.

Leave your other questions in the comments below

Do you have any other questions about the Pandas astype method?

Is there something else that you need to know that I haven’t covered here?

If so, leave your question in the comments section below.

Discover how to become ‘fluent’ in Pandas

This tutorial showed you how to use the Pandas astype method, but if you want to master data wrangling with Pandas, there’s a lot more to learn.

So if you want to master data wrangling in Python, and become ‘fluent’ in Pandas, then you should join our course, Pandas Mastery.

Pandas Mastery is our online course that will teach you these critical data manipulation tools, show you how to memorize the syntax, and show you how to put it all together.

You can find out more here:

Learn More About Pandas Mastery

Joshua Ebner

Joshua Ebner is the founder, CEO, and Chief Data Scientist of Sharp Sight.   Prior to founding the company, Josh worked as a Data Scientist at Apple.   He has a degree in Physics from Cornell University.   For more daily data science advice, follow Josh on LinkedIn.

2 thoughts on “How to Use the Pandas Astype Function in Python”

  1. Very clear and concise explanations. Thank you very much.

    For some reason I did not received the email to confirm subscription.

    Regards,

    Reply

Leave a Comment