Pandas Map, Explained

This tutorial will explain how to use the Pandas map method to replace values in a Pandas Series.

The tutorial will explain what the method does, how the syntax works, and will show you a few step-by-step examples of how to use it.

If you need something specific, just click on one of the following links.

Table of Contents:

A Quick Introduction to Pandas Map

Let’s start with a quick review of the Pandas map method.

In Python, the Pandas map method substitutes the values in a Pandas series object according to a mapping that we define.

Confused?

Let me try to explain with a quick visual example.

Let’s say that we have a Pandas series with several values … for example, the abbreviated days of the week.

And let’s say that we want to transform those input values to different values. In this case, let’s say that we want to transform the abbreviated day names to the full names.

An image that shows how the Pandas map transforms the values in a Pandas series to corresponding new values.

You can use the Pandas map function to do this.

Pandas map takes the values of an input Pandas series, and re-assigns new values based on a “mapping”; based on a set of corresponding old values and new values.

Its really just a way to re-code values in a Pandas series.

That’s conceptually how it works.

It will be easier to understand after you see some examples. I’ll show you some concrete examples in the examples section.

But first, we need to look at the syntax.

The Syntax of Pandas Map

Here, we’ll look at the syntax of Pandas map.

A quick note

Just a quick reminder.

The syntax that I explain here will assume that you’ve imported Pandas.

You can import Pandas with this code:

import pandas as pd

It will also assume that you already have a Pandas series object. We’ll create some series objects in the examples section … just be aware that a pre-existing series object is necessary for the syntax to work.

Pandas Map syntax

The syntax of Pandas map is fairly simple.

You type the name of the series object that you want to operate on.

Then you type .map() to call the method.

An explanation of the syntax of the Pandas map method.

The first argument to the method is the data mapping that connects the old input values to the new values you want to output.

There’s also an optional parameter called na_action.

Let’s quickly review these.

The Parameters of Pandas map

As noted above, there are only two inputs to the map method:

  • data_mapping
  • na_action
data_mapping (required)

The so-called “data mapping” is an object that shows how the old values (i.e., the values in the series that you’re operating on) should be transformed. It shows the correspondence between the old values and the new values.

This mapping object can take several forms:

  • a Pandas series
  • a Python dictionary
  • a function

Personally, I most commonly use dictionaries.

If you use a dictionary for this mapping object, the “key” of the dictionary entries should be the old value, and the “value” if each dictionary entry should be the new value.

I’ll show you some examples of this in the examples section.

na_action (optional)

The na_action parameter controls how Pandas map handles NaN values.

By default, this is set to na_action = None.

With this default setting, there are some cases where NaN values are not propagated to the output, and instead are treated as strings. This can cause some strange or otherwise unwanted output. I’ll show you an example of this in example 3.

To correct this erroneous handling of NaN values, you can set na_action = 'ignore'. This will propagate the NaN values into the output. I’ll show you this parameter setting in example 4.

Examples of How to Use Pandas Map

Now that you understand what Pandas map does, and now that you’ve seen how the syntax works, let’s take a look at some examples.

Examples:

Run this code first to Import Pandas

Before you run the examples, you need to import Pandas.

import pandas as pd

Remember: this enables us to call Pandas tools with the prefix ‘pd‘.

EXAMPLE 1: Map the values in a Pandas series to new values, using a dictionary of mappings

In this example, I’ll show you how to use Pandas map to recode the values in a Pandas series.

Specifically, we’re going to transform month abbreviations to the full month names.

This will require several steps:

  • create Pandas series
  • create the mapping from abbreviations to full names
  • use the map method to substitute the full names for the abbreviations

Let’s do each of those.

Create Pandas Series

First, we’ll just create a Pandas series object using the pd.Series() function.

months = pd.Series(['Jan','Feb','Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec'])

Here, we’ve stored the series with the name months.

The series contains the abbreviations for the months of the year.

Let’s quickly print it, so you can see the contents:

print(months)

OUT:

0     Jan
1     Feb
2     Mar
3     Apr
4     May
5     Jun
6     Jul
7     Aug
8     Sep
9     Oct
10    Nov
11    Dec
dtype: object

Create Mapping

Next, we’re going to create a mapping that relates the month abbreviations to the full month names.

To do this, we’re going to use a dictionary. The ‘keys’ of the dictionary will be the abbreviations, and the ‘values’ will be corresponding full month name.

mapping_month_abbr_to_name = {'Jan':'January'
                              ,'Feb':'February'
                              ,'Mar':'March'
                              ,'Apr':'April'
                              ,'May':'May'
                              ,'Jun':'June'
                              ,'Jul':'July'
                              ,'Aug':'August'
                              ,'Sep':'September'
                              ,'Oct':'October'
                              ,'Nov':'November'
                              ,'Dec':'December'}

Essentially, every item in the dictionary specifies the connection between the “old value” and “new value.”

Here, I’ve called the mapping mapping_month_abbr_to_name.

(Note: always use clear, easy-to-understand object names. Write clean code!)

Map Series Values to New Values

Now that we have our Pandas series and our mapping, we’re going to use the Pandas map() method to map the old values to the new values.

months.map(mapping_month_abbr_to_name)

OUT:

0       January
1      February
2         March
3         April
4           May
5          June
6          July
7        August
8     September
9       October
10     November
11     December
dtype: object

Explanation

So what happened here?

The map() method substitutes old values in a Series object with new values.

The connection between old and new values is defined by our mapping, which we created here with a dictionary.

Note: the values in the original series will still be the same

One more note: if you check the months object after you run the mapping code above, you’ll notice that the values are unchanged.

That’s because by default, the output of Pandas map is sent to the console, unless we store the output.

To capture the new values that have been created by Pandas map, you should store it with variable name.

You can even store it with the same variable name, like this:

months = months.map(mapping_month_abbr_to_name)

But be careful when you do this. It will overwrite your data!

EXAMPLE 2: Use Pandas Map with Some Missing Mappings (i.e., NaN output values)

Next, I want to show you an example where we use Pandas map, but there are some values in the original data for which there are not mappings.

If we don’t have a new, mapped value in our mapping, then the .map() method will output the value NaN.

Let’s take a look.

Create a Series Object

First, let’s create a Pandas series object that we can operate on.

Here, I’m going to create a series of names.

names = pd.Series(['Will','Josh','Kat', 'Sofia'])

Notice that most of these names are “short form” names: Will is often short for William. Josh is short for Joshua. And so on.

This will be important, as we can use .map() to replace some of these names.

Create Mapping

Now, we’ll create a mapping that maps from the short form of the name to the long form of the name.

Remember: we can use a dictionary for this. The “key” of the dictionary entry is the old value, and the “value” of the dictionary is the new output that we want to produce with our mapping.

names_short_to_long = {'Will':'William'
                       ,'Josh':'Joshua'
                       ,'Kat':'Katherine'
                       }

Notice that in this mapping, we have mappings for three of the names (Will, Josh, and Kat), but we do not have a mapping for the fourth, Sophia.

This will be relevant.

Map Old Values to New Values

Next, we’ll use Pandas map() to map the short-form names to the long-form names.

names.map(names_short_to_long)

OUT:

0      William
1       Joshua
2    Katherine
3          NaN
dtype: object

Notice that three of the names have been successfully mapped to the new values, but one of the outputs is NaN.

Explanation

The important thing to note in this example is that one of the output values was NaN.

Why?

The mapping that we created – the dictionary of old names to new names that we called names_short_to_long – was missing a mapping for the name ‘Sophia’.

If we use Pandas map, any input value that’s missing a mapped value will be replaced with NaN.

That may be fine … but if not, if you run Pandas map and have missing values, you may need to go back and add additional mappings to your dictionary.

EXAMPLE 3: Map the values in a Pandas series to new values, using a function

Next, we’ll use the Pandas map method to apply a Python function.

Specifically, we’re going to apply the format function to modify the string values in our Series.

Create Series (days of the week)

First, we’ll create a new Pandas series object that contains the days of the week.

days = pd.Series(['Monday'
                  ,'Tuesday'
                  ,'Wednesday'
                  ,'Thursday'
                  ,'Friday'
                  ,'Saturday'
                  ,'Sunday'
                  ,np.NaN])

Notice that one of the values is NaN. That will cause a small problem which we’ll fix in the next example.

Use Pandas Map with a Function

Now, we’ll use Pandas map with a function: the format function.

days.map('{} is a day of the week.'.format)

OUT:

0       Monday is a day of the week.
1      Tuesday is a day of the week.
2    Wednesday is a day of the week.
3     Thursday is a day of the week.
4       Friday is a day of the week.
5     Saturday is a day of the week.
6       Sunday is a day of the week.
7          nan is a day of the week.
dtype: object

Explanation

Notice that the map method has applied the format function to modify the string values (here, it’s turning them into full sentences).

Also note that one of the input values was np.nan.

Here, map has applied the format function to that NaN value, as if it was a proper sting value.

That may not be what you want, so we’ll fix that in the final example.

EXAMPLE 4: Ignore NaN Values When You Use Map With a Function

In example 3, we used the map method with a function, to modify our input values using that function.

As I noted in that example, if there are any NaN values in the input series, map will treat those values as valid values by default, which can lead to some inappropriate outputs.

You can fix that behavior using the na_action parameter.

Let’s take a look.

Create Series (days of the week)

First, we’ll create a Pandas series object that contains the days of the week.

This is the same Pandas series that we created in example 4, so if you already created it there, this will just be a duplicate.

days = pd.Series(['Monday'
                  ,'Tuesday'
                  ,'Wednesday'
                  ,'Thursday'
                  ,'Friday'
                  ,'Saturday'
                  ,'Sunday'
                  ,np.NaN])

Notice that the last value in the series is np.NaN. This will be important for our example.

Apply Function, and Ignore NaN Values

You should have noticed in example 3 that by default, Pandas map will operate on NaN values. In example 3, this lead to a strange output for one of the values: "nan is a day of the week."

We want map to ignore that NaN input, and produce a proper NaN value as the corresponding output.

To do that, we can set na_action='ignore'.

days.map('{} is a day of the week.'.format, na_action = 'ignore')

OUT:

0       Monday is a day of the week.
1      Tuesday is a day of the week.
2    Wednesday is a day of the week.
3     Thursday is a day of the week.
4       Friday is a day of the week.
5     Saturday is a day of the week.
6       Sunday is a day of the week.
7                                NaN

Explanation

Notice that in the output for this example, the final output value is NaN.

This is in contrast to example 3, where the output was "nan is a day of the week."

The difference is that here, we used na_action = 'ignore' to force Pandas map to ignore the np.nan input value. Instead of attempting to process it like string, it treated it as a proper NaN value, and propogated the NaN value to the output.

Frequently Asked Questions About Pandas Map

Now that you’ve learned about the Pandas map method and seen some examples, let’s review some frequently asked questions.

Frequently asked questions:

Question 1: After I ran the map method, my Series is still the same … why?

Remember: most Pandas methods do not modify the data object directly. They do not modify the object “in place.”

Rather, most Pandas methods produce a new object.

By default, if you’re working in an IDE or similar environment, that new object will be sent to the console. So you’ll see the new output data, but it won’t be saved by default.

The way to get around this is to use the equals sign to assign the output to a variable name.

So let’s re-do the operation from example 1, but at the end, we’ll assign the output of Pandas map back to the variable name months.

months = pd.Series(['Jan','Feb','Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec'])

mapping_month_abbr_to_name = {'Jan':'January'
                              ,'Feb':'February'
                              ,'Mar':'March'
                              ,'Apr':'April'
                              ,'May':'May'
                              ,'Jun':'June'
                              ,'Jul':'July'
                              ,'Aug':'August'
                              ,'Sep':'September'
                              ,'Oct':'October'
                              ,'Nov':'November'
                              ,'Dec':'December'}

months = months.map(mapping_month_abbr_to_name)

Do you see that last line? We’re just taking the output of Pandas map, and using the equals sign to assign the output to the name months. That’s how to store the new transformed values.

Question 2: Why aren’t the NaN values in my data processing correctly?

Pandas map sometimes processes NaN values as strings.

This is particularly true if you’re using map with a function to apply the function to each value.

To get around this, you may need to set na_action = 'ignore'.

For more information on this, see example 4.

Leave your other questions in the comments below

Do you have other questions about the Pandas map method?

If so, leave your questions in the comments section below.

To learn more about Pandas, sign up for our email list

This tutorial should have given you a good introduction to the Pandas map technique, but if you really want to master data manipulation and data science in Python, there’s a lot more to learn.

So if you’re ready to learn more about Pandas and more about data science, then sign up for our email newsletter.

We publish FREE tutorials almost every week on:

  • Base Python
  • NumPy
  • Pandas
  • Scikit learn
  • Machine learning
  • Deep learning
  • … and more.

When you sign up for our email list, we’ll deliver these free tutorials directly to your inbox.

Joshua Ebner

Joshua Ebner is the founder, CEO, and Chief Data Scientist of Sharp Sight. Prior to founding the company, Josh worked as a Data Scientist at Apple. He has a degree in Physics from Cornell University.

Leave a Comment