In this tutorial, I’ll explain how to use the Pandas rename method to rename columns in a Python dataframe.
I’ll explain what the technique does, how the syntax works, and I’ll show you clear examples of how to use it.
If you need something specific, you can click on any of the following links.
Table of Contents:
Ok. Let’s start with a quick introduction to the rename method.
A quick introduction to Pandas rename
The Pandas rename method is fairly straight-forward: it enables you to rename the columns or rename the row labels of a Python dataframe.
This technique is most often used to rename the columns of a dataframe (i.e., the variable names).
But again, it can also rename the row labels (i.e., the labels in the dataframe index).
I’ll show you examples of both of these in the examples section.
But first, let’s take a look at the syntax.
The syntax of Pandas rename
Ok, now that I’ve explained what the Pandas rename method does, let’s look at the syntax.
Here, I’ll show you the syntax for how to rename Pandas columns, and also how to rename Pandas row labels.
A quick note
Everything that I’m about to describe assumes that you’ve imported Pandas and that you already have a Pandas dataframe created.
You can import pandas with the following code:
import pandas as pd
And if you need a refresher on Pandas dataframes and how to create them, you can read our tutorial on Pandas dataframes.
Syntax to rename Pandas columns
Ok, let’s start with the syntax to rename columns. (The syntax for renaming columns and renaming rows labels is almost identical, but let’s just take it one step at a time.)
When we use the rename method, we actually start with our dataframe. You type the name of the dataframe, and then .rename()
to call the method.
Inside the parenthesis, you’ll use the columns
parameter, which enables you to specify the columns that you want to change.
How to use the columns parameter
Let’s look carefully at how to use the columns parameter.
When you change column names using the rename method, you need to present the old column name and new column name inside of a Python dictionary.
So if the old variable name is old_var
and the new variable name is new_var
, you would present to the columns
parameter as key/value pairs, inside of a dictionary: columns = {'old_var':'new_var'}
.
If you have multiple columns that you want to change, you simply provide multiple old name/new name dictionary pairs, separated by commas.
To do this properly, you really need to understand Python dictionaries, so if you need a refresher, then read about how dictionaries are structured.
Syntax to rename Pandas row labels
Now, let’s look at the syntax for renaming row labels (i.e., the labels in the dataframe index).
You’ll notice that the syntax is almost exactly the same as the syntax for changing the column names.
The major difference is that instead of using the columns
parameter, when you want to change the row labels, you need to use the index
parameter instead.
Beyond that you’ll still use a dictionary with old name/new name pairs as the argument to the index
parameter.
The parameters of Pandas rename
Now, let’s take a look at the parameters of the rename method.
The most important parameters that you should know are:
columns
index
inplace
The rename method also has additional parameters – axis
, copy
, levels
, and errors
– but these are more advanced and less commonly used, so I won’t cover them here.
columns
The columns
parameter enables you to specify the column names you want to change, and what to change them to.
As described above, the argument to this parameter can be a dictionary, but it can also be a function.
When you provide a dictionary, it the values should be structured as old name/new name pairs, like this: {'old_var':'new_var'}
.
index
The index
parameter is very similar to the columns
parameter, except it operates on row index labels instead of column names.
So the index
parameter enables you to specify the row labels you want to change, and what to change them to.
As described above, the argument to this parameter can be a dictionary, but it can also be a function.
When you provide a dictionary, it the values should be structured as old name/new name pairs, like this: {'old_var':'new_var'}
.
inplace
The inplace parameter enables you to force the rename method to directly modify the dataframe that’s being operated on.
By default, inplace
is set to inplace = False
. This causes the rename method to produce a new dataframe. In this case, the original dataframe is left unchanged.
If you set inplace = True
, the rename method will directly alter the original dataframe, and overwrite the data directly. Be careful with this, and make sure that your code is doing exactly what you want it to.
The output of Pandas rename
By default, the rename method will output a new Python dataframe, with new column names or row labels. As noted above, this means that by default, rename will leave the original dataframe unchanged.
If you set inplace = True
, rename won’t produce any new output. In this case, rename will instead directly modify and overwrite the dataframe that’s being operated on.
Examples: How to rename Pandas columns and Pandas row labels
Now that we’ve looked at the syntax, let’s look at some examples.
Examples:
- Rename one dataframe column
- Rename multiple dataframe columns
- Change row labels
- Change the column names and row labels ‘in place’
However, before you run the examples, you’ll need to run some preliminary code to import the modules we need, and to create the dataset we’ll be operating on.
Import modules
First, let’s import Pandas.
You can do that with the following code:
#=============== # IMPORT MODULES #=============== import pandas as pd
Notice that we’re importing Pandas with the alias pd
. This makes it possible to refer to Pandas as pd
in our code, which is the common convention among Python data scientists.
Create DataFrame
Next, we’ll create a dataframe that we can operate on.
We’ll create our dataframe in three steps:
- create a Python dictionary
- create a Pandas dataframe from the dictionary
- specify the columns and row labels (i.e., the index)
Let’s start by creating a Python dictionary. As you can see, this dictionary contains economic data for several countries.
#========================== # CREATE DICTIONARY OF DATA #========================== country_data_dict = { 'country_code':['USA', 'CHN', 'JPN', 'GER', 'UK', 'IND'] ,'country':['USA', 'China', 'Japan', 'Germany', 'UK', 'India'] ,'continent':['North America','Asia','Asia','Europe','Europe','Asia'] ,'gross_domestic_product':[19390604, 12237700, 4872137, 3677439, 2622434, 2597491] ,'pop':[322179605, 1403500365, 127748513, 81914672, 65788574, 1324171354] }
Next, we’ll create a DataFrame from the dictionary:
#================================= # CREATE DATAFRAME FROM DICTIONARY #================================= country_data = pd.DataFrame(country_data_dict, columns = ['country_code','country', 'continent', 'gross_domestic_product', 'pop'])
Note that in this step, we’re setting the column names using the columns
parameter inside of pd.DataFrame().
Finally, we’ll set the row labels (i.e., the index). By default, Pandas uses a numeric integer index that starts at 0.
We’re going to change that default index to character data.
Specifically, we’re going to use the values in the country_code
variable as our new row labels.
To do this, we’ll use the Pandas set_index method:
country_data = country_data.set_index('country_code')
Notice that we’re assigning the output of set_index()
to the original dataframe name, country_data
, using the equal sign. This is because by default, the output of set_index()
is a new dataframe object by default. By default, set_index()
does not modify the DataFrame in place.
Ok. Now that we have our dataframe, let’s print it out to look at the contents:
print(country_data)
OUT:
country continent gross_domestic_product pop country_code USA USA North America 19390604 322179605 CHN China Asia 12237700 1403500365 JPN Japan Asia 4872137 127748513 GER Germany Europe 3677439 81914672 UK UK Europe 2622434 65788574 IND India Asia 2597491 1324171354
This dataframe has economic data for six countries, organized in a row-and-column structure.
The dataframe has four columns: country
, continent
, gross_domestic_product
, and pop
.
Additionally, notice that the “country_code
” variable is set aside off to the left. That’s because we’ve set the country_code
column as the index. The values of country_code
will now function as the row labels of the dataframe.
Now that we have this dataframe, we’ll be able to use the rename()
method to rename the columns and the row labels.
EXAMPLE 1: Rename one dataframe column
First, let’s start simple.
Here, we’re going to rename a single column name.
Specifically, we’re going to rename the gross_domestic_product
variable to GDP
.
Let’s run the code, and then I’ll explain.
country_data.rename(columns = {'gross_domestic_product':'GDP'})
OUT:
country continent GDP pop country_code USA USA North America 19390604 322179605 CHN China Asia 12237700 1403500365 JPN Japan Asia 4872137 127748513 GER Germany Europe 3677439 81914672 UK UK Europe 2622434 65788574 IND India Asia 2597491 1324171354
Explanation
Notice in the output that gross_domestic_product
has been renamed to GDP
.
How did we do it?
We typed the name of the dataframe, country_data
, and then used so-called “dot syntax” to call the rename()
method.
Inside the parenthesis, we’re using the columns
parameter to specify the columns we want to rename, and the new name that we want to use. This ‘old name’/’new name’ pair is presented as a dictionary in the form {'old name':'new name'}
.
So we have the synax columns = {'gross_domestic_product':'GDP'}
, which is basically saying change the column name 'gross_domestic_product'
to 'GDP'
.
Remember: the original dataframe is unchanged
One more thing to point out here: when we run this code, the original dataframe will remain unchanged.
That’s because by default, the Pandas rename method produces a new dataframe as an output and leaves the original unchanged. And by default, this output will be sent directly to the console. So when we run our code like this, we’ll see the new dataframe with the new name in the console, but the original dataframe will be left the same.
If you want to save the output, you can use the assignment operator like this:
country_data_new = country_data.rename(columns = {'gross_domestic_product':'GDP'})
Here, I’ve given the output dataframe the new name country_data_new
.
We can call it anything we like. We could even call it country_data
. Just be careful if you do country_data_new = country_data.rename(...)
, it will overwrite your original dataset. Make sure your code works perfectly before you do this!
EXAMPLE 2: Rename multiple columns
Next, let’s make things a little more complicated.
Here, we’ll rename multiple columns at the same time.
The way to do this is very similar to the code in example 1, except here, we’ll provide more old name/new name pairs in our dictionary.
Specifically, we’ll rename gross_domestic_product
to GDP
, and we’ll rename pop
to population
.
Let’s take a look.
country_data.rename(columns = {'gross_domestic_product':'GDP', 'pop': 'population'})
OUT:
country continent GDP population country_code USA USA North America 19390604 322179605 CHN China Asia 12237700 1403500365 JPN Japan Asia 4872137 127748513 GER Germany Europe 3677439 81914672 UK UK Europe 2622434 65788574 IND India Asia 2597491 1324171354
Explanation
This should make sense if you understood example 1.
Here, we’re calling the rename()
method using dot syntax.
Inside the parenthesis, we have the code columns = {'gross_domestic_product':'GDP', 'pop': 'population'}
.
Look inside the dictionary (i.e., inside the curly brackets). Here, we have two old name/new name pairs. These are organized as key/value pairs, just as we normally have inside of a dictionary.
This should be simple to understand, as long as you understand how dictionaries are structured. If you don’t make sure to review Python dictionaries.
EXAMPLE 3: Rename row labels
Now, let’s rename some of the row labels.
Specifically, we’re going to rename the labels GER
and UK
.
To do this, we’re going to use the index
parameter.
Let’s take a look.
country_data.rename(index = {'GER':'DEU','UK':'GBR'})
OUT:
country continent gross_domestic_product pop country_code USA USA North America 19390604 322179605 CHN China Asia 12237700 1403500365 JPN Japan Asia 4872137 127748513 DEU Germany Europe 3677439 81914672 GBR UK Europe 2622434 65788574 IND India Asia 2597491 1324171354
Explanation
Here, we’re renaming GER
to DEU
and we’re renaming UK
to GBR
.
To do this, we called the rename method and used the code index = {'GER':'DEU','UK':'GBR'}
inside the parenthesis.
The index
parameter enables us to specify the row labels that we want to change. And we’re using the dictionary as the argument, which contains the old value/new value pairs.
EXAMPLE 4: Change the column names and row labels ‘in place’
Finally, let’s change some of the columns and row labels ‘in place’.
As I mentioned in example 1, and in the syntax section, by default, the rename method leaves the original dataframe unchanged. That’s because by default, the inplace
parameter is set to inplace = False
. This causes the rename method to produce a new dataframe as the output, while leaving the original dataframe unchanged.
But sometimes, we actually want to modify the original dataframe directly.
To do this we can set inplace = True
.
Create copy of dataframe
Before we run our code, we’re actually going to make a copy of our data.
The reason is that we’re going to directly overwrite a dataframe. This can be dangerous if you get it wrong, so we’re actually going to work with a copy of the original.
country_data_copy = country_data.copy()
Now, we have a dataframe, country_data_copy
, which contains the same data as the original.
Rename columns and labels in place
Now, we’re going to directly rename the columns and row labels of country_data_copy
.
To do this, we’ll use rename with the inplace
parameter as follows:
country_data_copy.rename(index = {'GER':'DEU','UK':'GBR'} ,columns = {'gross_domestic_product':'GDP', 'pop': 'population'} ,inplace = True )
And now, let’s print out the data, so we can see it.
print(country_data_copy)
OUT:
country continent GDP population country_code USA USA North America 19390604 322179605 CHN China Asia 12237700 1403500365 JPN Japan Asia 4872137 127748513 DEU Germany Europe 3677439 81914672 GBR UK Europe 2622434 65788574 IND India Asia 2597491 1324171354
Explanation
As you can see in the output the row labels and column names have been changed directly in country_data_copy
.
We simply did this by setting inplace = True
.
Again: be careful with this. When you do this, you’ll directly modify and overwrite your data. Check your code and double check again to make sure that your code works correctly before using inplace = True
.
Frequently asked questions about Pandas rename
Let’s quickly cover a common question about the Pandas rename technique.
Frequently asked questions:
Question 1: Why is my dataframe unchanged after I use the rename function?
Remember: by default, the Pandas rename function creates a new dataframe as an output, but leaves the original dataframe unchanged. (I mentioned this in the syntax section.)
The reason is that by default, the inplace parameter is set to inplace = False
. Again, this leaves the original dataframe unchanged, and simply produces a new dataframe as the output.
If you want to directly modify your original dataframe, you need to set inplace = False
. I show how to do this in example 4.
Leave your other questions in the comments below
Do you have any other questions about the Pandas rename method?
Is there something that you’re struggling with that I haven’t covered here?
If so, leave your questions in the comments section near the bottom of the page.
For more Pandas tutorials, sign up for our email list
This tutorial should have given you a good idea of how to rename columns in Python using the Pandas rename method.
But to really understand data manipulation in Python, you’ll need to know quite a few more techniques.
Moreover, to understand data science more broadly, there’s really a lot more to learn.
Having said that, if you want to learn data manipulation with Pandas, and data science in Python, then sign up for our email newsletter.
When you sign up, you’ll get free tutorials on:
- NumPy
- Pandas
- Base Python
- Scikit learn
- Machine learning
- Deep learning
- … and more.
We publish data science tutorials every week, and when you sign up for our email list, we’ll deliver those tutorials directly to your inbox.
When I want to explore the ‘pop’ column from DataFrame, I can not use the attribute syntax, that is it returns the whole DataFrame. Why?
Give me a working example and I can possibly answer. Don’t make me spend time trying to figure out what you mean.
Hello Joshua
I found that “pop” is a pandas dataframe method and could not get and return the values of pop feature. And so I understood that in such situation must use dict like value accessing to get the values of a feature or rename the column name.
of course, I understood that when the feature name is a valid python identifier, we can use attribute syntax (dot notation) to get values of that feature.