How to use the Pandas sort_values method

This tutorial will show you how to use the Pandas sort_values method to sort a DataFrame by column.

The sort_values method is fairly straightforward to use, but this tutorial will explain everything step by step.

I’ll explain what the sort values method does. I’ll explain the syntax. And then I’ll provide clear, step-by-step examples of how the sort_values method works.

If you need something specific, you can click on any of the following links and it will take you to the appropriate section.

Table of Contents:

Again, if you’re looking for something specific, you can just click on one of the links.

But if you’re new to Pandas and not really sure how to do data manipulation in Python, you should really read the whole tutorial.

Ok. Let’s take a high level look at sort_values.

A quick introduction to the Pandas sort_values method

The sort_values method is a data manipulation tool from the Pandas package.

If you’re not completely familiar with it, the Pandas package is a data manipulation toolkit for the Python programming language.

Pandas has a few dozen tools for manipulating data. There are tools for renaming variables, subsetting rows of data, selecting DataFrame columns, and a variety of other data manipulation tasks. Pandas is really an integrated toolkit for cleaning and shaping data.

One of the most common data manipulation tasks is sorting data, and Pandas has a tool for that as well.

sort_values sorts the columns of a dataframe

The sort_values method is a Pandas method for sorting the columns of a DataFrame.

That’s really all it does!

But there are a few details about how the function works that you should know about.

To really understand the details of the sort_values method, you need to understand the syntax.

Let’s take a look.

The syntax of Pandas sort_values

The syntax of the Pandas sort_values method is fairly straightforward.

When you call the method, you first need to type the name of your DataFrame. That means that before you call the method, you first need an actual DataFrame. For more information about DataFrames, I recommend our tutorial about Pandas DataFrames.

An image that explains the syntax of the Pandas sort_values method.

After you type the name of the DataFrame, you need to type a dot (“.“) and then the name of the method: sort_values.

Then, inside of the parenthesis there will be a group of arguments and parameters that enable you to specify exactly how to sort the data.

Let’s look at the parameters of sort_values more carefully.

The parameters of sort_values

The sort_values method actually has several parameters, but we’re going to focus on three:

  • by
  • ascending
  • inplace

These three parameters are the most common, and they are the ones you will use most often.

The method also has 3 other parameters – axis, kind, and na_position. These are less common, so we’re not going to look at them in detail (although I will mention axis and kind in the FAQ section towards the end.).

Ok … I’m going to quickly explain the by, ascending, and inplace parameters.

by

The by parameter specifies the column or columns to sort by.

You can provide a name of an individual column or a list of names.

When you provide a single column name, you need to provide the column name in the form of a Pandas string (i.e., the column name needs to be enclosed inside of quotation marks).

When you provide several column names, you need to organize them inside of a list.

I’ll show you examples of both of these in the examples section of the tutorial.

Note that this is required. You must provide an argument to this parameter.

ascending

The ascending parameter enables you to specify whether or not you want to sort the DataFrame in ascending order or descending order.

If you set ascending = True, then sort_values will sort the data in ascending order (i.e., from lowest to highest).

If you set ascending = False, the sort_values will sort the data in descending order (i.e., from highest to lowest).

This parameter is not required. If you don’t use this parameter, it will default to ascending = True and sort your DataFrame in ascending order.

inplace

This parameter specifies whether or not you want to sort the DataFrame “in place.”

By default, this parameter is set to inplace = False.

When inplace = False, the method will create a new DataFrame as the output, and leave the original DataFrame unchanged.

As a side note, that’s actually a good thing, because that way you won’t accidentally overwrite your data!

However, if you set inplace = True, the Pandas sort values method will directly sort the original DataFrame. Be careful! This mean that when you set set inplace = True, sort_values will overwrite your original dataset!

The output of sort_values

As I just mentioned, the sort_values method outputs a new Pandas DataFrame by default.

However, if you set inplace = True, the sort_values method will directly sort the original DataFrame.

Examples: how sort a DataFrame by column with sort_values

Now that you’ve learned the syntax of Pandas sort_values, let’s take a look at some examples.

Examples:

Run this code first

Before you run the example code, you need to make sure that you do two things.

You need to import Pandas and you need to create the DataFrame that we’re going to use.

Import Pandas

You can run this code to import pandas:

import pandas as pd

This imports the Pandas package with the prefix pd.

Create dataframe

Next, let’s create a simple DataFrame.

Here, we’re going to create a small DataFrame that contains some dummy sales data.

To do this, we’re just going to call the pd.DataFrame() function and provide the data that we want to turn into a DataFrame.

sales_data = pd.DataFrame({"name":["William","Emma","Sofia","Markus","Edward","Thomas","Ethan","Olivia","Arun","Anika","Paulo"]
,"region":["East","North","East","South","West","West","South","West","West","East","South"]
,"sales":[50000,52000,90000,34000,42000,72000,49000,55000,67000,65000,67000]
,"expenses":[42000,43000,50000,44000,38000,39000,42000,60000,39000,44000,45000]})

And let’s print it out so you can see the contents:

print(sales_data)

OUT:

       name region  sales  expenses
0   William   East  50000     42000
1      Emma  North  52000     43000
2     Sofia   East  90000     50000
3    Markus  South  34000     44000
4    Edward   West  42000     38000
5    Thomas   West  72000     39000
6     Ethan  South  49000     42000
7    Olivia   West  55000     60000
8      Arun   West  67000     39000
9     Anika   East  65000     44000
10    Paulo  South  67000     45000

The DataFrame contains 11 rows and four columns. The columns are name, region, sales, and expenses.

We’ll be able to use several of these variables as our sorting variables.

For more information about how to create DataFrames like this, you can read our Pandas DataFrame tutorial.

EXAMPLE 1: Sort dataframe by a single column

For our first example, we’re going to start simple.

Here, we’re going to sort by a single variable, the sales variable.

To do this, we’ll simply call the sort_variables() by typing the name of the DataFrame, and then call the method using “dot” notation. Inside of the method, we specify our sort variable by simply typing the variable name as a string (i.e., enclosed in quotation marks).

Here’s the code:

sales_data.sort_values('sales')

And here is the output when you run the code:

       name region  sales  expenses
3    Markus  South  34000     44000
4    Edward   West  42000     38000
6     Ethan  South  49000     42000
0   William   East  50000     42000
1      Emma  North  52000     43000
7    Olivia   West  55000     60000
9     Anika   East  65000     44000
8      Arun   West  67000     39000
10    Paulo  South  67000     45000
5    Thomas   West  72000     39000
2     Sofia   East  90000     50000

Notice that the output DataFrame is sorted by the sales variable, from the lowest value to the highest value.

Remember: by default, the ascending parameter is set to ascending = True. Because we did not manually set the ascending variable, the method defaulted to ascending = True, so the output is sorted by sales in ascending order.

EXAMPLE 2: sort a DataFrame by multiple variables

Next, let’s make things a little more complicated.

Here, we’re going to sort our DataFrame by multiple variables.

The syntax is almost exactly the same as the code for the previous example.

However, instead of providing a single variable to the method, we’re going to provide a list of variables.

Let’s take a look at the syntax and run it, and then I’ll explain more.

sales_data.sort_values(['region','sales'])

And here is the output DataFrame:

       name region  sales  expenses
0   William   East  50000     42000
9     Anika   East  65000     44000
2     Sofia   East  90000     50000
1      Emma  North  52000     43000
3    Markus  South  34000     44000
6     Ethan  South  49000     42000
10    Paulo  South  67000     45000
4    Edward   West  42000     38000
7    Olivia   West  55000     60000
8      Arun   West  67000     39000
5    Thomas   West  72000     39000

Notice that the output is sorted by two variables. It’s sorted alphabetically by region, and then within each region, the rows are sorted by sales.

This output corresponds to the syntax. When we called the method, we provided a list of two variables as the input. The method call was .sort_values(['region','sales']). Notice that inside of the method call, we have a Python list that contains the two variable names.

The output is simply sorted by those two variables.

Keep in mind that here, we sorted by two variables. We could sort by three (try it yourself). Or if you have a very large DataFrame with many variables, you can sort by a very large number of variables, if you need to. Just provide the sorting variables to the method in the form of a Python list.

EXAMPLE 3: sort a Pandas DataFrame in descending order

Now, let’s sort our DataFrame in descending order.

Remember that by default, the ascending parameter is set to ascending = True. That automatically sorts the data in ascending order.

To change that and sort in descending order, we’re going to set ascending = False.

Here’s the code:

sales_data.sort_values('sales', ascending = False)

And here’s the output:

       name region  sales  expenses
2     Sofia   East  90000     50000
5    Thomas   West  72000     39000
8      Arun   West  67000     39000
10    Paulo  South  67000     45000
9     Anika   East  65000     44000
7    Olivia   West  55000     60000
1      Emma  North  52000     43000
0   William   East  50000     42000
6     Ethan  South  49000     42000
4    Edward   West  42000     38000
3    Markus  South  34000     44000

Notice that in this case, the output are sorted by sales in descending order … i.e., from highest to lowest.

EXAMPLE 4: sort a Pandas DataFrame in place

Finally, we’re going to sort our data “in place”.

That might be a little confusing, so let me quickly explain.

Notice that whenever we’ve run our sort code in the previous examples, the code creates a new DataFrame as an output. If you’re working in Spyder or a similar Python IDE, the output is just sent right to the console window.

However, if, after running the above examples, you print out the original DataFrame with the code print(sales_data), you’ll find that the original dataframe (sales_data) is exactly the same. Unsorted.

Syntactically, this is because the inplace parameter is set to inplace = False by default. If we do not reference the inplace parameter in our code, it just defaults to False. That means that the method does not modify the original DataFrame (e.g., sales_data) directly. It does not modify the original DataFrame “in place”. Instead, if inplace is set to False, it just creates a new dataset.

What if we want to modify the original DataFrame though?

Easy.

We just set inplace = True.

Let’s take a look.

Here, before we modify our DataFrame “in place”, we’re going to create a copy.

This is just so we don’t overwrite the original DataFrame. (I want to leave the original DataFrame intact for you, so we’ll just create a copy that we can modify in place.)

sales_data_copy = sales_data.copy()

And now, here is the code to modify the data in place:

sales_data_copy.sort_values('sales', inplace = True)

After you run the code, there should not be any direct output. This is because the sort_values method has modified sales_data_copy directly.

That being the case, to see the sorted DataFrame, we need to print it out.

We can do that with the following code:

print(sales_data_copy)

OUT:

       name region  sales  expenses
3    Markus  South  34000     44000
4    Edward   West  42000     38000
6     Ethan  South  49000     42000
0   William   East  50000     42000
1      Emma  North  52000     43000
7    Olivia   West  55000     60000
9     Anika   East  65000     44000
8      Arun   West  67000     39000
10    Paulo  South  67000     45000
5    Thomas   West  72000     39000
2     Sofia   East  90000     50000

Notice that the sales_data_copy DataFrame is sorted on sales from low to high.

Because we sorted the data in place, that change to sales_data_copy is permanent.

At the same time though, the original sales_data DataFrame is still unchanged.

Frequently asked questions about Pandas sort_values

Ok …. let’s take a look at a couple questions about the sort_values method.

Frequently asked questions:

Question 1: Is it possible to sort the data along axis 1?

So far in this tutorial, we’ve been using the sort_values method to sort the rows of data.

Technically, sorting the rows is equivalent to saying that we’re sorting axis 0 of the DataFrame. (This is a bit technical … you need to understand Pandas axes to understand this.)

I haven’t mentioned though, that it’s actually possible to sort the columns as well. You can use sort_values to sort the columns of a DataFrame just like you can sort the rows.

To do this, you can set the axis parameter to axis = 1 in your code.

Having said that, it’s extremely rare to have to do this.

If you have a typical DataFrame with rows and columns, you should almost never need to set axis = 1.

Question 2:How can you sort by ascending order and descending order for different variables?

What if you want to sort by one variable in ascending order and sort by a second variable in descending order. Is that possible?

Yes.

You can just past a list of boolean values to the ascending parameter.

Try it!

You can run this code on the sales_data DataFrame we created above:

sales_data.sort_values(['region','sales'], ascending = [False, True])
Leave your other questions in the comments below

Do you have more questions about the Pandas sort_values method?

Leave your questions in the comments section below.

Join our course to learn more about Pandas

If you’re serious about learning Pandas, you should enroll in our premium Pandas course called Pandas Mastery.

Pandas Mastery will teach you everything you need to know about Pandas, including:

  • How to subset your Python data
  • Data aggregation with Pandas
  • How to reshape your data
  • and more …

Moreover, it will help you completely master the syntax within a few weeks. You’ll discover how to become “fluent” in writing Pandas code to manipulate your data.

Find out more here:

Learn More About Pandas Mastery

Joshua Ebner

Joshua Ebner is the founder, CEO, and Chief Data Scientist of Sharp Sight.   Prior to founding the company, Josh worked as a Data Scientist at Apple.   He has a degree in Physics from Cornell University.   For more daily data science advice, follow Josh on LinkedIn.

Leave a Comment