In this tutorial I’ll show you how to use the Pandas unique technique to get unique values from Pandas data.
I’ll explain the syntax, including how to use the two different forms of Pandas unique: the unique function as well as the unique method. (There are actually two different ways to use this technique in Pandas. I’ll show you both.)
If you’re here for something specific, you can click on any of the links below, and it will take you to the appropriate section of the tutorial.
Table of Contents:
- Introduction to Pandas Unique
- The syntax of Pandas Unique
- Example of the Pandas Unique technique
- Pandas Unique FAQ
But, if you read everything from start to finish, it will probably make more sense.
Having said that, let’s get started.
A Quick Introduction to Pandas Unique
The Pandas Unique technique identifies the unique values in Pandas series objects and other types of objects.
If you’re somewhat new to Pandas, that might not make sense, so let me quickly explain.
Pandas is a Data Manipulation Toolkit
Just a quick review for people who are new to Pandas: Pandas is a data manipulation toolkit for Python.
We use Pandas to retrieve, clean, subset, and reshape data in Python.
Pandas Tools Work on DataFrames and Series Objects
There are two main data structures in Pandas.
First, there is the Pandas dataframe, which is a row-and-column data structure. A dataframe is sort of like an Excel spreadsheet, in the sense that it has rows and columns. Dataframes look something like this:
The second major Pandas data structure is the Pandas Series.
A Pandas Series is like a single column of data.
It’s important to understand that we typically encounter and work with Pandas Series objects as part of a dataframe. When you retrieve or operate on a single column from a dataframe, it’s very frequently returned as a Series object.
Having said that, Series objects can also exist independently.
This is important, because when we use Pandas to work with Series objects, we sometimes do this with lone Series. But more often, we operate on Series objects that are part of a dataframe.
This is important to remember when we work with the Pandas unique technique.
Pandas Unique Identifies Unique Values
With all that being said, let’s return to the the Pandas Unique method.
The Pandas Unique technique identifies the unique values of a Pandas Series.
So if we have a Pandas series (either alone or as part of a Pandas dataframe) we can use the pd.unique()
technique to identify the unique values.
At a high level, that’s all the unique()
technique does, but there are a few important details.
With that in mind, let’s look at the syntax so you can get a clearer understanding of how the technique works.
The Syntax of Pandas Unique
Ok. Let’s take a look at the syntax.
The syntax is fairly simple and straightforward, but there are a few important details.
Notably, there are actually two different ways to use the unique()
technique. You can use unique()
as a Pandas function, but you can also use it as a method.
We’ll take a look at the syntax of each independently.
A quick note
One quick note: going forward, I’m going to assume that you’ve imported the Pandas library with the alias ‘pd’.
You can do that with the following code:
import pandas as pd
The syntax of pd.unique
Ok. Let’s start by taking a look at the pd.unique function.
When we use the unique function, we can call it like this:
We call the function as pd.unique()
.
Inside the parenthesis, we provide the name of the Series that we want to operate on. Keep in mind, that this can be an actual Series, but the function will also work if you provide an “array like” object, such as a Python list.
The syntax of the Pandas Unique Method
In the previous section, we looked at how to call the unique()
function.
Here, I’ll explain how to use unique as a method. (Remember, a method is like a function that’s associated with an object.)
When you use the method version, you start by typing the name of the Series object that you want to work with.
Next, you type a “dot,” and then the name of the method, unique()
.
When we use the Pandas unique method, we can use it on a lone Series object that exists on it’s own, outside of a dataframe.
Having said that, it’s probably more common to use unique() on dataframe columns.
Let’s take a look at how to do that.
How to use unique on a dataframe column
As I’ve already mentioned dataframe columns are essentially Pandas Series objects. If you want to use the unique() method on a dataframe column, you can do so as follows:
Type the name of the dataframe, then use “dot syntax” and type the name of the column. Then use dot syntax to call the unique()
method.
It’s actually really easy to use, but I’ll show you specific examples in the examples section.
The Output of Unique()
Whether we use the function form or the method form, the output is the same.
The unique() technique produces a Numpy array with the unique values.
Moreover, keep in mind that the unique values are returned in the order that they appear in the input series. So they are not sorted in the output.
[Note that “In case of an extension-array backed Series, a new ExtensionArray of that type with just the unique values is returned. This includes categorical, period, datetime with timezone, interval, sparse, integerNA.” See official documentation for Pandas unique.]
Examples: How to Identify Unique Values in Pandas
Ok. Now that you’ve learned about the syntax, let’s look at some concrete examples.
You can click on any of the following links, and it will take you directly to the example.
Examples:
- Identify the unique values of a list
- Get unique values from Pandas Series using the unique function
- Get unique values from Pandas Series using unique method
- Identify the unique values of a dataframe column
Run this code first
Two quick pieces of setup, before you run the examples.
You need to import Pandas, and retrieve a dataset.
Import Pandas and Seaborn
First you need to import Pandas and Seaborn with the following code.
import pandas as pd import seaborn as sns
We will use Seaborn to retrieve a dataset.
Retrieve Titanic Dataframe
Next, we’ll retrieve the titanic
dataframe.
We can do this with the sns.load_dataset()
function as follows:
sns.load_dataset('titanic')
We won’t use this dataframe for all of the examples, but we will use it for one of them.
EXAMPLE 1: Identify the Unique Values of a List
First, we’ll start simple.
Here, instead of working with more complex data structures, we’ll just work with a simple Python list.
First, let’s just create a simple Python list with 7 values.
letter_list = ['A','B','B','C','E','D','E']
The list, letter_list
, contains several capital letters. Notice that there are several repeated letters.
pd.unique(letter_list)
OUT:
array(['A', 'B', 'C', 'E', 'D'], dtype=object)
Explanation
Here, the input was a simple Python list that contains several letters. Some of the letters were repeated.
The output is a Numpy array that contains the unique values that were in the input. In other words, the output array contains the same values, but with all of the duplicates removed.
Furthermore, notice the order. The items in the output are not sorted. Instead, the items in the output appear in the same order that they originally appeared in the input.
EXAMPLE 2: Get unique values from Pandas Series using unique function
Next, let’s get the unique values from a Pandas Series.
Here, we’ll again use the unique() function to do this.
First though, let’s quickly create a Series object:
animals = pd.Series(['cat', 'dog', 'cat', 'bear', 'bear', 'bear', 'badger'])
And now, let’s identify the unique values:
pd.unique(animals)
OUT:
array(['cat', 'dog', 'bear', 'badger'], dtype=object)
Explanation
Again, this is fairly simple.
Here, we’re calling the pd.unique()
function to get the unique values.
The input to the function is the animals
Series (a Pandas Series object).
The output is a Numpy array.
Notice again that the items in the output are de-duped … the duplicates are removed.
Moreover, they appear in the exact same order as they appeared in the input. They are unsorted.
EXAMPLE 3:Get unique values from Pandas Series using unique method
Next, let’s use the unique()
method to get unique values.
So in the previous example, we used the unique function to compute the unique values. But here, we’re going to use the method (if you’re confused about this, review our explanation of the function version and the method version in the section about syntax.)
First, we can create our Series object (this is the same Series as the previous example).
animals = pd.Series(['cat', 'dog', 'cat', 'bear', 'bear', 'bear', 'badger'])
Next, let’s use the method syntax to retrieve the unique values.
animals.unique()
OUT:
array(['cat', 'dog', 'bear', 'badger'], dtype=object)
Explanation
Here, we’ve used the method syntax to retrieve the unique values that are contained in a Pandas series.
To do this, we typed the name of the Series object, animals
.
Then, we used so-called “dot syntax” to call the unique()
method. When we use the unique()
technique this way, it simply identifies the unique values that are contained in the associated Series object. As an output, it produces a Numpy array with the unique values.
EXAMPLE 4: Identify the Unique Values of a DataFrame Column
Finally, let’s do one more example.
Here, we’ll identify the unique values of a dataframe column.
Specifically, we’ll identify the unique values of the embark_town
variable in the titanic
dataset.
First, let’s get the titanic dataframe using sns.load_dataset()
.
titanic = sns.load_dataset('titanic')
Next, we can retrieve the unique values of the embark_town
column by using the method syntax as follows:
titanic.embark_town.unique()
OUT:
array(['Southampton', 'Cherbourg', 'Queenstown', nan], dtype=object)
Explanation
Here, we’re using the method syntax to identify the unique values of a dataframe column.
I explained this in the syntax section, but let me quickly repeat, for clarity.
When we get the unique values of a column, we need to type the name of the dataframe, then the name of the column, and then unique()
. Keep in mind that these must be separated by ‘dots.’
So in this example, titanic
is the name of the dataframe.
embark_town
is the name of the column. Remember, when we call it with the code titanic.embark_town
, it’s actually a Series object. That’s why we can use the method syntax.
Finally, we call the method with .unique()
.
The output is a Numpy array with the unique values that had been in the titanic.embark_town
column.
Keep in mind that t his is very useful when you’re analyzing or working with dataframes. You can identify the unique values of a column by using this technique.
Leave your questions in the comments below
Do you still have questions about the Pandas Unique technique?
Just leave your questions in the comments section near the bottom of the page.
If you want to master Pandas, join our course
In this tutorial, I’ve explained how to use the unique function, but if you want to master data manipulation in Pandas, there’s really a lot more to learn.
So if you really want to master data wrangling with Pandas, you should join our premium online course, Pandas Mastery.
Pandas Mastery is our online course that will teach you these critical data manipulation tools.
Inside the course, you’ll learn all of the essentials of data manipulation in pandas, like:
- subsetting data
- filtering data by logical conditions
- adding new variables
- reshaping data
- working with Pandas indexes
- and much more …
Additionally, you’ll discover our unique practice system that will enable you to memorize all of the syntax you learn. Memorizing the syntax will only take a few weeks!
Find out more here: