In [15]: df = pd.DataFrame({"A":[1,1,2,3,1,1],"B":[5,4,3,4,6,7]})
In [21]: df
Out[21]:
A B
0 1 5
1 1 4
2 2 3
3 3 4
4 1 6
5 1 7
To get unique values in column A and B.
In [22]: df["A"].unique()
Out[22]: array([1, 2, 3])
In [23]: df["B"].unique()
Out[23]: array([5, 4, 3, 6, 7])
To get the unique values in column A as a list (note that unique()
can be used in two slightly different ways)
In [24]: pd.unique(df['A']).tolist()
Out[24]: [1, 2, 3]
Here is a more complex example. Say we want to find the unique values from column 'B' where 'A' is equal to 1.
First, let's introduce a duplicate so you can see how it works. Let's replace the 6 in row '4', column 'B' with a 4:
In [24]: df.loc['4', 'B'] = 4
Out[24]:
A B
0 1 5
1 1 4
2 2 3
3 3 4
4 1 4
5 1 7
Now select the data:
In [25]: pd.unique(df[df['A'] == 1 ]['B']).tolist()
Out[25]: [5, 4, 7]
This can be broken down by thinking of the inner DataFrame first:
df['A'] == 1
This finds values in column A that are equal to 1, and applies True or False to them. We can then use this to select values from column 'B' of the DataFrame (the outer DataFrame selection)
For comparison, here is the list if we don't use unique. It retrieves every value in column 'B' where column 'A' is 1
In [26]: df[df['A'] == 1]['B'].tolist()
Out[26]: [5, 4, 4, 7]