pandas Duplicated data Counting and getting unique elements


Example

Number of unique elements in a series:

In [1]: id_numbers = pd.Series([111, 112, 112, 114, 115, 118, 114, 118, 112])
In [2]: id_numbers.nunique()
Out[2]: 5

Get unique elements in a series:

In [3]: id_numbers.unique()
Out[3]: array([111, 112, 114, 115, 118], dtype=int64)

In [4]: df = pd.DataFrame({'Group': list('ABAABABAAB'), 
                           'ID': [1, 1, 2, 3, 3, 2, 1, 2, 1, 3]})

In [5]: df
Out[5]: 
  Group  ID
0     A   1
1     B   1
2     A   2
3     A   3
4     B   3
5     A   2
6     B   1
7     A   2
8     A   1
9     B   3

Number of unique elements in each group:

In [6]: df.groupby('Group')['ID'].nunique()
Out[6]: 
Group
A    3
B    2
Name: ID, dtype: int64

Get of unique elements in each group:

In [7]: df.groupby('Group')['ID'].unique()
Out[7]: 
Group
A    [1, 2, 3]
B       [1, 3]
Name: ID, dtype: object