pandas Grouping Data Grouping numbers


Example

For the following DataFrame:

import numpy as np
import pandas as pd
np.random.seed(0)
df = pd.DataFrame({'Age': np.random.randint(20, 70, 100), 
                   'Sex': np.random.choice(['Male', 'Female'], 100), 
                   'number_of_foo': np.random.randint(1, 20, 100)})
df.head()
# Output: 

#    Age     Sex  number_of_foo
# 0   64  Female             14
# 1   67  Female             14
# 2   20  Female             12
# 3   23    Male             17
# 4   23  Female             15

Group Age into three categories (or bins). Bins can be given as

  • an integer n indicating the number of bins—in this case the dataframe's data is divided into n intervals of equal size
  • a sequence of integers denoting the endpoint of the left-open intervals in which the data is divided into—for instance bins=[19, 40, 65, np.inf] creates three age groups (19, 40], (40, 65], and (65, np.inf].

Pandas assigns automatically the string versions of the intervals as label. It is also possible to define own labels by defining a labels parameter as a list of strings.

pd.cut(df['Age'], bins=4)
# this creates four age groups: (19.951, 32.25] < (32.25, 44.5] < (44.5, 56.75] < (56.75, 69]
Name: Age, dtype: category
Categories (4, object): [(19.951, 32.25] < (32.25, 44.5] < (44.5, 56.75] < (56.75, 69]]

pd.cut(df['Age'], bins=[19, 40, 65, np.inf])
# this creates three age groups: (19, 40], (40, 65] and (65, infinity)
Name: Age, dtype: category
Categories (3, object): [(19, 40] < (40, 65] < (65, inf]]

Use it in groupby to get the mean number of foo:

age_groups = pd.cut(df['Age'], bins=[19, 40, 65, np.inf])
df.groupby(age_groups)['number_of_foo'].mean()
# Output: 
# Age
# (19, 40]     9.880000
# (40, 65]     9.452381
# (65, inf]    9.250000
# Name: number_of_foo, dtype: float64

Cross tabulate age groups and gender:

pd.crosstab(age_groups, df['Sex'])
# Output: 
# Sex        Female  Male
# Age
# (19, 40]       22    28
# (40, 65]       18    24
# (65, inf]       3     5