pandas Tutorial => Grouping numbers

Example

For the following DataFrame:

import numpy as np
import pandas as pd
np.random.seed(0)
df = pd.DataFrame({'Age': np.random.randint(20, 70, 100), 
                   'Sex': np.random.choice(['Male', 'Female'], 100), 
                   'number_of_foo': np.random.randint(1, 20, 100)})
df.head()
# Output: 

#    Age     Sex  number_of_foo
# 0   64  Female             14
# 1   67  Female             14
# 2   20  Female             12
# 3   23    Male             17
# 4   23  Female             15

Group Age into three categories (or bins). Bins can be given as

an integer n indicating the number of bins—in this case the dataframe's data is divided into n intervals of equal size
a sequence of integers denoting the endpoint of the left-open intervals in which the data is divided into—for instance bins=[19, 40, 65, np.inf] creates three age groups (19, 40], (40, 65], and (65, np.inf].

Pandas assigns automatically the string versions of the intervals as label. It is also possible to define own labels by defining a labels parameter as a list of strings.

pd.cut(df['Age'], bins=4)
# this creates four age groups: (19.951, 32.25] < (32.25, 44.5] < (44.5, 56.75] < (56.75, 69]
Name: Age, dtype: category
Categories (4, object): [(19.951, 32.25] < (32.25, 44.5] < (44.5, 56.75] < (56.75, 69]]

pd.cut(df['Age'], bins=[19, 40, 65, np.inf])
# this creates three age groups: (19, 40], (40, 65] and (65, infinity)
Name: Age, dtype: category
Categories (3, object): [(19, 40] < (40, 65] < (65, inf]]

Use it in groupby to get the mean number of foo:

age_groups = pd.cut(df['Age'], bins=[19, 40, 65, np.inf])
df.groupby(age_groups)['number_of_foo'].mean()
# Output: 
# Age
# (19, 40]     9.880000
# (40, 65]     9.452381
# (65, inf]    9.250000
# Name: number_of_foo, dtype: float64

Cross tabulate age groups and gender:

pd.crosstab(age_groups, df['Sex'])
# Output: 
# Sex        Female  Male
# Age
# (19, 40]       22    28
# (40, 65]       18    24
# (65, inf]       3     5

PDF - Download pandas for free

Previous Next

pandas

Fastest Entity Framework Extensions

Example

Got any pandas Question?

pandas

pandas Grouping Data Grouping numbers

Fastest Entity Framework Extensions

Example

Got any pandas Question?