pandas Reshaping and pivoting Cross Tabulation

Help us to keep this website almost Ad Free! It takes only 10 seconds of your time:
> Step 1: Go view our video on YouTube: EF Core Bulk Insert
> Step 2: And Like the video. BONUS: You can also share it!

Example

import pandas as pd
df = pd.DataFrame({'Sex': ['M', 'M', 'F', 'M', 'F', 'F', 'M', 'M', 'F', 'F'], 
               'Age': [20, 19, 17, 35, 22, 22, 12, 15, 17, 22],
               'Heart Disease': ['Y', 'N', 'Y', 'N', 'N', 'Y', 'N', 'Y', 'N', 'Y']})

df

  Age Heart Disease Sex
0   20             Y   M
1   19             N   M
2   17             Y   F
3   35             N   M
4   22             N   F
5   22             Y   F
6   12             N   M
7   15             Y   M
8   17             N   F
9   22             Y   F

pd.crosstab(df['Sex'], df['Heart Disease'])

Hearth Disease  N  Y
Sex                 
F               2  3
M               3  2

Using dot notation:

pd.crosstab(df.Sex, df.Age)

Age  12  15  17  19  20  22  35
Sex                            
F     0   0   2   0   0   3   0
M     1   1   0   1   1   0   1

Getting transpose of DF:

pd.crosstab(df.Sex, df.Age).T

Sex  F  M
Age      
12   0  1
15   0  1
17   2  0
19   0  1
20   0  1
22   3  0
35   0  1

Getting margins or cumulatives:

pd.crosstab(df['Sex'], df['Heart Disease'], margins=True)

Heart Disease  N  Y  All
Sex                     
F              2  3    5
M              3  2    5
All            5  5   10

Getting transpose of cumulative:

pd.crosstab(df['Sex'], df['Age'], margins=True).T


Sex  F  M  All
Age           
12   0  1    1
15   0  1    1
17   2  0    2
19   0  1    1
20   0  1    1
22   3  0    3
35   0  1    1
All  5  5   10

Getting percentages :

pd.crosstab(df["Sex"],df['Heart Disease']).apply(lambda r: r/len(df), axis=1)

Heart Disease    N    Y
Sex                    
F              0.2  0.3
M              0.3  0.2

Getting cumulative and multiplying by 100:

df2 = pd.crosstab(df["Age"],df['Sex'], margins=True ).apply(lambda r: r/len(df)*100, axis=1)

df2

Sex     F     M    All
Age                   
12    0.0  10.0   10.0
15    0.0  10.0   10.0
17   20.0   0.0   20.0
19    0.0  10.0   10.0
20    0.0  10.0   10.0
22   30.0   0.0   30.0
35    0.0  10.0   10.0
All  50.0  50.0  100.0

Removing a column from DF (one way):

df2[["F","M"]]

Sex     F     M
Age            
12    0.0  10.0
15    0.0  10.0
17   20.0   0.0
19    0.0  10.0
20    0.0  10.0
22   30.0   0.0
35    0.0  10.0
All  50.0  50.0


Got any pandas Question?