pandas Cross Tabulation


Example

import pandas as pd
df = pd.DataFrame({'Sex': ['M', 'M', 'F', 'M', 'F', 'F', 'M', 'M', 'F', 'F'], 
               'Age': [20, 19, 17, 35, 22, 22, 12, 15, 17, 22],
               'Heart Disease': ['Y', 'N', 'Y', 'N', 'N', 'Y', 'N', 'Y', 'N', 'Y']})

df

  Age Heart Disease Sex
0   20             Y   M
1   19             N   M
2   17             Y   F
3   35             N   M
4   22             N   F
5   22             Y   F
6   12             N   M
7   15             Y   M
8   17             N   F
9   22             Y   F

pd.crosstab(df['Sex'], df['Heart Disease'])

Hearth Disease  N  Y
Sex                 
F               2  3
M               3  2

Using dot notation:

pd.crosstab(df.Sex, df.Age)

Age  12  15  17  19  20  22  35
Sex                            
F     0   0   2   0   0   3   0
M     1   1   0   1   1   0   1

Getting transpose of DF:

pd.crosstab(df.Sex, df.Age).T

Sex  F  M
Age      
12   0  1
15   0  1
17   2  0
19   0  1
20   0  1
22   3  0
35   0  1

Getting margins or cumulatives:

pd.crosstab(df['Sex'], df['Heart Disease'], margins=True)

Heart Disease  N  Y  All
Sex                     
F              2  3    5
M              3  2    5
All            5  5   10

Getting transpose of cumulative:

pd.crosstab(df['Sex'], df['Age'], margins=True).T


Sex  F  M  All
Age           
12   0  1    1
15   0  1    1
17   2  0    2
19   0  1    1
20   0  1    1
22   3  0    3
35   0  1    1
All  5  5   10

Getting percentages :

pd.crosstab(df["Sex"],df['Heart Disease']).apply(lambda r: r/len(df), axis=1)

Heart Disease    N    Y
Sex                    
F              0.2  0.3
M              0.3  0.2

Getting cumulative and multiplying by 100:

df2 = pd.crosstab(df["Age"],df['Sex'], margins=True ).apply(lambda r: r/len(df)*100, axis=1)

df2

Sex     F     M    All
Age                   
12    0.0  10.0   10.0
15    0.0  10.0   10.0
17   20.0   0.0   20.0
19    0.0  10.0   10.0
20    0.0  10.0   10.0
22   30.0   0.0   30.0
35    0.0  10.0   10.0
All  50.0  50.0  100.0

Removing a column from DF (one way):

df2[["F","M"]]

Sex     F     M
Age            
12    0.0  10.0
15    0.0  10.0
17   20.0   0.0
19    0.0  10.0
20    0.0  10.0
22   30.0   0.0
35    0.0  10.0
All  50.0  50.0