pandas Grouping Data using transform to get group-level statistics while preserving the original dataframe


Example

example:

df = pd.DataFrame({'group1' :  ['A', 'A', 'A', 'A',
                               'B', 'B', 'B', 'B'],
                   'group2' :  ['C', 'C', 'C', 'D',
                               'E', 'E', 'F', 'F'],
                   'B'      :  ['one', np.NaN, np.NaN, np.NaN,
                                np.NaN, 'two', np.NaN, np.NaN],
                   'C'      :  [np.NaN, 1, np.NaN, np.NaN,
                               np.NaN, np.NaN, np.NaN, 4]})           

 df
Out[34]: 
     B    C group1 group2
0  one  NaN      A      C
1  NaN  1.0      A      C
2  NaN  NaN      A      C
3  NaN  NaN      A      D
4  NaN  NaN      B      E
5  two  NaN      B      E
6  NaN  NaN      B      F
7  NaN  4.0      B      F

I want to get the count of non-missing observations of B for each combination of group1 and group2. groupby.transform is a very powerful function that does exactly that.

df['count_B']=df.groupby(['group1','group2']).B.transform('count')                        

df
Out[36]: 
     B    C group1 group2  count_B
0  one  NaN      A      C        1
1  NaN  1.0      A      C        1
2  NaN  NaN      A      C        1
3  NaN  NaN      A      D        0
4  NaN  NaN      B      E        1
5  two  NaN      B      E        1
6  NaN  NaN      B      F        0
7  NaN  4.0      B      F        0