pandas Getting started with pandas Descriptive statistics


Example

Descriptive statistics (mean, standard deviation, number of observations, minimum, maximum, and quartiles) of numerical columns can be calculated using the .describe() method, which returns a pandas dataframe of descriptive statistics.

In [1]: df = pd.DataFrame({'A': [1, 2, 1, 4, 3, 5, 2, 3, 4, 1], 
                           'B': [12, 14, 11, 16, 18, 18, 22, 13, 21, 17], 
                           'C': ['a', 'a', 'b', 'a', 'b', 'c', 'b', 'a', 'b', 'a']})

In [2]: df
Out[2]: 
   A   B  C
0  1  12  a
1  2  14  a
2  1  11  b
3  4  16  a
4  3  18  b
5  5  18  c
6  2  22  b
7  3  13  a
8  4  21  b
9  1  17  a

In [3]: df.describe()
Out[3]:
               A          B
count  10.000000  10.000000
mean    2.600000  16.200000
std     1.429841   3.705851
min     1.000000  11.000000
25%     1.250000  13.250000
50%     2.500000  16.500000
75%     3.750000  18.000000
max     5.000000  22.000000

Note that since C is not a numerical column, it is excluded from the output.

In [4]: df['C'].describe()
Out[4]:
count     10
unique     3
freq       5
Name: C, dtype: object

In this case the method summarizes categorical data by number of observations, number of unique elements, mode, and frequency of the mode.