pandas Tutorial => Descriptive statistics

Example

Descriptive statistics (mean, standard deviation, number of observations, minimum, maximum, and quartiles) of numerical columns can be calculated using the .describe() method, which returns a pandas dataframe of descriptive statistics.

In [1]: df = pd.DataFrame({'A': [1, 2, 1, 4, 3, 5, 2, 3, 4, 1], 
                           'B': [12, 14, 11, 16, 18, 18, 22, 13, 21, 17], 
                           'C': ['a', 'a', 'b', 'a', 'b', 'c', 'b', 'a', 'b', 'a']})

In [2]: df
Out[2]: 
   A   B  C
0  1  12  a
1  2  14  a
2  1  11  b
3  4  16  a
4  3  18  b
5  5  18  c
6  2  22  b
7  3  13  a
8  4  21  b
9  1  17  a

In [3]: df.describe()
Out[3]:
               A          B
count  10.000000  10.000000
mean    2.600000  16.200000
std     1.429841   3.705851
min     1.000000  11.000000
25%     1.250000  13.250000
50%     2.500000  16.500000
75%     3.750000  18.000000
max     5.000000  22.000000

Note that since C is not a numerical column, it is excluded from the output.

In [4]: df['C'].describe()
Out[4]:
count     10
unique     3
freq       5
Name: C, dtype: object

In this case the method summarizes categorical data by number of observations, number of unique elements, mode, and frequency of the mode.

PDF - Download pandas for free

Previous Next

pandas

Fastest Entity Framework Extensions

Example

Got any pandas Question?

pandas

pandas Getting started with pandas Descriptive statistics

Fastest Entity Framework Extensions

Example

Got any pandas Question?