pandas MultiIndex Setting and sorting a MultiIndex


Example

This example shows how to use column data to set a MultiIndex in a pandas.DataFrame.

In [1]: df = pd.DataFrame([['one', 'A', 100], ['two', 'A', 101], ['three', 'A', 102],
   ...:                    ['one', 'B', 103], ['two', 'B', 104], ['three', 'B', 105]],
   ...:                   columns=['c1', 'c2', 'c3'])


In [2]: df
Out[2]: 
      c1 c2   c3
0    one  A  100
1    two  A  101
2  three  A  102
3    one  B  103
4    two  B  104
5  three  B  105



In [3]: df.set_index(['c1', 'c2'])
Out[3]: 
           c3
c1    c2     
one   A   100
two   A   101
three A   102
one   B   103
two   B   104
three B   105

You can sort the index right after you set it:

In [4]: df.set_index(['c1', 'c2']).sort_index()
Out[4]: 
           c3
c1    c2     
one   A   100
      B   103
three A   102
      B   105
two   A   101
      B   104

Having a sorted index, will result in slightly more efficient lookups on the first level:

In [5]: df_01 = df.set_index(['c1', 'c2'])

In [6]: %timeit df_01.loc['one']
1000 loops, best of 3: 607 µs per loop


In [7]: df_02 = df.set_index(['c1', 'c2']).sort_index()

In [8]: %timeit df_02.loc['one']
1000 loops, best of 3: 413 µs per loop

After the index has been set, you can perform lookups for specific records or groups of records:

In [9]: df_indexed = df.set_index(['c1', 'c2']).sort_index()

In [10]: df_indexed.loc['one']
Out[10]: 
     c3
c2     
A   100
B   103


In [11]: df_indexed.loc['one', 'A']
Out[11]: 
c3    100
Name: (one, A), dtype: int64


In [12]: df_indexed.xs((slice(None), 'A'))
Out[12]: 
        c3
c1        
one    100
three  102
two    101