Tutorial by Examples | RIP Tutorial

Select column by label

# Create a sample DF df = pd.DataFrame(np.random.randn(5, 3), columns=list('ABC')) # Show DF df A B C 0 -0.467542 0.469146 -0.861848 1 -0.823205 -0.167087 -0.759942 2 -1.508202 1.361894 -0.166701 3 0.394143 -0.287349 -0.978102 4 -0.160431 1.054736 -0.785250 ...

pandas • Indexing and selecting data

Select by position

The iloc (short for integer location) method allows to select the rows of a dataframe based on their position index. This way one can slice dataframes just like one does with Python's list slicing. df = pd.DataFrame([[11, 22], [33, 44], [55, 66]], index=list("abc")) df # Out: # 0...

pandas • Indexing and selecting data

Slicing with labels

When using labels, both the start and the stop are included in the results. import pandas as pd import numpy as np np.random.seed(5) df = pd.DataFrame(np.random.randint(100, size=(5, 5)), columns = list("ABCDE"), index = ["R" + str(i) for i in range(5)]) ...

pandas • Indexing and selecting data

Mixed position and label based selection

DataFrame: import pandas as pd import numpy as np np.random.seed(5) df = pd.DataFrame(np.random.randint(100, size=(5, 5)), columns = list("ABCDE"), index = ["R" + str(i) for i in range(5)]) df Out[12]: A B C D E R0 99 78 61 16 73...

pandas • Indexing and selecting data

Boolean indexing

One can select rows and columns of a dataframe using boolean arrays. import pandas as pd import numpy as np np.random.seed(5) df = pd.DataFrame(np.random.randint(100, size=(5, 5)), columns = list("ABCDE"), index = ["R" + str(i) for i in range(5)]) print (...

pandas • Indexing and selecting data

Filtering columns (selecting "interesting", dropping unneeded, using RegEx, etc.)

generate sample DF In [39]: df = pd.DataFrame(np.random.randint(0, 10, size=(5, 6)), columns=['a10','a20','a25','b','c','d']) In [40]: df Out[40]: a10 a20 a25 b c d 0 2 3 7 5 4 7 1 3 1 5 7 2 6 2 7 4 9 0 8 7 3 5 8 8 9 6 8 4 8 1 ...

pandas • Indexing and selecting data

Filtering / selecting rows using `.query()` method

import pandas as pd generate random DF df = pd.DataFrame(np.random.randint(0,10,size=(10, 3)), columns=list('ABC')) In [16]: print(df) A B C 0 4 1 4 1 0 2 0 2 7 8 8 3 2 1 9 4 7 3 8 5 4 0 7 6 1 5 5 7 6 7 8 8 6 7 3 9 6 4 5 select rows where value...

pandas • Indexing and selecting data

Path Dependent Slicing

It may become necessary to traverse the elements of a series or the rows of a dataframe in a way that the next element or next row is dependent on the previously selected element or row. This is called path dependency. Consider the following time series s with irregular frequency. #starting pytho...

pandas • Indexing and selecting data

Get the first/last n rows of a dataframe

To view the first or last few records of a dataframe, you can use the methods head and tail To return the first n rows use DataFrame.head([n]) df.head(n) To return the last n rows use DataFrame.tail([n]) df.tail(n) Without the argument n, these functions return 5 rows. Note that the slice ...

pandas • Indexing and selecting data

Select distinct rows across dataframe

Let df = pd.DataFrame({'col_1':['A','B','A','B','C'], 'col_2':[3,4,3,5,6]}) df # Output: # col_1 col_2 # 0 A 3 # 1 B 4 # 2 A 3 # 3 B 5 # 4 C 6 To get the distinct values in col_1 you can use Series.unique() df['col_1'].unique() # Output: ...

pandas • Indexing and selecting data

Filter out rows with missing data (NaN, None, NaT)

If you have a dataframe with missing data (NaN, pd.NaT, None) you can filter out incomplete rows df = pd.DataFrame([[0,1,2,3], [None,5,None,pd.NaT], [8,None,10,None], [11,12,13,pd.NaT]],columns=list('ABCD')) df # Output: # A B ...

pandas • Indexing and selecting data