The iloc (short for integer location) method allows to select the rows of a dataframe based on their position index. This way one can slice dataframes just like one does with Python's list slicing.
df = pd.DataFrame([[11, 22], [33, 44], [55, 66]], index=list("abc"))
df
# Out:
# 0...
When using labels, both the start and the stop are included in the results.
import pandas as pd
import numpy as np
np.random.seed(5)
df = pd.DataFrame(np.random.randint(100, size=(5, 5)), columns = list("ABCDE"),
index = ["R" + str(i) for i in range(5)])
...
DataFrame:
import pandas as pd
import numpy as np
np.random.seed(5)
df = pd.DataFrame(np.random.randint(100, size=(5, 5)), columns = list("ABCDE"),
index = ["R" + str(i) for i in range(5)])
df
Out[12]:
A B C D E
R0 99 78 61 16 73...
One can select rows and columns of a dataframe using boolean arrays.
import pandas as pd
import numpy as np
np.random.seed(5)
df = pd.DataFrame(np.random.randint(100, size=(5, 5)), columns = list("ABCDE"),
index = ["R" + str(i) for i in range(5)])
print (...
It may become necessary to traverse the elements of a series or the rows of a dataframe in a way that the next element or next row is dependent on the previously selected element or row. This is called path dependency.
Consider the following time series s with irregular frequency.
#starting pytho...
To view the first or last few records of a dataframe, you can use the methods head and tail
To return the first n rows use DataFrame.head([n])
df.head(n)
To return the last n rows use DataFrame.tail([n])
df.tail(n)
Without the argument n, these functions return 5 rows.
Note that the slice ...
Let
df = pd.DataFrame({'col_1':['A','B','A','B','C'], 'col_2':[3,4,3,5,6]})
df
# Output:
# col_1 col_2
# 0 A 3
# 1 B 4
# 2 A 3
# 3 B 5
# 4 C 6
To get the distinct values in col_1 you can use Series.unique()
df['col_1'].unique()
# Output:
...
If you have a dataframe with missing data (NaN, pd.NaT, None) you can filter out incomplete rows
df = pd.DataFrame([[0,1,2,3],
[None,5,None,pd.NaT],
[8,None,10,None],
[11,12,13,pd.NaT]],columns=list('ABCD'))
df
# Output:
# A B ...