Let
df = pd.DataFrame({'col_1':['A','B','A','B','C'], 'col_2':[3,4,3,5,6]})
df
# Output:
# col_1 col_2
# 0 A 3
# 1 B 4
# 2 A 3
# 3 B 5
# 4 C 6
To get the distinct values in col_1
you can use Series.unique()
df['col_1'].unique()
# Output:
# array(['A', 'B', 'C'], dtype=object)
But Series.unique() works only for a single column.
To simulate the select unique col_1, col_2 of SQL you can use DataFrame.drop_duplicates()
:
df.drop_duplicates()
# col_1 col_2
# 0 A 3
# 1 B 4
# 3 B 5
# 4 C 6
This will get you all the unique rows in the dataframe. So if
df = pd.DataFrame({'col_1':['A','B','A','B','C'], 'col_2':[3,4,3,5,6], 'col_3':[0,0.1,0.2,0.3,0.4]})
df
# Output:
# col_1 col_2 col_3
# 0 A 3 0.0
# 1 B 4 0.1
# 2 A 3 0.2
# 3 B 5 0.3
# 4 C 6 0.4
df.drop_duplicates()
# col_1 col_2 col_3
# 0 A 3 0.0
# 1 B 4 0.1
# 2 A 3 0.2
# 3 B 5 0.3
# 4 C 6 0.4
To specify the columns to consider when selecting unique records, pass them as arguments
df = pd.DataFrame({'col_1':['A','B','A','B','C'], 'col_2':[3,4,3,5,6], 'col_3':[0,0.1,0.2,0.3,0.4]})
df.drop_duplicates(['col_1','col_2'])
# Output:
# col_1 col_2 col_3
# 0 A 3 0.0
# 1 B 4 0.1
# 3 B 5 0.3
# 4 C 6 0.4
# skip last column
# df.drop_duplicates(['col_1','col_2'])[['col_1','col_2']]
# col_1 col_2
# 0 A 3
# 1 B 4
# 3 B 5
# 4 C 6
Source: How to “select distinct” across multiple data frame columns in pandas?.