pandas Masking data based on column value


Example

This will be our example data frame:

  color      name   size
0   red      rose    big
1  blue    violet  small
2   red     tulip  small
3  blue  harebell  small

Accessing a single column from a data frame, we can use a simple comparison == to compare every element in the column to the given variable, producing a pd.Series of True and False

df['size'] == 'small'
0    False
1     True
2     True
3     True
Name: size, dtype: bool

This pd.Series is an extension of an np.array which is an extension of a simple list, Thus we can hand this to the __getitem__ or [] accessor as in the above example.

size_small_mask = df['size'] == 'small'
df[size_small_mask]
  color      name   size
1  blue    violet  small
2   red     tulip  small
3  blue  harebell  small