pandasData Types


Remarks

dtypes are not native to pandas. They are a result of pandas close architectural coupling to numpy.

the dtype of a column does not in any way have to correlate to the python type of the object contained in the column.

Here we have a pd.Series with floats. The dtype will be float.

Then we use astype to "cast" it to object.

pd.Series([1.,2.,3.,4.,5.]).astype(object)
0    1
1    2
2    3
3    4
4    5
dtype: object

The dtype is now object, but the objects in the list are still float. Logical if you know that in python, everything is an object, and can be upcasted to object.

type(pd.Series([1.,2.,3.,4.,5.]).astype(object)[0])
float

Here we try "casting" the floats to strings.

pd.Series([1.,2.,3.,4.,5.]).astype(str)
0    1.0
1    2.0
2    3.0
3    4.0
4    5.0
dtype: object

The dtype is now object, but the type of the entries in the list are string. This is because numpy does not deal with strings, and thus acts as if they are just objects and of no concern.

type(pd.Series([1.,2.,3.,4.,5.]).astype(str)[0])
str

Do not trust dtypes, they are an artifact of an architectural flaw in pandas. Specify them as you must, but do not rely on what dtype is set on a column.