Different operations with data are done using special classes.
Most of the classes belong to one of the following groups:
sklearn.base.ClassifierMixin
) to solve classification problemssklearn.base.RegressorMixin
) to solve problem of reconstructing continuous variables (regression problem)sklearn.base.TransformerMixin
) that preprocess the dataData is stored in numpy.array
s (but other array-like objects like pandas.DataFrame
s are accepted if those are convertible to numpy.array
s)
Each object in the data is described by set of features the general convention is that data sample is represented with array, where first dimension is data sample id, second dimension is feature id.
import numpy
data = numpy.arange(10).reshape(5, 2)
print(data)
Output:
[[0 1]
[2 3]
[4 5]
[6 7]
[8 9]]
In sklearn
conventions dataset above contains 5 objects each described by 2 features.