scikit-learn Tutorial => Interfaces and conventions:

Example

Different operations with data are done using special classes.

Most of the classes belong to one of the following groups:

classification algorithms (derived from sklearn.base.ClassifierMixin) to solve classification problems
regression algorithms (derived from sklearn.base.RegressorMixin) to solve problem of reconstructing continuous variables (regression problem)
data transformations (derived from sklearn.base.TransformerMixin) that preprocess the data

Data is stored in numpy.arrays (but other array-like objects like pandas.DataFrames are accepted if those are convertible to numpy.arrays)

Each object in the data is described by set of features the general convention is that data sample is represented with array, where first dimension is data sample id, second dimension is feature id.

import numpy
data = numpy.arange(10).reshape(5, 2)
print(data)

Output:
[[0 1]
 [2 3]
 [4 5]
 [6 7]
 [8 9]]

In sklearn conventions dataset above contains 5 objects each described by 2 features.

PDF - Download scikit-learn for free

Previous Next

scikit-learn

Fastest Entity Framework Extensions

Example

Got any scikit-learn Question?

scikit-learn

scikit-learn Getting started with scikit-learn Interfaces and conventions:

Fastest Entity Framework Extensions

Example

Got any scikit-learn Question?