scikit-learn Getting started with scikit-learn Interfaces and conventions:

Help us to keep this website almost Ad Free! It takes only 10 seconds of your time:
> Step 1: Go view our video on YouTube: EF Core Bulk Extensions
> Step 2: And Like the video. BONUS: You can also share it!

Example

Different operations with data are done using special classes.

Most of the classes belong to one of the following groups:

  • classification algorithms (derived from sklearn.base.ClassifierMixin) to solve classification problems
  • regression algorithms (derived from sklearn.base.RegressorMixin) to solve problem of reconstructing continuous variables (regression problem)
  • data transformations (derived from sklearn.base.TransformerMixin) that preprocess the data

Data is stored in numpy.arrays (but other array-like objects like pandas.DataFrames are accepted if those are convertible to numpy.arrays)

Each object in the data is described by set of features the general convention is that data sample is represented with array, where first dimension is data sample id, second dimension is feature id.

import numpy
data = numpy.arange(10).reshape(5, 2)
print(data)

Output:
[[0 1]
 [2 3]
 [4 5]
 [6 7]
 [8 9]]

In sklearn conventions dataset above contains 5 objects each described by 2 features.



Got any scikit-learn Question?