Finding patterns in data often proceeds in a chain of data-processing steps, e.g., feature selection, normalization, and classification. In sklearn
, a pipeline of stages is used for this.
For example, the following code shows a pipeline consisting of two stages. The first scales the features, and the second trains a classifier on the resulting augmented dataset:
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
pipeline = make_pipeline(StandardScaler(), KNeighborsClassifier(n_neighbors=4))
Once the pipeline is created, you can use it like a regular stage (depending on its specific steps). Here, for example, the pipeline behaves like a classifier. Consequently, we can use it as follows:
# fitting a classifier
pipeline.fit(X_train, y_train)
# getting predictions for the new data sample
pipeline.predict_proba(X_test)