Gradient Boosting for classification. The Gradient Boosting Classifier is an additive ensemble of a base model whose error is corrected in successive iterations (or stages) by the addition of Regression Trees which correct the residuals (the error of the previous stage).
Import:
from sklearn.ensemble import GradientBoostingClassifier
Create some toy classification data
from sklearn.datasets import load_iris
iris_dataset = load_iris()
X, y = iris_dataset.data, iris_dataset.target
Let us split this data into training and testing set.
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=0)
Instantiate a GradientBoostingClassifier
model using the default params.
gbc = GradientBoostingClassifier()
gbc.fit(X_train, y_train)
Let us score it on the test set
# We are using the default classification accuracy score
>>> gbc.score(X_test, y_test)
1
By default there are 100 estimators built
>>> gbc.n_estimators
100
This can be controlled by setting n_estimators
to a different value during the initialization time.