One needs the predicted probabilities in order to calculate the ROC-AUC (area under the curve) score. The cross_val_predict
uses the predict
methods of classifiers. In order to be able to get the ROC-AUC score, one can simply subclass the classifier, overriding the predict
method, so that it would act like predict_proba
.
from sklearn.datasets import make_classification from sklearn.linear_model import LogisticRegression from sklearn.cross_validation import cross_val_predict from sklearn.metrics import roc_auc_score class LogisticRegressionWrapper(LogisticRegression): def predict(self, X): return super(LogisticRegressionWrapper, self).predict_proba(X) X, y = make_classification(n_samples = 1000, n_features=10, n_classes = 2, flip_y = 0.5) log_reg_clf = LogisticRegressionWrapper(C=0.1, class_weight=None, dual=False, fit_intercept=True) y_hat = cross_val_predict(log_reg_clf, X, y)[:,1] print("ROC-AUC score: {}".format(roc_auc_score(y, y_hat)))
output:
ROC-AUC score: 0.724972396025