scikit-learn Classification Using Support Vector Machines


Support vector machines is a family of algorithms attempting to pass a (possibly high-dimension) hyperplane between two labelled sets of points, such that the distance of the points from the plane is optimal in some sense. SVMs can be used for classification or regression (corresponding to sklearn.svm.SVC and sklearn.svm.SVR, respectively.


Suppose we work in a 2D space. First, we create some data:

import numpy as np

Now we create x and y:

x0, x1 = np.random.randn(10, 2), np.random.randn(10, 2) + (1, 1)
x = np.vstack((x0, x1))

y = [0] * 10 + [1] * 10

Note that x is composed of two Gaussians: one centered around (0, 0), and one centered around (1, 1).

To build a classifier, we can use:

from sklearn import svm

svm.SVC(kernel='linear').fit(x, y)

Let's check the prediction for (0, 0):

>>> svm.SVC(kernel='linear').fit(x, y).predict([[0, 0]])

The prediction is that the class is 0.

For regression, we can similarly do:

svm.SVR(kernel='linear').fit(x, y)