In Depth: Parameter tuning for SVC

In this post we will explore the most important parameters of Sklearn SVC classifier and how they impact our model in term of overfitting.

Support Vectors Classifier tries to find the best hyperplane to separate the different classes by maximizing the distance between sample points and the hyperplane.

We will use the Iris data from sklearn.

We start by loading the libraries and SVC

import numpy as np
import matplotlib.pyplot as plt
from sklearn import svm, datasets

%matplotlib inline

# import some data to play with
iris = datasets.load_iris()
X = iris.data[:, :2] # we only take the first two features. We could
 # avoid this ugly slicing by using a two-dim dataset
y = iris.target

We use the same plotting script for our plots.

def plotSVC(title):
    # create a mesh to plot in
    x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
    y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
    h = (x_max / x_min)/100
    xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
 np.arange(y_min, y_max, h))

    plt.subplot(1, 1, 1)
    Z = svc.predict(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)
    plt.contourf(xx, yy, Z, cmap=plt.cm.Paired, alpha=0.8)

    plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.Paired)
    plt.xlabel('Sepal length')
    plt.ylabel('Sepal width')
    plt.xlim(xx.min(), xx.max())
    plt.title(title)
    plt.show()

Kernel

kernel parameters selects the type of hyperplane used to separate the data. Using ‘linear’ will use a linear hyperplane (a line in the case of 2D data). ‘rbf’ and ‘poly’ uses a non linear hyper-plane

kernels = ['linear', 'rbf', 'poly']

for kernel in kernels:
    svc = svm.SVC(kernel=kernel).fit(X, y)
    plotSVC('kernel=' + str(kernel))

gamma

gamma is a parameter for non linear hyperplanes. The higher the gamma value it tries to exactly fit the training data set

gammas = [0.1, 1, 10, 100]

for gamma in gammas:
    svc = svm.SVC(kernel='rbf', gamma=gamma).fit(X, y)
    plotSVC('gamma=' + str(gamma))

We can see that increasing gamma leads to overfitting as the classifier tries to perfectly fit the training data

C

C is the penalty parameter of the error term. It controls the trade off between smooth decision boundary and classifying the training points correctly.

cs = [0.1, 1, 10, 100, 1000]

for c in cs:
    svc = svm.SVC(kernel='rbf', C=c).fit(X, y)
    plotSVC('C=' + str(c))

Increasing C values may lead to overfitting the training data.

degree

degree is a parameter used when kernel is set to ‘poly’. It’s basically the degree of the polynomial used to find the hyperplane to split the data.

degrees = [0, 1, 2, 3, 4, 5, 6]

for degree in degrees:
    svc = svm.SVC(kernel='poly', degree=degree).fit(X, y)
    plotSVC('degree=' + str(degree))

Using degree=1 is the same as using a ‘linear’ kernel. Also, increasing this parameters leads to higher training times.