Home Support Vector Machines
Post
Cancel

Support Vector Machines

In this post I cover the fundamentals of Support Vector Machines.
A high level summary, is that a SVM is a linear classifier chosen such that the minimum distance from the decision boundary to any other training points is maximised.

Main references:

  • PRML by Christoper Bishop - Section 4

Introduction

Linear Decision Rule

We are in the binary classification setup, and we want to create a linear decision rule that when given a feature vector we predict the class it belongs to. What we mean by a linear decision rule, is that the decision rule will be based on some activation function of an affine combinantion of our features.

In particular, for now we will define our decision rule as:

\[\begin{align} \label{eq:binary linear classifier} f(\bx) &= \begin{cases} +1, &\quad g(\bx;\bw) \geq 0 \\ -1, &\quad g(\bx;\bw) < 0 \end{cases} \\ \text{Where } g(\bx;\bw) &= \langle \bw',\bx \rangle + w_0 \nonumber \end{align}\]

In other words, this is the step function activation function applied to the affine function of our features $g(\bx;\bw)$

Decision Boundary Geometry

Before we introduce Support Vector Machines (SVM), I will list some geometric facts related to the decision boundary.

The decision boundary is the set $$

svm-diagram SVM

SVM Motivation

The perceptron algorithm chooses the value of $\bw$ by applying gradient ascent iteratively on observations that are miss-classified. If the data is linearly separable, it can be shown that the perceptron algorithm will find a solution (decision rule where all points correctly classified) in a finite number of steps.

However, if the data is linearly separable there will be an infinite number of choices for a decision boundary.
Our objective is to find the decision boundary that generalizes “best” to unseen data, Support Vector Machines (SVM) tries to solve this problem.

High Level Description of SVM

SVMs find the decision boundary that is furthest away from the closest points. Or in other words, it is searching for the decision boundary that maximises the minimum distance of feature vectors to the decision boundary.

\[max min\]
This post is licensed under CC BY 4.0 by the author.