Support Vector Machines

Support Vector Machines are a Supervised Machine Learning Algorithm by which we can perform Regression and Classification.

In SVM, data points are plotted in n-dimensional space where n is the number of features. Then the classification is done by selecting a suitable hyper-plane that differentiates two classes. In n-dimensional space, the hyper-plane has (n-1) dimensions.

The SVM is an extension of the other two classifiers. Often considered one of the best “out of the box” classifiers.

  1. Maximal Margin Classifiers are Separable data.
  2. Support Vector Classifiers Non-separable data.
  3. Support Vector Machines Nonlinear class boundaries.

Hyperplane:

A Hyperplane divides P dimensional space into two parts. Two predictor variables -> 2D predictor space .

We want to find 1D (Line) hyperplane which separates this space into 2 parts for the below figure.

For the above table we will find a Hyperplane that separates the data into two categories: Pass and Fail.

The above figure divides the data into pass and fail categories using a hyperplane. The blue dots represents the pass category whereas the red dots represents the fail category. 

There are many other directions to draw a Hyperplane. For separating the two classes.

1. Maximal Margin Classifier:

By computing the perpendicular distance between the hyperplane to the training observations. The shortest such distance is called the minimal distance between the hyperplane and the observation, and it is called margin

Therefore, the maximal margin hyperplane is the hyperplane that has the largest margin, meaning. which has the largest distance between the hyperplane and the training observations. Using that hyperplane we can classify testing data. If our model has.

Image for post

 then the maximal margin classifier can classify new test observations based on 

Image for post

The observations which fall on margin are known as Support vectors.These classifiers depend on support vectors only.That is why this technique is different from conventional ML techniques.

Maximal margin classifiers cannot be used if the two classes are not separable by a hyperplane. From the below figure the points in the space cannot be separated

because one class point is mixed with other class points.

Maximal margin classifiers are very sensitive to support vectors, an additional Observation Can Lead To A Dramatic Shift In The Maximal Margin Hyperplane.

Support Vector Classifier:

We use a support vector classifier to handle non perfectly separable scenario means as shown in the above figure the marginal classifier cannot separate the two vectors which are in combined state so to solve this problem we use Support vector classifier and also for Greater robustness To Individual Observations.

In simple words, if we have a single outlier in our observation it can change the way margins are defined and this change will wrongly classify the testing observation. In the above figure you can observe that the other class outliers are in different class spaces so to find and classify those outliers in a class we use a support vector classifier to separate the classes.

Support vector classifier = soft margin classifier

 How it Separates:

  1. We create a misclassification budget ( B ).
  2. We limit sum of distances of the points on the wrong side of the margin ?1 + ?2 + ?3 + ?4 < ?.
  3. so, We try to maximize margin while trying to stay within budget. 
  4. Usually in our software packages we use C (Cost – multiplier of the error term) which is inversely related to B.

Impact of C:

  • When C is small, margins will be wide and there will be many support vectors and many misclassified observations.
  • When C is large , margins will be narrow and there will be fewer support vectors and fewer misclassified values.
  • However, low cost value prevents overfitting and may give better test set performance.
  • We try to find the optimal value of C at which we get best test performance.

Support Vector Machines:

Support vector machines (SVM) are an extension of the support vector classifier which uses Kernels to create non linear boundaries.

Problem:

therefore, There are cases where we cannot draw a boundary that can separate two classes in linear fashion. The example is in the picture below. We see that a linear Plane Cannot Separate Training Observations Belonging To Two Different Classes.

In that case, we consider enlarging the feature space using functions of the predictors such as quadratic or cubic terms in order to address the non-linearity. In case of support vector classifiers, we can address this problem of having non-linear boundaries between classes in a similar way by enlarging the feature space using quadratic, cubic or even higher-order polynomials.

For example:

if we have 2-D feature space with training observation having value (X1,Y1), (X1,Y1)….(X_n,Y_n) such that they are inseparable using a linear plane, then we can convert the feature space into a 3-D space by having a third-dimension (X1,Y1,Z1),(X2,Y2,Z2)…..(X_n,Y_n,Z_n)

In order words, rather than fitting a support vector classifier using p features (left hand side) we can instead fit a support vector classifier using 2p features (right hand side).

The change in feature space can be anything one likes as long as it is successfully converting the space into higher dimensional space such that two classes can be separated using a linear plane.

However, we need some kind of standardized computation method that can convert our feature space into higher dimensions. Here comes the kernel trick.

Kernel:

The kernel trick is an effective computational approach for enlarging the feature space. The kernel trick uses the inner product of two vectors. The inner product of two r-vectors a and b is defining as

Kernels Some functional relationship between two observations. Some popular kernels. 

  1. Linear 
  2. Polynomial 
  3. Radial 
1. Linear Kernel:

Linear kernel takes inner product of two observations.This kernel effectively is a support vector classifier 

2. Polynomial Kernel:

Polynomial kernel uses power function to create non linear boundaries 

3. Radial Kernel:

The Radial kernel  uses radial function to create radial Boundaries.

where, ? ?? ? ???????? ???????t.

Gamma defines how much influence a single training example has. The larger gamma is, the closer other examples must be to be affected.

also, If you want to know clearly how it represents dynamically.

svmjs Support Vector Machine in Javascript: demo (stanford.edu)

Click on the above link to know how the svm visualization will be.

thus, This is how Support Vector Machines helps in categorising the classes using its technique. 

I hope you have got clear info about how support vector machines works.

Written By: Sangamreddy Manasa Valli

Reviewed By: Umamah

If you are Interested In Machine Learning You Can Check Machine Learning Internship Program
Also Check Other Technical And Non Technical Internship Programs

Leave a Comment

Your email address will not be published. Required fields are marked *