We’ve seen how logistic regression, LDA, and decision trees can be used to solve classification problems. Another categorization technique is now available: support vector machine.

The __Support Vector Machines__ are generalization of the maximum margin classifier. Because the classes must be separated by a linear border, the maximal margin classifier cannot be applied to the majority of datasets.

As a result, the support vector classifier was developed as an extension of the maximal margin classifier that may be used in a wider range of circumstances.

Finally, support vector machine is basically a non-linear class boundary extension of the support vector classifier.

It is suitable for both binary and multiclass classification.

SVM theory can be somewhat difficult to explain. Hopefully, this post has helped you grasp how SVMs function.

After you’ve gone over the theory, you’ll get to put the algorithm to work in four distinct scenarios!

Let’s get started without further ado.

**Classifier with the Highest Margin**

A hyperplane is used to separate classes in this manner.

**What exactly is a hyperplane?**

A hyperplane is a flat affine subspace of size p-1 in a p-dimensional space. The hyperplane will appear as a line in 2D space and as a flat plane in 3D space.

In principle, if the data can be properly separated using a hyperplane, there are an endless number of hyperplanes because they can be adjusted up or down or slightly rotated without colliding with an observation.

That is why we utilize the maximal margin hyperplane, also known as the ideal separating hyperplane, which is the separating hyperplane that is farthest away from the observations. Given a hyperplane, we compute the perpendicular distance from each training observation. This is referred to as the margin. As a result, the best separating hyperplane is the one with the biggest margin.

**Support vector machine**

The support vector machine is a support vector classifier extension that results from extending the feature space with kernels. The kernel approach is just a computationally efficient method for handling a non-linear border between classes.

A kernel is a function that measures the similarity of two observations without going into technical specifics. The kernel might be any size.

**SVM Train and Runtime Complexities**

SVM training time complexities are about O(n2). SVMs are not employed in low-latency applications because O(n2) is very big when n is very large.

The runtime complexity is around O(k.d), where

- k denotes the number of Support Vectors.
- d =Data Dimensionality

**SVR stands for Support Vector Machines**

Support Vector Machine can also be utilized as a regression approach while retaining all of the algorithm’s fundamental characteristics (maximal margin).

With a few minor exceptions, the Support Vector Regression (SVR) uses the same classification concepts as the SVM. For starters, because the output is a real number, it becomes extremely difficult to forecast the information at hand, which has an endless number of possibilities.

In the case of regression, a margin of tolerance (epsilon) is set in order to approximate the SVM that the problem would have already requested. However, there is another aspect to consider, and that is that the algorithm is more difficult.

The essential principle, however, is always the same: to minimize error by individualizing the hyper-plane that maximizes the margin while keeping in mind that some mistake is acceptable.

**Gamma is a hyper-parameter**

In SVC, there is a highly essential hyper-parameter called ‘gamma’ that is frequently employed.

Gamma: The gamma parameter specifies how far a single training example’s influence extends, with low values indicating ‘far’ and large values indicating ‘near’. In other words, when gamma is low, locations far away from a reasonable separation line are taken into account when calculating the separation line. High gamma indicates that points near to a plausible line are taken into account in the calculation.

**SVM Applications**

**SVM Implementation on Real-World Datasets**

We will use SVM on Donors Choose Dataset in this section. We will use SVM functions from the sklearn library in this section.

The dataset’s details are available here. It’s a classification difficulty.

For kernel SVMs, we can utilize the SVC () function from the sklearn library. You may get detailed information about it here.

SGDClassifier (loss =’hinge’) from the sklearn library can be used for linear SVMs. You may get detailed information about it here.

We used the following steps to apply linear SVM on Donors. Select a dataset:

Divide the dataset into two parts: train and test.

Then, using the Cross Validation approach, we tuned the hyper-parameters to find the optimal hyper-parameters. GridSearchCV() is a function offered by the sklearn library.