Implementation of SVM For Spam Mail Detection - THEORY














































Implementation of SVM For Spam Mail Detection - THEORY



 Implementation of SVM For Spam Mail Detection




Aim:

              To implement the spam mail detection using support vector machine algorithm.


Theory:


              Support Vector Machine or SVM is one of the most popular Supervised Learning algorithms, which is used for Classification as well as Regression problems. However, primarily, it is used for Classification problems in Machine Learning.

The goal of the SVM algorithm is to create the best line or decision boundary that can segregate n-dimensional space into classes so that we can easily put the new data point in the correct category in the future. This best decision boundary is called a hyperplane.

Types Of SVM:

They are two types of SVM:

  •            Linear SVM

  •            Non-linear SVM

Linear SVM: Linear SVM is used for linearly separable data, which means if a dataset can be classified into two classes by using a single straight line, then such data is termed as linearly separable data, and classifier is used called as Linear SVM classifier.

Non-Linear SVM: If data is linearly arranged, then we can separate it by using a straight line, but for non-linear data, we cannot draw a single straight line.

SVM can be of two types:


  • Linear SVM: Linear SVM is used for linearly separable data, which means if a dataset can be classified into two classes by using a single straight line, then such data is termed as linearly separable data, and classifier is used called as Linear SVM classifier.
  • Non-linear SVM: Non-Linear SVM is used for non-linearly separated data, which means if a dataset cannot be classified by using a straight line, then such data is termed as non-linear data and classifier used is called as Non-linear SVM classifier.

Hyperplane and Support Vectors in the SVM algorithm:

Hyperplane: There can be multiple lines/decision boundaries to segregate the classes in n-dimensional space, but we need to find out the best decision boundary that helps to classify the data points. This best boundary is known as the hyperplane of SVM.

The dimensions of the hyperplane depend on the features present in the dataset, which means if there are 2 features (as shown in image), then hyperplane will be a straight line. And if there are 3 features, then hyperplane will be a 2-dimension plane.

We always create a hyperplane that has a maximum margin, which means the maximum distance between the data points.

Support Vectors:

The data points or vectors that are the closest to the hyperplane and which affect the position of the hyperplane are termed as Support Vector. Since these vectors support the hyperplane, hence called a Support vector.

How does SVM works?

Linear SVM:

The working of the SVM algorithm can be understood by using an example. Suppose we have a dataset that has two tags (green and blue), and the dataset has two features x1 and x2. We want a classifier that can classify the pair(x1, x2) of coordinates in either green or blue. Consider the below image: Support Vector Machine Algorithm

So as it is 2-d space so by just using a straight line, we can easily separate these two classes. But there can be multiple lines that can separate these classes. Consider the below image:

Support Vector Machine Algorithm

Hence, the SVM algorithm helps to find the best line or decision boundary; this best boundary or region is called as a hyperplane. SVM algorithm finds the closest point of the lines from both the classes. These points are called support vectors. The distance between the vectors and the hyperplane is called as margin. And the goal of SVM is to maximize this margin. The hyperplane with maximum margin is called the optimal hyperplane.

Support Vector Machine Algorithm

Non-Linear SVM:

If data is linearly arranged, then we can separate it by using a straight line, but for non-linear data, we cannot draw a single straight line. Consider the below image:

Support Vector Machine Algorithm

So to separate these data points, we need to add one more dimension. For linear data, we have used two dimensions x and y, so for non-linear data, we will add a third dimension z. It can be calculated as:

z=x2 +y2

By adding the third dimension, the sample space will become as below image:

Support Vector Machine Algorithm

So now, SVM will divide the datasets into classes in the following way. Consider the below image:

Support Vector Machine Algorithm

Since we are in 3-d Space, hence it is looking like a plane parallel to the x-axis. If we convert it in 2d space with z=1, then it will become as:

Support Vector Machine Algorithm

Hence we get a circumference of radius 1 in case of non-linear data.


More Articles of Eedula Ganesh Reddy:

Name Views Likes
Create a Virtual Private Cloud (VPC) Using AWS Coursera answers 1094 17
WORD ORDER PROBLEM 595 110
Palindrome related problem code - Python 629 123
Music Player - Python 420 31
Implementation of SVM For Spam Mail Detection - Python 569 17
Implementation of SVM For Spam Mail Detection - THEORY 609 209
Implementation of Movie Recommender System 379 57
Implementation of K Means Clustering for Customer Segmentation 610 224
Implementation of Decision Tree Regressor Model for Predicting the Salary of the Employee 409 72
Implementation of Decision Tree Classifier Model for Predicting Employee Churn 444 83
f-strings in Python 455 154
Implementation of Logistic Regression Using Gradient Descent - SOURCE CODE 366 149
Implementation of Logistic Regression Using Gradient Descent - THEORY 427 164
Implementation of Logistic Regression Model to Predict the Placement Status of Student 408 6
Implementation of Multivariate Linear Regression Model for Sales Prediction 1389 27
Colour Detection using Pandas & OpenCV -PYTHON CODE 414 17
Colour Detection using Pandas & OpenCV 421 46
Implementation of Simple Linear Regression Model for Predicting the Marks Scored 350 152
Python Program to generate one-time password (OTP) 609 117
Create a Screen recorder using Python 356 22
Face Detection Using Python 404 20
Linear Regression Lab 1440 180
MAKING A RECTANGULAR SPIRAL IN MS PAINT 392 118
pyautogui library Theory 401 25
Multiple message sending program 412 195
Sklearn 354 156
LinearRegression 377 167

Comments