Simple Linear Regression

Simple Linear Regression


                      Introduction to Simple Linear Regression

The term regression was first applied to statistics by the polymath Francis Galton. Galton is a major figure in the development of statistics and genetics.

Linear Regression is one of the simplest machine learning algorithms that map the relationship between two variable by fitting a linear equation to observed data. One variable is considered to be an explanatory variable, and the other is considered to be a dependent variable.

                        Simple Linear Regression

In simple linear regression when we have a single input. and we have to obtain a line that best fits the data. The best fit line is the one for which total prediction error (all data points) are as small as possible. Error is the distance between the point to the regression line.

      E.g: Relationship between hours of study and marks obtained.


Error is the difference between predicted and actual value. to reduce the error and find the best fit line we have to find value for bo and b1.


for finding a best fit line value of bo and b1 must be that minimize the error.error is the difference between predicted and actual output.

           Assumptions of simple linear regression

Simple linear regression is a parametric test, meaning that it makes certain assumptions about the data. These assumptions are:

  • Independence of observations: the observations in the dataset were collected using statistically valid sampling methods and there are no hidden relationships among observations.
  • Homogeneity of variance: the size of the error in our prediction doesn't change significantly across the values of the independent variable.
  • Normality: The data follows a normal distribution
  • The relationship between the independent and dependent variable is linear: the line of best fit through the data points is a straight line . If your data do not meet the assumptions of homoscedasticity or normality, you may be able to use a nonparametric test instead, such as the Spearman rank test.

Let us now apply Machine Learning to train a dataset to predict the Salary from Years of Experience. We will follow the following steps to do so

  • Step1 :-Importing Libraries and Datasets
Three python libraries will be used in the code.

  • pandas
  • matplotlib
  • sklearn

   ## Importing the necessary Libraries
      import pandas as pd
      import matplotlib.pyplot as plt

   ## Importing the datasert
       data =                          pd.read_csv("")
      X = data.iloc[:, :-1].values
      y = data.iloc[:, -1].values

        Step 2:-  Split the Data

In this step, Split the dataset into the Training set, on which the Linear Regression model will be trained and the Test set, on which the trained model will be applied to visualize the results. In this the test_size=3.0 denotes that 30% of the data will be kept as the Test set and the remaining 70% will be used for training as the Training set.
    ## Splitting the dataset into the Training set and Test set

     from sklearn.model_selection import train_test_split
     train_X, test_X, train_y, test_y = train_test_split(X, y, test_size = 0.3, random_state=12)

      Step3:-  Fit and Predict

Import LinearRegression from linear_model and assigned it to the variable lr. used to learn from data and lr.predict() to predict basis on learn data.

    ## Fit and Predict Model

    from sklearn.linear_model import LinearRegression
    lr = LinearRegression(), train_y)
   predicted_y = lr.predict(test_X)

    Step4:- Evaluate and Visualize
       Create a pandas dataframe of predict and actual values and visualize the dataset.

     ## Visualising the Test set results

     plt.scatter(test_X, test_y, color = 'red')
     plt.scatter(test_X, predicted_y, color = 'green')
    plot(train_X, lr.predict(train_X), color = 'black')
    plt.title('Salary vs Experience (Result)')

      Now lets us know how we can plot our linear Regression Curve

Error=(predict%u2212actual)%u2217%u22172for finding a best fit line value of bo and b1 must be that minimize the error.error is the difference between predicted and actual output.


Name Views Likes
C++ Exception Handling-|| 85 3
C++ Exception Handling 116 3
C++ File Handling EOF 92 5
C++ File Handling Error Handling During file Operation 1135 5
C++ File Handling seekp() Function 105 5
C++ File Handling seekg() Function 105 6
C++ File Handling tellg() Function 126 5
C++ File Handling tellp() Function 132 5
C++ File Handling close() Function 78 5
C++ File Handling:: open() 106 5
C++ File Handling Introduction 131 5
C++ Container Library:: std ::array-II 101 4
Top Django Interview Questions and Answers You Need to Know 302 6
C++ Container Library::std::array 96 6
C++ Container Library Introduction 103 5
Essential Engineering Skills for Your Resume 127 5
Mistakes To Avoid During Technical Interview 133 5
Mistakes You Must Avoid During Work From Home 178 6
Seven Common Mistakes that Beginners Should Avoid While Learning to Code 108 6
Simple Linear Regression 96 6
Python wagtailmenus 122 6
Python wagtail application 119 6
Python guppy GSL 136 5
Python wagtail guppy subpackage heapy 173 6
Python guppy Introduction 215 6
Python wagtail Structural Block 257 6
Python wagtail Integrating into Django Project 370 6
Python wagtail testing 280 7
Python wagtail Basic Block 349 7
Python wagtail image tag 215 7
Python wagtail Templates 179 6
Python wagtail Backends 206 6
Python wagtail Indexing 189 6
Python wagtail Search 173 7
Python wagtail Snippets 191 6
Python wagtail Writing your own page models 165 9
Python wagtail model class PageRevision 196 10
Python wagtail model class Site 198 9
Python wagtail wagtail core model 218 9
Python wagtail page models 189 10
Python wagtail 166 11