C++ MLPACK :: LogisticRegression














































C++ MLPACK :: LogisticRegression



  • In the previous article in this series of MLpack with C++, we dealt with the algorithm named as LinearRegression.
  • Here we will be seeing another important Machine-Learning algorithm named as LogisticRegression in MLpack.

Introduction :

  • Logistic Regression is the regression analysis to conduct when the dependent variable is binary, i.e., can belong to one class or the other.

  • Following are the two major questions (applications) that can be answered by Logistic Regression :
    • How does the probability of getting lung cancer (yes vs. no) change for every additional pound a person is overweight and for every pack of cigarettes smoked per day?
    • Do body weight, calorie intake, fat intake, and age have an influence on the probability of having a heart attack (yes vs. no)?

  • The output value of Logistic-Regression-Function is of binary type (0 or 1), showing to which class, the given point belongs.
  • Just like Linear regression assumes that the data follows a linear function, Logistic regression models the data using the sigmoid function.
    • This sigmoid function is the one which takes all the output generated in the range [0, 1].

  • Following is the chart-representation and mathematical formula for Sigmoid Function :

  • In MLpack, LogisticRegression class implements an L2-regularized logistic regression model, and supports training with multiple optimizers and classification.
  • The class supports different observation types via the MatType template parameter; for instance, logistic regression can be performed on sparse datasets by specifying arma::sp_mat as the MatType parameter.

  • LogisticRegression can be used for general classification tasks, but the class is restricted to support only two classes. For multiclass logistic regression, MLpack provides a separate class called SoftmaxRegression (which we will see in some other article).


Model Description :

  • The LogisticRegression is rather generally used as a classification technique as it outputs in form of binary variables.
  • The model actually outputs in a continuous value, the same manner as the LinearRegression does, but using the Sigmoid Function, this continuous output is converted into discrete one.


Class Description :

  • The class allows loading a logistic regression model (via the --input_model_file parameter) or training a logistic regression model, given training data (specified with the --training_file parameter), or both those things at once. 
  • In addition, this program allows classification on a test dataset (specified with the --test_file parameter) and the classification results may be saved with the --predictions_file output parameter. 
  • The trained logistic regression model may be saved using the --output_model_file output parameter.
  • We can furthermore provide a lambda parameter (L2 regularization), which helps the model not to do over-fitting. This can be specified with the --lambda option.

  • As an example, to train a logistic regression model on the data 'data.csv' with labels 'labels.csv' with L2 regularization of 0.1, saving the model to 'lr_model.bin', the following command may be used:
$ mlpack_logistic_regression --training_file data.csv --labels_file labels.csv --lambda 0.1 --output_model_file lr_model.bin

  • Then, to use that model to predict classes for the dataset 'test.csv', storing the output predictions in 'predictions.csv', the following command may be used:
$ mlpack_logistic_regression --input_model_file lr_model.bin --test_file test.csv --output_file predictions.csv



Major Function Description :


  • The class provides you with four constructors, the most widely used one is given below, 
LogisticRegression(const MatType & predictors,
const arma::Row< size_t > & responses,
const double lambda = 0 
)


The parameter description for the constructor is as follows :
predictorsInput training variables.
responsesOutputs resulting from input training variables.
lambdaL2-regularization parameter.


  • Next, comes the function to classify the provided new data-points (test dataset). Following is most commonly used template for this,
void Classify(const MatType & dataset,
arma::Row< size_t > & labels,
const double decisionBoundary = 0.5 
)const

Parameters
datasetSet of points to classify.
labelsPredicted labels for each point.
decisionBoundaryDecision boundary (default 0.5).


  • For calculating the accuracy of the model, we need to compare the actual outputs of data-points with the outputs that the model predicts. For this purpose, the class gives us a function called 
double ComputeAccuracy(const MatType & predictors,
const arma::Row< size_t > & responses,
const double decisionBoundary = 0.5 
)

Parameters
predictorsInput predictors.
responsesVector of responses.
decisionBoundaryDecision boundary (default 0.5).

  • The accuracy is returned as a percentage, between 0 and 100. It returns the percentage of responses that are correctly predicted.

  • The required header files are as follows :
#include <mlpack/core.hpp>
#include <mlpack/methods/logistic_regression/logistic_regression.hpp>
#include <ensmallen.hpp>
#include <boost/test/unit_test.hpp>
#include "test_tools.hpp"

using namespace mlpack;
using namespace mlpack::regression;
using namespace mlpack::distribution;


Read about all available functions here : Logistic-Regression

Stay connected for more Articles ...
Like the post if you find it worthy ...




More Articles of Keshav Kabra:

Name Views Likes
C++ MLPACK :: CharExtract 560 4
Star Pattern - 3 527 5
Things a Beginner Programmer MUST Do 635 12
Ramanujan Numbers 1416 2
Star Pattern - 4 540 8
Implementation of Stack using Singly Linked-List 2249 10
C++ MLPACK :: Recall 500 7
C++ MLPACK Introduction 652 13
Star Pattern - 1 638 13
C++ MLPACK :: ZCAWhitening 547 4
Factorial using Stack 5821 5
C++ MLPACK :: Installation 1611 7
C++ MLPACK :: DatasetMapper 505 6
C++ MLPACK :: AdaBoost 982 10
C++ MLPACK :: ImageInfo 442 3
C++ MLPACK :: F1-Score (F1) 1032 7
C++ MLPACK :: MedianImputation 472 5
C++ MLPACK :: Split 838 7
C++ MLPACK :: Clustering 789 4
C++ MLPACK :: MeanNormalization 548 6
C++ MLPACK :: SimpleCV 588 8
Postfix Expression Evaluation by Stack 4755 14
C++ Program to Identify People Invited in a Party 956 3
C++ MLPACK :: ListwiseDeletion 511 5
C++ MLPACK :: ColumnsToBlock 361 2
C++ MLPACK :: Confusion Matrix 1376 7
C++ MLPACK :: Binarize 555 5
Print Statement Without Using Semi-Colon 434 4
C++ MLPACK :: MinMaxScaler 1061 5
C++ MLPACK :: MeanShift 984 3
C++ MLPACK :: MSE (Mean Squared Error) 1731 6
C++ MLPACK :: Data-Normalization 728 3
Mistakes People do While Learning Programming 533 12
Shift 3 Numbers without using Extra Memory 460 7
C++ MLPACK :: Perceptron 1119 2
C++ MLPACK :: LinearSVM 916 11
Left-Shift and Right-Shift Operators 503 5
Circular Queue using Array 2386 8
C++ MLPACK :: LinearRegression 1950 15
C++ MLPACK :: NeighborSearch 632 14
C++ MLPACK :: StandardScaler 1048 4
Sorting a Singly Linked-List 506 2
Star Pattern - 7 517 5
C++ MLPACK :: Precision 535 8
C++ MLPACK :: MaxAbsScaler 556 4
C++ MLPACK :: KMeans 1656 4
C++ MLPACK :: NaiveBayesClassifier 1181 4
C++ MLPACK :: NaiveKMeans 420 3
C++ MLPACK :: MeanImputation 424 5
C++ MLPACK :: DBSCAN 2056 3
C++ MLPACK :: Accuracy 479 8
C++ MLPACK :: DecisionStump 479 2
C++ MLPACK :: Ensemble Learning 708 9
C++ MLPACK :: EMST :: DTBRules 375 3
C++ MLPACK :: RangeType 411 2
Reverse a Singly Linked-List 515 6
Star Pattern - 5 (Pascal Triangle) 734 3
C++ MLPACK :: Imputer 567 6
Move Text on Pressing Keys 607 4
C++ MLPACK :: PCAWhitening 508 3
Star Pattern - 2 486 8
C++ MLPACK :: EMST :: EdgePair 432 3
C++ MLPACK :: Over-Fitting and Under-Fitting 692 7
Pythagorean Triplets 3414 3
Merge Two Linked-Lists 2777 5
Singly Circular Linked-List 450 4
C++ MLPACK :: LogisticRegression 2131 15
C++ MLPACK :: KFoldCV 853 9
Star Pattern - 6 528 5
C++ MLPACK :: LoadCSV 699 6
Sieve of Eratosthenes 5304 4
C++ MLPACK :: StringEncoding 867 6
C++ MLPACK :: EMST :: DTBStat 351 3
C++ MLPACK :: SplitByAnyOf 462 4
C++ MLPACK :: GaussianDistribution 427 3
C++ MLPACK :: SoftmaxRegression 725 12
Armstrong Numbers 470 3

Comments