Python PyCaret How to optimize the probability threshold % in binary classification














































Python PyCaret How to optimize the probability threshold % in binary classification



How to optimize the probability threshold % in binary classification

 
In classification problems, the cost of false positives is almost never the same as the cost of false negatives. As such, if you are optimizing a solution for a business problem where Type 1 and Type 2 errors have a different impact, you can optimize your classifier for a probability threshold value to optimize the custom loss function simply by defining the cost of true positives, true negatives, false positives and false negatives separately. By default, all classifiers have a threshold of 0.5.

See example below using "credit" dataset.

# Importing dataset
from pycaret.datasets import get_data

credit = get_data('credit')
# Importing module and initializing setup
from pycaret.classification import *
clf1 = setup(data = credit, target = 'default')

# create a model
xgboost = create_model('xgboost')

# optimize threshold for trained model
optimize_threshold(xgboost, true_negative = 1500, false_negative = -5000)

 

Image for post

You can then pass 0.2 as probability_threshold parameter in predict_model function to use 0.2 as a threshold for classifying positive class. See example below:

predict_model(xgboost, probability_threshold=0.2)

More Articles of Aditi Kothiyal:

Name Views Likes
Python AdaBoost Mathematics Behind AdaBoost 421 1
Python PyCaret How to optimize the probability threshold % in binary classification 2069 0
Python K-means Predicting Iris Flower Species 1321 2
Python PyCaret How to ignore certain columns for model building 2624 0
Python PyCaret Experiment Logging 679 0
Python PyWin32 Open a File in Excel 940 0
Python Guppy GSL Introduction 219 2
Python Usage of Guppy With Example 1100 2
Python Naive Bayes Tutorial 552 2
Python Guppy Recent Memory Usage of a Program 892 2
Introduction to AdaBoost 289 1
Python AdaBoost Implementation of AdaBoost 512 1
Python AdaBoost Advantages and Disadvantages of AdaBoost 3713 1
Python K-Means Clustering Applications 332 2
Python Random Forest Algorithm Decision Trees 439 0
Python K-means Clustering PREDICTING IRIS FLOWER SPECIES 456 1
Python Random Forest Algorithm Bootstrap 475 0
Python PyCaret Util Functions 441 0
Python K-means Music Genre Classification 1763 1
Python PyWin Attach an Excel file to Outlook 1540 0
Python Guppy GSL Document and Test Example 248 2
Python Random Forest Algorithm Bagging 386 0
Python AdaBoost An Example of How AdaBoost Works 279 1
Python PyWin32 Getting Started PyWin32 602 0
Python Naive Bayes in Machine Learning 374 2
Python PyCaret How to improve results from hyperparameter tuning by increasing "n_iter" 1723 0
Python PyCaret Getting Started with PyCaret 2.0 356 1
Python PyCaret Tune Model 1325 1
Python PyCaret Create your own AutoML software 320 0
Python PyCaret Intoduction to PyCaret 296 1
Python PyCaret Compare Models 2696 1
Python PyWin Copying Data into Excel 1152 0
Python Guppy Error: expected function body after function declarator 413 2
Python Coding Random forest classifier using xgBoost 246 0
Python PyCaret How to tune "n parameter" in unsupervised experiments 658 0
Python PyCaret How to programmatically define data types in the setup function 1403 0
Python PyCaret Ensemble Model 805 1
Python Random forest algorithm Introduction 227 0
Python k-means Clustering Example 337 1
Python PyCaret Plot Model 1243 1
Python Hamming Distance 714 0
Python Understanding Random forest algorithm 309 0
Python PyCaret Sort a Dictionary by Keys 244 0
Python Coding Random forest classifier using sklearn 340 0
Python Guppy Introduction 368 2
Python How to use Guppy/Heapy for tracking down Memory Usage 1068 2
Python AdaBoost Summary and Conclusion 231 1
Python PyCaret Create Model 365 1
Python k -means Clusturing Introduction 325 2
Python k-means Clustering With Example 348 2

Comments