In classification problems, the cost of false positives is almost never the same as the cost of false negatives. As such, if you are optimizing a solution for a business problem where Type 1 and Type 2 errors have a different impact, you can optimize your classifier for a probability threshold value to optimize the custom loss function simply by defining the cost of true positives, true negatives, false positives and false negatives separately. By default, all classifiers have a threshold of 0.5.
See example below using "credit" dataset.
# Importing dataset from pycaret.datasets import get_data credit = get_data('credit') # Importing module and initializing setup from pycaret.classification import * clf1 = setup(data = credit, target = 'default') # create a model xgboost = create_model('xgboost') # optimize threshold for trained model optimize_threshold(xgboost, true_negative = 1500, false_negative = -5000)
You can then pass 0.2 as probability_threshold parameter in predict_model function to use 0.2 as a threshold for classifying positive class. See example below:
predict_model(xgboost, probability_threshold=0.2)
Comments