Python Random Forest Algorithm Bagging

Python Random Forest Algorithm Bagging


The general definition of bagging is to create several subsets of data from the training sample chosen randomly with replacement. Each subset of data is used to train their decision tree. It is also called Bootstrap aggregation.

The main goal of the bagging is to reduce the variance of our decision tree classifier. It is only possible by combining several decision trees classifier.

So the question arises why do we use decision trees usually?

And the answer to that question is

1- A single decision trees are unstable

2- A minor change in the training data will highly affect the tree structure. Hence, there is a high variance in predictions.

Random Forest

Random forest is usually a bunch of decision trees. In this where we ask a random set of the question. And based on that question it makes the split. So the question arises is how do we know that at which value we have to split the decision tree.

The answer to that question is by using Gini Index we can find our magical number.

Let us dig deeper and understand what is Gini Index

A Gini Index is a cost function that is used to evaluate the splits. And minimize it.

The mathematical formula is

Gini Index formula

where p of i is the proportion of the samples that belong to class c for a particular node.

More Articles of Aditi Kothiyal:

Name Views Likes
Python AdaBoost Mathematics Behind AdaBoost 421 1
Python PyCaret How to optimize the probability threshold % in binary classification 2071 0
Python K-means Predicting Iris Flower Species 1323 2
Python PyCaret How to ignore certain columns for model building 2635 0
Python PyCaret Experiment Logging 680 0
Python PyWin32 Open a File in Excel 941 0
Python Guppy GSL Introduction 220 2
Python Usage of Guppy With Example 1102 2
Python Naive Bayes Tutorial 552 2
Python Guppy Recent Memory Usage of a Program 893 2
Introduction to AdaBoost 290 1
Python AdaBoost Implementation of AdaBoost 513 1
Python AdaBoost Advantages and Disadvantages of AdaBoost 3714 1
Python K-Means Clustering Applications 333 2
Python Random Forest Algorithm Decision Trees 440 0
Python K-means Clustering PREDICTING IRIS FLOWER SPECIES 457 1
Python Random Forest Algorithm Bootstrap 476 0
Python PyCaret Util Functions 441 0
Python K-means Music Genre Classification 1763 1
Python PyWin Attach an Excel file to Outlook 1542 0
Python Guppy GSL Document and Test Example 248 2
Python Random Forest Algorithm Bagging 387 0
Python AdaBoost An Example of How AdaBoost Works 280 1
Python PyWin32 Getting Started PyWin32 603 0
Python Naive Bayes in Machine Learning 376 2
Python PyCaret How to improve results from hyperparameter tuning by increasing "n_iter" 1724 0
Python PyCaret Getting Started with PyCaret 2.0 356 1
Python PyCaret Tune Model 1325 1
Python PyCaret Create your own AutoML software 321 0
Python PyCaret Intoduction to PyCaret 297 1
Python PyCaret Compare Models 2697 1
Python PyWin Copying Data into Excel 1154 0
Python Guppy Error: expected function body after function declarator 413 2
Python Coding Random forest classifier using xgBoost 247 0
Python PyCaret How to tune "n parameter" in unsupervised experiments 659 0
Python PyCaret How to programmatically define data types in the setup function 1403 0
Python PyCaret Ensemble Model 805 1
Python Random forest algorithm Introduction 228 0
Python k-means Clustering Example 339 1
Python PyCaret Plot Model 1245 1
Python Hamming Distance 715 0
Python Understanding Random forest algorithm 311 0
Python PyCaret Sort a Dictionary by Keys 245 0
Python Coding Random forest classifier using sklearn 340 0
Python Guppy Introduction 368 2
Python How to use Guppy/Heapy for tracking down Memory Usage 1069 2
Python AdaBoost Summary and Conclusion 232 1
Python PyCaret Create Model 365 1
Python k -means Clusturing Introduction 326 2
Python k-means Clustering With Example 351 2