The general definition of bagging is to create several subsets of data from the training sample chosen randomly with replacement. Each subset of data is used to train their decision tree. It is also called Bootstrap aggregation.
The main goal of the bagging is to reduce the variance of our decision tree classifier. It is only possible by combining several decision trees classifier.
So the question arises why do we use decision trees usually?
And the answer to that question is
1- A single decision trees are unstable
2- A minor change in the training data will highly affect the tree structure. Hence, there is a high variance in predictions.
Random Forest
Random forest is usually a bunch of decision trees. In this where we ask a random set of the question. And based on that question it makes the split. So the question arises is how do we know that at which value we have to split the decision tree.
The answer to that question is by using Gini Index we can find our magical number.
Let us dig deeper and understand what is Gini Index
A Gini Index is a cost function that is used to evaluate the splits. And minimize it.
The mathematical formula is
where p of i is the proportion of the samples that belong to class c for a particular node.
Comments