In general term bootstrap means "subsampling with replacement".
Subsampling with replacement means a subsampling is said to be with a replacement if the selected subsample is replaced into the population before drawing the next unit.
(1) Consider we have given 100 samples of integers and your goal is to find the mean. Normally we add the 100 samples and divide by the total no of samples to get the mean.
(2) However in bootstrapping we pick a subsample size, let say 20 samples, we find the mean of these 20 samples and call it %u03BC1. Now put these 20 samples back into the pile and pick another random 20 samples and call it %u03BC2.
(3) Now repeat this process which is subsampling with replacement 30 times and we end up with 30 means. At the last, we take the mean of these mean and that is the final answer.
Now the question arises is that why we use bootstrapped mean instead of the average mean.
And the answer to that question is because if we take the average over an average decreases the variance of the predicted mean. It means there is less chance of overfitting and that is a very desirable property to use bootstrap.
Comments