Bagging
- Bootstrap aggregating
- Statistical motivation
- No data set is a perfect match to any canonical distribution
- Some data sets are close enough
- Other data sets are hard to match
- For hard-to-match sets, we can create an estimate of the distribution through bootstrapping
- From a single sample, only one mean can be computed
- We can create arbitrary numbers of new samples from an original of size
n as follows:
- Select
n elements from the sample at random
- But select with replacement
- Hence, there is a very low probability that the samples will be identical
- The plot of means of the new samples gives a picture of the distribution
(next)