Random Forests
- Let
N be the number of hypotheses
- Let
T be the training set
- Let
M be the total number of features across all training examples
- For each hypothesis
- Create a training set with
|T| examples, selected at random, with replacement
- Train a random decision tree using the set
- For each interior node
- Randomly select
sqrt(M) features for splitting candidates
- Pick the best split among them
- Add the hypothesis to the ensemble
- To select a label for a test example
- For each label
L
- Count the hypotheses that return
L for the test example
- Return the label with the highest count