Refining the algorithm
- Goal: Find partitions that yield homogeneous sets
- Requires a metric for homogeneity
- What is the set of possible feature splits?
- Boolean features:
- Features with
n
discrete values:
- Create a tree with
n
branches rather than two branches
- Numerically-valued features:
- For each concrete value in the training set:
- Perform a binary split around that value
- From all of the possible splits, we select the split with the highest gain