Calculating the Gain
- Calculate the homogeneity for each branch
b of the split (hb), as well as for the set prior to the split (hparent).
- The gain is
hparent - sum(hb), across all branches b
- High values for the gain indicate that the split creates branches that
are relatively more homogenous than the parent.
Calculating Homogeneity
- For each label
i, calculate the portion pi.
-
pi is the probability that a member of the set has label i
- Find the total number of elements with label
i.
- Divide by the total number of elements.
- Gini Coefficient:
1 - sum(pi2), for all labels i