Calculating the Gain
- Calculate the homogeneity for each branch
b
of the split (hb
), as well as for the set prior to the split (hparent
).
- The gain is
hparent - sum(hb)
, across all branches b
- High values for the gain indicate that the split creates branches that
are relatively more homogenous than the parent.
Calculating Homogeneity
- For each label
i
, calculate the portion pi
.
-
pi
is the probability that a member of the set has label i
- Find the total number of elements with label
i
.
- Divide by the total number of elements.
- Gini Coefficient:
1 - sum(pi2)
, for all labels i