Probably Approximately Correct
- f is Probably Approximately Correct (PAC) with probability 1-δ if and only if Pr(Error(f) > ε) < δ
How many examples are needed?
- Depends upon the size of the hypothesis space
- What would this be for decision trees?
- Let |H| be the size of the hypothesis space
- If f is a "bad" hypothesis (i.e., Error(f(x)) > ε), then:
- Probability that the "bad" f is consistent with m examples is ≤ (1 - ε)m
- In other words, this is the chance that the "bad" f fools us
- Probability that a "bad" f exists in H is |H| (1-ε)m
- This is δ that we referred to earlier
- Let's do some algebra:
- |H| (1-ε)m ≤ δ
- (1-ε)m ≤ δ / |H|
- m ≥ log(δ/|H|) / log(1 - ε)
- This result is a lower bound on the number of training examples (m) we need to guarantee PAC for any learning algorithm
- This result is optimistic:
- For a specific learning algorithm, m could well be larger
- Does not really take noisy data into account
- This result is pessimistic:
- Both training and test examples can exhibit high similarity, or even duplication
- Using a distribution-dependent PAC model gets complicated very quickly
(next)