Probably Approximately Correct

f is Probably Approximately Correct (PAC) with probability 1-δ if and only if Pr(Error(f) > ε) < δ

How many examples are needed?

Depends upon the size of the hypothesis space
- What would this be for decision trees?
Let |H| be the size of the hypothesis space
If f is a "bad" hypothesis (i.e., Error(f(x)) > ε), then:
- Probability that the "bad" f is consistent with m examples is ≤ (1 - ε)^m
  - In other words, this is the chance that the "bad" f fools us
  - Probability that a "bad" f exists in H is |H| (1-ε)^m
  - This is δ that we referred to earlier
- Let's do some algebra:
  - |H| (1-ε)^m ≤ δ
  - (1-ε)^m ≤ δ / |H|
  - m ≥ log(δ/|H|) / log(1 - ε)
- This result is a lower bound on the number of training examples (m) we need to guarantee PAC for any learning algorithm
- This result is optimistic:
  - For a specific learning algorithm, m could well be larger
  - Does not really take noisy data into account
- This result is pessimistic:
  - Both training and test examples can exhibit high similarity, or even duplication
  - Using a distribution-dependent PAC model gets complicated very quickly