Modifying Dynamic Programming
- Alternative transitions define a probability distribution
- Weighted state values
- Value(s) = Discount * sum(ps' * Value(Policy(s'))) + Reward(s)
This works, but...
- Still requires a perfect state model
- Requires knowledge of each distribution
(next)