Dynamic Programming
- For each state:
- Calculuate value of executing policy for one time step
- Given the reward after
n time steps, compute reward for
n + 1 time steps
- Naive update:
- For each state
s:
- Value(s) = Value(Policy(s)) + Reward(s)
- What could possibly go wrong?
(next)