Learning State Values
- Assume perfect state-action model
- Assume fixed policy
- Given a policy and a state, select an action
- Action should maximize "value"
- This is how a policy is assessed
- How do we determine this? The Reward Function!
- Resuming our example
- Reward for not bumping: 1
- Reward for bumping: 0
(next)