Assessing TD-Learning
- Emulate results of dynamic programming with no state model
- Must make multiple visits to every state, admittedly
- On-line vs. Off-line learning
- A TD-learner can be embedded in an environment
- A dynamic programming learner cannot
- Where is the Markov chain?
- How do we learn a policy?
(next)