Deterministic Exploration
- Maintain a count of visits for each action in each state
- Pick a target for the number of visits
- If all actions have met the target, pick the action with the highest
Q-value
- Otherwise, pick the action with the lowest count
(next)