Exploration vs. Exploitation

Exploitation
- From a given state s, select the action a that maximizes Q(s,a)
Exploration
- For Q-learning to correctly estimate the rewards for each state/action pair, it must visit every state/action pair multiple times
- Consequently, we cannot exploit our knowledge on every move