Exploration vs. Exploitation 
-  Exploitation
  
  -  From a given state s, select the action a that maximizes Q(s,a)
  
 
-  Exploration
  
  -  For Q-learning to correctly estimate the rewards for each state/action pair, it must visit every state/action pair multiple times
  
-  Consequently, we cannot exploit our knowledge on every move
  
 
(next)