Random Exploration
Let ε be the probability of exploration
Start ε at a high value
Decrease ε as time progresses
If a random value is less than ε
Pick an action at random
May or may not be weighted by q-values
(next)