Deterministic Exploration

Maintain a count of visits for each action in each state
Pick a target for the number of visits
If all actions have met the target, pick the action with the highest Q-value
Otherwise, pick the action with the lowest count