CSCI 335 - Programming Project #9

Fall 2011

Due: Tuesday, November 15, beginning of class

Apply Q-Learning to automatically learn the six behaviors for which you implemented controllers as part of the previous assignment
- Object avoidance
- Directed motion
- Wandering
- Wall-following
- Pursuit
- Light-finding
For each task:
- Use any combination of sensors you would like; feel free to wire the sensors differently for each task.
- Determine how the sensors will be encoded into states. It is a trade-off between precision in representing the state space with learning time and memory consumption.
- Define the reward function in terms of sensor values. This reward represents a quantitative measurement of the robot's performance at the current time-step.
- Run your original solution, but keep track of the reward it accumulates and the number of times the reward total is updated. From this you can compute the average reward per step. Run it at least three times.
- Develop a program that moves completely at random. It should accumulate rewards according to your reward function for this task. Again run it three times and compute the average reward per step.
- Run your Q-learning solution at least three times, again recording the total reward accumulated, the total steps run, and the average reward per step.
- If the Q-learning solution fails to learn the target behavior to a satisfactory degree, experiment with the reward function and state encodings to produce a superior learner.
Before running the full suite of tasks, calibrate your parameters using one of the tasks. You will need to determine ideal values for:
- The discount rate γ
- The annealing schedule for α and ε
- The amount of time for the robot to run before stopping
- Experiment with several values for each of these parameters using your first task until you reach a consensus as to appropriate values. You will need to document your reasoning in your report.
Write a report detailing your results. Pay particular attention to discussing the degree to which Q-learning was successful in learning the behaviors you had previously implemented. Also discuss evidence indicating that the learned behavior somehow represents an improvement on random action selection.
Some pertinent examples of Lua code
Submit using Sauron