CSCI 235 - Intelligent Robotics

Spring 2017

Project #4: Q-Learning

Learning Actions

For this assignment, we will use a library that implements the Q-Learning algorithm. To use Q-Learning, we specify the following:

Sensors and condition flags, just as we have done with mode selection.
A reward function that returns a floating-point value corresponding to a reward gained for achieving a particular condition.
A half-life that determines how aggressively the algorithm updates its learned behavior.
The target number of visits to a state-action combination before its learned values are exploited.
The discount rate (between 0 and 1) that controls the importance of future rewards.
The actions that are available to the robot.

Library

The library files are in modeselection.zip. This includes all of the library files from the previous assignments.

Sample Program

The sample program is in proj4.zip.

LearnAvoid.java

Assignment

Implement three programs that use Q-Learning. The first program should learn the following behavior:

When no obstacles are present, drive forward.
When obstacles are present, turn away from them.

The other two programs can learn any behavior you would like. At least one of them should make use of color vision. I strongly encourage you to try to duplicate behaviors that you implemented in previous projects.

Questions

For each program, devise a metric for its performance on its task. How well did each program perform? Feel free to experiment with different parameter settings to optimize performance. Be sure to discuss the most useful parameter settings in your report and presentation.
How much effort is it to configure Q-Learning in comparison to manually defining the mode transitions?
Overall, what did you find useful or interesting about Q-Learning?