CSCI 235 - Intelligent Robotics
Spring 2017
Project #4: Q-Learning
Learning Actions
For this assignment, we will use a library that implements the Q-Learning
algorithm. To use Q-Learning, we specify the following:
- Sensors and condition flags, just as we have done with mode selection.
- A reward function that returns a floating-point value corresponding to
a reward gained for achieving a particular condition.
- A half-life that determines how aggressively the algorithm updates its
learned behavior.
- The target number of visits to a state-action combination before its
learned values are exploited.
- The discount rate (between 0 and 1) that controls the importance of
future rewards.
- The actions that are available to the robot.
Library
The library files are in modeselection.zip. This includes all of the library files from the previous assignments.
Sample Program
The sample program is in proj4.zip.
Assignment
Implement three programs that use Q-Learning. The first program should learn
the following behavior:
- When no obstacles are present, drive forward.
- When obstacles are present, turn away from them.
The other two programs can learn any behavior you would like. At least one
of them should make use of color vision. I strongly encourage you to try to
duplicate behaviors that you implemented in previous projects.
Questions
- For each program, devise a metric for its performance on its task. How
well did each program perform? Feel free to experiment with different
parameter settings to optimize performance. Be sure to discuss the most useful
parameter settings in your report and presentation.
- How much effort is it to configure Q-Learning in comparison to manually
defining the mode transitions?
- Overall, what did you find useful or interesting about Q-Learning?