CSCI 335 - Artificial Intelligence
Fall 2017
Programming Project #8: Handwriting Recognition with Decision Trees and Random Forests
Overview
You will implement the Decision Tree learning algorithm for the task of recognizing handwritten characters.
Programming Assignment
Download decisiontree.zip. Implement the following. Each implementation will extend the
RecognizerAI class.
- Decision Tree Learning
- Random forests
- An ensemble of n decision trees.
- Each tree is trained with a random subsample of the training inputs.
- If there are p pixels in a drawing, each data split considers only sqrt(p) of those
pixels, again selected at random. (Note that in this case, we can use the width or height of
the drawing.)
- Provided files (* for files you must modify):
Assessing performance
- Build a decision tree using the first file as the training set. Test
its performance using the second file. Then build a new tree using the second
file for training and the first file for testing. How well does each tree perform on its testing set?
- Once you can build a tree that distinguishes two letters, expand your
training and test sets to train it to distinguish three letters.
Continue iterating this process until you can build a tree that can
distinguish at least eight different letters.
- Feel free to perform additional experiments to clarify any issues that may arise.
Visualization
A visualization has been provided for you, for both the regular Decision Tree and the Random Forest. Be sure
to employ the visualizations in your analysis.
Paper
When you are finished with your experiments, write a paper summarizing your findings. Include the following:
- An analysis and discussion of your data. (Be sure to include the data as well.)
- An analysis and discussion of the visualizations of the learned functions.
You are strongly encouraged to include images of the visualizations in support of your analysis.
- How well does decision tree learning perform for this task?
- Are random forests worth the trouble? Why or why not?
Deadlines
- Progress report: Thursday, November 9:
- Discuss initial results using a single decision tree and two letters.
- Project complete: Tuesday, November 14:
- Discuss results with up to 8 letters, for both a single tree and a random forest.