CSCI 335 - Programming Project #6

Fall 2011

Due: Tuesday, October 11, beginning of class

You may continue to work in teams of two for this assignment.

Download a modified drawing program. I recommend putting the files in a different project from the previous assignments, as they are similar but not the same.

Complete the implementation of handwriting.kohonen.SelfOrgMap.java. It should implement a Kohonen Self-Organizing Map.

Implement the static method SOMClassifier.inputsFor(). This method will encode Drawing objects as inputs to a SelfOrgMap object.

Implement a class that extends the abstract class SOMClassifier, overriding all of the abstract methods. To do this, you will need to develop an algorithm that generates a label from the output of a self-organizing map. The self-orgainizing map should be trained by your constructor.

Try out your implementation as follows:

Modify the SomCreator inner class in DrawingEditor to create an instance of your SOMClassifier extension.
Use the Self-Organizing Map menu as follows:
- Create creates a new map.
- Apply classifies the currently visible drawing.
- View inverter creates a window containing a square for every output node in the current map. Run the mouse over the window to see the inputs that each map output node expects.
- View outputs creates (or refreshes) a window that shows the intensity of activation for every output node, relative to the currently visible drawing.
Test the performance of your classifier using the same data sets you created in Homework 5. Try the following variations:
- Number of iterations through the training set
  - Be sure to try one and two iterations first, as a baseline.
  - Then try at least three larger numbers of iterations to determine the relationship between iterations and performance.
- Number of output nodes
  - Try at least three different numbers of output nodes.
  - Again, start with something that may seem ridiculously small as a baseline, before trying larger values.
- Whether the training set is randomized
  - Try running the training set on each iteration with each labeled group running consecutively.
  - Then, try running the training set in a completely randomized sequence, mixing up the groups.

Develop a second algorithm for determining a classification from the output of the self-organizing map. Ideally, this will incorporate lessons you have learned from your first set of experiments. Run all of your previous tests from the first set of experiments using this classifier.

Choose ONE of the following variations for an additional alternative. You may choose both for extra credit:

Implement a second self-organizing map that, when updating the weights during training, reduces the weight update in proportion to the distance from the output node being modified to the winning output node. Use the Gaussian for the neighborhood function.
Implement dilation by fixing the method Drawing.makeDilated() to return a new Drawing object that is a dilated version of the method invocation target. Re-run all of your experiments to use dilated training and test sets. (You can use DilatedSampleData.parseDataFrom() to create these sets.)

Helpful hints:

Begin by determining an ideal number of iterations:
- With your first classifier, run the 8-character set with five different numbers of iterations
- Select one of these values for the remaining experiments
Next, test randomization:
- Again using your first classifier and the 8-character set, test the non-randomized vs. randomized sequencing of the examples
- Select either randomized or non-randomized for your remaining experiments
Next, test data set sizes and output map sizes:
- For each of your data sets (2-8 characters):
  - Test your first classifier with at least three different output map sizes
- Analyze the following in this context:
  - What impact does varying the character set size affect performance?
  - What impact does varying the output map size affect performance?
  - Is there any relationship between these effects? (For example, is the small output map adequate for a small character set but problematic for a large set?)
Next, re-run the previous experiment set with your second classifier
- That is, test data set sizes and output map sizes
- Select one number of iterations
- Select whether or not to randomize
Finally, check the dilation or Gaussian:
- Select one output map size
- Try both classifiers
- Try all data sets

Write a report in which you analyze the following:

In qualitative terms, what is a typical map layout like?
How does altering the number of iterations and output nodes affect performance?
Does randomizing the input ordering affect performance?
For each of your two classification algorithms:
- How does classification performance compare with a multi-layer perceptron?
Which of your two classification algorithms is superior? Why?
Does image dilation have the predicted benefit of improved classifier performance?

Submit using Sauron