CSci 280: Algorithms and problem-solving paradigms
Home Syllabus Classwork

Assignment 1: Protein folding

Due Friday, January 30, at 4pm. Worth 30 points. Submit by uploading ProteinSearch.java and your ASCII text document to Sauron. [Instructions] You may choose to work with a classmate on this assignment; if you do so, you should submit a single solution together.

Protein folding is a heavily studied question in biology: Given a chain of amino acids, how will the protein tend to arrange itself physically? The physical arrangement of the protein has important consequences for what function the protein can serve in a biological system.

The problem is quite complex and very difficult to compute. Some researchers instead examine to a highly simplified model called the HP model. Here, we imagine that each amino acid appears on a grid, with adjacent amino acids at adjacent grid points. (This simplification is completely unrealistic, but the problem is worth researching even with this massive simplification.) We classify each amino acid into just two categories — amino acid residues that are hydrophobic and others that are polar. The hydrophobic residues are non-polar and so despise being next to polar acids and water molecules. Thus, in a water-based solution, the protein will prefer an arrangement with hydrophobic residues being next to each other. In the HP model, then, we ask: How can we place the protein chain onto a grid so as to maximize the number of adjacent pairs of hydrophobic residues?

As an example, let's consider the protein chain HPPHPPPHPPH. Two possible ways of arranging this protein on a grid are illustrated below.

    
score: 3 score: 2

The first listed has a score of three, since there are three pairs of adjacent hydrophobic residues (each diagrammed with a gold-dashed line), whereas the second has a score of two. The first arrangement would therefore be preferred. In fact, there is no arrangement with a score better than three, so the first arrangement is optimal.

Finding the optimal protein arrangement — even using the much-simplified HP model on two-dimensional grid — has been proven to be NP-complete. But the problem is important, so researchers have done much work with trying to find faster algorithms for computing the best arrangement.

For this assignment, I have produced a program that uses exhaustive search to compute and display the optimal arrangement under the HP model. This code consists of two files.

Protein Defines a class for representing a single protein-folding solution, including code to compute its score and to display it. Also, the class includes the main method. You should not modify this class.
ProteinSearch All search code is contained in this class.

Your assignment is twofold.