Due: 2:00pm, Thursday, September 24. Value: 30 pts. Submit to Sauron.
A detective arrives on a crime scene and locates a DNA sample thought to be from the culprit. The detective collects DNA from several suspects, sends it off to a lab, which determines which suspect's DNA (if any) matches the DNA found at the crime scene. In this assignment we'll write a program implementing a crude approximation for measuring the similarity of the DNA sequences. (The same program might also be useful in determining how species are related.)
DNA is composed of a sequence of four nucleotides: adenine, cytosine, guanine, and thymine, commonly abbreviated as A, C, G, and T. In our crude similarity approximation, we'll simply count the number of positions where the sequences match. As an example, consider the below two sequences.
GTGAAGTCCG GGGTGCAACC
Our measure of similarity would be 3, since they have three nucleotides in common: the first (G for both), the third (G), and the next-to-last (C).
Your assignment is to create a program for which the user first enters the DNA sequence from the crime scene then the DNA sequence for each suspect. As each suspect's DNA is read, the program should display how many nucleotides it has in the same position as in the crime scene DNA. The following illustrates how your program should interact with the user, with user input shown in green boldface.
Crime scene: GTGAAGTCCG Your output should match this example exactly. Suspect 1: GGGTGCAACC Shares 3 nucleotides This is the example shown above. Suspect 2: CCACGACCGC Shares 1 nucleotide Note that nucleotide is in the singular. Suspect 3: GTCACGACAG Shares 6 nucleotides Suspect 4: GAGCCGACCA Shares 5 nucleotides Suspect 5: GCGACACCCA Shares 5 nucleotides Suspect 6: User enters empty sequence to terminate program.
Your program may assume that all strings have the same length (though the shared length may not be 10!), and that the only characters in each string are A, C, G, and T. That means that you don't need to worry about your program verifying that the inputs are correct.
Note: There are two major gotchas
on this
assignment: First, if two sequences share just one nucleotide,
the output should use the word nucleotide rather than
the plural nucleotides. And second, the program should
truly stop when the user presses Enter without entering a
sequence.
Suggestion: Build your program in two steps.
First, write a program that reads in the crime scene DNA and the DNA for just one suspect, then displays how many nucleotides they have in common. Before you go on, make sure this part works. Make sure you test the program when the two DNA sequences share just one nucleotide, where the singular nucleotide should be used.
Once you have that working, modify the program so that it deals with multiple suspects. This will largely be a matter of wrapping most of your step-1 code into a loop. Make sure your program indeed stops when the user enters an empty sequence, as illustrated above.