CSCI 150 - Lab 5
Mutation is the Word


Overview

In this lab we will make extensive use of sentinel loops (while loops). We will explore mutation of DNA through playing the game Doublets.

Materials

Description

As we study computation and life, we can ask “How did life evolve from single-celled organisms into the diversity of life we see today?” The answer lies in a process called mutation.

Separate from the DNA-RNA-Protein process are two separate processes for cell duplication called mitosis and meiosis. Both processes involve a cell replicating its DNA, and it is within this replication that mutations can occur. Perhaps there was a copying mistake, or a random x-ray damaged part of the DNA; in any case, the DNA code we begin with can undergo small alterations. Each mutation during this replication process could possibly have a large impact of the earlier transcription and translation process, such that we could have a new stop codon within a gene, or we could place a different amino acid into our protein chain. While some changes may be very harmful and lead to genetic diseases, others could provide an unexpected benefit and new functionality. These underlying mutations introduce variety into the gene pool for a population, and allow natural selection and evolution to occur.

While the language of DNA was not understood until 1953 by James Watson and Francis Crick with the help of data from Rosalind Franklin, similar properties of mutation were noticed nearly 100 years earlier by Charles Lutwidge Dodgson (aka Lewis Carroll), who is most famous for his novel “Alice in Wonderland.” In an article for Vanity Fair, Carroll created a puzzle called Doublets, which explored the effects of mutation on English words. His rules description is as follows:

The rules of the Puzzle are simple enough. Two words are proposed, of the same length; and the Puzzle consists in linking these together by interposing other words, each of which shall differ from the next word in one letter only. That is to say, one letter may be changed in one of the given words, then one letter in the word so obtained, and so on, till we arrive at the other given word. The letters must not be interchanged among themselves, but each must keep to its own place. As an example, the word ‘head’ may be changed into ‘tail’ by interposing the words ‘heal, teal, tell, tall’. I call the given words ‘a Doublet’, the interposed words ‘Links’, and the entire series ‘a Chain’, of which I here append an example:

H E A D
h e a l
t e a l
t e l l
t a l l
T A I L

It is, perhaps, needless to state that it is de rigueur that the links should be English words, such as might be used in good society.

(More details on the connection between Lewis Carroll and genetics can be found in an article by David B. Searls entitled From Jabberwocky to Genome: Lewis Carroll and Computational Biology).

Our task for this lab is to create a Python program that lets a user play the Doublets game and enforces all the rules given above by Lewis Carroll.

Step 1 - Develop Pseudocode

Begin by developing an algorithmic solution using pseudocode. This should correspond to the structure of a Python program, but is written in English instead of Python.

For example, pseudocode for a guessing-game could look like this:

Generate a number for the human to guess
As long as the human has not guessed the number
    Request a guess from the human
    If it's too high or too low, say so
Congratulate the human on their insight

The point is that this is structured like Python, using indentation and hinting at loops and conditionals. But the individual pieces are just English descriptions, which you will later translate into precise Python code.

To demonstrate the eventual behavior of the program, here is a sample run of the Doublets game.

What is the starting word? CAT
What is the ending word? DOG
Start   = CAT
Current = CAT
End     = DOG
Which character do you want to change? (the first character is 1) 3
What is your new character? B
Current = CAB
End     = DOG
Which character do you want to change? (the first character is 1) 2
What is your new character? O
Current = COB
End     = DOG
Which character do you want to change? (the first character is 1) 3
What is your new character? G
Current = COG
End     = DOG
Which character do you want to change? (the first character is 1) 1
What is your new character? D
Solution path found in 4 steps.
CAT -> CAB -> COB -> COG -> DOG

A second run of the program is shown below, where the user made many mistakes.

What is the starting word? log
What is the ending word? worm
The lengths are not equal. Please try again.
What is the ending word? sfu
sfu is not a word. Please try again.
What is the ending word? bug
Start   = LOG
Current = LOG
End     = BUG
Which character do you want to change? (the first character is 1) 0
0 is out of range. Please try again.
Which character do you want to change? (the first character is 1) 4
4 is out of range. Please try again.
Which character do you want to change? (the first character is 1) 1
What is your new character? 4
4og is not a valid word
Current = LOG
End     = BUG
Which character do you want to change? (the first character is 1) 1
What is your new character? b
Current = BOG
End     = BUG
Which character do you want to change? (the first character is 1) x
x is not a number. Please try again.
Which character do you want to change? (the first character is 1) 2
What is your new character? uu
uu is more than one character. Please try again.
Which character do you want to change? (the first character is 1) 2
What is your new character? u
Solution path found in 2 steps.
LOG -> BOG -> BUG

Add the pseudocode to your Lab Evaluation.

Step 2 - Implement Incrementally

Download the text file english2.txt and the python module dictionary.py files, and make sure they are in the same folder where you will put your lab. You do not need to copy and paste anything! At the top of your Python program, you should put

import dictionary

which will allow you to use the functions in dictionary.py. The dictionary module contains a function valid_word(word, file), which will return True if the word is found in the file and False otherwise. You can call the function by writing something like

dictionary.valid_word(some_word, 'english2.txt')

Now take your pseduocode description and begin to implement your program in Python. Your program should be named doublets.py. Write your code in small sections and test incrementally. Remember that your program should be structured entirely using functions. Your program should look something like this:

  # Input: ...
  # Output: ...
  # Description
  def fun1(x,y,z):
    ...

  ... More functions here with appropriate comments ...

  def main():
    ...

  main()

I recommend starting by getting things to work first without using functions, and then incrementally abstracting parts of your program into functions until the whole program is appropriately decomposed.

Step 3 - Test, Debug and Maintain

With incremental implementation, Steps 3 and 4 should go hand in hand. Never write more than 5-10 lines of code before testing your changes. First make your program work with perfect input, and then expand it to work with erroneous user input.

Once your program is working, try out some of the transformations below

Future Thoughts

Your doublets.py program lets the user chose what character to change at each step, and then verifies that each new word is a valid word. Can you think of a way to always find the minimum number of steps to transform one word to another? How will you know if a path between two words can be found? Include your answers to these questions in the Evaluation Document.

What to Hand In

Make sure you have followed the Python Style Guide, and have run your project through the Automated Style Checker.

You must hand in:

(Yes, please turn in dictionary.py and english2.txt even though you did not create them—it makes the grading process much easier!)

Grading


© Mark Goadrich, Hendrix College