CSCI 150 - Lab 8 - Caesar's Secrets

CSCI 150 - Lab 8
Caesar's Secrets

Overview

In this lab we will be using for loops and functions to encrypt and decrypt secret messages with a Caesar cipher.

Materials

Chapter 1 of Alice's Adventures In Wonderland

Description

Step 1

One of the first popular methods for encoding information is attributed to Julius Caesar. His Caesar Cipher was used to encrypt messages to his generals in the field. While not a very secure method these days, few people were literate and fewer would have known how to attempt any cryptanalysis of his seemingly meaningless messages.

To perform encryption with a Caesar cipher, you first choose a number between 1 and 25, known as the rotation. Then for each letter in your plain-text message, replace it with the character that is found by adding the rotation to the position of the character. For instance, for a rotation of 4, the letter F would be translated into J, since F is the 6th letter in the alphabet and J is the 10th. If the position of the character plus the rotation is larger than 26 (the last letter Z), treat the alphabet as a loop and wrap around to the front. Decryption is performed in a similar manner, but instead the rotation is subtracted from the character to return us to the original plain-text message. This encryption process is illustrated below (thanks to Wikipedia); decryption follows the arrows in reverse.

Open a new python program called caesar.py. In this file, you will first write two functions for encryption and decryption of strings.

encrypt(plain_text: str, rot: int) -> str will take a string and an integer rotation between 0 and 25, (where 1 means A -> B, 2 means A -> C, etc) and return the rotated string.

decrypt(cipher_text: str, rot: int) -> str will also take a string and an integer rotation between 0 and 25 signifying the number of rotations backwards (where 1 means B -> A, 2 mean C -> A, etc), and return the reverse-rotated string.

You can test your functions using string="GREEN" and rot=13; your answer should be "TERRA". Also, any string encrypted with rot should decrypt back with the same rot, such that

decrypt(encrypt("TESTING", i), i)

should return "TESTING" for all feasible values of i.

Note: any characters which are not letters (punctuation, spaces, etc.) should be unchanged by encrypt and decrypt. For example, encrypt("HELLO, THERE!", 13) should yield "URYYB, GURER!".

Hint: make a string containing the letters of the alphabet, "ABCDE...Z". (You can also access this string as string.ascii_uppercase if you add import string to your file.) Now use the string .index(letter) method and Python's mod operator (%) to implement encrypt. Alternatively, you can use the built-in ord and chr functions in place of string.ascii_uppercase.index(...).

Hint 2: once you get encrypt working, there should be a very short (1 line) way to implement decrypt.

Step 2

When faced with an encrypted text, we will not know what rotation was used for the encryption. But since there are only 26 possible alignments (actually 25 since we know that our text is encrypted), we can iterate through them all in a reasonable amount of time.

Write a brute force function brute_decrypt(s: str) to print out all possible Caesar rotations for a given string. Use this function to decrypt the following lines of poetry (note: all four lines are encrypted with the same rotation).

WZDV EULOOLJ DQG WKH VOLWKB WRYHV
GLG JBUH DQG JLPEOH LQ WKH ZDEH
DOO PLPVB ZHUH WKH ERURJRYHV
DQG WKH PRPH UDWKV RXWJUDEH

Record the correct decryption and the rotation which was used in a Lab Evaluation.

Step 3

The above brute-force method requires there to be a human to examine each of the 25 rotations and find the correct decryption. If we wish to automatically decrypt any Caesar-encrypted string, we must learn a bit about the distribution of characters in the English language.

Our first tool for analysis will examine the frequency of each letter in a standard text. While there are many different words and near infinite ways to combine them into sentences, there are regular patterns that we can observe across all words. The figure below (from Wikipedia) shows the relative frequency of each character, derived from counting the frequency of each character in a base text and then dividing by the total number of characters.

This information has been used to simplify early Linotype machines, which ordered the characters to be typed by their frequency (leading to the sometimes cryptic inclusion of etaoin shrdlu in newspaper articles).

Write a function in caesar.py called frequency_count(s: str) -> List[float]. This will return a list of relative frequencies for all letters A-Z found in the string. Create a test string where you know the frequency of each character and make sure your function returns the correct list of relative frequencies.

For example, frequency_count("apple") should return

[0.2, 0, 0, 0, 0.2, 0, 0, 0, 0, 0, 0, 0.2, 0, 0, 0, 0.4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

An intermediate step to write this function is to first find a way to return a list of the raw counts for each letter. Then add in code to calculate the relative frequency for each of these letters.

Step 4

As we are encoding and decrypting strings with a Caesar cipher, we can notice that the relative frequency for an encrypted message will tend to follow (if the string is long enough) the same pattern as the above standard, but it will be rotated as well to match the rotation of the encryption.

With 25 possible rotations, we can now examine the relative frequency count from each one and compare them to the base frequencies to find the best match. For this, we need one last tool, the Chi-square statistic, shown below.

Given two frequency distributions (O for Observed, E for Expected), the Chi-square statistic will tell us how good of a fit there is between these two distributions. Every observed frequency is compared to the expected frequency and squared, then divided by the expected frequency. These are summed to return the Chi-square statistic.

For our purposes, we do not need to compare this statistic to a table to see the probability of the match, we only need to find the rotation that produces the minimum Chi-square statistic (note that a perfect match will result in a statistic of 0, and any differences will result in a positive increase).

Write a function chi_square(observed_freq: List[float], expected_freq: List[float]) -> float in your caesar.py to return the Chi-square statistic given the two frequency distributions. This function should work as long as the two frequency distributions are the same length, and also not be dependent on our specific use where the frequencies will be of length 26. For example,

chi_square([0.3, 0.1, 0.6], [0.1, 0.2, 0.7])

should return 0.4642857142857142.

Step 5

Using your functions from the previous steps, write a new function auto_decrypt(s: str, base_text: str) which takes an encrypted string and the file name of a base reference text. Your function will calculate the character frequencies for the encrypted string and the reference text, then iteratively calculate the Chi-square statistic for each possible reverse rotation. Your function should print out

the most likely rotation (minimum Chi-square statistic),
the decrypted message, and
the chi-squared statistic for that rotation.

Note: to read in the contents of a file as a string, you can write something like the following:

f = open(file_name)
file_contents = f.read()
f.close()   # polite but not really necessary

If file_name is a string containing the name of a file (for example, "alice-chapter-one.txt"), then after running the above code, file_contents will be a string containing the entire contents of the file.

Use Chapter 1 of Alice's Adventures In Wonderland as the base text to decrypt the three messages below.

V SBE BAR JRYPBZR BHE ARJ VAFRPG BIREYBEQF XRAG OEBPXZNA

P DVBSK YHAOLY IL MPYZA PU H ZTHSS CPSSHNL PU NHBS AOHU ZLJVUK PU JVTTHUK PU YVTL QBSPBZ JHLZHY

NDIXZ DI OCZ GJIB MPI ZQZMT KGVIZOVMT XDQDGDUVODJI RDGG WZ ZIYVIBZMZY WT DHKVXON AMJH NKVXZ
ZQZMT NPMQDQDIB XDQDGDUVODJI DN JWGDBZY OJ WZXJHZ NKVXZAVMDIB IJO WZXVPNZ JA ZSKGJMVOJMT JM
MJHVIODX UZVG WPO AJM OCZ HJNO KMVXODXVG MZVNJI DHVBDIVWGZ NOVTDIB VGDQZ DA JPM GJIB-OZMH
NPMQDQVG DN VO NOVFZ RZ CVQZ V WVNDX MZNKJINDWDGDOT OJ JPM NKZXDZN OJ QZIOPMZ OJ JOCZM RJMGYN
XVMG NVBVI

Report your results in Lab Evaluation.

What to Hand In

Hand in your lab through the usual submission form. Make sure you have followed the Python Style Guide, and have run your project through the Automated Style Checker.

You must hand in:

caesar.py
Lab Evaluation

Grading

To earn a D on this lab, complete Step 1 and 2.
To earn a C on this lab, do the above and complete Step 3.
To earn a B on this lab, do the above and complete Step 4.
To earn a A on this lab, do the above and complete Steps 5.
To earn a 100 on this lab, do the above and complete the lab evaluation.

CSCI 150 - Lab 8Caesar's Secrets