24 March 2011

Topics in Cryptanalysis: Keyword Cipher

Recently, someone hit my blog looking for an encrypted selection of text. They were probably doing their homework and needed the answer. The text is as follows:

Eve intercepted another message this time from Bob to Alice. She knows they are still using a keyword cipher but they have once again changed their keyword. Use what you know about their messages and any CAP tools you need to find the keyword and the plaintext.

tmhjf pasal tmltd abfrs tmbtd akmlt dfltd ahbta
rlmml emlmt oahbr hfehb tarhj javaf smljy dukhl
sdaph llmtp mltfl uatmo asmju piyfh ksura wawfj
jbfle hiayt dhtsd aphll mtora himrh ttdav aryja
hstwa wfjjp mkaun wftdt danar baptp fndar systa
khlel avard hvatm wmrry homut darhc hflom o


This actually comes from one of the early chapters from Richard Spillman's book Classical and Contemporary Cryptology. I actually liked that book, but unfortunately it has been out of print since 2004.

But here are the steps I went through to solve the problem.

Step 1: Knowing the cipher.
We know that Eve intercepted an encrypted message from Bob to Alice. We also know that the message is encrypted using a keyword cipher. This is a standard monoalphabetic encryption system. The ciphertext is transposed against the plaintext using a chosen keyword starting at a chosen postion, then fill in the remaining unused letters.

For this problem, we will use lowercase letters as the ciphertext and the uppercase letters as the plaintext.

Step 2: Create the inital cipher
Write out the plaintext letters from A to Z. Fill in the cipher text letters beneath it as they are determined.

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z


Step 3: Frequency Analysis
Despite the fact that the letters have all been substituted and the word structure has been broken by the 5-letter word groups, the frequency of the letters of a monoalphabetic cipher will not change. Also, the most common letter used in text is the letter "E" with a frequency of about 12.77%, followed by "T" with about 8.55%, "O" at about 8.07%, "A" at about 7.78% and "N" with roughly 6.86%. These five letters are the high frequency letters. Also note that these values are approximates. These will vary from one group of text to the next, and it is possible for a non-high frequency letter to be in the top 5.

In our block of text, the letter "a" is the highest at 13.56%, "t" at 10.59%, "h" at 8.47%, "m" at 8.47% and "l" at 8.05%. So, I will make the assumption that a = E and t = T

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
_ _ _ _ a _ _ _ _ _ _ _ _ _ _ _ _ _ _ t _ _ _ _ _ _

T____ _E_E_ T__T_ E____ T__T_ E___T ___T_ E__TE
tmhjf pasal tmltd abfrs tmbtd akmlt dfltd ahbta

_____ ____T _E___ _____ TE___ _E_E_ _____ _____
rlmml emlmt oahbr hfehb tarhj javaf smljy dukhl

__E__ ___T_ __T__ _ET__ E____ _____ ____E _E___
sdaph llmtp mltfl uatmo asmju piyfh ksura wawfj

_____ __E_T __T__ E____ _T__E _____ TT_E_ E___E
jbfle hiayt dhtsd aphll mtora himrh ttdav aryja

__T_E _____ __E__ __T_T _E_E_ _E_T_ ___E_ ___TE
hstwa wfjjp mkaun wftdt danar baptp fndar systa

_____ E_E__ __ET_ _____ ____T _E___ _____ _
khlel avard hvatm wmrry homut darhc hflom o


Step 4: Common trigram.
The word THE is a very common trigram. In looking for spaces that currently have the values T_E, there are a few results.
tda - 4 occurrences
toa - 1 occurrence
twa - 1 occurrence.
So, I will assume that tda = THE... therefore d = H

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
_ _ _ _ a _ _ d _ _ _ _ _ _ _ _ _ _ _ t _ _ _ _ _ _

T____ _E_E_ T__TH E____ T__TH E___T H__TH E__TE
tmhjf pasal tmltd abfrs tmbtd akmlt dfltd ahbta

_____ ____T _E___ _____ TE___ _E_E_ _____ H____
rlmml emlmt oahbr hfehb tarhj javaf smljy dukhl

_HE__ ___T_ __T__ _ET__ E____ _____ ____E _E___
sdaph llmtp mltfl uatmo asmju piyfh ksura wawfj

_____ __E_T H_T_H E____ _T__E _____ TTHE_ E___E
jbfle hiayt dhtsd aphll mtora himrh ttdav aryja

__T_E _____ __E__ __THT HE_E_ _E_T_ __HE_ ___TE
hstwa wfjjp mkaun wftdt danar baptp fndar systa

_____ E_E_H __ET_ _____ ____T HE___ _____ _
khlel avard hvatm wmrry homut darhc hflom o


Step 5: Remaining high frequency letters
We still have the letters of "h", "m" and "l" most likely corresponding to "O", "A" and "N" in no set order. Here I used a little deductive reasoning and knowledge of how the English language is structured to help figure out how I should set the next letters.
Looking through the encrypted text, we see that a "m" is corresponding to the 2nd letter of the encrypted text. The first plaintext letter is "T". This eliminates "N" since this is impossible. Also note that there are there is an occurrence of "mm" in the encrypted text. There is very little chance of it being "AA" since the letter "A" rarely follows itself. So, I will assume that m = O
Now, we still have "h" and "l" to map to "A" and "N". There are a lot of "ll" occurrences, so this should eliminate "A" as a choice leaving "N". So l = N. This leaves h = A.

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
h _ _ _ a _ _ d _ _ _ _ _ l m _ _ _ _ t _ _ _ _ _ _

TOA__ _E_EN TONTH E____ TO_TH E_ONT H_NTH EA_TE
tmhjf pasal tmltd abfrs tmbtd akmlt dfltd ahbta

_NOON _ONOT _EA__ A__A_ TE_A_ _E_E_ _ON__ H__AN
rlmml emlmt oahbr hfehb tarhj javaf smljy dukhl

_HE_A NNOT_ ONT_N _ETO_ E_O__ ____A ____E _E___
sdaph llmtp mltfl uatmo asmju piyfh ksura wawfj

___N_ A_E_T HAT_H E_ANN OT__E A_O_A TTHE_ E___E
jbfle hiayt dhtsd aphll mtora himrh ttdav aryja

A_T_E _____ O_E__ __THT HE_E_ _E_T_ __HE_ ___TE
hstwa wfjjp mkaun wftdt danar baptp fndar systa

_AN_N E_E_H A_ETO _O___ A_O_T HE_A_ A_N_O _
khlel avard hvatm wmrry homut darhc hflom o


Step 6: Keyword Cipher exploit
We know when we use the keyword cipher, we pick a keyword and a starting location. We spell out the keyword using the first occurrence of each letter. Once the keyword is filled in, we follow it by the remaining letters in order until each plaintext letter has a ciphertext letter mapped to it.
If we look at our plaintext<->ciphertext mapping, we see that E F G H is corresponding to a _ _ d. So, we can probably assume that F and G will map to b and c respectively.

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
h _ _ _ a b c d _ _ _ _ _ l m _ _ _ _ t _ _ _ _ _ _

TOA__ _E_EN TONTH EF___ TOFTH E_ONT H_NTH EAFTE
tmhjf pasal tmltd abfrs tmbtd akmlt dfltd ahbta

_NOON _ONOT _EAF_ A__AF TE_A_ _E_E_ _ON__ H__AN
rlmml emlmt oahbr hfehb tarhj javaf smljy dukhl

_HE_A NNOT_ ONT_N _ETO_ E_O__ ____A ____E _E___
sdaph llmtp mltfl uatmo asmju piyfh ksura wawfj

_F_N_ A_E_T HAT_H E_ANN OT__E A_O_A TTHE_ E___E
jbfle hiayt dhtsd aphll mtora himrh ttdav aryja

A_T_E _____ O_E__ __THT HE_E_ FE_T_ __HE_ ___TE
hstwa wfjjp mkaun wftdt danar baptp fndar systa

_AN_N E_E_H A_ETO _O___ A_O_T HE_AG A_N_O _
khlel avard hvatm wmrry homut darhc hflom o


Step 7: Possible word found
In reviewing the text, there is a fairly long word. We see that at position 37, AFTE_NOON is mapping to hbtarlmml. So, I will assume that r = R

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
h _ _ _ a b c d _ _ _ _ _ l m _ _ r _ t _ _ _ _ _ _

TOA__ _E_EN TONTH EF_R TOFTH E_ONT H_NTH EAFTE
tmhjf pasal tmltd abfrs tmbtd akmlt dfltd ahbta

RNOON _ONOT _EAFR A__AF TERA_ _E_E_ _ON__ H__AN
rlmml emlmt oahbr hfehb tarhj javaf smljy dukhl

_HE_A NNOT_ ONT_N _ETO_ E_O__ ____A ___RE _E___
sdaph llmtp mltfl uatmo asmju piyfh ksura wawfj

_F_N_ A_E_T HAT_H E_ANN OT_RE A_ORA TTHE_ ER__E
jbfle hiayt dhtsd aphll mtora himrh ttdav aryja

A_T_E _____ O_E__ __THT HE_ER FE_T_ __HER ___TE
hstwa wfjjp mkaun wftdt danar baptp fndar systa

_AN_N E_ERH A_ETO _ORR_ A_O_T HERAG A_N_O _
khlel avard hvatm wmrry homut darhc hflom o

Step 8: Keyword Cipher exploit
We now have another letter open for us to exploit. Notice that R S T corresponds to r _ t. So, s = S.

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
h _ _ _ a b c d _ _ _ _ _ l m _ _ r s t _ _ _ _ _ _

TOA__ _ESEN TONTH EF_RS TOFTH E_ONT H_NTH EAFTE
tmhjf pasal tmltd abfrs tmbtd akmlt dfltd ahbta

RNOON _ONOT _EAFR A__AF TERA_ _E_E_ SON__ H__AN
rlmml emlmt oahbr hfehb tarhj javaf smljy dukhl

SHE_A NNOT_ ONT_N _ETO_ ESO__ ____A _S_RE _E___
sdaph llmtp mltfl uatmo asmju piyfh ksura wawfj

_F_N_ A_E_T HATSH E_ANN OT_RE A_ORA TTHE_ ER__E
jbfle hiayt dhtsd aphll mtora himrh ttdav aryja

AST_E _____ O_E__ __THT HE_ER FE_T_ __HER S_STE
hstwa wfjjp mkaun wftdt danar baptp fndar systa

_AN_N E_ERH A_ETO _ORR_ A_O_T HERAG A_N_O _
khlel avard hvatm wmrry homut darhc hflom o

Step 9: Word found
Reading what we have so far, we see that atarting at position 8, there is SENTONTHEF_RST. I am going to assume that the blank letter is "I". So, f = I.

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
h _ _ _ a b c d f _ _ _ _ l m _ _ r s t _ _ _ _ _ _

TOA_I _ESEN TONTH EFIRS TOFTH E_ONT HINTH EAFTE
tmhjf pasal tmltd abfrs tmbtd akmlt dfltd ahbta

RNOON _ONOT _EAFR AI_AF TERA_ _E_EI SON__ H__AN
rlmml emlmt oahbr hfehb tarhj javaf smljy dukhl

SHE_A NNOT_ ONTIN _ETO_ ESO__ ___IA _S_RE _E_I_
sdaph llmtp mltfl uatmo asmju piyfh ksura wawfj

_FIN_ A_E_T HATSH E_ANN OT_RE A_ORA TTHE_ ER__E
jbfle hiayt dhtsd aphll mtora himrh ttdav aryja

AST_E _I___ O_E__ _ITHT HE_ER FE_T_ I_HER S_STE
hstwa wfjjp mkaun wftdt danar baptp fndar systa

_AN_N E_ERH A_ETO _ORR_ A_O_T HERAG AIN_O _
khlel avard hvatm wmrry homut darhc hflom o

Step 10: Word found
Starting at position 22, we have OFTHE_ONTH, I am assuming the blank is the letter M. So, k = M.

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
h _ _ _ a b c d f _ _ _ k l m _ _ r s t _ _ _ _ _ _

TOA_I _ESEN TONTH EFIRS TOFTH EMONT HINTH EAFTE
tmhjf pasal tmltd abfrs tmbtd akmlt dfltd ahbta

RNOON _ONOT _EAFR AI_AF TERA_ _E_EI SON__ H_MAN
rlmml emlmt oahbr hfehb tarhj javaf smljy dukhl

SHE_A NNOT_ ONTIN _ETO_ ESO__ ___IA MS_RE _E_I_
sdaph llmtp mltfl uatmo asmju piyfh ksura wawfj

_FIN_ A_E_T HATSH E_ANN OT_RE A_ORA TTHE_ ER__E
jbfle hiayt dhtsd aphll mtora himrh ttdav aryja

AST_E _I___ OME__ _ITHT HE_ER FE_T_ I_HER S_STE
hstwa wfjjp mkaun wftdt danar baptp fndar systa

MAN_N E_ERH A_ETO _ORR_ A_O_T HERAG AIN_O _
khlel avard hvatm wmrry homut darhc hflom o

Step 11: Words found
Starting at Position 46, we have _ONOT_EAFRAI_. I am assuming the letters e = D and o = B.

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
h o _ e a b c d f _ _ _ k l m _ _ r s t _ _ _ _ _ _

TOA_I _ESEN TONTH EFIRS TOFTH EMONT HINTH EAFTE
tmhjf pasal tmltd abfrs tmbtd akmlt dfltd ahbta

RNOON DONOT BEAFR AIDAF TERA_ _E_EI SON__ H_MAN
rlmml emlmt oahbr hfehb tarhj javaf smljy dukhl

SHE_A NNOT_ ONTIN _ETOB ESO__ ___IA MS_RE _E_I_
sdaph llmtp mltfl uatmo asmju piyfh ksura wawfj

_FIND A_E_T HATSH E_ANN OTBRE A_ORA TTHE_ ER__E
jbfle hiayt dhtsd aphll mtora himrh ttdav aryja

AST_E _I___ OME__ _ITHT HE_ER FE_T_ I_HER S_STE
hstwa wfjjp mkaun wftdt danar baptp fndar systa

MANDN E_ERH A_ETO _ORR_ ABO_T HERAG AINBO B
khlel avard hvatm wmrry homut darhc hflom o

Step 12: Keyword Cipher exploit
We see that we have I J K L M corresponding to f _ _ _ k. We know that "h" was already used, so this means the blanks are g i j.

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
h o _ e a b c d f g i j k l m _ _ r s t _ _ _ _ _ _

TOALI _ESEN TONTH EFIRS TOFTH EMONT HINTH EAFTE
tmhjf pasal tmltd abfrs tmbtd akmlt dfltd ahbta

RNOON DONOT BEAFR AIDAF TERAL LE_EI SONL_ H_MAN
rlmml emlmt oahbr hfehb tarhj javaf smljy dukhl

SHE_A NNOT_ ONTIN _ETOB ESOL_ _K_IA MS_RE _E_IL
sdaph llmtp mltfl uatmo asmju piyfh ksura wawfj

LFIND AKE_T HATSH E_ANN OTBRE AKORA TTHE_ ER_LE
jbfle hiayt dhtsd aphll mtora himrh ttdav aryja

AST_E _ILL_ OME__ _ITHT HE_ER FE_T_ I_HER S_STE
hstwa wfjjp mkaun wftdt danar baptp fndar systa

MANDN E_ERH A_ETO _ORR_ ABO_T HERAG AINBO B
khlel avard hvatm wmrry homut darhc hflom o


Step 13: Word found
At position 1, we have TOALI_E. So, I am assuming p = C.

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
h o p e a b c d f g i j k l m _ _ r s t _ _ _ _ _ _

TOALI CESEN TONTH EFIRS TOFTH EMONT HINTH EAFTE
tmhjf pasal tmltd abfrs tmbtd akmlt dfltd ahbta

RNOON DONOT BEAFR AIDAF TERAL LE_EI SONL_ H_MAN
rlmml emlmt oahbr hfehb tarhj javaf smljy dukhl

SHECA NNOTC ONTIN _ETOB ESOL_ CK_IA MS_RE _E_IL
sdaph llmtp mltfl uatmo asmju piyfh ksura wawfj

LFIND AKE_T HATSH ECANN OTBRE AKORA TTHE_ ER_LE
jbfle hiayt dhtsd aphll mtora himrh ttdav aryja

AST_E _ILLC OME__ _ITHT HE_ER FECTC I_HER S_STE
hstwa wfjjp mkaun wftdt danar baptp fndar systa

MANDN E_ERH A_ETO _ORR ABO_T HERAG AINBO B
khlel avard hvatm wmrry homut darhc hflom o

Step 14: Keyword Cipher exploit
We can now figure that P Q will map to n q (since o and p are already mapped) and U V W X Y Z will map to u v w x y z since these are the only letters remaining and the keyword cipher uses the remaining letters sequentially after the keyword is filled in.

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
h o p e a b c d f g i j k l m n q r s t u v w x y z

TOALI CESEN TONTH EFIRS TOFTH EMONT HINTH EAFTE
tmhjf pasal tmltd abfrs tmbtd akmlt dfltd ahbta

RNOON DONOT BEAFR AIDAF TERAL LEVEI SONLY HUMAN
rlmml emlmt oahbr hfehb tarhj javaf smljy dukhl

SHECA NNOTC ONTIN UETOB ESOLU CKYIA MSURE WEWIL
sdaph llmtp mltfl uatmo asmju piyfh ksura wawfj

LFIND AKEYT HATSH ECANN OTBRE AKORA TTHEV ERYLE
jbfle hiayt dhtsd aphll mtora himrh ttdav aryja

ASTWE WILLC OMEUP WITHT HEPER FECTC IPHER SYSTE
hstwa wfjjp mkaun wftdt danar baptp fndar systa

MANDN EVERH AVETO WORRY ABOUT HERAG AINBO B
khlel avard hvatm wmrry homut darhc hflom o


So, our final message after all this work is(punctuation and spacing is assumed):
To Alice, sent on the first of the month in the afternoon. Do not be afraid. After all, Eve is only human. She cannot continue to be so lucky. I am sure we will find a key that she cannot break, or at the very least, we will come up with the perfect cipher system and never have to worry about her again. Bob.

No comments: