TransWikia.com

Cipher with uncommon letter distribution

Puzzling Asked by Skippy Doodle on August 17, 2021

I have a cipher that I’m trying to decode. It’s quite lengthy, so I started trying do do a simple substitution. I checked the frequency and noticed that every letter in the alphabet was used – and multiple times.
B 6.89%
X 5.47%
L 5.47%
U 5.28%
Z 5.09%
O 4.91%
N 4.81%
R 4.15%
E 3.96%
G 3.96%
Q 3.96%
Y 3.87%
F 3.77%
J 3.77%
K 3.68%
V 3.58%
A 3.58%
H 3.58%
D 3.49%
M 2.83%
C 2.83%
S 2.55%
W 2.26%
T 2.26%
P 2.17%
I 1.79%

I don’t think the letters Z, Q and X would be used as much as the least frequent letter though. I noticed that there are a lot of duplicated "words". So, my next idea was to find a keyword to use. The only problem is that the letters that aren’t frequently used (X, Z, Q) are some of the most frequent letters. I can’t imagine what keyword would contain those letters.

I’ve been searching for different ways to decode this but have come up short so far. I’m not looking for an answer, I stubbornly want to do this on my own. I’m looking for some ideas, or guidance from someone who is far more experienced. Is there some sort of checklist to go down to figure out how I need to proceed?

One Answer

One possibility is that this is a Vigenere cipher. A Vigenere cipher is in effect several interleaved Caesar ciphers, so your letter frequencies will be a mix of several "rotations" of the letter frequencies of the whole alphabet. The longer the key, the more "evened out" the frequencies will get.

I tried the following: (1) construct a table of letter frequencies from a large piece of English text, (2) combine it with random rotations of itself, and (3) look at the resulting distributions.

If we "combine" one random rotation, we just have the original letter frequencies. The 1st, 6th, 11th, 16th, 21st, 26th most common are 13.0%, 7.3%, 3.9%, 2.1%, 1.0%, 0.1% compared with your text's 6.9%, 4.9%, 4.0%, 3.6%, 2.8%, 1.8%. Much more uneven, confirming that this surely isn't a simple substitution cipher.

With two random rotations, of course exactly what we get depends on how far apart those rotations are. A typical example is 8.5%, 6.1%, 4.0%, 3.1%, 1.3%, 0.2%. Still much too uneven.

Typical figures with three: 8.7%, 5.2%, 3.9%, 3.2%, 1.7%, 1.3%. Getting better but still distinctly more uneven than yours.

Typical figures with 15 random rotations are 5.7%, 4.4%, 3.9%, 3.7%, 3.3%, 2.3%. By this point things are too smoothed-out and we have a less uneven letter distribution than yours.

Eyeballing the numbers, if this is a Vigenere cipher then probably the key length is somewhere around 6 letters, but there are better ways to estimate that than by looking at how uneven the letter distribution is. There are online tools for breaking Vigenere ciphers, but it's also quite possible to do it by hand (or, better, by computer but writing your own code, if you happen to be able to write code). It looks like your text is 1060 letters, which should be plenty.

More generally, any sort of polyalphabetic cipher is likely to give you a similar sort of pattern of letter frequencies.

Answered by Gareth McCaughan on August 17, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP