Code Review Asked on October 27, 2021
I have recently been working through a number of the Cryptopals Cryptography Challenges to try and improve my knowledge and understanding of both cryptography and Python. As five of the first six challenges are XOR related problems I thought it would be a good idea to compile my work into a single program capable of encrypting, decrypting, and cracking using the XOR cipher. The program is capable of both single-byte and multi-byte encryption modes and can employ statistical analysis to guess a key when none is given.
I have previously asked for reviews on my Ceasar and Vigenere implementations/crackers and have included all of them together as a small suite for these fun little ciphers which I have uploaded to a repository on GitHub. I won’t include all the code here but if possible I would like to know what improvements I could make to the overall structure of the project as I am trying to learn about how to organise a project like this with the intention of growing it with more and more encryption tools in the future. Due to folder structure dependencies I would suggest you clone the GitHub repository if you intend to run this code although all the relevant code will be posted below.
What I Would Like Feedback On
predictKeySize()
in XORAnalysis.py in which it will have a strong bias towards short keys if it is allowed to guess short keys. As such it is currently hard coded to only guess lengths greater than 6 meaning my program is incapable of cracking keys between two and five characters long. Any ideas about how to improve this would be much appreciatedThe Code
xor.py
#!/usr/bin/python3
"""
Filename: xor.py
Author: Jess Turner
Date: 15/07/20
Licence: GNU GPL V3
Multipurpose XOR Encryption tool, can encrypt and decrypt text using a specified single-byte or multi-byte key or attempt to decrypt an input without a given key by using statistical analysis
Options:
--encrypt Enable encryption mode (Default)
--decrypt Enable decryption mode
--key Specify the encryption key
--guess Attempt to guess the encryption key by statistical analysis
--single-byte Enable single-byte XOR mode (Default)
--multi-byte Enable multi-byte XOR mode
"""
import argparse
import string
import codecs
import sys
from itertools import cycle
from internal.XORAnalysis import predictKeySize, multiByteXORCrack, multiByteXOR, repeatingByteXOR, repeatingByteXORCrack
def initialiseParser():
parser = argparse.ArgumentParser(description = "Encrypt, decrypt, or crack a message using the XOR Cipher")
parser.add_argument("--key", "-k", help = "The encryption key to be used (if relevant)", type = str)
parser.add_argument("--guess", "-g", help = "Perform statistical analysis to estimate the most likely value of the encryption key", action = "store_true")
parser.add_argument("--single-byte", "--single", "-s", help = "Enable single-byte key mode", action = "store_true")
parser.add_argument("--multi-byte", "--multi", "-m", help = "Enable multi-byte key mode", action = "store_true")
parser.add_argument("--decrypt", "-d", help = "Enable decryption mode", action = "store_true")
return parser
def main():
parser = initialiseParser()
args = parser.parse_args()
inputString = sys.stdin.read().encode()
if args.decrypt or args.guess:
inputString = codecs.decode(inputString, "base-64")
if args.guess:
if args.multi_byte:
print("[+] Selecting multi-byte key mode...", file = sys.stderr)
print("[+] Predicting key length...", file = sys.stderr) # At this point we have the entire decoded input in memory, all that is left is to crack it
keyLength = predictKeySize(inputString)
print("[-] Got length of {}...n[+] Attempting to crack key...".format(keyLength), file = sys.stderr)
crack = multiByteXORCrack(inputString, keyLength)
key = crack['key']
else:
print("[+] Selecting single-byte key mode...", file = sys.stderr)
print("[+] Attempting to crack key...", file = sys.stderr)
crack = repeatingByteXORCrack(inputString)
key = chr(crack['key'])
print("[-] Got key: "{}" !n[+] Decrypting message...".format(key), file = sys.stderr)
output = crack['message']
elif args.key != None:
if len(args.key) > 1 and not args.multi_byte:
print("[+] Single-byte mode selected but multi-byte key was given. Defaulting to multi-byte mode...", file = sys.stderr)
args.multi_byte = True
output = multiByteXOR(inputString, [ord(c) for c in args.key]) if args.multi_byte else repeatingByteXOR(inputString, ord(args.key))
else:
print("[-] Error: No key given!", file = sys.stderr)
return
if not args.decrypt and not args.guess:
output = codecs.encode(output.encode(), "base-64").decode()
print(output, end = "")
if __name__ == "__main__":
main()
XORAnalysis.py
"""
Filename: XORAnalysis.py
Author: Jess Turner
Date: 19/06/20
Licence: GNU GPL V3
A collection of analysis functions and pieces of information required byciphertools programs which implement XOR-based algorithms
"""
from itertools import cycle
import string
from .Strings import alphanumeric_characters, buildSubStrings
# XOR analysis functions
def letterRatio(inputString):
return sum([x in alphanumeric_characters for x in inputString]) / len(inputString)
def probablyText(inputString):
return letterRatio(inputString) > 0.7
# Functions for single-byte key XOR
def repeatingByteXOR(inputString, byte):
return "".join(chr(c ^ byte) for c in inputString)
def repeatingByteXORCrack(inputString):
best = None
for byte in range(256):
currentString = repeatingByteXOR(inputString.strip(), byte)
num_chars = sum([x in alphanumeric_characters for x in currentString])
if best == None or num_chars > best['num_chars']:
best = { 'message': currentString, 'num_chars': num_chars, 'key': byte }
return best
# Functions for multi-byte key XOR
def multiByteXORCrack(inputString, keyLength):
key = "".join(chr(repeatingByteXORCrack(string.strip())['key']) for string in buildSubStrings(inputString, keyLength))
message = multiByteXOR(inputString, key.encode())
return { 'message': message, 'key': key }
def multiByteXOR(inputString, key):
return "".join(chr(c ^ byte) for c, byte in zip(inputString, cycle(key)))
# Functions for multi-byte XOR key length prediction
def XORStrings(first, second):
return bytes([i ^ j for i, j in zip(first, second)]) # Convert two byte strings to their xor product
def hammingDistance(first, second):
return bin(int.from_bytes(XORStrings(first, second), "little")).count("1") # Calculate the bit difference between two strings
def predictKeySize(inputString):
bestKey = 0
bestDistance = 10000
for i in range(6, 40): # Set to a lower bound of 6 because otherwise it always guesses a really short key. Will try and fix in later version.
distance = 0
blocks = len(inputString) // i - 1
for x in range(blocks):
distance += hammingDistance(inputString[i * x:i * (x + 2) - 1], inputString[i * (x + 2):i * (x + 4) - 1])
distance /= i
distance /= blocks
if distance < bestDistance:
bestDistance = distance
bestKey = i
return bestKey
Strings.py
"""
Filename: strings.py
Author: Jess Turner
Date: 28/09/19
Licence: GNU GPL V3
A collection of functions for the modification of strings required by multiple programs in the ciphertools suite
"""
import re
alphanumeric_characters = "1234567890ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz "
english = { 'monogram-frequencies': [8.167, 1.492, 2.782, 4.253, 12.702, 2.228, 2.015, 6.094, 6.966, 0.153, 0.772, 4.025, 2.406, 6.749, 7.507, 1.929, 0.095, 5.987, 6.327, 9.056, 2.758, 0.978, 2.360, 0.150, 1.974, 0.074 ],
'bigram-frequencies': [] }
def stringPrepare(string, preserveSpacing): # Strip all non alphabetic characters from a string and convert to upper case
return re.compile("[^A-Zs]" if preserveSpacing else "[^A-Z]").sub("", string.upper())
def buildSubStrings(string, separation): # Build a list of substrings required to analyse the ciphertext
return [string[i::separation] for i in range(separation)]
By PEP8, initialiseParser
should be initialise_parser
, and similarly for inputString
, etc.
print("[-] Got length of {}...n[+] Attempting to crack key...".format(keyLength), file = sys.stderr)
is simpler as
print(
f"[-] Got length of {key_length}...n"
"Attempting to crack key...",
file=sys.stderr,
)
For example,
def probablyText(inputString):
can be
def probably_text(input_string: str) -> bool:
sum([x in alphanumeric_characters for x in currentString])
should use the generator directly instead of making a list; i.e.
sum(x in alphanumeric_characters for x in current_string)
The same goes for
return bytes([i ^ j for i, j in zip(first, second)]) # Convert two byte strings to their xor product
best = { 'message': currentString, 'num_chars': num_chars, 'key': byte }
If you're only doing this because you need to return multiple things, idiomatic Python is to simply return them as a tuple, i.e.
best = current_string, num_chars, byte
# ...
return best
But this would be better-represented by a named tuple, or (better) a @dataclass
with type hints. Just not a dictionary.
distance /= i
distance /= blocks
can be
distance /= i * blocks
for x in range(blocks):
distance += hammingDistance(inputString[i * x:i * (x + 2) - 1], inputString[i * (x + 2):i * (x + 4) - 1])
can be
distance = sum(
hamming_distance(
input_string[i*x : i*(x+2)-1],
input_string[i*(x+2) : i*(x+4)-1],
)
for x in range(blocks)
)
Given your current code,
english = { 'monogram-frequencies': [8.167, 1.492, 2.782, 4.253, 12.702, 2.228, 2.015, 6.094, 6.966, 0.153, 0.772, 4.025, 2.406, 6.749, 7.507, 1.929, 0.095, 5.987, 6.327, 9.056, 2.758, 0.978, 2.360, 0.150, 1.974, 0.074 ],
'bigram-frequencies': [] }
should simply be a monogram variable and a bigram variable.
Answered by Reinderien on October 27, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP