Set 1: basics
The first set covers generic utilities and AES in ECB mode.
hex to base64
For the first challenge, you need to be able to convert hexadecimal strings into base64 encoded strings.
The base64 encoding represents data in an ASCII string format using a radix-64 representation. This means that each character can be chosen from a set of 64 characters that are defined by the implementation of the encoding. It follows that I only need 6 bits to represent them, since 2^6 = 64.
A common choice for the set is to include all uppercase letters (A-Z) plus all lowercase letters (a-z) plus all digits (0-9). This sums up to 62 characters; implementations will differ in the choice of the last two characters, although printable characters are to be expected from most of them.
Conversion from ASCII
Let's say I start with the ASCII string "Man". In ASCII, the three characters are represented by the bytes 77, 97, and 110 respectively. If we convert each to bits, and concatenate the results together, we obtain the string 010011010110000101101110.
This string is then split into 4 pieces of 6 bits each. A piece is converted back to decimal; the number will then identify the base64 character. A table for this conversion can be found on Wikipedia.
Converting from a hex-encoded ASCII string is analogous; each hexadecimal character is transformed into 2 bytes, and then the same procedure is applied.
As it stands, most common languages have libraries to do the job for us anyway.
encoded = codecs.decode(hex_string, 'hex')
return base64.b64encode(encoded)
Fixed XOR
Write a function that takes two hex-encoded strings of the same length, computes the XOR between them and returns the result as an hex-encoded strings.
The main part here is the encoding conversions rather than the XOR itself. In Python the quickest way is probably to convert the input into byte strings using the bytes library; then you can compute the XOR between each byte using the ^ operator.
At the time, I decided to convert the hex strings into bit strings; this was achieved by having a table to convert each possible hexadecimal character into its binary representation. Then the bit strings were easily XORed, and a reverse table was used to produce the output hexadecimal string.
Single-byte XOR cipher
In this challenge, you get an hex-encoded string; this is a ciphertext produced by XORing the plaintext string against one specific byte repeatedly. You need to recover the plaintext
Since the key is a simple byte, there are only 2^8 = 256 possible keys. For each of them, compute the XOR between the key, repeated as many times as needed to "cover" the whole ciphertext, and the ciphertext.
We could verify each of the candidate plaintexts by hand, but it is more fun to write some heuristics that tell us whether a short string is likely to be valid English.
First, we eliminate all texts that contain non-readable bytes, since the resulting string must be readable. Each reader may choose to apply his/her own favourite heuristics. Here are those I implemented:
- parentheses and brackets must match, i.e. they are always closed properly;
- some punctuation characters should be followed by a space, unless they are the last character in the string;
- the text should start with a capital letter (this is more of a gamble);
- character frequency: this is suggested in the challenge itself. In particular, "ETAOIN SHRDLU" lists the most common letters in the English language in order of their frequency. Therefore I sorted all the characters in the string by their frequency inside the string, and compared that they were, indeed, "ETAOIN". Since this is a pretty short string, checking the second piece too would have likely created a false negative.
For this specific problem, applying all of these heuristics yields one single candidate, the actual solution.
Detect single-character XOR
Given a list of ciphertexts, derive which one was encrypted using single-byte XOR.
This has been solved trivially by simply applying the decoding algorithm implemented earlier to all of the strings in the input data. I print only the strings for which the algorithm was able to find one viable English candidate, together with the candidate itself so we know the plaintext.
I assumed there was only one candidate, but this may not be true in the general case if the heuristics are not strong enough.
Implement repeating-key XOR
This is an implementation challenge, there is nothing to break (yet). What do we do when we want to encrypt something with XOR, but the key is shorter than the plaintext? We "repeat" the key as many times as we need to cover the whole text, and then compute the XOR.
In this challenge, our key has only 3 bytes, "ICE", and we need to encrypt a much longer text.
As we'll see later, this is a bad idea. In cryptography such an encryption is called the OTP, or One-Time Pad. As every intro course to cryptography teaches, it is proved the OTP is secure as long as:
- the key is as long as the plaintext it is applied on;
- the key is random;
- the key is used to encrypt exactly one plaintext.
Precondition #2 is usually met: we assume an attacker listening on some communication channel does not have access to the internals of the key generation process.
However, the other conditions make the OTP too hard to use in practice, and indeed it is almost never used.
Note in the initial formulation of the OTP used modular addition between bytes rather than the XOR. The Vernam cipher, published in 1919, used XOR instead. It is however common to refer to OTP for the XOR-based implementation too, since they share every property we care about.
Break repeating-key XOR
Now the fun starts! You are given a base64-encoded ciphertext which was encrypted with repeating-key XOR. Breaking the cipher and recovering the original text requires a few steps.
First step: guess the length of the key. You need to try a few values: the authors suggest trying all values from 2 to 40. Let's call KEYSIZE the current guess.
For each KEYSIZE, you need to validate it. The suggested approach is to take consecutive blocks of size KEYSIZE, compute their Hamming distance, and "normalize" it, i.e. divide by the number of blocks times the key length.
The idea behind this approach is that if you XOR the same block against different blocks, you are more likely to get "similar" blocks as a result. The measure of similarity is easily given by the Hamming distance between the blocks. Consider the worst case, where you have two identical blocks of plaintexts: they yield the same block of ciphertext, making the Hamming distance 0.
It follows that if many pairs of blocks in the ciphertext are similar to each other, you should have guessed the right key length.
Now that you know the key size you can break the key byte-by-byte. Divide the ciphertext in blocks of KEYSIZE length. Then take the first byte from each block; you have a sequence of bytes that were all encrypted with the same byte of the key. This is analogous to the single-byte XOR seen earlier, and therefore can be cracked with the same trick. You may need to adjust the heuristics: since the bytes were picked from different blocks, the resulting sequence will not be composed of valid English words, but the single characters should still be readable ASCII.
You repeat this procedure until you have discovered all the bytes in the key; at that point you can just compose the results, or XOR the key against the ciphertext directly to recover your plaintext. Good job!
This is the first challenge where you are actually required to break a cipher. While theorically easy, it shows that some coding and tricks are required to get from the theory to a working solution.
AES in ECB mode
Decrypt a ciphertext using AES in ECB mode, knowing the key. This challenge presents no difficulty; you can implement AES from scratch if you like the idea, or you can use any cryptographic library. For my Python solutions, pycrypto was used.
AES
A complete description of AES can be found on Wikipedia or on any cryptography textbook. Here we just state a few properties of this well-known algorithm.
The most commonly used AES has 128-bits keys, but larger keys, of 192 or 256 bits, are also possible. Like for the XOR-based cipher, the same key is used for both encrypting and decrypting.
As of 2016 no attacks have been published that allow to completely break a key for a correct implementation. In 2011 an attack whose performance is better than a brute force was published; however, despite the performance gain, the time and space required are still several order of magnitudes too large to permit practical implementations.
Despite this, there are several vulnerabilities and "incorrect" choices that can be exploited. This book will explore many of them.
ECB mode
AES uses keys of fixed size (e.g. 128 bits). So what do you do when your plaintext is longer than the key size? There are several ways of extending an AES key, called modes.
We will assume that the plaintext length is a multiple of the key length; when this is not true, padding algorithms are used to extend the plaintext length accordingly.
The simplest mode is ECB, or Electronic Codebook: the key is repeated as many times as necessary to cover the whole plaintext. Each plaintext block and "iteration" of the key is then fed to the AES algorithm independently to produce the ciphertext.
This, we will find out, is again a bad idea, even if the set of operations involved is more complicated than a simple XOR.
Detect ECB mode
Given a list of ciphertexts, discover which one has been encrypted using AES in ECB mode.
This exploits the main vulnerability of ECB; it is stateless, that is, the same block of plaintext will always produce the same block of ciphertext, since the key is the same.
Therefore, you only need to divide each text into blocks of 128 bits each, and figure out which of them has two (or more) identical blocks.
No other AES mode has this property, although of course repeating-key XOR has it too :-).