Statistical techniques for cryptanalysis Essay Example
Statistical techniques for cryptanalysis Essay Example

Statistical techniques for cryptanalysis Essay Example

Available Only on StudyHippo
  • Pages: 11 (2989 words)
  • Published: August 4, 2018
  • Type: Essay
View Entire Sample
Text preview

Introduction: Cryptography involves writing messages in code or cipher to secure the content of a text. The encrypted message can only be deciphered using the key used for encoding. It does not hide the existence of the message, but conceals its content [1]. On the other hand, cryptanalysis aims to retrieve the plaintext from a message without having access to the key. It can successfully recover the plaintext or key from a specific ciphertext [2]. There are five main types of cryptanalytic attacks: 1.

Ciphertext-only attack: The cryptanalyst possesses cipher texts encrypted using the same encryption algorithm. The objective is to determine the plain text or identify the encryption key used.

Known-plaintext attack: The cryptanalyst possesses both ciphertext and corresponding plaintext values encrypted with a specific key. Through analyzing the relationship between the ciphertext and plaintext, the cryptanalyst attempts to deduce the key.

...

Chosen-plaintext attack: The cryptanalyst has access to both ciphertext and associated plaintext for multiple messages. Additionally, the cryptanalyst has the capability to select the plaintext that will undergo encryption.

The individual's task is to determine the encryption key used for the messages or develop an algorithm to decrypt future messages encrypted with the same key. Frequency analysis examines the occurrence of letters or letter groupings within the encrypted text and assists in deciphering traditional ciphers. This analysis relies on the observation that certain letters and combinations of letters appear more frequently than others in written language.

Rubber-hose cryptanalysis is a technique that involves threats, torture, or blackmail to obtain a key from an individual. Frequency analysis, which counts the occurrence of letters in ciphertext, is a basic method used to decrypt substitution cipher algorithms. Thi

View entire sample
Join StudyHippo to see entire essay

technique provides the cryptanalyst with information by associating guessed plaintext letters with their corresponding ciphertext letters. To enhance the analysis, statistics can be applied to pairs of letters, such as digrams and trigrams. By exploiting the weakness in the substitution cipher algorithm, this technique encrypts similar plaintext letters to similar ciphertext letters.

Frequency analysis is a technique used in cryptanalysis to break ciphers based on traditional cryptographic algorithms. However, this method is not effective with modern block cipher based cryptographic algorithms. Frequency analysis works by exploiting the statistical properties of English language, which is not random. It reveals that single alphabetic substitution does not conceal these statistical properties. When dealing with encryption using monoalphabetic substitution, it is helpful to begin deciphering by examining the frequency counts of all letters. The most commonly used letters in English are E, T, A, O, and I, while the least common are Q, Z, and X [7]. By analyzing the redundancy of text in a language, it is possible to detect statistical patterns.

There are several common patterns that can be found in texts from different domains and languages. One such pattern, known as Zipf's law, explains how word frequency decreases as their rank increases. This law has been observed in various written documents across different languages.

In the case of cryptographic substitutions, English characters have a high redundancy rate. Frequency analysis is one method to decrypt a message encoded with a substitution cipher.

"In simpler terms, if the sender uses a substitution encryption method in which English letters are replaced with other English letters, we can still identify the original text by analyzing the frequency patterns of the encrypted characters "[4]. To

perform a frequency analysis, we must know the frequencies of each letter in the English alphabet or the frequency patterns in the language used by the sender to encrypt the message. The table below shows average letter frequencies in English. For example, E accounts for 12.7% of all English letters, while Z only represents 0.1%."

All the frequencies are tabulated and plotted below:-
For example, let us consider the following sentence: “We study Cryptography as part of our course”. Using a simple substitution cipher, let us consider the following: a->c , b->d, c->e…………..w->y, x->z, y->a, z->b So, the cipher text becomes: “yg uvwfa etarvqitcrja cu rctv qh qwt eqwtug”. A simple frequency analysis of the cipher text can be carried out and the results are as given below: The above data can be used by a cryptanalyst to identify the key or the plaintext by using simple substitution to the cipher text till a suitable plaintext value is not identified. Apart from the use of mono alphabetic frequency analysis, cryptanalysts also identify frequency of paired letters better known as digram frequency and that of three letter words, called as Trigram frequencies. These help the cryptanalyst to exploit the redundant features of English language to break the cipher.

The most common Digrams (in order): th, he, in, en, nt, re, er, an, ti, es, on, at, se, nd, or, ar, al, te, co, de, to, ra, et, ed, it, sa, em.

The most common Trigrams (in order): the, and, tha, ent, ing, ion, tio, for, nde, has, nce, edt, tis, oft, sth, men

Table 1: Digram and Trigram Frequencies [6]

These frequencies aid in identifying frequently used terms in English to decrypt a cipher. Digram frequencies decrypt two-letter words like an, to, of etc., while trigram frequencies decrypt three-letter words like the, are, for etc. Once significant two-letter and three-letter words are decrypted, it becomes relatively easy to determine the key by matching the corresponding values in the ciphertext with the cracked plaintext values. This significant vulnerability in the English language is exploited to decrypt cipher texts encrypted using simple algorithms that utilize English alphabets.

The practical use of frequency analysis involves counting the frequency of letters in the ciphertext and then assigning "guessed" plaintext letters to them. Although many letters may have similar frequencies, certain letters occur more frequently in every language. For example, if there are more X's in the ciphertext than any other letter, it's a reasonable assumption that X represents the letter E in English plaintext. However, T and A are also commonly used in English text, so X could also represent either of them. Therefore, the cryptanalyst may have to experiment with different mappings between ciphertext and plaintext letters.

Once the common single letter frequencies have been resolved, paired patterns and other patterns can be resolved as well. Then, when enough characters have been deciphered, the remaining text can be cracked using simple substitution. Frequency analysis is highly effective against simpler substitution ciphers and can easily break surprisingly short cipher texts. Traditional algorithms that encrypt using bit by bit encryption have been vulnerable to cryptanalytic attacks. These attacks exploit frequency analysis and can break such algorithms easily. 1.

The Caesar Cipher is one of the oldest ciphers, replacing letters in plaintext with different

letters to create ciphertext. Each occurrence of a specific letter will always be transformed into the same letter in the cipher. For example, all B's become F's. "Frequency analysis is based on the fact that certain letters and combinations of letters appear with characteristic frequency in all texts in a particular language" [9]. In English, E is commonly used while X is not. Combinations like ST, NG, TH, and QU are common, while XT, NZ, and QJ are rare or even impossible in English. This clearly demonstrates how easily decipherable the Caesar cipher can be by determining the frequency of each letter in the ciphertext.

The Caesar cipher is a type of encryption where the message is easily breakable through exhaustive methods, making it highly vulnerable to cryptanalysis. Substitution ciphers, which include the Caesar cipher, involve using a key that is a permutation of all twenty-six characters in the English alphabet.

Instead of using one key for all encryption processes, we use a unique key for each encryption process. This technique increases the number of possible keys to 26!, or around 4 X 1026, effectively preventing exhaustive cryptanalysis attacks on the keyspace [7]. To decode the cipher, we examine the statistical distribution of individual letter frequencies in English. Then, we compare the frequencies of digrams and trigrams in typical English words to those found in the cipher, allowing us to reconstruct the key and ultimately decrypt the text.

This method efficiently breaks the substitution cipher by representing each plaintext letter with the same ciphertext letter in the message, thus carrying over all properties of the plaintext to the cipher text [1]. In a Vigenere cipher, there is enhanced

security as a given plaintext letter is not consistently represented by the same ciphertext letter. This is accomplished by employing a sequence of n distinct substitution ciphers for message encryption. Consequently, this technique expands the number of potential keys from 26! to (26!)n [1].

Although considered unbreakable, the Kasiski method successfully decrypted the message encrypted with a Vigenere cipher. The first step is to determine the key length (n) by finding identical segments of plaintext that encrypt to the same ciphertext when they are b positions apart, where b=0 mod n. Next, identify all identical segments longer than 3 and record the distance between them. This information can help predict the length of the key (n) according to Kasiski [7].

The key is discovered through a thorough search of all possible combinations in the keyspace. This involves substituting various values for n to generate substrings. By back substituting the key into the cipher, the plaintext message can be automatically identified. This process is repeated for different values of n until the actual key is found, revealing the encrypted plaintext.

The process of breaking a key to uncover the plaintext can be time-consuming when dealing with long key lengths. This is because the keyspace value, which represents the number of possible keys, increases significantly for larger keys. Frequency based attacks have long been employed to break traditional encryption algorithms. These attacks take advantage of the fact that traditional encryption algorithms maintain the statistical properties of the original language. One method to overcome frequency based attacks is to encrypt blocks of characters instead of individual letters [7].

This would ensure that the plaintext and ciphertext do not have the same encryption

for the same text. For example, if we use the Caesar cipher encryption scheme, the word "ADDITIONAL" will be encrypted to "CFFKVKQPCN". We can observe that the letters A, D, and I are repeated multiple times and each time they are encrypted to C, F, and K respectively. This repetition can be exploited in frequency analysis to identify redundant characters and map them back to the original plaintext. However, using a block encryption scheme prevents this occurrence. In a block encryption scheme, the plaintext is divided into blocks of data, which are then encrypted as a whole block rather than as individual characters. This reduces the likelihood of two blocks producing the same ciphertext. Another method to counter frequency analysis is to use synonyms instead of repeating the same word in a sentence.

The English language offers many words with multiple synonyms, allowing for the flexibility to choose the most appropriate word based on context. Checking grammar is important to ensure that replacing words does not change the intended meaning of a sentence. Attackers may attempt to undermine this method by creating a list of optimal synonyms. However, this would be ineffective as different words can be used each time to convey the same meaning, making any advantage gained from this strategy useless. The term in cryptography for using alternative words to bypass cryptanalysis attacks is "Homophones" [7]. Another effective technique for countering cryptanalysis is Polyalphabetic substitution, which involves using multiple alphabets to encrypt the message instead of repeatedly using the same substitution technique. The Vigenere Cipher is an example of a Polyalphabetic cipher.

This guarantees that every character in a message is encrypted with a unique

ciphertext alphabet, thereby preventing the direct analysis of cipher frequencies to reveal the original message. However, additional methods must be used to determine the length of the encryption key. If this information can be acquired, frequency analysis can then be utilized to decrypt the initial plaintext message successfully. In order to counteract frequency analysis, one possible solution is to "encrypt a single character of plaintext with two ciphertext characters" [3].

One technique for ensuring security is to encrypt the message by substituting different characters when encountering the same character twice. This can be accomplished by using a key size that is twice as large as the plaintext and encrypting each character with two distinct key values. This ensures that no two plaintext characters will have identical ciphertext characters, thereby making it difficult to crack the cipher through frequency analysis. Contemporary encryption algorithms and cryptanalysis methods have enhanced this approach by employing block encryption instead of encrypting individual characters, which removes redundancy in ciphertext alphabets corresponding to similar plaintext alphabets.

Block ciphers are essential for creating protocols in shared-key cryptography. These ciphers are represented as E: {0, 1}k A?- {0, 1}n A?A?a‚¬A ‘ {0, 1}n. E takes a k-bit string and an n-bit string as inputs and produces an n-bit string as output [2]. The key is the first input and is used to encrypt the confidential message. The plaintext is the second input and is transformed into ciphertext. The key-length k and block-length n are specific parameters associated with each block cipher.

Block ciphers vary depending on their design and can range from AES, Triple-DES, Blowfish, CAST, and IDEA. Trusted symmetric ciphers include these algorithms. For public-key cryptography,

commonly used systems are RSA and Diffie-Hellman. These systems have not shown any vulnerabilities to date. In an ideal scenario, the publicly specified algorithm for block cipher would be E.

"In typical usage, a random key K is chosen and kept secret between a pair of users. The function EK is used by the sender to encrypt the message, for a given key, before sending it to the intended receiver, who decrypts the message using the same key" [2]. Security relies on the secrecy of the key. Therefore, initially, one might consider the objective of the cryptanalyst as uncovering the key K when given some ciphertext intercepted during transmission. The block cipher must be designed to make this task computationally challenging.

The encryption algorithms employed to secure messages must be designed with intricate mathematics that make it impossible to derive the original message from the encrypted form. The length of the encryption key is a crucial factor in determining the algorithm's effectiveness. Key length is typically measured in bits, and strong ciphers usually have key lengths ranging from 128 to 256 bits. An algorithm is considered strong if no successful attempts have been made to exploit vulnerabilities in it after extensive analysis. Consequently, the most efficient method for decrypting a message without knowledge of the encryption key is through a brute force attack.

With all possible keys, the effort needed to decrypt a message depends on the number of keys, known as the keyspace. By knowing the computer's speed in breaking the key, it is simple to calculate the time it would take to search the keyspace and crack a specific cipher [2]. As an example, let's

consider a cipher that uses 128-bit keys. Each bit in the key can be either 0 or 1, resulting in approximately 2128 or 3.4x1038 keys. If we imagine that ten billion computers are assigned the task of deciphering the code and each computer can test ten billion keys per second, then it would take around 1.3x1018 seconds, or roughly 100 billion years, to go through the entire keyspace. However, in reality, only half of the keyspace needs to be tested to find the correct key, which would take about 50 billion years.

This paragraph demonstrates the impracticality of cracking modern cryptographic algorithms through Brute Force attacks by mentioning that their estimated age is longer than the age of the universe according to cosmology. This highlights the effectiveness and resistance of modern cryptographic algorithms against cryptanalytic attacks. In conclusion, recent progress in cryptography has resulted in successful defense against various forms of cryptanalytic attacks. However, frequency analysis based attacks have been able to exploit weaknesses in traditional encryption algorithms and reveal the encrypted plaintext message.

The nature of the natural language used to encrypt messages is not random, which makes it vulnerable to attacks that rely on frequency counting. By analyzing the frequency of letters in the encrypted text, it is possible to guess the characters in the original message based on their redundancy rate and letter combinations. However, this weakness can be addressed by using stream ciphers, as they don't carry over the redundancy from the original message to the encrypted text. In contrast, modern block ciphers encrypt a block of text into ciphertext and vice versa, effectively eliminating the language redundancy during encryption. While the

algorithm used is important, the length of the encryption key employed in block ciphers plays a crucial role in resisting cryptanalysis attempts. Modern ciphers employ key lengths starting from 128 bits, which makes it virtually impossible to decrypt the message through brute force attacks.

The longer the key length, the longer it takes to break these ciphers. Additionally, these cryptographic algorithms have become more popular in the security community due to these advantages. At this time, there are no known weaknesses in these algorithms that would allow someone to identify the plaintext message. Here are some references for further reading on this topic:

[1] Stallings, W., Cryptography and Network Security, Chapter 1, Third Edition, Prentice Hall, 2003

[2] Schneier, B., Applied Cryptography, Chapter 1, Second Edition, John Wiley & Sons, New York City, New York, USA, 1996

[3] Hart, G.W., To Decode Short Cryptograms, Communications of the ACM 37(9), 1994, pp. 102-108

[4] Lee, K.W., Teh, C.E., Tan, Y.L., Decrypting English Text Using Enhanced Frequency Analysis, National Seminar on Science, Technology and Social Sciences (STSS 2006), Kuantan, Pahang, Malaysia

[5] Zipf, GK., Human Behaviour and the Principle of Least Effort, 1949, Cambridge: Addison Wesley Publications.

[6] Lewand, R.E., Cryptological Mathematics, The Mathematical Association of America, 2000, Pages 345-346

[7] Stamp, M and Low, R.M., Applied Cryptanalysis, 2007, Chapter 1 and 2, John Wiley & Sons, New York City, New York, USA

[8] http://www.simonsingh.net, Online internet frequency analysis tools

[9] http://www.textalyser.net, online text analysis and frequency analysis information

Get an explanation on any task
Get unstuck with the help of our AI assistant in seconds
New