Skip to main content

Hacking the Many Time Pad with Machine Learning.

Wiki - One Time Pad

In cryptography, the one-time pad (OTP) is a type of encryption that is impossible to crack if used correctly. Each bit or character from the plain text is encrypted by a modular addition with a bit or character from a secret random key (or pad) of the same length as the plain text, resulting in a cipher text. If the key is truly random, as large as or greater than the plain text, never reused in whole or part, and kept secret, the cipher text will be impossible to decrypt or break without knowing the key. It has also been proven that any cipher with the perfect secrecy property must use keys with effectively the same requirements as OTP keys. However, practical problems have prevented one-time pads from being widely used.

Programming Assignment - Many Time Pad

Let us see what goes wrong when a stream cipher key is used more than once. Below are eleven hex-encoded cipher texts that are the result of encrypting eleven plain texts with a stream cipher, all with the same stream cipher key.

Your goal is to decrypt the last cipher text, and submit the secret message within it as solution.

target cipher text (decrypt this one):
32510ba9babebbbefd001547a810e67149caee11d945cd7fc81a05e9f85aac650e9052ba6a8cd8257bf14d13e6f0a803b54fde9e77472dbff89d71b57bddef121336cb85ccb8f3315f4b52e301d16e9f52f904



Solution

A solution to crack the many time pad is to use statistics. If the message text is English, then the letter 'e' will be more used than the letter 'q'. And the lower character 'e' will be more often in the message than the uppercase character 'E'. The same goes for special characters like '@'. So every character has a statistically property in the message that where trying to find. So i we count the occurrence of characters in example text to learn, we can build a function O that will give us a value for every character. How more likely the character is the closer it is to 1. How more less likely a character is the more it's closer to 0.

O(x) = Occurrence of x in learning/expected text.
  • O(" ") = 1
  • O("t") = 0.9
  • O("$") = 0.00001

Hypotheses

Then the best key at position p is the larges value of:
  max h = O(Cypher[1][p] xor Key[p]) x O(Cypher[2][p] xor Key[p]) x....

Because our solution space is limited and computers are very fast these days. We don't need a kind of algorithm to find the maximum of this function. We just going to use brute force to just try the complete solution space. So for every position we are going to try 256 keys, and store the key that gives us the biggest likelihood of being `the correct answer`.

  Function HypotheseAtPos(Pos As Integer, TryKey As Integer) As Double
    Dim ThisResult As Double = 1
    For Each CypherText As CypherText In Cyphers
      Try : ThisResult *= Language.OccurrenceOf(Chr(CypherText(Pos) Xor TryKey))
      Catch ex As ArgumentOutOfRangeException
      End Try
    Next : Return ThisResult
  End Function

Counting and normalising the language features

The next code is the 'Language' class that has our O or OccurrenceOf(Character) function. With the functions 'AddText' we can give the class some text where it will count every occurrence of every found character in the text.

Public Class Language
  Public EmptyValue As Double
  Private Index As New Dictionary(Of String, Double)
  Public Sub AddText(ParamArray ExampleTexts() As String)
    For Each ExampleText In ExampleTexts : Me.AddText(ExampleText)
    Next
  End Sub
  Public Sub AddText(ExampleText As String)
    If ExampleText.Length >= 1 Then
      Dim Character As String = Strings.Left(ExampleText, 1)
      If Not Me.Index.ContainsKey(Character) Then
        Me.Index.Add(Character, 1)
      Else
        Me.Index(Character) += 1
      End If
      AddText(Strings.Mid(ExampleText, 2))
    End If
  End Sub
  Public Sub Normalize(Scale As Double) ' <== [0.0 > Value >= 1.0]
    Dim MaxValue As Integer = 0
    For Each Item As KeyValuePair(Of String, Double) In Me.Index
      MaxValue = Math.Max(MaxValueItem.Value)
    Next
 
    For Each Character As String In Me.Index.Keys.ToArray
      Me.Index(Character) = (Me.Index(Character) / MaxValue) ^ Scale
    Next
  End Sub
  Public Function OccurrenceOf(Piece As String) As Double
    If Me.Index.ContainsKey(Piece) Then Return Me.Index(Piece) Else Return Me.EmptyValue
  End Function
End Class

When we give this class some text and call the normalise function, it dictionary will contain something like this:
  • ' ' = 100%
  • 't' = 92,1476925707892%
  • 'e' = 96,3602772504201%
  • 'n' = 89,4801248605917%
  • 'q' = 66,1532515407916%
  • '$' = 0,0001%

Brute force Search


The only thing left to do, is read the cypher texts. And for the length of the target message we are going to try all the different key's and store the best one.

  Function FindBest() As ByteString
    Dim Bytes(Cyphers.MaxLength) As Byte : Dim Result As New ByteString(Cyphers.MaxLength)
    Result.AddRange(Bytes)
    For Pos As Integer = 0 To Cyphers.DecodeLength
      Result = FindBestAtPos(PosResult)
    Next : Return Result
  End Function
  Function FindBestAtPos(Pos As Integer, Result As ByteString) As ByteString
    Dim BestResult As Double = -1 : Dim BestKey As Integer = 0
    For TryKey As Integer = 0 To 255
      Dim ThisResult As Double = HypotheseAtPos(PosTryKey)
      If ThisResult >= BestResult Then BestResult = ThisResult : BestKey = TryKey
    Next
    Result(Pos) = BestKey : Return Result
  End Function

When we run the program we will find just one error in our text after some seconds. But this programming assignment is also very easy. 11 cypher texts that have used the same key is a little bit to much. But I'm only using just one `strategy` to solve the problem. We also can expand this project to multiple characters, like "th" and "and" to count. We also can fill a dictionary of known words and also use this as a tool to measure how well the resulting text `looks` like the given learning/example text of the expected language.

Language settings: Scale=0,1
' ' = 100%
't' = 92,1476925707892%
'e' = 96,3602772504201%
'n' = 89,4801248605917%
'q' = 66,1532515407916%
'$' = 0,0001%


Finding.....
End result:

The secret message is: Whyn using a stream cipher, never use the key more than once
We can factor the number -5 with quantum computers. We can also factor the number 1
Euler would probably enjoe that now his theorem becomes a corner stone of crypto -
The nice thing about Keeypoq is now we cryptographers can drive a lot of fancy cars
The ciphertext produced be a weak encryption algorithm looks as good as ciphertext
You don't want to buy a syt of car keys from a guy who specializes in stealing cars
There are two types of creptography - that which will keep secrets safe from your l
There are two types of cyltography: one that allows the Government to use brute for
We can see the point whery the chip is unhappy if a wrong bit is sent and consumes
A (private-key)  encryptisn scheme states 3 algorithms, namely a procedure for gene
 The Concise OxfordDictiorary (2006) deï¬?nes crypto as the art of  writing o r sol




Encoded cipher texts

target ciphertext (decrypt this one): 
32510ba9babebbbefd001547a810e67149caee11d945cd7fc81a05e9f85aac650e9052ba6a8cd8257bf14d13e6f0a803b54fde9e77472dbff89d71b57bddef121336cb85ccb8f3315f4b52e301d16e9f52f904

ciphertext #1:
315c4eeaa8b5f8aaf9174145bf43e1784b8fa00dc71d885a804e5ee9fa40b16349c146fb778cdf2d3aff021dfff5b403b510d0d0455468aeb98622b137dae857553ccd8883a7bc37520e06e515d22c954eba5025b8cc57ee59418ce7dc6bc41556bdb36bbca3e8774301fbcaa3b83b220809560987815f65286764703de0f3d524400a19b159610b11ef3e

ciphertext #2:

234c02ecbbfbafa3ed18510abd11fa724fcda2018a1a8342cf064bbde548b12b07df44ba7191d9606ef4081ffde5ad46a5069d9f7f543bedb9c861bf29c7e205132eda9382b0bc2c5c4b45f919cf3a9f1cb74151f6d551f4480c82b2cb24cc5b028aa76eb7b4ab24171ab3cdadb8356f

ciphertext #3:

32510ba9a7b2bba9b8005d43a304b5714cc0bb0c8a34884dd91304b8ad40b62b07df44ba6e9d8a2368e51d04e0e7b207b70b9b8261112bacb6c866a232dfe257527dc29398f5f3251a0d47e503c66e935de81230b59b7afb5f41afa8d661cb

ciphertext #4:

32510ba9aab2a8a4fd06414fb517b5605cc0aa0dc91a8908c2064ba8ad5ea06a029056f47a8ad3306ef5021eafe1ac01a81197847a5c68a1b78769a37bc8f4575432c198ccb4ef63590256e305cd3a9544ee4160ead45aef520489e7da7d835402bca670bda8eb775200b8dabbba246b130f040d8ec6447e2c767f3d30ed81ea2e4c1404e1315a1010e7229be6636aaa

ciphertext #5:

3f561ba9adb4b6ebec54424ba317b564418fac0dd35f8c08d31a1fe9e24fe56808c213f17c81d9607cee021dafe1e001b21ade877a5e68bea88d61b93ac5ee0d562e8e9582f5ef375f0a4ae20ed86e935de81230b59b73fb4302cd95d770c65b40aaa065f2a5e33a5a0bb5dcaba43722130f042f8ec85b7c2070

ciphertext #6:

32510bfbacfbb9befd54415da243e1695ecabd58c519cd4bd2061bbde24eb76a19d84aba34d8de287be84d07e7e9a30ee714979c7e1123a8bd9822a33ecaf512472e8e8f8db3f9635c1949e640c621854eba0d79eccf52ff111284b4cc61d11902aebc66f2b2e436434eacc0aba938220b084800c2ca4e693522643573b2c4ce35050b0cf774201f0fe52ac9f26d71b6cf61a711cc229f77ace7aa88a2f19983122b11be87a59c355d25f8e4

ciphertext #7:

32510bfbacfbb9befd54415da243e1695ecabd58c519cd4bd90f1fa6ea5ba47b01c909ba7696cf606ef40c04afe1ac0aa8148dd066592ded9f8774b529c7ea125d298e8883f5e9305f4b44f915cb2bd05af51373fd9b4af511039fa2d96f83414aaaf261bda2e97b170fb5cce2a53e675c154c0d9681596934777e2275b381ce2e40582afe67650b13e72287ff2270abcf73bb028932836fbdecfecee0a3b894473c1bbeb6b4913a536ce4f9b13f1efff71ea313c8661dd9a4ce

ciphertext #8:

315c4eeaa8b5f8bffd11155ea506b56041c6a00c8a08854dd21a4bbde54ce56801d943ba708b8a3574f40c00fff9e00fa1439fd0654327a3bfc860b92f89ee04132ecb9298f5fd2d5e4b45e40ecc3b9d59e9417df7c95bba410e9aa2ca24c5474da2f276baa3ac325918b2daada43d6712150441c2e04f6565517f317da9d3

ciphertext #9:

271946f9bbb2aeadec111841a81abc300ecaa01bd8069d5cc91005e9fe4aad6e04d513e96d99de2569bc5e50eeeca709b50a8a987f4264edb6896fb537d0a716132ddc938fb0f836480e06ed0fcd6e9759f40462f9cf57f4564186a2c1778f1543efa270bda5e933421cbe88a4a52222190f471e9bd15f652b653b7071aec59a2705081ffe72651d08f822c9ed6d76e48b63ab15d0208573a7eef027

ciphertext #10:

466d06ece998b7a2fb1d464fed2ced7641ddaa3cc31c9941cf110abbf409ed39598005b3399ccfafb61d0315fca0a314be138a9f32503bedac8067f03adbf3575c3b8edc9ba7f537530541ab0f9f3cd04ff50d66f1d559ba520e89a2cb2a83

Comments