Greedy Goblin

Thursday, October 6, 2016

Unbreakable encryption for texts

This idea bugs me for long. Coding messages is vital for any privacy, yet we know from Snowden that via backdoors and huge server farms the NSA can break anything. From the various leaks embarrassing the USA we know that the Russians can break anything too. So I've been thinking about an encryption method that cannot be broken by having backdoors and server farms while the password (symmetric key) can be memorized.

This would only work for texts written in a grammatically simple fashion. You can't code pictures or videos or poetry texts. Every encryption performs some transformation on the input to create a scrambled data which can be decoded by the password/private key/whatnot. This is no different, but the transformation happens as a dictionary replacement for a word that grammatically fits the sentence. For our example let the dictionary just contain 3 words for all grammatical categories. The sentence I want to code is "Joe goes to the cinema", which as the categories "who", "what" and "where". Since we have just 3 words in our example dictionary, this provides only 27 possible scrambled sentences with 27 passwords:
  • 000: Joe goes to the cinema
  • 001: Joe goes to the zoo
  • 002: Joe goes to the kitchen
  • 010: Joe jumps to the cinema
  • 011: Joe jumps to the zoo
  • 012: Joe jumps to the kitchen
  • 020: Joe abandons the cinema
  • 021: Joe abandons the zoo
  • 022: Joe abandons the kitchen
  • 100: Kate goes to the cinema
  • 101: Kate goes to the zoo
  • 102: Kate goes to the kitchen
  • 110: Kate jumps to the cinema
  • 111: Kate jumps to the zoo
  • 112: Kate jumps to the kitchen
  • 120: Kate abandons the cinema
  • 121: Kate abandons the zoo
  • 122: Kate abandons the kitchen
  • 200: I go to the cinema
  • 201: I go to the zoo
  • 202: I go to the kitchen
  • 210: I jump to the cinema
  • 211: I jump to the zoo
  • 212: I jump to the kitchen
  • 220: I abandon the cinema
  • 221: I abandon the zoo
  • 222: I abandon the kitchen
It's trivial to create all 27 iterations manually. If the dictionary would have 1000 words for each category, a computer would still be able to check all of it in a second. If the key is not a number, but something you can remember, the number of likely passwords will be around a couple ten thousands anyway. The trick is that the transmitted scrambled information has no checksum to allow a computer to determine if a key is good or not. In a usual encryption method if you use a wrong key, you get a definitive "bad key" answer. Computers can just keep trying until they find the right one. Here there is no such response. If the correct key was "122" and you tried "021", you get "Kate goes to the zoo" instead of "bad key". Now what?

All of the possible permutations are grammatically correct, reasonable sentences. Only a human or a true artificial intelligence can determine if a result makes sense or not. For a computer "Kate goes to the zoo" is perfectly OK, despite it's a wrong answer. For a human who has contextual information, it's clear that we want nothing in a zoo and who is Kate?. Even if there are only 27 possible passwords, the only way to break this encryption is to show the permutations to a human working on the field and having enough information to rule out every bad permutations. Good luck doing that with ten thousand possible permutations!


Anonymous said...

You don't get "bad key" like that, usually it's only after the information has been decrypted, you see that it's the wrong one, by getting gibberish instead of what you expected. Systems which can identify the bad key specifically are just the user convenience ones, mainly used for authentication rather than true encryption.

But on this one, you're simply late. Meaning-based anagrams have been in use for at least 2 centuries. Computers have almost obsoleted this one as well, because, given big enough sample of data, they can assist human in crossing out ones that do not make sense and associatively apply the crossout to the rest of the data, until in some data only one variation would remain valid, which is used as decrypted values and decrypts the rest of the text. With some tricks, the process can even be human-less - assuming conversation actually linked and makes sense, computer can heuristically present most probable decryption.

Modern encryption systems are built around the fact that everything about their internal structure is known and keys can be compromised any moment. In fact, most of them has never been cracked. The cracks are usually the fault of people - making mistakes, neglecting proper procedures, leaking keys. So far the only successful attacks on SSL/TLS included either implementation errors (heartbleed), channel tampering (MitM), stealing keys, compromising the RNG, physical server access; but with current technology, it takes a $1 million and a year to directly bruteforce a 1024-bit key, which (on a properly configured server) will allow you to read contents of one session. Having a 2048 bit key would make you safe for another 17 years should moore's law hold (and it probably won't).

tl;dr already used that, already obsolete, it's an anagram-based encryption at its core, and computers have dealt with it since day 1 of their use in decryption.

Tal said...

There are several issues with this scheme:
* Even a person can't tell whether Joe or Kate went to the zoo if there is no other context
* You have to share the password between the communicating parties
* Gramatically correct sentences are not enough - encryption is used as part of communication protocols for binary messages. This also doesn't account for "metadata" that people would also want encrypted.
* Verification of the decrypted message is also important to prevent forging messages

Hanura H'arasch said...

The NSA can't break modern crypto itself, or at least we have no evidence that suggests that. The best thing they /may/ be able to break is 1024 bit RSA with tons of specialized hardware and time.

But it's stupid to attack the cipher itself, as it's the most robust part of modern crypto, and the NSA knows that. So we get side-channel attacks, backdoors in operating systems and the like.

"In a usual encryption method if you use a wrong key, you get a definitive "bad key" answer."

This is a misconception. AES itself doesn't care what input it gets, it will always produce a valid result. There is a special extension, called AEAD that is responsible for producing your "bad key" answer.

Anonymous said...

i agree with the first anonymous. the issue is not breaking encryption, but circumventing it. the nsa doesn't have to break ssl if they get a trojan onto your machine.
stealing emails also is not an encyrption issue (usually).

dobablo said...

A fixed substitution cipher? I think your cryptography has skipped the 20th century. Since you are using it on text messages don't use the standard ASCII characterset. Assign emoji.

Test said...

Congratulations you just invented one time pad!

Basil said...

Currently, as another commenter mentioned, there is still unbreakable practical encryption. Modern public key encryption won't be breakable for some time yet. Computers, either regular or especially quantum, will eventually make everything sent via public key cryptography readable.

If you're willing to use a much less practical form of encryption, as another commenter mentioned, you can use "one time pads", which are literally mathematically impossible to break. The impracticality is a big deal, though- you need to securely pre-share enough truly random key data to exchange any message. Truly random data is actually quite expensive to make, and anything short of truly random keys will be crackable by a computer. Exchanging any data without encryption is also logistically challenging and moves part of the final message's security to a realm of real-world cloak-and-dagger security. One meat-space slip up will result in losing security on all your communication.

Titus Tallang said...

Don't. Just don't. Your attempts at cryptography are so laughably outdated that anyone that actually works in the field cannot help but chuckle.

The weakness of modern crypto is not that it can be broken computationally , which it cannot (in reasonable time) assuming some common conjectures - most notably P ≠ NP, which is an open question in itself.
The "weakness" here against state-level attackers is access to one side of the communication. For example, if you're chatting over Facebook's IM service or whatever, your message is not sent encrypted from user A to user B - instead, it goes from user A to Facebook's servers (using encryption negotiated between user A and the server), and then is transferred from Facebook's servers to user B (using encryption negotiated between those two). The weakness here is not the transfer between the parties involved, but the fact that one of the parties (most likely Facebook's servers) has a (voluntary or otherwise) method of data extraction available to the attacker, since the server has the plain text content of the message.

There are other messaging systems out there (like Telegram) that actually employ end-to-end encryption, where the server does not have a plain text version of the message (the encryption is negotiated between A and B directly), and thus cannot be used as an access vector to the contents. Of course, this is just an example, using your own implementation would of course be safer (what's telling you that the Telegram app doesn't have a backdoor built in?) unless you end up messing up.

tl;dr: It's not an issue of crypto being weak, it's an issue of the server-client model where the server has all the plain text messages (and state-level attackers can access that server). That is why privacy advocates are in arms about companies logging/storing user data without agreement - because crypto from you to Google doesn't help you if Google then hands over the decrypted data to the NSA.

User-to-user encryption is safe assuming proper crypto strength.

Gevlon said...

@Titus: but the fact is that the various agencies get access to practically every system (as you mentioned, even Telegram). They probably also gather raw traffic from ISPs. Anything goes out of your computer is seen by them. You both practically and often legally cannot create a secure connection. So if you want to communicate securely with another party, you must design your own encryption system and transmit information on the compromised internet, preferably in a way that doesn't raise flags by itself. Remember, in several countries encryption itself is illegal.

Tweeting "Kate jumped the zoo" is probably ignored by everyone and even if someone suspects it's not, can't prove it and needs human computational power to determine if you are sending encrypted message or simply tweeting while smoking a pot.

Antze said...

Things you describe are combination of the already mentioned one-time pads and

Both approaches are used for serious secrets, nothing new here.

Encoding messages through grammatically correct phrases is generally not needed and lacks power for some tasks (you can transmit a short signal message, but you can't encode blueprints of a missile) but it's valid method for "casual encryption". For encoding hardcore things, there are other methods of hiding secret data under something which doesn't draw attention (colors of random pixels in a selfie photo).

Even without steganographic tricks, one-time pads are actually cryptographically invulnerable, but unpractical for wide use, since for the to work, you need to generate a new cipher with every new message (and transfer it through a different safe channel). But if you use a same cipher over and over, and someone sees through your linguistic disguise, it won't be very hard to break.

>> Anything goes out of your computer is seen by them.

Not really.

>> You both practically and often legally cannot create a secure connection.

Practically, you can. To do it reliably, though, you need to understand the concepts of cryptography, and not just believe that this nice secure instant messenger does everything for you, and does it right.

Phelps said...

If you are going to exchange a key, just exchange one time pads. If you reuse your key in this system, latent semantic analysis is going to burn you quicker than you think (if they care enough to keep looking.)

Just make sure you are using a truly random number generator, and not a psuedo-random system like in every OS (including linux). You need to have it seeded from a natural random source, like celestial radiation or a good set of casino dice thrown properly.

maxim said...

If you want to hide from a bigger organization, then a combination of hiding in plain sight and staying off the public transaction record as much as possible is indeed the way to go.

The "agencies" getting their hands on a strong enough AI decryptor is simply a matter of time. And either way you wouldn't want to get into a crypto arms race with an entity with both budget and tech far beyond yours.

Finally, you don't crack a high security system directly. You crack it by stealing the identity of someone with the level of access you need. Which is why the best way to increase security is to have an ever-increasing amount of key factors.

Theodora Dunkelmauer said...


A little more direct answer :

It's your computer than crypt your communication, so your ISP can't read anything, unless it's the communication end point. ( Or running a man-in-the-middle attack, in which case you should counter it with proper certificats. )

Also, unless you advertise your file content as some encrypted text, it's near impossible to really figure if a file is something encrypted, or just a quit random but legitimate file format you just don't know how to use, both are after all a suit of 0 and 1. But if you really want to hide it, put it into an image file :

Titus Tallang said...

Raw traffic from ISPs only tells them who you're talking to (as in, which other computer), not what you're exchanging (assuming proper crypto is in place).

Hiding the fact that you're exchanging data is hard, certainly - but if you really wanted, there are far more effective ways to hide data in seemingly innocuous information - for example, you can encode tons of data into image files by simply changing the color value of a few pixels slightly - anyone that has the original image could computationally retrieve the data from the difference, while at a glance the fact that the image is modified isn't apparent whatsoever; and these kind of slight variations commonly occur for entirely innocent reasons, such as the file being re-processed when you open them in some editors or whatnot, giving plausible deniability.

Properly encrypted data (which is what you'd encode) can not be distinguished from random bits (such as what would result by a graphics program re-compressing the image), so there's no way any attacker can "prove" that the photo of your dog you tweeted doesn't contain a secret message that you could retrieve by comparing its color values, pixel by pixel, to another picture of your dog that you sent to someone by email.

Gevlon said...

OK, I see now that I won't change the World on this field. But at least I've learned a lot about cryptography.

However there is one case where I still believe it has uses: if I can't even trust my computer (see the centrifuge control computers of Iran). Such coding can be done with reasonable speed using a literal codebook and then I type the message on an infected computer to Twitter and no one knows better.

Hanura H'arasch said...


Even in this case one time pads are still superior, using the Vigenère cipher. They have been used by spies and alike for hundreds of years for precisely that reason. See here.

Anonymous said...

There is always quantum cryptography. It practically already exists as unbreakable transmission protocol. It is not yet commercially viable though. But as many here have stated the transmission isn't the weak spot.

99smite said...

Les sanglots longs
Des violons
De l’automne
Blessent mon cœur
D’une langueur

Now, that was a message that informed the French résistance that D-Day was about to happen.

Both sides, London and France,agreed on this message, but the Wehrmacht or SD knew about this message...

The German Uboats had not only the enigma, but they also used a code book in which they had codes for specific messages. The enigma was unsecure for technical reason, but when the ennemy has your code book as well, you better not tell secrets via enigma...

Up to today, the importance of Bletchley Park and the outstanding work of Alan Turing can't be estimated high enough. The way Turing was treated because of his sexual orientation is still a shame for England.

Interestingly enough, England provided all the information they decyphered to the red army, especially before Operation Citadel, the battle of Kursk. The defenses were built specifically to counter the then known German attack plans and still the russian suffered incredible losses... Only the invasion of Sicily forced the termination of this operation.

As the other commenters already posted, rather not break a code, but seduce the secretary who copies the decrypted texts and receive a copy from her... OR any other means of circumventing the coding...