Greedy Goblin

Thursday, October 6, 2016

Unbreakable encryption for texts

This idea bugs me for long. Coding messages is vital for any privacy, yet we know from Snowden that via backdoors and huge server farms the NSA can break anything. From the various leaks embarrassing the USA we know that the Russians can break anything too. So I've been thinking about an encryption method that cannot be broken by having backdoors and server farms while the password (symmetric key) can be memorized.

This would only work for texts written in a grammatically simple fashion. You can't code pictures or videos or poetry texts. Every encryption performs some transformation on the input to create a scrambled data which can be decoded by the password/private key/whatnot. This is no different, but the transformation happens as a dictionary replacement for a word that grammatically fits the sentence. For our example let the dictionary just contain 3 words for all grammatical categories. The sentence I want to code is "Joe goes to the cinema", which as the categories "who", "what" and "where". Since we have just 3 words in our example dictionary, this provides only 27 possible scrambled sentences with 27 passwords:
  • 000: Joe goes to the cinema
  • 001: Joe goes to the zoo
  • 002: Joe goes to the kitchen
  • 010: Joe jumps to the cinema
  • 011: Joe jumps to the zoo
  • 012: Joe jumps to the kitchen
  • 020: Joe abandons the cinema
  • 021: Joe abandons the zoo
  • 022: Joe abandons the kitchen
  • 100: Kate goes to the cinema
  • 101: Kate goes to the zoo
  • 102: Kate goes to the kitchen
  • 110: Kate jumps to the cinema
  • 111: Kate jumps to the zoo
  • 112: Kate jumps to the kitchen
  • 120: Kate abandons the cinema
  • 121: Kate abandons the zoo
  • 122: Kate abandons the kitchen
  • 200: I go to the cinema
  • 201: I go to the zoo
  • 202: I go to the kitchen
  • 210: I jump to the cinema
  • 211: I jump to the zoo
  • 212: I jump to the kitchen
  • 220: I abandon the cinema
  • 221: I abandon the zoo
  • 222: I abandon the kitchen
It's trivial to create all 27 iterations manually. If the dictionary would have 1000 words for each category, a computer would still be able to check all of it in a second. If the key is not a number, but something you can remember, the number of likely passwords will be around a couple ten thousands anyway. The trick is that the transmitted scrambled information has no checksum to allow a computer to determine if a key is good or not. In a usual encryption method if you use a wrong key, you get a definitive "bad key" answer. Computers can just keep trying until they find the right one. Here there is no such response. If the correct key was "122" and you tried "021", you get "Kate goes to the zoo" instead of "bad key". Now what?

All of the possible permutations are grammatically correct, reasonable sentences. Only a human or a true artificial intelligence can determine if a result makes sense or not. For a computer "Kate goes to the zoo" is perfectly OK, despite it's a wrong answer. For a human who has contextual information, it's clear that we want nothing in a zoo and who is Kate?. Even if there are only 27 possible passwords, the only way to break this encryption is to show the permutations to a human working on the field and having enough information to rule out every bad permutations. Good luck doing that with ten thousand possible permutations!

22 comments:

Anonymous said...

You don't get "bad key" like that, usually it's only after the information has been decrypted, you see that it's the wrong one, by getting gibberish instead of what you expected. Systems which can identify the bad key specifically are just the user convenience ones, mainly used for authentication rather than true encryption.

But on this one, you're simply late. Meaning-based anagrams have been in use for at least 2 centuries. Computers have almost obsoleted this one as well, because, given big enough sample of data, they can assist human in crossing out ones that do not make sense and associatively apply the crossout to the rest of the data, until in some data only one variation would remain valid, which is used as decrypted values and decrypts the rest of the text. With some tricks, the process can even be human-less - assuming conversation actually linked and makes sense, computer can heuristically present most probable decryption.

Modern encryption systems are built around the fact that everything about their internal structure is known and keys can be compromised any moment. In fact, most of them has never been cracked. The cracks are usually the fault of people - making mistakes, neglecting proper procedures, leaking keys. So far the only successful attacks on SSL/TLS included either implementation errors (heartbleed), channel tampering (MitM), stealing keys, compromising the RNG, physical server access; but with current technology, it takes a $1 million and a year to directly bruteforce a 1024-bit key, which (on a properly configured server) will allow you to read contents of one session. Having a 2048 bit key would make you safe for another 17 years should moore's law hold (and it probably won't).

tl;dr already used that, already obsolete, it's an anagram-based encryption at its core, and computers have dealt with it since day 1 of their use in decryption.

Tal said...

There are several issues with this scheme:
* Even a person can't tell whether Joe or Kate went to the zoo if there is no other context
* You have to share the password between the communicating parties
* Gramatically correct sentences are not enough - encryption is used as part of communication protocols for binary messages. This also doesn't account for "metadata" that people would also want encrypted.
* Verification of the decrypted message is also important to prevent forging messages

Hanura H'arasch said...

The NSA can't break modern crypto itself, or at least we have no evidence that suggests that. The best thing they /may/ be able to break is 1024 bit RSA with tons of specialized hardware and time.

But it's stupid to attack the cipher itself, as it's the most robust part of modern crypto, and the NSA knows that. So we get side-channel attacks, backdoors in operating systems and the like.

"In a usual encryption method if you use a wrong key, you get a definitive "bad key" answer."

This is a misconception. AES itself doesn't care what input it gets, it will always produce a valid result. There is a special extension, called AEAD that is responsible for producing your "bad key" answer.

Anonymous said...

i agree with the first anonymous. the issue is not breaking encryption, but circumventing it. the nsa doesn't have to break ssl if they get a trojan onto your machine.
stealing emails also is not an encyrption issue (usually).

Anonymous said...

A fixed substitution cipher? I think your cryptography has skipped the 20th century. Since you are using it on text messages don't use the standard ASCII characterset. Assign emoji.

Test said...

Congratulations you just invented one time pad! https://en.wikipedia.org/wiki/One-time_pad

Basil said...

Currently, as another commenter mentioned, there is still unbreakable practical encryption. Modern public key encryption won't be breakable for some time yet. Computers, either regular or especially quantum, will eventually make everything sent via public key cryptography readable.

If you're willing to use a much less practical form of encryption, as another commenter mentioned, you can use "one time pads", which are literally mathematically impossible to break. The impracticality is a big deal, though- you need to securely pre-share enough truly random key data to exchange any message. Truly random data is actually quite expensive to make, and anything short of truly random keys will be crackable by a computer. Exchanging any data without encryption is also logistically challenging and moves part of the final message's security to a realm of real-world cloak-and-dagger security. One meat-space slip up will result in losing security on all your communication.

Unknown said...

Don't. Just don't. Your attempts at cryptography are so laughably outdated that anyone that actually works in the field cannot help but chuckle.

The weakness of modern crypto is not that it can be broken computationally , which it cannot (in reasonable time) assuming some common conjectures - most notably P ≠ NP, which is an open question in itself.
The "weakness" here against state-level attackers is access to one side of the communication. For example, if you're chatting over Facebook's IM service or whatever, your message is not sent encrypted from user A to user B - instead, it goes from user A to Facebook's servers (using encryption negotiated between user A and the server), and then is transferred from Facebook's servers to user B (using encryption negotiated between those two). The weakness here is not the transfer between the parties involved, but the fact that one of the parties (most likely Facebook's servers) has a (voluntary or otherwise) method of data extraction available to the attacker, since the server has the plain text content of the message.

There are other messaging systems out there (like Telegram) that actually employ end-to-end encryption, where the server does not have a plain text version of the message (the encryption is negotiated between A and B directly), and thus cannot be used as an access vector to the contents. Of course, this is just an example, using your own implementation would of course be safer (what's telling you that the Telegram app doesn't have a backdoor built in?) unless you end up messing up.

tl;dr: It's not an issue of crypto being weak, it's an issue of the server-client model where the server has all the plain text messages (and state-level attackers can access that server). That is why privacy advocates are in arms about companies logging/storing user data without agreement - because crypto from you to Google doesn't help you if Google then hands over the decrypted data to the NSA.

User-to-user encryption is safe assuming proper crypto strength.

Gevlon said...

@Titus: but the fact is that the various agencies get access to practically every system (as you mentioned, even Telegram). They probably also gather raw traffic from ISPs. Anything goes out of your computer is seen by them. You both practically and often legally cannot create a secure connection. So if you want to communicate securely with another party, you must design your own encryption system and transmit information on the compromised internet, preferably in a way that doesn't raise flags by itself. Remember, in several countries encryption itself is illegal.

Tweeting "Kate jumped the zoo" is probably ignored by everyone and even if someone suspects it's not, can't prove it and needs human computational power to determine if you are sending encrypted message or simply tweeting while smoking a pot.

Antze said...

Things you describe are combination of the already mentioned one-time pads and https://en.wikipedia.org/wiki/Steganography

Both approaches are used for serious secrets, nothing new here.

Encoding messages through grammatically correct phrases is generally not needed and lacks power for some tasks (you can transmit a short signal message, but you can't encode blueprints of a missile) but it's valid method for "casual encryption". For encoding hardcore things, there are other methods of hiding secret data under something which doesn't draw attention (colors of random pixels in a selfie photo).

Even without steganographic tricks, one-time pads are actually cryptographically invulnerable, but unpractical for wide use, since for the to work, you need to generate a new cipher with every new message (and transfer it through a different safe channel). But if you use a same cipher over and over, and someone sees through your linguistic disguise, it won't be very hard to break.

>> Anything goes out of your computer is seen by them.

Not really.

>> You both practically and often legally cannot create a secure connection.

Practically, you can. To do it reliably, though, you need to understand the concepts of cryptography, and not just believe that this nice secure instant messenger does everything for you, and does it right.

Unknown said...

Actually, you can quite easily create a secure connection, if you know what you're doing, the whole problem is usability and M&S.

In a modern encryption system, the key and only the key is the hold the secrecy. And you don't even have to share it with the other party thanks to handshaking.

The main problem is most users use a weak key, like a word, azerty, etc etc. So hackers can just run dictionary attack or even brute force. But if you use a proper key, like this :

-----BEGIN RSA PRIVATE KEY-----
MIIJKQIBAAKCAgEAuKovkWW3sr5yFNgk4OiGQrfjOAHzv0j0Xvv3E3VgGlL4I4hjJnxmP8ZT/tKT0cldvlvm1mlbzABFpbmL9lABo1cDcIWrtrIEKFDojhov4Ub5SHGiihjeqkFYLGXmD0iZcvVNFDnQ+z1DvPuFOElvybrFX+K4bKf+nYYTS4fRsZ9aw+FcpJyLJZcR6SxdDhBvHHMwm/roQVHh8lfc877XSyrFn8htT+dHQF0m+GtFv1A4vGN4s37xp7vasTq0ej9k0yK2Hj1zGfyVliXua4BugOTeHDC9hEssNpvpn+u1gDy7ZAZ8NXLZKnVhRxTg6M3yhn9FPrswlw/P6v7KcFswbdDuAVDziECZ2Iq+tc0S3ZONvfiAyet+pXTTiVWBqqeVXT2X78DYCGvAO6qNHkrxjg2t7JZ0NVieQ8f1yF1Uc2HjCmfVpcubFnjJ450vBaLo2uijexjpD+cd5aQiWtbZYAbrVHr9ur+erwO+p89QnXlOEpmqe5EGs/sfEiquj1xCzKObfhxxiLclQAaz9qHxXbGcdo/8z+To17v1KuCrETpykuwv0IYEp2V9yaSlhW+OrdsjPRFJex
-----END RSA PRIVATE KEY-----

Well, even the NSA can't really brute force that, sadly, it also not really usable for most peoples, even if more and more websites, mainly targeted at developers allow you to use that kind of key to connect.

Now about how they got access to a properly secured system, usually have nothing to do with encryption as it's most of the time the hardest point of defense. You usually attack the weakest link :

* Compromising the implementation you got from your supplier ( Google, Yahoo, etc ).
* Feed you some tasty looking yet poisonous data / executable files, like those funny animations shared by families or co-workers, torrents files pre-loaded with malicious softwares, "lost usb stick" in the parking lot of your target, private computer physically accessed by janitorial contractor, "hotel personnel" while you are the bar. All those options allow you to turn a computer into a backdoor / bot by just relying on the user incompetence. ( And unlike what you see on TV, unless it's a ransomware, you will never get any big flashing windows when your computer got compromised. )
* Impersonate a critical public service. For some critical cases, the NSA built some DNS / search engine servers that would answer the target request faster than the real one would, so they could redirect the target to their own version of the website the target wanted to visit.

If you read carefully what got out of the last years leak, fbi crypto wars on pedophils, what come out of it is it was never the encryption that was defeated, but the others flaws of the systems that were systematically exploited.

PS : got to remove more than 75% of the key for blogger to accept my post...

Anonymous said...

@Titus Tallang

Facebook has introduced end-to-end encryption into their messaging service like last week.
It's not enabled by default and can't be enabled for all contacts, but per-contact setting is there.

@Gevlon

> but the fact is that the various agencies get access to practically every system (as you mentioned, even Telegram). They probably also gather raw traffic from ISPs. Anything goes out of your computer is seen by them.

With end to end encryption it doesn't matter if they intercept traffic or have server access. It's encrypted between you and the other party computers, unless they have a trojan on either of your PCs, you are safe for a year, intercepting already encrypted outgoing traffic does nothing.

> You both practically and often legally cannot create a secure connection.

Practically, any communication between untapped PCs with enough modern crypto-strength is secure. The problem is your crypto setup and remaining untapped.

> So if you want to communicate securely with another party, you must design your own encryption system and transmit information on the compromised internet, preferably in a way that doesn't raise flags by itself.

Current encryption systems are good enough. Not raising flags is obfuscation problem, not encryption problem, and it's fairly trivial considering petabytes of monthly torrent traffic, which is fully encrypted nowadays. Mimic it and you're done.

> Remember, in several countries encryption itself is illegal.

It's not directly illegal in any country. Even North Korea. Most major countries have signed "personal use exemption", which allows personal use of any cryptography without restrictions. Non-signatories require some sort of certification for personal encrypted devices, which, in most cases, is available, and doesn't include decryption (imagine physically requiring decryption for every smartphone... impossiburu).

> Tweeting "Kate jumped the zoo" is probably ignored by everyone and even if someone suspects it's not, can't prove it and needs human computational power to determine if you are sending encrypted message or simply tweeting while smoking a pot.

This type of encryption is literally ages old, I think I've seen it used in 17th century mail myself. However it has many flaws - mainly weak encryption, unsustainability, and like previous posters have mentioned - requires pre-exchange of passwords. Modern asynchronous encryption systems exchanges public keys which are safe to be compromised, it's the private key you need to keep hidden, without having both you can't decrypt, and if only your device has the private key, you can't get into people's mail without tapping the device.

Phelps said...

If you are going to exchange a key, just exchange one time pads. If you reuse your key in this system, latent semantic analysis is going to burn you quicker than you think (if they care enough to keep looking.)

Just make sure you are using a truly random number generator, and not a psuedo-random system like in every OS (including linux). You need to have it seeded from a natural random source, like celestial radiation or a good set of casino dice thrown properly.

maxim said...

If you want to hide from a bigger organization, then a combination of hiding in plain sight and staying off the public transaction record as much as possible is indeed the way to go.

The "agencies" getting their hands on a strong enough AI decryptor is simply a matter of time. And either way you wouldn't want to get into a crypto arms race with an entity with both budget and tech far beyond yours.

Finally, you don't crack a high security system directly. You crack it by stealing the identity of someone with the level of access you need. Which is why the best way to increase security is to have an ever-increasing amount of key factors.

Unknown said...

@Gevlon

A little more direct answer :

It's your computer than crypt your communication, so your ISP can't read anything, unless it's the communication end point. ( Or running a man-in-the-middle attack, in which case you should counter it with proper certificats. )

Also, unless you advertise your file content as some encrypted text, it's near impossible to really figure if a file is something encrypted, or just a quit random but legitimate file format you just don't know how to use, both are after all a suit of 0 and 1. But if you really want to hide it, put it into an image file : https://en.wikipedia.org/wiki/Steganography

Unknown said...

Raw traffic from ISPs only tells them who you're talking to (as in, which other computer), not what you're exchanging (assuming proper crypto is in place).

Hiding the fact that you're exchanging data is hard, certainly - but if you really wanted, there are far more effective ways to hide data in seemingly innocuous information - for example, you can encode tons of data into image files by simply changing the color value of a few pixels slightly - anyone that has the original image could computationally retrieve the data from the difference, while at a glance the fact that the image is modified isn't apparent whatsoever; and these kind of slight variations commonly occur for entirely innocent reasons, such as the file being re-processed when you open them in some editors or whatnot, giving plausible deniability.

Properly encrypted data (which is what you'd encode) can not be distinguished from random bits (such as what would result by a graphics program re-compressing the image), so there's no way any attacker can "prove" that the photo of your dog you tweeted doesn't contain a secret message that you could retrieve by comparing its color values, pixel by pixel, to another picture of your dog that you sent to someone by email.

Gevlon said...

OK, I see now that I won't change the World on this field. But at least I've learned a lot about cryptography.

However there is one case where I still believe it has uses: if I can't even trust my computer (see the centrifuge control computers of Iran). Such coding can be done with reasonable speed using a literal codebook and then I type the message on an infected computer to Twitter and no one knows better.

Hanura H'arasch said...

@Gevlon

Even in this case one time pads are still superior, using the Vigenère cipher. They have been used by spies and alike for hundreds of years for precisely that reason. See here.

Anonymous said...

Sending messages securely has some tricky points. If you use one time pads, what are prooven completely secure, you need to transfer that pad over a secure channel. There are no secure channels, so how do you send secure data over non secure medium like the internet? That means you are back to start, you need to send some messages securely.

One way how its done, is using randomness and public key encryption. Its basicly lending trough internet safes what only you can open, but everyone else can put stuff in it. Result is, you can see messages what are sent only to you.

First problem you get, everyone can use those safes, both good and bad people. Way around is, lets both sides use those "public encryption key" safes. Example, John sends safes Mary over nonsecure channel. Mary puts her safes into one of the John safse and sends it back. Noone can open it, only John. John opens gets Mary safe and puts message in, "i got your safes" and sends back. Again, noone can open safe but Mary. Now both sides have safes and can send messages securely to eachother.

To avoid man in the middle attacks, you need some confirmation that Mary is Mary and John is John, not some malicious James Bond. Thats why you need a third person, who collects those safes. That third person says, if you want to send stuff to John, you must use exactly those safes, what i got here, if someone tells you to use different kind, then dont accept them. And thats what basicly https://en.wikipedia.org/wiki/Certificate_authority does.

Randomness is big part of public key cryptography. It basicly means, how long it would take to guess a random number. If number is small, not many tries are needed. If you increase size, amount of tries increases too. Until to a point where universe cease to exist until you get you correct guess on average time. There is allways possibility go get lucky with first guess too. What it means, public key cryptography is not forever secure. Its secure for a time period. After some time, days/years/million of years, you will decrypt the messages and get the content. If you compare it to one time pads cryptography, if its correctly done and both pads are destroyed after use, its secure forever.

If someone got quantum computer, then it removes the randomness out of the table. What it means, quantum computer can guess any random number correctly instantly. Well, up to certain quantum bit limit. And if you remove the time factor, public key encryption ceases to be useful.

So yes, encryption is tricky business. Sending messages what are forever secure is very hard problem to solve. If you do, im sure you have high chances to win nobel prize.

Anonymous said...

There is always quantum cryptography. It practically already exists as unbreakable transmission protocol. It is not yet commercially viable though. But as many here have stated the transmission isn't the weak spot.

Unknown said...


Les sanglots longs
Des violons
De l’automne
Blessent mon cœur
D’une langueur
Monotone.

Now, that was a message that informed the French résistance that D-Day was about to happen.

Both sides, London and France,agreed on this message, but the Wehrmacht or SD knew about this message...

The German Uboats had not only the enigma, but they also used a code book in which they had codes for specific messages. The enigma was unsecure for technical reason, but when the ennemy has your code book as well, you better not tell secrets via enigma...

Up to today, the importance of Bletchley Park and the outstanding work of Alan Turing can't be estimated high enough. The way Turing was treated because of his sexual orientation is still a shame for England.

Interestingly enough, England provided all the information they decyphered to the red army, especially before Operation Citadel, the battle of Kursk. The defenses were built specifically to counter the then known German attack plans and still the russian suffered incredible losses... Only the invasion of Sicily forced the termination of this operation.

As the other commenters already posted, rather not break a code, but seduce the secretary who copies the decrypted texts and receive a copy from her... OR any other means of circumventing the coding...

Anonymous said...

> can't even trust my computer

Unless your computer has a hardware tap on it, you can just use any popular flash-bootable anonymity-preservation OS, and you're good to go.

Tails (https://tails.boum.org/) is a good choice. Leaves no traces, forces Tor for any connection to internet (you can't hide the fact you sent data, but you can hide the recipient identity this way, and a few extra layers of encryption never hurt anyone), can run from DVD/USB/SD devices, has latest crypto tools available.

Even if it has hardware tap, where could it be to compromise the live bootable OS? Network card? Nah, transmits only already encrypted materials. RAM/CPU? Seriously, those things pull gigabits per second, sorting important data from random junk is seemingly impossible, tap must be extremely specific which process it hunts and where to look... then you change versions and it goes to hell. I/O controller? If you're using hard drive encryption, that'll pull gibberish. Ironically, video card is probably the best bet, as it has fairly big memory and data you're working with tends to be rendered on screen... and then it has to transmit somehow, within a way that works for any OS and any motherboard it happens to be plugged in to, with a guarantee that a free driver in Tails doesn't have any hidden shit in it. The attackers are pretty much fucked. Your entire PC would have to be one big piece of spyware in order to be compromised while running Tails, and while some reasonable level of mistrust is normal, the idea that your custom build PC (which is what most gamers run anyway) somehow turned itself into a spy hardware complex has too much tinfoil for my taste.