Sci-Fi Crypto

From YobiWiki
Jump to navigation Jump to search

This is the copy of an idea I've spread during the HackCamp about Sci-Fi crypto

Sci-Fi crypto

===Brainstorming on how to securely exchange messages while Big Brother has at its disposal nice quantum computers or how to defeat brute-force attacks on message ciphers===

Some time ago I was reading "Digital Fortress" by Dan Brown.
A thriller in the NSA headquarters where the head of the crypto division is facing an unbreakable code, resistant to the brute-force attacks of the NSA's supercomputer TRANSLTR.
The thrilling fact is that the author of this code threatened NSA that the code will be revealed to the entire world if he happens to die.
And his death is the start of the book...

So I was reading and while the protagonists were busy imaginating how this unbreakable code was designed, I thought:
"Bu..sh..![1] To make an unbreakable code this is not like that that you have to do. You've rather to..."
And that's how I imaginated what I'm about to talk about.

Reading further, I found one quote much smarter than the rest of the book, which illustrates very well my technique:
"you could have broken the code but maybe you didn't see it!"

Let's try to create a message cipher resistant to brute force.
I said "message cipher" and not "stream cipher" because this cipher will not work on binary data but english text.
Brute force by definition can break any known symmetric cipher, but what if you don't know you broke it?
Generating all the possible plaintexts is only half of the game, the other half is to have a decision criterion to discriminate the right plaintext amongst the others.
And that's this half that we'll try to make harder for the cryptanalyst.
Here I need a new word because usually crypto talks about plaintext and ciphertext but not about possibly wrongly decoded ciphertexts.
We'll call that an attemptext (tentative of plaintext).

Imagine a ciphered text that is always deciphered to a grammatically correct attemptext, no matter which (wrong) key you used.
Cryptanalysis requires therefore a large amount of work to retrieve the correct plaintext among all attemptexts.
This is the start of a war between artificial intelligences...
Anything the cryptanalyst program can find to discard an attemptext means that you can probably tune the initial grammar model to create more plausible attemptexts.
The problem for the cryptanalyst is a variation of the Turing test as it's a test of humanity, with some differences with the original test:

  • the test procedure is not a dialog but consists of n texts of which only one was written by a human
  • the tester is rather a computer than a human (like for a captcha(tm)[6])
  • the human does not want to prove his humanity, to the contrary, so he can deliberately introduce some non-sense in his text [7]

Let's represent a language by a graph.
Each sentence is one path through the language graph.
Taking another random path will lead to another grammatically correct sentence.
The perfect language graph represents all possible sentences but no junk, so a cryptanalysis has to be done on another level than the grammar or other easy tricks like alphabet frequency analysis.


To cipher our message, the first step is not a cryptographic one but the encoding of the sentence as a path along the grammar graph.
This path has to be identified numerically (enumerated) amongst the possible paths.
Ideally the enumeration must be balanced by the frequency of common grammatical constructions and vocabulary, for example something you get for free if you manage to map some Huffman coding on it.
If there is a complete map between all the paths up to a given length and a bounded set of integers, then we have the guarantee that any random pick in the set will be accepted by the deciphering routine and will lead to an attemptext.
And the numerical representation can now be ciphered by any classic symmetric cipher.
So the complete solution has to follow some rules:

  • it must run without revealing any metadata confirming the right key when brute-forced, so e.g. don't introduce any checksum in the plaintext [2] but on the ciphered text!
  • any wrong key must lead to a proper deciphering and an attemptext, no rejection.

Some additional thoughts:
Such a cipher method covering a balanced language graph would become for free one of the best natural language text compressor available (like when you order in a chinese restaurant the numbers 3, 10 and 12 (I recommend the 12)).
This gives plausible deniability, especially if re-decoded (to another attemptext): you can even deny it is a message encoded with this method, this text was just the poetry of a drunken man ;-)
A little junk is tolerated in the brute-forced plaintexts (even a lot! 99% of detectable junk = loss of 6.6 bits of key material).
A limited language coverage with a stronger grammar can help but then we need a special predictive editor to create only sentences respectful of the cipher grammar.
We covered english language but this could be extended to any formal media description if a grammar can be written for it.
Problems: Grammar/Vocabulary could be limited, we need special care when entering a sentence and be sure it'll be handled properly by the grammar.
We also have to foresee loopholes for whatever will not be in a dictionary (family names, brands, acronyms, numbers...)

Proof-of-concept


Good starting points are grammar and spelling checkers, maybe Turing games (mimicking a human, elisa etc).

$ apt-cache search grammar
link-grammar - Carnegie Mellon University's link grammar parser

See also [3]

$ apt-cache show link-grammar
    In Selator, D. and Temperly, D. "Parsing English with a Link Grammar" (1991), the authors defined a new formal grammatical system called a "link grammar". A sequence of words is in the language of a link grammar if there is a way to draw "links" between words in such a way that the local requirements of each word are satisfied, the links do not cross, and the words form a connected graph. The authors encoded English grammar into such a system, and wrote this program to parse English using this grammar.

Links, graphs, mmm this description talks to me...

# apt-get install link-grammar
$ echo "the cat is chasing the mouse" |link-grammar en
                   +-----Os-----+
 +-Ds-+--Ss-+-Pg*b-+      +--Ds-+
 |    |     |      |      |     |
the cat.n is.v chasing.v the mouse.n 

The difficulty is the enumeration of paths that would cover the key space.
For a first attempt I will keep the grammar links of the plaintext and replace every word by another which respects the same grammar structure.
After scripting some bash around link-grammar and its dictionnaries, here's what I get (see at the bottom of the page for source code):

$ echo my example illustrates a mean to obfuscate a complex sentence easily with an impressive code|./encode 
@24:2 n.1:3651 v.4.2:1105 a n.1:6281 to v.4.1:1430 a adj.1:973 n.1:9127 adv.1:385 with a @476:16 n.1:2021

This is the encoding of the input, every word is replaced by a reference to a wordlist and its position in the list.

$ echo my example illustrates a mean to obfuscate a complex sentence easily with an impressive code|./encode|./decode
my example illustrates a mean to obfuscate a complex sentence easily with an impressive code

So far, so good.

$ echo my example illustrates a mean to obfuscate a complex sentence easily with an impressive code|./encode 123
@24:1 n.1:8481 v.4.2:1980 a n.1:1626 to v.4.1:469 a adj.1:6271 n.1:8763 adv.1:1393 with a @476:28 n.1:6878

Here I'm changing the positions using a secret key with a very very stupid 16bit numeric cipher.

$ echo my example illustrates a mean to obfuscate a complex sentence easily with an impressive code|./encode 123|./decode 123
my example illustrates a mean to obfuscate a complex sentence easily with an impressive code

So far, so good.
But because any wrong key leads to an attemptext, we could send this attemptext instead of the encoded path so it looks less like ciphered stuff than like psychedelic poem:

$ echo my example illustrates a mean to obfuscate a complex sentence easily with an impressive code|./encode 123|./decode
its rehash romanticizes a caterpillar to curry a unobstructed rotter tensely with a undeniable nose
$ echo its rehash romanticizes a caterpillar to curry a unobstructed rotter tensely with a undeniable nose|./encode|./decode 123
my example illustrates a mean to obfuscate a complex sentence easily with a basic number

So far, so good.

$ echo my example illustrates a mean to obfuscate a complex sentence easily with an impressive code|./encode 123|./decode 456
their age attunes a deciliter to co-produce a pious kingpin anon with a plausible subcontinent

Deciphering with a wrong key gives a grammatically correct sentence.[4]

$ echo their age attunes a deciliter to co-produce a pious kingpin anon with a plausible subcontinent|./encode 456|./decode 123
my example illustrates a mean to obfuscate a complex sentence easily with a basic number
$ echo their age attunes a deciliter to co-produce a pious kingpin anon with a plausible subcontinent|./encode 334|./decode 1
my example illustrates a mean to obfuscate a complex sentence easily with a basic number

With this simple cipher, it's a bit like stepping forward and backward and 123-456+334-1=0

Brute-forcing:

$ echo "@24:1 n.1:8481 v.4.2:1980 a n.1:1626 to v.4.1:469 a adj.1:6271 n.1:8763 adv.1:1393 with a @476:28 n.1:6878"|./force     
...
28364:its budgie drapes a firefly to facet an inseparable neighbour deeply with a regrettable wager
28365:your armistice models a dispersion to plagiarize a whiny lirone unfeelingly with a paradoxical teaching
28366:our vine segregates a connotation to uncork a neat humour ploddingly with a self-evident slight
28367:their appetite test-drives a dioxide to wangle a fevered librarian negatively with a false tailspin
28368:its vendor cross-indexes a conch to donate a tractable horse globally with a unbelievable siren
28369:your swimsuit implements a bulldog to notate an inflammable fragrance categorically with a understandable repair
28370:my uptake misplaces a commercial to overstrain a constant hitter adroitly with a significant shuttle
28371:their super scotches a bristle to tamp a rhombohedral force rowdily with a debatable regard
28372:its shipwright blunts an apricot to decolonize a fallback drawing languorously with an improbable pioneer
28373:your reconnaissance facelifts a vernacular to jinx a thorough correlation enquiringly with a conceivable neckband
28374:my shadow imitates an ante to mimic an opinionated donor defectively with a regrettable pianoforte
28375:their rationalist recolors a uttermost to short-change a comprehensible conveyor unflinchingly with a paradoxical musket
28376:its pestilence walls a surface to clamp a reproducible can pneumatically with a self-evident levy
28377:our radical blackmails a underclass to convey an isodicentric constabulary negligently with a false motive
28378:my penchant expropriates a succession to incense an adiabatic caber gloomily with a unbelievable lawmaker
28379:their moneylender overtaxes a shareholder to renounce an obstinate auricle causally with a understandable hindrance
28380:your patron rearranges a stroller to rival a frightful bunk adversely with a significant lair
28381:our misfit videotapes a sensationalist to brand a unavoidable aspect royally with a debatable heliport
^C

This was just a proof-of-concepts, the numeric cipher is stupid, probably leading to some cycles in some wordlists, the key is much too short, the words in a list are not balanced by their frequency, the grammar links are not ciphered at all, my script retains only the first path found by link-grammar and this can lead to situations the grammar links discovered on a modified plaintext does not correspond to the initial plaintext links [5], etc.
BTW why mentioning quantum computers? I'm not sure they would help in brute-force, they will help breaking public key cryptography for sure, but I found it sexy in the title ;-)

polygen /usr/share/polygen/eng/debian/thanks.grm

Notes:
[1] there is more than one technological error in this book (cf e.g. http://victoria.tc.ca/int-grps/books/techrev/bkdgtlft.rvw ), one of the most obvious ones is probably that NSA never considered that the "unbreakable message" could have been random garbage.
Giving a "ciphered text" as a challenge to break, without even giving access to the cipher itself, even not as oracle, is by no mean a proof your cipher is unbreakable, even if I remember in the past having faced such a marketing communication from fools trying to sell a new cipher.
Note that what is finally revealed as the Digital Fortress unbreakable code makes eventually sense in its context (if the proper terminology was used), think about the integer overflow vulnerability in the command "file" for example (CVE-2007-2799)
No I will not tell more, for those who didn't read the book yet and plan to do so, I'm not a spoiler ;-)
[2] that's how I broke my first 32bit cipher for HP48 when I was 17, its author was so confident that he prepended the plaintext with the key.
[3] I also discovered "polygen": PolyGen is a program for generating random sentences according to a grammar definition, that is following custom syntactical and lexical rules.
Not suitable for us unless someone writes a complete grammar of english, but so funny, try it ;-)
[4] Actually on this particular input link-grammar found 2 paths and the first one considers "complex" as a noun rather than an adjective, using the second path would have been more correct.
[5] Here's an example with an older version of link-grammar (secret diffusion was commented out in the code to build this case more easily) echo "it's a complete push" |./encode 1|./decode|./encode|./decode 1 => it 's a complement nonentity
It happened because the intermediate sentence was "it 's a complex sentence", "complex" was build from the adjective list (complete->complex) but analysed in "|./encode|" by link-grammar as a noun, so deciphered as (complex->complement)
This specific example works again in latest versions of link-grammar but I'm not sure it could never happen again with other sentences...
It's a rare border case but if the message has to be conveyed as ciphered sentence instead of a ciphered path, check first that it can be deciphered properly.
[6] CAPTCHA is an acronym: "Completely Automated Public Turing test to tell Computers and Humans Apart" and is trademarked by Carnegie Mellon University.
Note that in our case, a human could manually finish the test if the computer already managed to reduce the original set of n texts to a more reasonable amount.
[7] The intended recipient won't have any problem to distinguish in one single text which parts are meaningful and which are just bananas jumping after the carpet.

Scripts

They require link-grammar
Latest scripts revision were tested with link-grammar version 4.7.4)
SECURITY WARNING link-grammar versions prior to v4.2.5 have an exploitable buffer-overflow bug:

echo xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx|link-grammar en
linkparser> Segmentation fault

Be sure to not parse unknown text and to upgrade link-grammar

Feel free to share your opinion

Philippe Teuwen <phil teuwen org>