MoVfuscator Writeup

Intro

Three days ago Chris Domas announced the release of M/o/Vfuscator2, a beautiful single instruction C compiler leveraging the paper "mov is Turing-complete" (pdf), by Stephen Dolan.
That's it: once compiled, your program is made only of MOV instructions.
See the REcon 2015 slides (pdf) for more insight.
The code is available here: https://github.com/xoreaxeaxeax/movfuscator and by default check.sh will apply it on https://github.com/kokke/tiny-AES128-C , a small portable AES128 implementation.

So, is it safe to protect your AES crypto with M/o/Vfuscator2?

Coincidentally we published a new attack against white-boxes a few days ago: Differential Computation Analysis: Hiding your White-Box Designs is Not Enough.
M/o/Vfuscator2 doesn't transform your AES into a traditional white-box based on look-up tables but we should admit it's quite intimidating for a reverser.

Visualization

For example, here is a trace of the initial AES, up to the first three rounds:

Legend: memory range on the X-axis, time counting from top to bottom on the Y-axis, instructions in black, mem reads in green, mem writes in red.

And here once it's compiled with M/o/Vfuscator2:

Legend: memory range on the X-axis, time counting from top to bottom on the Y-axis, instructions in black, mem reads in green, mem writes in red.

Ouch! That's why I limited the trace to the first 3 rounds, it's so huge that I've disk space and RAM issues to display more...
The black square is simply the main MOV loop repeating all long, so if there is information, it should be in the memory accesses.

Challenge

The initial tiny-AES128-C example encrypts a fixed plaintext with a fixed key but let's make a more realistic challenge where one can provide arbitrary plaintexts.
You can download this code [{{#file: challenge.c}} as challenge.c]

#include <stdio.h>
#include <string.h>
#include <stdint.h>
#define CBC 0
#define ECB 1
#include "aes.h"

static const uint8_t *CHRHEX = (uint8_t *)
    "\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF" \
    "\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF" \
    "\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF" \
    "\x00\x01\x02\x03\x04\x05\x06\x07\x08\x09\xFF\xFF\xFF\xFF\xFF\xFF" \
    "\xFF\x0A\x0B\x0C\x0D\x0E\x0F\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF" \
    "\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF" \
    "\xFF\x0A\x0B\x0C\x0D\x0E\x0F\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF" \
    "\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF" \
    "\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF" \
    "\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF" \
    "\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF" \
    "\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF" \
    "\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF" \
    "\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF" \
    "\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF" \
    "\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF";

static void phex(uint8_t* str)
{
    unsigned char i;
    for(i = 0; i < 16; ++i)
        printf("%.2x", str[i]);
    printf("\n");
}

int main(int argc, char *argv[])
{
    uint8_t key[16] =  { 0x2b, 0x7e, 0x15, 0x16, 0x28, 0xae, 0xd2, 0xa6, 0xab, 0xf7, 0x15, 0x88, 0x09, 0xcf, 0x4f, 0x3c };
    uint8_t plaintext[16];
    uint8_t ciphertext[16];
    int n=0;
    const char *p;
    const uint8_t *D;
    uint8_t c=0, e, *d;
    p = argv[1];
    d = plaintext;
    D = (d + sizeof (plaintext));
    for (; d != D && *p; p++) {
        e = CHRHEX [(int) *p];
        if (e != 0xFF) {
            c = ((c << 4) | e);
            n++;
            if (n == 2) {
                *(d++) = c;
                n = 0;
            }
        }
    }
    AES128_ECB_encrypt(plaintext, key, ciphertext);
    printf("plaintext:\n");
    phex(plaintext);
    printf("ciphertext:\n");
    phex(ciphertext);
    return 0;
}

To download, compile and test it, you can use [{{#file: build_challenge.sh}} this script]:

#!/bin/sh
git clone https://github.com/xoreaxeaxeax/movfuscator
cd movfuscator
./build.sh
[ ! -d "examples/aes" ] && git clone https://github.com/kokke/tiny-AES128-C examples/aes
wget -O examples/aes/challenge.c "http://wiki.yobi.be/index.php?title=MoVfuscator_Writeup&action=raw&anchor=challenge.c"
movcc examples/aes/aes.c examples/aes/challenge.c -o examples/aes/challenge -s
./examples/aes/challenge 6bc1bee22e409f96e93d7e117393172a

Yes, I kept the same AES key, but that doesn't really matter. You can change it if you wish.

./build_challenge.sh
...
emit/mov>cnsti4(0)
emit/mov>cnsti4(0)
emit/mov>reti4(cnsti4(0))
emit/mov>labelv(17)

M/o/Vfuscation complete.

plaintext:
6bc1bee22e409f96e93d7e117393172a
ciphertext:
3ad77bb40d7a3660a89ecaf32466ef97

Traces

Let's use a very simple tracing code with Intel PIN to record all 1-byte memory writes:

#include "pin.H"
#include <fstream>

std::ofstream TraceFile;
PIN_LOCK lock;
ADDRINT main_begin;
ADDRINT main_end;

static ADDRINT WriteAddr;
static INT32 WriteSize;

static VOID RecordWriteAddrSize(ADDRINT addr, INT32 size)
{
    WriteAddr = addr;
    WriteSize = size;
}

static VOID RecordMemWrite(ADDRINT ip)
{
    UINT8 memdump[256];
    PIN_GetLock(&lock, ip);
    PIN_SafeCopy(memdump, (void *)WriteAddr, WriteSize);
    if (WriteSize==1)
        TraceFile << static_cast<CHAR>(*memdump);
    PIN_ReleaseLock(&lock);
}

VOID Instruction_cb(INS ins, VOID *v)
{
    ADDRINT ip = INS_Address(ins);
    if ((ip < main_begin) || (ip > main_end))
        return;

    if (INS_IsMemoryWrite(ins))
    {
        INS_InsertPredicatedCall(
            ins, IPOINT_BEFORE, (AFUNPTR)RecordWriteAddrSize,
            IARG_MEMORYWRITE_EA,
            IARG_MEMORYWRITE_SIZE,
            IARG_END);
        if (INS_HasFallThrough(ins))
        {
            INS_InsertCall(
                ins, IPOINT_AFTER, (AFUNPTR)RecordMemWrite,
                IARG_INST_PTR,
                IARG_END);
        }
    }
}

void ImageLoad_cb(IMG Img, void *v)
{
    PIN_GetLock(&lock, 0);
    if(IMG_IsMainExecutable(Img))
    {
        main_begin = IMG_LowAddress(Img);
        main_end = IMG_HighAddress(Img);
    }
    PIN_ReleaseLock(&lock);
}

VOID Fini(INT32 code, VOID *v)
{
    TraceFile.close();
}

int  main(int argc, char *argv[])
{
    PIN_InitSymbols();
    PIN_Init(argc,argv);
    TraceFile.open("trace-1byte-writes.bin");
    if(TraceFile == NULL)
        return -1;
    IMG_AddInstrumentFunction(ImageLoad_cb, 0);
    INS_AddInstrumentFunction(Instruction_cb, 0);
    PIN_AddFiniFunction(Fini, 0);
    PIN_StartProgram();
    return 0;
}

Compile it with your Intel PIN release and use it: (I used the -ifeellucky switch because current latest PIN doesn't know about my linux kernel 4.0 yet)

../../../pin -ifeellucky -t ./obj-ia32/Tracer.so -- ./aes 6bc1bee22e409f96e93d7e117393172a

Attack

Ok. Now, we've everything to apply the attack of our paper:

Acquire a few traces with different random plaintexts, keep the traces and corresponding plaintexts (we also keep the ciphertexts, handy to validate a recovered AES key...).

#!/usr/bin/env python
import os
import random
import subprocess
for i in range(50):
    plaintext=random.randint(0, (1<<(8*16))-1)
    cmd = '../../../pin -ifeellucky -t ./obj-ia32/Tracer.so -- ./challenge %0*x; exit 0' % (32, plaintext)
    output=subprocess.check_output(cmd, shell=True)
    ciphertext=int(output.split('\n')[3], 16)
    os.rename('trace-1byte-writes.bin', 'trace_1byte_write_%02i_%0*X_%0*X.bin' % (i, 32, plaintext, 32, ciphertext))
    print '%02i %0*X -> %0*X' % (i, 32, plaintext, 32, ciphertext)

The traces are still pretty large with 3.988.860 1-byte memory writes captured per trace but most are common to all traces and therfore contain zero useful information, let's filter them out.

#!/usr/bin/env python

import glob
N=10 # traces to decide which bytes to keep
filelist = sorted(glob.glob('trace_1byte_write_*'))
ref_samples=[ord(x) for x in open(filelist[0], 'rb').read()]
mask=[False]*len(ref_samples)
for filename in filelist[1:N]:
    samples=[ord(x) for x in open(filename, 'rb').read()]
    mask=[m or (x-y != 0) for m,x,y in zip(mask,ref_samples, samples)]
for filename in filelist:
    with open(filename, 'rb') as trace, open(filename+".small", 'wb') as small:
        samples=[x for x,m in zip(trace.read(),mask) if m]
        small.write(''.join(samples))

Now each trace contains only 6625 1-byte writes, much better!

A classical DPA attack on those traces will reveal the key with as few as 20 traces.
Small reminder about the settings to mount a DPA attack with as few traces as possible, from our paper:

Serialize the traces to consider each bit individually
Attack first round SBox output, considering each target bit individually, not their Hamming weight.
DPA works slightly better than DCA

One may use autocorrelation on a single trace to find the underlying AES structure and find the first round:

In red I've circled the spot where useful information leaks. I don't know why there is this unusual black cross.
You see here clearly the nine AES rounds with their MixColumn operation (last round doesn't have MixColumn and is therefore much smaller).
In a typical white-box setup it would leak somewhere further in the first big square because the first round key is merged in the tables of the next operations but here it's mixed immediately, as per the AES definition.

Open samples

In typical white-boxes, the key scheduling is unrolled when the executable is generated and the round keys are merged somehow into the processing, typically in some look-up tables.
But here it was not the case so, if someone knows exactly where to look, he may directly extract the key.
This knowledge requires so called open samples: we need to compile a few same challenges with other known keys.
In such case, we can correlate directly the keys with the binaries to see if the key bytes (or bits) are directly accessible somewhere in the binaries.
Once the spots have been identified, simply extract the key material from the initial challenge binary.
This is left as an exercise for the reader ;)

Conclusions

M/o/Vfuscator2 is wonderful at hiding a processing and someone has still to prove if the transform can be somehow automatically reversed into a more readable disassembly.
And so e.g. crackme2, a simple 15 line crackme in C, is probably pretty hard to crack.
But cryptography is not ordinary processing, and even if the compiled tiny-AES128-c is much larger and slower than the simple crackme2, we saw that it's relatively easy to crack automatically, with a generic attack agnostic to the MOV transform.

Lessons:
Yes, M/o/Vfuscator2 sounds very appealing to harden your processing.
No, don't count on it to harden any cryptographic processing!