MoVfuscator Writeup
Intro
Three days ago Chris Domas announced the release of M/o/Vfuscator2, a beautiful single instruction C compiler leveraging the paper "mov is Turing-complete" (pdf), by Stephen Dolan.
That's it: once compiled, your program is made only of MOV instructions.
See the REcon 2015 slides (pdf) for more insight.
The code is available here: https://github.com/xoreaxeaxeax/movfuscator and by default check.sh will apply it on https://github.com/kokke/tiny-AES128-C , a small portable AES128 implementation.
So, is it safe to protect your AES crypto with M/o/Vfuscator2?
Coincidentally we published a new attack against white-boxes a few days ago: Differential Computation Analysis: Hiding your White-Box Designs is Not Enough.
M/o/Vfuscator2 doesn't transform your AES into a traditional white-box based on look-up tables but we should admit it's quite intimidating for a reverser.
Visualization
For example, here is a trace of the initial AES, up to the first three rounds:
Legend: memory range on the X-axis, time counting from top to bottom on the Y-axis, instructions in black, mem reads in green, mem writes in red.
And here once it's compiled with M/o/Vfuscator2:
Legend: memory range on the X-axis, time counting from top to bottom on the Y-axis, instructions in black, mem reads in green, mem writes in red.
Ouch! That's why I limited the trace to the first 3 rounds, it's so huge that I've disk space and RAM issues to display more...
The black square is simply the main MOV loop repeating all long, so if there is information, it should be in the memory accesses.
Challenge
The initial tiny-AES128-C example encrypts a fixed plaintext with a fixed key but let's make a more realistic challenge where one can provide arbitrary plaintexts.
You can download this code [{{#file: challenge.c}} as challenge.c]
#include <stdio.h>
#include <string.h>
#include <stdint.h>
#define CBC 0
#define ECB 1
#include "aes.h"
static const uint8_t *CHRHEX = (uint8_t *)
"\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF" \
"\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF" \
"\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF" \
"\x00\x01\x02\x03\x04\x05\x06\x07\x08\x09\xFF\xFF\xFF\xFF\xFF\xFF" \
"\xFF\x0A\x0B\x0C\x0D\x0E\x0F\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF" \
"\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF" \
"\xFF\x0A\x0B\x0C\x0D\x0E\x0F\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF" \
"\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF" \
"\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF" \
"\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF" \
"\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF" \
"\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF" \
"\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF" \
"\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF" \
"\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF" \
"\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF";
static void phex(uint8_t* str)
{
unsigned char i;
for(i = 0; i < 16; ++i)
printf("%.2x", str[i]);
printf("\n");
}
int main(int argc, char *argv[])
{
uint8_t key[16] = { 0x2b, 0x7e, 0x15, 0x16, 0x28, 0xae, 0xd2, 0xa6, 0xab, 0xf7, 0x15, 0x88, 0x09, 0xcf, 0x4f, 0x3c };
uint8_t plaintext[16];
uint8_t ciphertext[16];
int n=0;
const char *p;
const uint8_t *D;
uint8_t c=0, e, *d;
p = argv[1];
d = plaintext;
D = (d + sizeof (plaintext));
for (; d != D && *p; p++) {
e = CHRHEX [(int) *p];
if (e != 0xFF) {
c = ((c << 4) | e);
n++;
if (n == 2) {
*(d++) = c;
n = 0;
}
}
}
AES128_ECB_encrypt(plaintext, key, ciphertext);
printf("plaintext:\n");
phex(plaintext);
printf("ciphertext:\n");
phex(ciphertext);
return 0;
}
To download, compile and test it, you can use [{{#file: build_challenge.sh}} this script]:
#!/bin/sh
git clone https://github.com/xoreaxeaxeax/movfuscator
cd movfuscator
./build.sh
[ ! -d "examples/aes" ] && git clone https://github.com/kokke/tiny-AES128-C examples/aes
wget -O examples/aes/challenge.c "http://wiki.yobi.be/index.php?title=MoVfuscator_Writeup&action=raw&anchor=challenge.c"
movcc examples/aes/aes.c examples/aes/challenge.c -o examples/aes/challenge -s
./examples/aes/challenge 6bc1bee22e409f96e93d7e117393172a
Yes, I kept the same AES key, but that doesn't really matter. You can change it if you wish.
./build_challenge.sh ... emit/mov>cnsti4(0) emit/mov>cnsti4(0) emit/mov>reti4(cnsti4(0)) emit/mov>labelv(17) M/o/Vfuscation complete. plaintext: 6bc1bee22e409f96e93d7e117393172a ciphertext: 3ad77bb40d7a3660a89ecaf32466ef97
Traces
Let's use a very simple [{{#filelink: tracer.cpp}} tracing code tracer.cpp] with Intel PIN to record all 1-byte memory writes: {{#fileanchor: tracer.cpp}}
#include "pin.H"
#include <fstream>
std::ofstream TraceFile;
PIN_LOCK lock;
ADDRINT main_begin;
ADDRINT main_end;
static ADDRINT WriteAddr;
static INT32 WriteSize;
static VOID RecordWriteAddrSize(ADDRINT addr, INT32 size)
{
WriteAddr = addr;
WriteSize = size;
}
static VOID RecordMemWrite(ADDRINT ip)
{
UINT8 memdump[256];
PIN_GetLock(&lock, ip);
PIN_SafeCopy(memdump, (void *)WriteAddr, WriteSize);
if (WriteSize==1)
TraceFile << static_cast<CHAR>(*memdump);
PIN_ReleaseLock(&lock);
}
VOID Instruction_cb(INS ins, VOID *v)
{
ADDRINT ip = INS_Address(ins);
if ((ip < main_begin) || (ip > main_end))
return;
if (INS_IsMemoryWrite(ins))
{
INS_InsertPredicatedCall(
ins, IPOINT_BEFORE, (AFUNPTR)RecordWriteAddrSize,
IARG_MEMORYWRITE_EA,
IARG_MEMORYWRITE_SIZE,
IARG_END);
if (INS_HasFallThrough(ins))
{
INS_InsertCall(
ins, IPOINT_AFTER, (AFUNPTR)RecordMemWrite,
IARG_INST_PTR,
IARG_END);
}
}
}
void ImageLoad_cb(IMG Img, void *v)
{
PIN_GetLock(&lock, 0);
if(IMG_IsMainExecutable(Img))
{
main_begin = IMG_LowAddress(Img);
main_end = IMG_HighAddress(Img);
}
PIN_ReleaseLock(&lock);
}
VOID Fini(INT32 code, VOID *v)
{
TraceFile.close();
}
int main(int argc, char *argv[])
{
PIN_InitSymbols();
PIN_Init(argc,argv);
TraceFile.open("trace-1byte-writes.bin");
if(TraceFile == NULL)
return -1;
IMG_AddInstrumentFunction(ImageLoad_cb, 0);
INS_AddInstrumentFunction(Instruction_cb, 0);
PIN_AddFiniFunction(Fini, 0);
PIN_StartProgram();
return 0;
}
Compile it with your Intel PIN release, e.g. put this file as [{{#filelink: tracer.cpp}} tracer.cpp] in pin-2.13-65163-gcc.4.4.7-linux/source/tools/mystuff/ together with a [{{#filelink: Makefile}} Makefile]: {{#fileanchor: Makefile}}
CONFIG_ROOT := ../Config
include $(CONFIG_ROOT)/makefile.config
include $(TOOLS_ROOT)/Config/makefile.default.rules
all: ia32 intel64
ia32:
mkdir -p obj-ia32
$(MAKE) TARGET=ia32 obj-ia32/Tracer.so
intel64:
mkdir -p obj-intel64
$(MAKE) TARGET=intel64 obj-intel64/Tracer.so
clean-all:
$(MAKE) TARGET=ia32 clean
$(MAKE) TARGET=intel64 clean
And use it: (I used the -ifeellucky
switch because current latest PIN doesn't know about my linux kernel 4.0 yet)
make ../../../pin -ifeellucky -t ./obj-ia32/Tracer.so -- ./challenge 6bc1bee22e409f96e93d7e117393172a
Attack
Ok. Now, we've everything to apply the attack of our paper:
Acquire a few traces with different random plaintexts, keep the traces and corresponding plaintexts (we also keep the ciphertexts, handy to validate a recovered AES key...).
#!/usr/bin/env python
import os
import random
import subprocess
for i in range(50):
plaintext=random.randint(0, (1<<(8*16))-1)
cmd = '../../../pin -ifeellucky -t ./obj-ia32/Tracer.so -- ./challenge %0*x; exit 0' % (32, plaintext)
output=subprocess.check_output(cmd, shell=True)
ciphertext=int(output.split('\n')[3], 16)
os.rename('trace-1byte-writes.bin', 'trace_1byte_write_%02i_%0*X_%0*X.bin' % (i, 32, plaintext, 32, ciphertext))
print '%02i %0*X -> %0*X' % (i, 32, plaintext, 32, ciphertext)
The traces are still pretty large with 3.988.860 1-byte memory writes captured per trace but most are common to all traces and therfore contain zero useful information, let's filter them out.
#!/usr/bin/env python
import glob
N=10 # traces to decide which bytes to keep
filelist = sorted(glob.glob('trace_1byte_write_*'))
ref_samples=[ord(x) for x in open(filelist[0], 'rb').read()]
mask=[False]*len(ref_samples)
for filename in filelist[1:N]:
samples=[ord(x) for x in open(filename, 'rb').read()]
mask=[m or (x-y != 0) for m,x,y in zip(mask,ref_samples, samples)]
for filename in filelist:
with open(filename, 'rb') as trace, open(filename+".small", 'wb') as small:
samples=[x for x,m in zip(trace.read(),mask) if m]
small.write(''.join(samples))
Now each trace contains only 6625 1-byte writes, much better!
A classical DPA attack on those traces will reveal the key with about 20 to 30 traces.
Small reminder about the settings to mount a DPA attack with as few traces as possible, from our paper:
- Serialize the traces to consider each bit individually
- Attack first round SBox output, considering each target bit individually, not their Hamming weight.
- DPA works slightly better than CPA
If nevertheless you're trying CPA on Hamming weight of first round SBox, it will require about 160 traces, still quite doable.
Here is a simple DPA on the serialized traces written in Python. For larger datasets, better to use faster alternatives...
#!/usr/bin/env python
import glob
import numpy as np
import os
filenames = sorted(glob.glob('trace_1byte_write*small'))
#ntraces = len(filenames)
# Limit our number of traces for a faster result:
ntraces=30
nsamples = os.path.getsize(filenames[0])*8
#sample_start = 0
#sample_end = nsamples-1
# Focus on the first round for a faster result:
sample_start = 3000
sample_end = 3500
traces = np.zeros([ntraces, nsamples],np.uint8)
plaintexts = np.zeros([ntraces, 16],np.uint8)
for filename in filenames[:ntraces]:
i,plain,cipher = filename[18:-10].split('_')
open(filename, 'rb').read()
traces[i,:] = np.unpackbits(np.fromfile(filename,dtype=np.uint8))
plaintexts[i,:] = [ord(x) for x in plain.decode('hex')]
SBOX = [0x63, 0x7c, 0x77, 0x7b, 0xf2, 0x6b, 0x6f, 0xc5, 0x30, 0x01, 0x67,
0x2b, 0xfe, 0xd7, 0xab, 0x76, 0xca, 0x82, 0xc9, 0x7d, 0xfa, 0x59,
0x47, 0xf0, 0xad, 0xd4, 0xa2, 0xaf, 0x9c, 0xa4, 0x72, 0xc0, 0xb7,
0xfd, 0x93, 0x26, 0x36, 0x3f, 0xf7, 0xcc, 0x34, 0xa5, 0xe5, 0xf1,
0x71, 0xd8, 0x31, 0x15, 0x04, 0xc7, 0x23, 0xc3, 0x18, 0x96, 0x05,
0x9a, 0x07, 0x12, 0x80, 0xe2, 0xeb, 0x27, 0xb2, 0x75, 0x09, 0x83,
0x2c, 0x1a, 0x1b, 0x6e, 0x5a, 0xa0, 0x52, 0x3b, 0xd6, 0xb3, 0x29,
0xe3, 0x2f, 0x84, 0x53, 0xd1, 0x00, 0xed, 0x20, 0xfc, 0xb1, 0x5b,
0x6a, 0xcb, 0xbe, 0x39, 0x4a, 0x4c, 0x58, 0xcf, 0xd0, 0xef, 0xaa,
0xfb, 0x43, 0x4d, 0x33, 0x85, 0x45, 0xf9, 0x02, 0x7f, 0x50, 0x3c,
0x9f, 0xa8, 0x51, 0xa3, 0x40, 0x8f, 0x92, 0x9d, 0x38, 0xf5, 0xbc,
0xb6, 0xda, 0x21, 0x10, 0xff, 0xf3, 0xd2, 0xcd, 0x0c, 0x13, 0xec,
0x5f, 0x97, 0x44, 0x17, 0xc4, 0xa7, 0x7e, 0x3d, 0x64, 0x5d, 0x19,
0x73, 0x60, 0x81, 0x4f, 0xdc, 0x22, 0x2a, 0x90, 0x88, 0x46, 0xee,
0xb8, 0x14, 0xde, 0x5e, 0x0b, 0xdb, 0xe0, 0x32, 0x3a, 0x0a, 0x49,
0x06, 0x24, 0x5c, 0xc2, 0xd3, 0xac, 0x62, 0x91, 0x95, 0xe4, 0x79,
0xe7, 0xc8, 0x37, 0x6d, 0x8d, 0xd5, 0x4e, 0xa9, 0x6c, 0x56, 0xf4,
0xea, 0x65, 0x7a, 0xae, 0x08, 0xba, 0x78, 0x25, 0x2e, 0x1c, 0xa6,
0xb4, 0xc6, 0xe8, 0xdd, 0x74, 0x1f, 0x4b, 0xbd, 0x8b, 0x8a, 0x70,
0x3e, 0xb5, 0x66, 0x48, 0x03, 0xf6, 0x0e, 0x61, 0x35, 0x57, 0xb9,
0x86, 0xc1, 0x1d, 0x9e, 0xe1, 0xf8, 0x98, 0x11, 0x69, 0xd9, 0x8e,
0x94, 0x9b, 0x1e, 0x87, 0xe9, 0xce, 0x55, 0x28, 0xdf, 0x8c, 0xa1,
0x89, 0x0d, 0xbf, 0xe6, 0x42, 0x68, 0x41, 0x99, 0x2d, 0x0f, 0xb0,
0x54, 0xbb, 0x16]
# code derived from https://media.blackhat.com/eu-13/briefings/OFlynn/bh-eu-13-for-cheapstakes-oflynn-slides.pdf
# by Colin O'Flynn, slide #16
np.seterr(all='ignore')
for bnum in range(0, 16):
diffs = [0]*256
for key in range(0, 256):
mean1 = np.zeros([sample_end-sample_start,8])
mean0 = np.zeros([sample_end-sample_start,8])
num1 = [0]*8
num0 = [0]*8
for tnum in range(ntraces):
Hyp = SBOX[plaintexts[tnum, bnum] ^ key]
for targetbit in range(8):
if (Hyp & (1 << targetbit)) != 0:
mean1[:,targetbit] = np.add(mean1[:,targetbit], traces[tnum,sample_start:sample_end])
num1[targetbit] += 1
else:
mean0[:,targetbit] = np.add(mean0[:,targetbit], traces[tnum,sample_start:sample_end])
num0[targetbit] += 1
mean1=np.divide(mean1, num1)
mean0=np.divide(mean0, num0)
absdiff = np.fabs(np.subtract(mean1, mean0))
diffs[key] = np.nanmax(absdiff)
print "%02x" % diffs.index(max(diffs)),
Running it:
time ./dpa2.py 2b 7e 15 16 28 ae d2 a6 ab f7 15 88 09 cf 4f 3c real 0m7.927s user 0m7.908s sys 0m0.012s
One may use autocorrelation on a single trace to find the underlying AES structure and find the first round:
In red I've circled the spot where useful information leaks. I don't know why there is this unusual black cross.
You see here clearly the nine AES rounds with their MixColumn operation (last round doesn't have MixColumn and is therefore much smaller).
In a typical white-box setup it would leak somewhere further in the first big square because the first round key is merged in the tables of the next operations but here it's mixed immediately, as per the AES definition.
Open samples
In typical white-boxes, the key scheduling is precomputed when the executable is generated and the round keys are merged somehow into the processing, typically in some look-up tables.
But here it was not the case so, if someone knows exactly where to look, he may directly extract the key.
This knowledge requires so called open samples: we need to compile a few same challenges with other known keys.
In such case, we can correlate directly the keys with the binaries to see if the key bytes (or bits) are directly accessible somewhere in the binaries.
Once the spots have been identified, simply extract the key material from the initial challenge binary.
This is left as an exercise for the reader ;)
Conclusions
M/o/Vfuscator2 is wonderful at hiding a processing and someone has still to prove if the transform can be somehow automatically reversed into a more readable disassembly.
And so e.g. crackme2, a simple 15 line crackme in C, is probably pretty hard to crack.
But cryptography is not ordinary processing, and even if the compiled tiny-AES128-c is much larger and slower than the simple crackme2, we saw that it's relatively easy to crack automatically, with a generic attack agnostic to the MOV transform.
Lessons:
Yes, M/o/Vfuscator2 sounds very appealing to obfuscate your processing.
No, don't count on it to harden any cryptographic processing!
Comments?
You can reply to my tweet or contact me.