Hack.lu 2017 Writeups

From YobiWiki
Jump to navigation Jump to search

2017

2017 Not My Digest

cf https://flatearth.fluxfingers.net/challenges/15 and https://notmydigest.flatearth.fluxfingers.net/

A webpage allows to sign and verify messages.
index.php and md5.php are made available.

Signature:

// $secret = 13 chars a-z0-9
function sign($str, $secret) {
    if(preg_match("/[a-zA-Z]/", $str)) { die( "Alpha characters are evil, thus you are not allowed to sign them!" ); }
    return htmlspecialchars($str.MD5_custom($str.$secret));
}

So a signed message is my_message|MD5_custom(my_message|secret) but my_message can't contain alphabetic chars (accentuated chars still work).

Verification:

function verify($str, $secret)
{
    if(strlen($str) < 32) { return false; }
    $md5 = substr($str, -32);
    $content = substr($str, 0, -32);
    if($md5 === MD5_custom($content.$secret)) { return true; }
}
[...]
//main
    if($valid) {
        $result = "success!";
        $str = substr($_POST["postdata"], 0, -32);
        if($str === "the earth is not flat!") {
            $result = "LIAR! GO AWAY, REPTILIAN AND DON'T COME BACK: " . $flag;
        }
    } else {
        $result = "fail!";
    }

So it expects a signed message with at least a hash appended, and it's computing again the custom MD5 and comparing it.
It's funny to note that when the hash is incorrect, the function returns nothing...
if the signature is correct and if the message is exactly the earth is not flat!, then the flag will be displayed. Note the usage of === so we can't abuse it.

So the goal is to craft a signature on a given message which contains obviously chars that will be rejected by the sign function.

Let's look a bit closer to the custom MD5.
The obvious customization is highlighted at the beginning:

/*
define("MD5_R_A", 0x67452301);
define("MD5_R_B", 0xefcdab89);
define("MD5_R_C", 0x98badcfe);
define("MD5_R_D", 0x10325476);
*/
// switch constants 4 security!
define("MD5_R_A", 0x10325476);
define("MD5_R_B", 0x98badcfe);
define("MD5_R_C", 0xefcdab89);
define("MD5_R_D", 0x67452301);

Modified constants could leak to backdoors, remember https://malicioussha1.github.io/ ? But in this case a simple swap A, B, C, D = D, C, B, A is not likely to affect MD5 security.
We can ask oracle to sign an empty message, it returns a4c9bcbb5a72ff69fc02f35d2c13f338, so we have MD5_custom(secret) = a4c9bcbb5a72ff69fc02f35d2c13f338.
At first, we modified hashcat to tackle this modified MD5, simply by swapping typedef enum md5_constants in OpenCL/inc_hash_constants.h.
With that, we could break short secrets we hashed ourselves with the php code (just add echo (MD5_custom($argv[1])); at the end of md5.php and use PHP CLI), but if [a-z0-9]{9} is doable, [a-z0-9]{13} is out of reach.

When comparing with another implementation (and because the implementation of Filippo is much easier to read), we noticed a first difference:
Normally the 8 last characters of the padding are the length of the message in bits: struct.pack('<Q', len(message) * 8), but in PHP it does pack("ll", $len*8, 0);, so the last 4 bytes are forced to zero. This means a message of (512Mb*"A" + "B"), when padded, would get the same last padded block as the message "B".
It's unclear anything could be abused in this specific CTF scenario, but when trying to validate this finding, we quickly found that there was yet another divergence between MD5_custom and our custom version of FiloSottile MD5: As soon as there is more than one MD5 block, i.e. as soon as the message is >= 43 chars, the hashes differ.
One can notice MD5.php is not indented. Running astyle can fix that.
When comparing both implementations and the intermediate states, it appeared that the end of MD5_custom

        $a &= 0xffffffff;
        $b &= 0xffffffff;
        $c &= 0xffffffff;
        $d &= 0xffffffff;
    }
    $a = $sum_a = ($sum_a + $a) & 0xffffffff;
    $b = $sum_b = ($sum_b + $b) & 0xffffffff;
    $c = $sum_c = ($sum_c + $c) & 0xffffffff;
    $d = $sum_d = ($sum_d + $d) & 0xffffffff;
    return bin2hex(pack("l", $sum_a).pack("l", $sum_b).pack("l", $sum_c).pack("l", $sum_d));
}

should have been

        $a &= 0xffffffff; // even not required actually
        $b &= 0xffffffff;
        $c &= 0xffffffff;
        $d &= 0xffffffff;
        $a = $sum_a = ($sum_a + $a) & 0xffffffff;
        $b = $sum_b = ($sum_b + $b) & 0xffffffff;
        $c = $sum_c = ($sum_c + $c) & 0xffffffff;
        $d = $sum_d = ($sum_d + $d) & 0xffffffff;
    }
    return bin2hex(pack("l", $sum_a).pack("l", $sum_b).pack("l", $sum_c).pack("l", $sum_d));
}

Ha, this is why they removed the indentation, to make it a bit harder to find the bug :)
The consequence is that $sum_[abcd] are not applied and updated properly after each intermediate block and only at the very end the MD5 state is added with the initial chaining value.
So when getting a 2-block message hash, we can subtract the initial chaining value from the hash to get the intermediate values a, b, c, d after the MD5 rounds and the mix with the second message block.
These rounds are revertible so if we make a guess about the content of the second block, one can compute a, b, c, d values before the processing of the second block which is therefore the chaining value after the processing of the first block.
This allows to mount a meet-in-the-middle attack:
Assume the secret is "XXXXXXXXXXXXX", hash the message "012345678901234567890123456789012345678901234567890123456XXXXXXXXXXXXX". It will be split in two blocks and the secret will be split across the two blocks: 7 chars at the end of the first block and 6 chars at the beginning of the second block. Compute backwards the states against all possible second blocks (given the message hash returned by the oracle, which is "be75f49ca582d673346bf85209aba13c" for message "012345678901234567890123456789012345678901234567890123456") and store these chaining values, then compute all possible first blocks and check for a match.
This requires quite some RAM, so in practice one can store only partial results (e.g. about half of the chaining value, ~32b as address and 32b as data) in a table smaller than the possible 36^6 values. E.g. when using 8GB, we get 2^31 buckets. This will cause quite some collisions as we're storing "randomly" about the same amount of values in the buckets and about 50% of the stored values will be overwritten by a newer value. Then we'll need a third phase to run again the backward phase and check the matches of phase 2. Because of the collisions in storage, the process may fail and will have to be restarted with different initial conditions.
The source code is available here: https://gist.github.com/doegox/d222ab2aab3d7baa0613e12400d42d2d
At the end, only one candidate is left: candidate: 8spld0srmk4jf
Now let's compute MD5_custom("the earth is not flat!8spld0srmk4jf") = 82136dab78e42b520168c6acc0408bf9 and forge the message "the earth is not flat!82136dab78e42b520168c6acc0408bf9".
Verify it. We're greeted with
LIAR! GO AWAY, REPTILIAN AND DON'T COME BACK: FLAG{One_bad_curly_breaks_your_crypto}.