learning a language through deobfuscation

This will be a periodically updated post as I come across more deobfuscating fun.

Having worked in WordPress for a number of years I can say with some confidence that there’s basically never a time where you want to find obfuscated code buried somewhere in /wp-content. However, having worked in WordPress for a number of years I can also say that I’ve found precisely that a few times. After going through the appropriate process for remediation (hyperventilate, sudo rm -rf, wp plugin update --all && wp theme update --all, kidding…), I began to enjoy the process of picking apart the code that caused me such trouble. That’s when I found out that there’s a lot to learn from it.

In retrospect this makes a lot of sense. It’s to the benefit of code obfuscators to be as arcane as possible. They have every reason then to leverage any and every quirk of a programming language. The great benefit to us is that obfuscated code becomes an unguided tour to these programming nooks and crannies and seldom used functions.

tricks of the trade

One of the primary modes of obfuscation is just to hide the signal in the noise. And a simple way to do that is to make mentally parsing the code difficult.

smash it all together

For languages without semantic whitespace putting the entirety of the code on a single line makes no different to the compiler or interpreter but makes it looks either like a wall of text or a single line disappearing into the distance depending on whether your editor is wrapping lines.

For example:

wc dont_run_this.php
       0       9  113968 dont_run_this.php

For wc a line is a string with a newline and a word is a string delimited by whitespace. So that’s 0 lines, 9 words, and almost 114,000 bytes. Your code editor won’t like this either and will probably stutter around when you try navigating around. Let’s help it out:

sed 's/;/;\n/g' dont_run_this.php > still_dont_run_this.php

which will give us:

wc still_dont_run_this.php
      12      21  113980 still_dont_run_this.php

Really? Only 12 lines?

Either this code isn’t doing much or there’s more obfuscation going on. I’m betting on the latter.

stretch it all out

Since our editor isn’t stumbling around like a one-year old chasing bubbles anymore, we can poke around the file much easier now. What jumps out right away is that we’ve got ’$’ followed by extremely long strings and since we’re working in PHP these are giving us variable names.

<php?
    $sn30n /* ... 241 more characters */ = ''

This gives us the opportunity for a simple search and replace. Since we’re likely dealing with a handful of expressions in this file given the number of lines we can do this manually. What we end up with looks like this:

<php?
    $part_0 = "..."; // ~31,000 characters
    $part_1 = "..."; // ~31,000 characters
    $part_2 = "..."; // ~31,000 characters
    $part_3 = "..."; // ~15,000 characters
    $part_4 = $part_0 . '' . $part_1;
    $part_5 = $part_4 . '' . $part_2;
    $to_rotate = $part_5 . '' . $part_3;
    $to_decode_0 = str_rot13($to_rotate);
    $to_decode_1 = base64_decode($to_decode_0);
    $to_inflate = base64_decode($to_decode_1);
    $to_eval = gzinflate($to_inflate);
    eval($to_eval);
?>

The idea is clear now. The real source code is buried in the 108,000 characters of the first 4 variables it just needs a bit polish to uncover:

concatenate all 4 strings
use str_rot13 on the result
base64 decode it twice
gzinflate that result
finally, run the result

There are a couple of interesting tidbits here. The docs on str_rot13 say, “shifts every letter by 13 places in the alphabet while leaving non-alpha characters untouched.” If we don’t take the time to simplify our variables and only see that there’s some base 64 shenanigans going on we might be tempted to try and decode some of our initial strings. The results of this are really unhelpful:

echo "long initial string..." | base64 --decode
...:{���q�2k���9$%�G"!1&a:�`e�D��^[[?62;22c*wL�Nb�5...

The str_rot13 ensures that we end up with gibberish by shifting only a portion of the input strings.

Perhaps unsurprisingly, when we reverse this chain of events and echo out what’s to be finally evaluated we get a string that looks like this (whitespace for clarity):

<php?
    eval(
        gzinflate(
            base64_decode(
                base64_decode(
                    str_rot13(
                        'another 85,000 character long string...')))));

So let’s unwind those operations too until we finally get:

<php?
    eval(gzinflate(base64_decode(base64_decode(str_rot13('66,000 more characters')))));

At this point, I can’t roll my eyes hard enough. I mean I’ve already gotten this far just show me what you’re doing!

Fine. Let’s do it all again and we get source code!

…

Just kidding. We have follow the same rotate, decode, and inflate procedure twice more to get the actual meat of this malicious program.