Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I have to throw my personal favorites into the ring: decompilation and deobfuscation.

In either case, after a certain amount of analysis you'll end up with a set of nodes (typically basic blocks) which connect to each other, but they'll be what I call 'immature' nodes. You know they connect, but they all effectively end with gotos. To make any sense of them, you have to make them into mature nodes: if, if-else, while, do-while, for, etc. Via graph reduction this is possible.

For instance, a few years back I was reverse-engineering Apple's Fairplay DRM and ran into an odd obfuscation scheme. There were dozens of functions which had the same structure: a switch inside a while statement; a state machine. After some hand-analysis, I figured out that at the end of each state, it'd set the state variable to either a constant value (a jump) or one of two conditional values (a conditional branch). At this point, it became clear that they were simply taking basic blocks and turning them into a big obfuscated state machine.

To make this into something worthwhile, I wrote a script that would go through the function and 'execute' it, to determine the state connections automatically. Then from there, I wired them up into a big graph, with two types of nodes: Fallthrough (no conditional) and Branch (conditional). Now, that helps a lot in terms of showing you what's happening, but not nearly as much so as control flow structures do, so the battle was only half over.

Now, if you look at common control flow structures from a graph perspective, they all look pretty straightforward (using something like dot graph notation):

    A; if(X) {B} else {C} D  becomes A -> B, A -> C, B -> D, C -> D.
    A; if(X) {B} C  becomes A -> B, A -> C, B -> C
    A; while(X) {B} C  becomes A -> B, A -> C, B -> C
So if you walk over the graph looking for these patterns (only looking at single nodes, mind you), replacing them with their mature equivalents, and recursively do that until you don't change anything, you can end up with a fully deobfuscated, cleanly readable result.


Great story. Worth pointing out here that reversers have been running with this idea full speed since ~2000. For instance, Halvar Flake and his team at Zynamics use graph isomorphisms of basic blocks in BinDiff, which is the industry standard tool for reversing security patches back to vulnerabilities.

Obviously, anyone who's ever used IDA Pro is familiar with control flow graphs and bblocks; hit the space bar in IDA and you get it graphically.


Yea, there's a lot of interesting things that have been done with graphs. I'm currently working on a reversing platform, and one of the core things I'm doing is elegant graph manipulation. It's amazing how much you can do when you have nice graphs, e.g. native pattern matching over graphs.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: