Pages

Friday, August 16, 2013

Obfuscated C map generator program Explained

Hello folks ,

Today ,our journey won't be about any topic related to IT security field ,we are going to analyze a C program that became famous the last couple of days ( at least to me ) ,and it uses code obfuscation techniques to make it hard for a usual coder to understand the code ,we are going to walk through it byte by byte until reaching the overall idea on how it's functioning .

Okay ,so let's start by showing the code ,here are two applications i saw lately :

Egypt Map generator

India Map generator

Note :

-I didn't remove any bit in any of the codes provided above ,so all rights are reserved to their real authors ,the coder of the egyptian map  is Eng. Tocka Ayman ,while the indian one ,i found it on StackOverflow ,so its right is reserved to its real coder .


Explanation :

-Let's start by showing respect to the guys who wrote these codes ,i was really happy to waste time and effort analyzing their code trying to de-obfuscate their techniques to reach something useful ,Hats Off !

-We are going to take the egyptian map generator as our study case today ,also im going to use Python to analyze the code ,you can use your best scripting language .

-So let's start the journey ,the code starts with the very usual inclusion for our lovely header files which contain functions explicit prototypes ,in our case ,we included the standard input and output header to use later on putchar() function .
-After that ,we started the main() function with some declarations and one useless initialization ( the variable count ,which has nothing to do in the program ) ,then from now on comes the pain .


-Our main components are two nested for loops which role are to print the characters of the map figure on screen ,let's take a closer look at the header of the first for loop ,we can see the long string in the condition part of the loop ,it looks like this :
 "Microsoft Tech Club Azhar Unv. @Gv@S@@Zv@@bGCAGAEV@@dCFAEAHAEN@@nACAIAFM@@oAAAJAGL@@pAJBHK@@qAJBHAI@@@qAKBFAJ@@@qALBDAK@@@pANBCAK@@@oAPBACJ@@@oAQFI@@@oARFH@@@pARFG@@@qARFF@@@rARFE@@@sARFD@@@rATFC@@@sATFB@@@sAUG@@sAVF@@rBWE@@rAXE@@oAAAZD@@nAAA[@BA@@nAb"

After this string ,we can see the indexing part ,which is little bit obfuscated [b+++21] ,this part can be written in a better way as : [b++ + 21] ,that way ,we can clearly see that it contains a post-increment statement ,an arithmetic operator ,and a constant ,in the initialization part we can see that b and c are assigned the constant 10 ,so the indexed part can be solved to 31 in the first iteration ,that means the string is going to start from the 32'th character in the string ,thus we can conclude that the 'real' string that is responsible to 'draw' the map is :
"@Gv@S@@Zv@@bGCAGAEV@@dCFAEAHAEN@@nACAIAFM@@oAAAJAGL@@pAJBHK@@qAJBHAI@@@qAKBFAJ@@@qALBDAK@@@pANBCAK@@@oAPBACJ@@@oAQFI@@@oARFH@@@pARFG@@@qARFF@@@rARFE@@@sARFD@@@rATFC@@@sATFB@@@sAUG@@sAVF@@rBWE@@rAXE@@oAAAZD@@nAAA[@BA@@nAb"

So far so good ,now let's get deeper and ask about what that string is ,what it can do ?!
This is a kind of compression ( or encoding ,if we can say ) ,named run-length compression ,it's a way to compress a bit-stream buffer into a less-in-size buffer ,which means ,instead of having a buffer of more than hundreds binary digits ,we can compress it into less number of bytes by transforming it into another way of representation which is easy to read ,and much smaller in size than the older bit-stream look .
Here ,the run-length compression is used to transform 1's and 0's into this string ,so that it's much smaller to represent into a program ,and easier to draw on screen .

-So ,let's continue digging into the code ,let's look into the other for loop ,its header has only a condition part ,which is ( a-- > 64 ) ,which is easily makes the second for loop iterates N times ,where ( N = ord(a) - 64 )* ,inside the for loop ,we can see the putchar() function which prints the code out on screen ,it has a little condition ,which prints either a space character ,or an exclamation mark character :
++c=='Z' ? c = c/ 9:33^b&1
 it's a ternary operator statement ,which acts like a branching statement ,if c is equal to 90 ( decimal representation for 'Z' ASCII character ) ,then it will return c to 10 as it was and prints it ( 10 is the decimal representation for the new line feed ASCII character '\n' ) ,if not ( less than 90 ) ,it will execute that weird equation ( 33^b&1 ) ,what this equation really does is too simple ,if (b) is even number ,it will print 33 on screen ( decimal representation for '!' ASCII character ) ,if (b) is odd ,it will decrement one and print 32 on screen ( decimal representation for space ' ' ASCII character ) .

The explanation of this equation is too easy ,if any character ( group of bits ) is AND'ed ( bitwise AND operation ) with the digit 1 ,the result will be the least significant bit of that input character ( also known as bit masking operation ),so here if b&1 gave 1 ,it means that the LSB of b is 1, which means that b is an odd number ,and if it gave 0 ,it means that the LSB of b is 0 ,which means that b is an even number ,after that ,it XOR's the result with 33 ,we know that any bit XOR'ed with zero ,will give the same inputed bit ,that means that if the result of the masking operation ( b&1 ) was zero ,that means that the LSB of b is zero ,which means that 33 won't change and it will be printed as is on screen ,and if the result of the masking operation  was one ,that means that the LSB of b is one ,which means that 1 XOR 1 = zero ,which decrements one from 33 and becomes 32 and printed as a white space on screen .

And voilĂ  ,the code just decodes every character in the string of the first loop ,and prints the bitmap on screen using '!' and white space characters ,forming that beautiful map figure we see .

The application can be written in a lot easier way than the above form :

Code

Note that i have tried to make the code as easy as possible ,if you have a more easier implementation ,just drop a comment here or send me a private message with your code .

Finally ,i hope that this analysis is helpful for anyone interested in programming in C and learning different ways of coding .

Notes :

*As i mentioned in the very beginning of this post ,i used python while analyzing this C program ,for those who aren't familiar with Python ,ord() is a built-in method in python 2.7x that takes an ASCII character ,and returns its decimal equivalent .

2 comments:

  1. nice work...keep on man, thanks for the hard work

    ReplyDelete
  2. Thanks for your comment ,glad it helped .

    ReplyDelete