So I competed ASIS CTF Finals 2018 as part of team dcua. It was a nice experience even though I personally was able to really tackle one problem: Light Fence. (I also helped with other problems, but this was the main problem that was done by myself). In the end this problem was only solved by two other teams (which were also the top 3 teams), so I was pretty happy for solving this problem.
So a brief rundown of the problem, it takes an input file as argument. It will
also accept an optional
-d (for decode) or
-e (for encode) flag along with
the file to decode or encode. If no flag is supplied
-e is assumed. The
problem is the decode operation is actually not implemented yet (would've made
the problem super easy then).
In the encode algorithm, it creates two temporary files, a
*.nsg file and a
*.nsg.huff file. These are two steps in an encoding algorithm. The first step
I have no idea about, but I had a strong intuition that the second step is a
huffman coding step. This
algorithm is also used as a basis for many compression algorithms found in
formats like ZIP and PNG.
I put this here to explain my process of thinking about this interesting program. If you want the answer of how I did it, click here
Originally I thought about simply reversing the encoding algorithms. And I got a bit way before deciding against it (after all I also wanted to see a huffman coding algorithm in the works). So I started with the NSG encoding; it took a while but I found that there are two steps in it.
In the first step, the algorithm scrambles the order by first creating a "view" of the entire stream by starting with the current character at some index, and then moving forward with each character, wrapping around, until all characters are visited. Then all the views are then placed into a list and sorted as if each view represent a different string of that same data from some index. Then the scrambled version would just simply be the previous character in that view going down the list.
This may be very useless, but in fact by picking the previous character down the list, we can create a state machine graph detailing a partial ordering of the characters. Of course I was a bit rusty in graph algorithm stuff, so I was kinda lazy to solving it... But here's an example:
Example: Suppose I have this text "asgj3". Then each view would be
The next step involves taking a look up table F(x), and the inverse G(y), and then rotating this look up table for each character, encoding a new string from this look up table.
Now the second part of NSG was easy to reverse, the first part took too much
algo, so I was reluctant to writing a decode algorithm. Then I realized maybe I
don't have to write one. My teammate told me that the decode functions were
actually embeded in the program itself... The huffman decoding algorithm was at
0x18A0, and the nsg decoding algorithm was at
0x14a0. The nsg decoding
algorithm took two parameters, an input file and output file, but the huffman
decoding algorithm took four parameters!
The first two were the input and output file, but the next two I didn't know what it was. I was a bit confused at first, but slowly I realized that the third parameter took the header+4 256-byte table of the enc file, which I realized later was the table of bit-lengths for each character. Then the fourth character was simply a number of bits to skip from the end. This was found at the character offset +3. This was necessary because huffman coding encoding works on the bit level, and may sometimes end up with an number trailing of bits at the end because the file would have to be encoded in units of bytes.
I decided the easiest way was to do a patching of the binary file, so I wrote
some basic assembly code to write the decoding function, taking in a single
parameter: the encoded filename. To minimize the linking that would otherwise
need to be done, I wrote my own versions for some library functions, and used
raw syscalls instead of libc wrappers. In hindsight I should've at least just
printf as well, since that would result in less library functions that
I need to write.
Then finally the patching! So I decided to overwrite the function at
since it was considerably long, and it wasn't actually used in the executable
since it was optimized out. Oh, I also had to patch the
main function where it
calls the "not-implemented" decode function, to instead call my decode function.
I wrote a patch script to do that for me:
cp $BIN $PATCH
exec 3<> $TEMP
# Patch the call at 0xb67
echo -e "\x44\x11" >&3
dd conv=nocreat,notrunc bs=1 count=2 if=$TEMP of=$PATCH seek=2920
dd conv=nocreat,notrunc bs=1 count=526 if=unencrypt of=$PATCH seek=7344
Originally when I ran it, the nsg conversion seemed to work perfectly, but I had trouble reproducing the huff decoding. I thought it was an issue with the huff decoding in the binary, but I wasn't sure. I had no time debugging, so I decided to write my own huffman decoding algorithm. I also had no time to reverse their algorithm, so I went blindly and assumed that it was a huffman coding algorithm, and decided to do mix of blackbox and whitebox reversing. This time my gamble worked perfectly, and I have the code here. It was quite fun having to write an huffman decoding algorithm on the spot here, and I would say that now I have a very detailed understanding of how huffman coding really works, and have a deeper understanding of how ZIP files compress data.
After the CTF, when writing this up, I realized that my own assembly code was to blame (oops!)... I was passing the wrong argument to the huff decode function... But I would not regret having to write a huffman decode algorithm... it was a fun experience to do!
If you want to play around with the patched "fully implemented" encode/decode binary, here it is.