A while back, I wrote an esoteric assembly language called RE^2. The main purpose of this language was to write for a CTFLearn problem. I will take this time to briefly explain the mechanics for this language. (If you are trying to solve this problem, hehehe... you struck gold). First, a background story behind this:
RE^2 came about from the idea that you would have to reverse engineer the assembler interpreter and then any code written in that esoteric assembly language. The interpreter (which was written in Java) should've been easily disassembled, even without accompanying source code, since Java itself is relatively easy to have automatic decompilation to a degree that produces somewhat readable source code. At the time of this writing, I had an intrigue towards both the x86 assembly and Java bytecode, whom I based this assembly language off of. I made this a hybrid of Java bytecode and x86 assembly. This allowed me to create an interesting language to explore some of the typical exploits (buffer overflow, ROP) in a different surrounding.
RE^2 does many of its arithmetic operations off of the stack, i.e. the
instructions ADD
, SUB
, MULT
, DIV
, and the assorted binary bitwise
operations internally pop two values from the stack (in reverse order) and then
push the result value back on the stack. This implicit stack allows for much
more condensed instructions (without the need of operands in the basic
instructions). Of course this does incur a large speed deficiency (doesn't
matter in a small scale scenareo) with the frequent access to the stack memory.
Here's a brief example of the RE^2 language:
1 | .entry start |
This code performs an ROT13 on the code
string and then prints out hello
world!
As can be noted by the code, I created this language just to demonstrate
an aspect of illegibility in assembler code because of the complex states when
you involve a stack. For example within that loop, we duplicated our current
character read of our string on the stack, but if we prematurely exit that loop,
i.e. if the character was outside a particular bounds, we had to then clean up
the stack and remove those extra values (or we could ignore that and let the
bp-based stack fix that).
This code, unlike x86, has a lot of its states in the stack itself, and in fact, the registers play a minimal role in the code (it only serves any purpose in indirect memory loads/stores).
This was a code that I created a while back just as a brainteaser. After an year or so (and learning how a RISC language like MIPS works), I find this particular instruction set quite amateur because of its inconsitency and such. Looking back on this project, I think I would definitely had made the language a little more standard (either completely utilize a stack, or just use registers to perform operations). Nevertheless, it was amazing how convoluted assembly code could get, even compared to MIPS or x86, when you involve operations on a stack.