Hiding between opcode bytes - GUloader-like string obfuscation in Rust
David S March 30, 2024 Updated: April 02, 2024 #Obfuscation #Malware #Reverse EngineeringI recently came across the GULoader malware family with its string obfuscation and wondered if one can build a similar technique in Rust.
Idea
Reading this blogpost by 0verfl0w_ about GULoader's stack manipulation for hiding hardcoded strings or arbitrary data, I thought that a similiar technique should be doable in Rust too.
Hence, my goal was to write a function which also gets called right before embedded data bytes. It shall fetch the saved eip from the stack to calculate the position of the data below, decrypt it and then manipulate the saved eip to point to the next valid instruction after return.
If possible, I wanted to to make the string embedding as simple as calling a macro.
Implementation
Decryption
Let's start with the decryption function which does the actual work at runtime.
Firstly, this function needs to find the saved eip of the calling function. In C or C++ one could do this by declaring it as naked.
Then, no function prologue is added to the function body. Hence one can
get the saved eip by just popping/pushing (or mov eax, [esp]
) it from the stack.
But in Rust, declaring a function as naked is still nightly and it additionally forbids
normal Rust code in the body which forces one to write the whole function in assembly!
Hence, I use the fact that the compiler calculates the stack usage (local variables etc.) of a function at compile time and then puts this
value hardcoded in the corresponding function prologue (so the sub esp, <num of needed bytes>
).
Thus, we can:
- easily write the function in Rust with some inline assembly
- compile it
- find out the stack usage using a disassembler by looking at the
sub esp, <stack usage>
of the prologue - adjust our hardcoded value for calculating the position of the esp before the prologue
Now the decryption function looks like this:
use embed_str;
use asm;
unsafe extern "C"
It first moves the current esp to eax and then adds the, in the prologue subtracted, number of bytes to recover the previous esp. Then, we also need to further add 16 bytes due to the pushes of other registers in the prologue. After that, we can read the saved eip from the stack and now know the start of the encrypted string bytes.
We can now decrypt the the string and put it on the heap.
The last step is important for the control flow: the function needs to manipulate the saved eip to point to the next opcode bytes by adding the length of the string. Now, after the return, the calling function pops the saved adjusted eip from the stack which points to valid assembly and the normal control flow continues.
Embedding
Rust has some really nice feature called procedural macros. With them, one can write arbitrary code which manipulates the syntax tree at compile time.
Thus, we can easily write a macro embed_str
which gets called with a literal as an argument, encryptes the string and outputs a code block which
- encrypts the literal
- allocates a string on the heap
- calls the decryption function with a reference of the allocated string
- embeds the encrypted string literal right after the function call, now the binary contains string bytes at this exact position
extern crate proc_macro;
use ;
Example
Let's see how we can now use the written code inside normal one:
use embed_str;
use asm;
... and build it as a release version (cargo b --release
) to see if it's working:
Pretty clean and easy!
Analyzing the binary
Let's see how the example binary looks like. First we take a look at the main function (which needs to be found first...): At 0xa7142c we see the call which our macro spat out, following the embedded encrypted string bytes. They can obviously not be recognized by BinaryNinja as a string and are parsed as a opcode bytes.
The real bytecode starts at 0xa7145f again. Our decryption function will alter the eip to point to this location after its return.
Finally, let's take a look at the function prologue of the decryption function and the recovering of the previous esp:
The assembly at 0xa711a4 and above forms the function prologue. Right after that, we can see our inline assembly consisting of mov eax, esp
and add eax, 0x74
.
From the perspective of a reverse engineer, the instruction pattern may seem to be unusual and suspicious since a compiler would rather spit out a mov eax, esp; sub esp, 0x74
for efficiency. One may also write a yara rule for it and would not get many false positives. And once one has found the decryption function, one can also write a script to
deobfuscate the string bytes after each call.
Conclusion
In my opinion, implementing a GULoader-like string obfuscation technique in Rust felt more ergonomic than in C/C++ due to Rust's powerful macro system. Once implemented, the malware engineer can use it in a way as simple as putting a single macro around each string literal, which makes it possible to use this technique for nearly every string. This makes static analysis of Rust binaries much harder!
Hence, I think that we will see even more sophisticated malware, abusing the powerful features Rust provides. And we should be prepared to analyze more Rust binaries.
You can find the code here.
THANK YOU FOR READING!