Hiding between opcode bytes - GUloader-like string obfuscation in Rust

David Schramm March 30, 2024 Updated: April 02, 2024 #Obfuscation #Malware #Reverse Engineering

Malware (Series)
Writing a simple self-injecting packer
Zero2Automated - Custom Sample
Hiding between opcode bytes - GUloader-like string obfuscation in Rust
Zero2Automated Advanced Malware Analysis Course - Certification
ESET Wiper - Hey ESET, wait for the leak..

I recently came across the GULoader malware family with its string obfuscation and wondered if one can build a similar technique in Rust.

Idea

Reading this blogpost by 0verfl0w_ about GULoader's stack manipulation for hiding hardcoded strings or arbitrary data, I thought that a similiar technique should be doable in Rust too.

Hence, my goal was to write a function which also gets called right before embedded data bytes. It shall fetch the saved eip from the stack to calculate the position of the data below, decrypt it and then manipulate the saved eip to point to the next valid instruction after return.

If possible, I wanted to to make the string embedding as simple as calling a macro.

Implementation

Decryption

Let's start with the decryption function which does the actual work at runtime.

Firstly, this function needs to find the saved eip of the calling function. In C or C++ one could do this by declaring it as naked. Then, no function prologue is added to the function body. Hence one can get the saved eip by just popping/pushing (or mov eax, [esp]) it from the stack. But in Rust, declaring a function as naked is still nightly and it additionally forbids normal Rust code in the body which forces one to write the whole function in assembly!

Hence, I use the fact that the compiler calculates the stack usage (local variables etc.) of a function at compile time and then puts this value hardcoded in the corresponding function prologue (so the sub esp, <num of needed bytes>).

Thus, we can:

easily write the function in Rust with some inline assembly
compile it
find out the stack usage using a disassembler by looking at the sub esp, <stack usage> of the prologue
adjust our hardcoded value for calculating the position of the esp before the prologue

Now the decryption function looks like this:

use enc_macros::embed_str;
use std::arch::asm;

#[inline(never)]
unsafe extern "C" fn decrypt(out: *mut String) {
    let mut esp = 0usize;

    // magic offset 0x74 comes from the stack usage of this function, use a disassembler to
    // find the sub esp, [num] at the start of this fn
    asm!("mov {esp}, esp; add {esp}, 0x74", esp = inout(reg) esp);

    esp += 16; // push ebp/ebx/edi/esi

    println!("[decrypt] recovered: {:x}", esp + 0x4);

    let mut eip: *mut u8;
    asm!("mov {eip}, [{esp}]", eip = out(reg) eip, esp = in(reg) esp);

    println!("[decrypt] eip of calling func: {:?}", eip);

    let out = &mut *out;

    let mut eip_c = eip;

    loop {
        let b: u8 = *eip_c;
        if b == 0 {
            eip_c = eip_c.add(1);
            break;
        }
        out.push((b ^ 0xc8) as char);
        eip_c = eip_c.add(1);
    }
    println!(
        "[decrypt] string: \"{}\" eip of next real instruction: {:x?}",
        out, eip_c
    );
    asm!("mov [{esp}], {eip_c}", esp = in(reg) esp, eip_c = in(reg) eip_c);
}

It first moves the current esp to eax and then adds the, in the prologue subtracted, number of bytes to recover the previous esp. Then, we also need to further add 16 bytes due to the pushes of other registers in the prologue. After that, we can read the saved eip from the stack and now know the start of the encrypted string bytes.

We can now decrypt the the string and put it on the heap.

The last step is important for the control flow: the function needs to manipulate the saved eip to point to the next opcode bytes by adding the length of the string. Now, after the return, the calling function pops the saved adjusted eip from the stack which points to valid assembly and the normal control flow continues.

Embedding

Rust has some really nice feature called procedural macros. With them, one can write arbitrary code which manipulates the syntax tree at compile time. Thus, we can easily write a macro embed_str which gets called with a literal as an argument, encryptes the string and outputs a code block which

encrypts the literal
allocates a string on the heap
calls the decryption function with a reference of the allocated string
embeds the encrypted string literal right after the function call, now the binary contains string bytes at this exact position

extern crate proc_macro;
use proc_macro::{TokenStream, TokenTree};

#[proc_macro]
pub fn embed_str(in_stream: TokenStream) -> TokenStream {
    let lit = match in_stream.into_iter().next() {
        Some(TokenTree::Literal(l)) => l.to_string(),
        t => panic!("[embed_str] {:?} not a string literal!", t),
    };

    let raw_lit = if lit.starts_with("\"") {
        lit.strip_prefix("\"").unwrap().strip_suffix("\"").unwrap()
    } else if lit.starts_with("r#") {
        lit.strip_prefix("r#").unwrap().strip_suffix("#").unwrap()
    } else if lit.starts_with("r\"") {
        lit.strip_prefix("r\"").unwrap().strip_suffix("\"").unwrap()
    } else {
        &lit
    };

    let mut enc_lit = String::new();

    for c in raw_lit.chars() {
        enc_lit.push_str(&format!("\\\\x{:02x}", (c as u8) ^ 0xc8));
    }

    let payload = format!(
        "{{
        let mut s = String::new();
        decrypt((&mut s) as *mut _);

        asm!(\".string \\\"{}\\\"\");
        s
    }}",
        enc_lit
    );

    payload.parse().unwrap()
}

Example

Let's see how we can now use the written code inside normal one:

use enc_macros::embed_str;
use std::arch::asm;

fn main() {
    let mut esp: usize;
    unsafe {
        asm!("mov {esp}, esp" , esp = out(reg) esp);
    }

    println!("real esp: {:x}", esp);
    let s = unsafe { embed_str!("This is some secret string you will never find!") };

    println!("string: {}", s);
}

... and build it as a release version (cargo b --release) to see if it's working: Example

Pretty clean and easy!

Analyzing the binary

Let's see how the example binary looks like. First we take a look at the main function (which needs to be found first...): At 0xa7142c we see the call which our macro spat out, following the embedded encrypted string bytes. They can obviously not be recognized by BinaryNinja as a string and are parsed as a opcode bytes.

The real bytecode starts at 0xa7145f again. Our decryption function will alter the eip to point to this location after its return.

Finally, let's take a look at the function prologue of the decryption function and the recovering of the previous esp:

The assembly at 0xa711a4 and above forms the function prologue. Right after that, we can see our inline assembly consisting of mov eax, esp and add eax, 0x74.

From the perspective of a reverse engineer, the instruction pattern may seem to be unusual and suspicious since a compiler would rather spit out a mov eax, esp; sub esp, 0x74 for efficiency. One may also write a yara rule for it and would not get many false positives. And once one has found the decryption function, one can also write a script to deobfuscate the string bytes after each call.

Conclusion

In my opinion, implementing a GULoader-like string obfuscation technique in Rust felt more ergonomic than in C/C++ due to Rust's powerful macro system. Once implemented, the malware engineer can use it in a way as simple as putting a single macro around each string literal, which makes it possible to use this technique for nearly every string. This makes static analysis of Rust binaries much harder!

Hence, I think that we will see even more sophisticated malware, abusing the powerful features Rust provides. And we should be prepared to analyze more Rust binaries.

You can find the code here.

THANK YOU FOR READING!