Shellcode: Data Compression

Introduction

This post examines data compression algorithms suitable for position-independent codes and assumes you’re already familiar with the concept and purpose of data compression. For those of you curious to know more about the science, or information theory, read Data Compression Explained by Matt Mahoney. For historical perspective, read History of Lossless Data Compression Algorithms. Charles Bloom has a great blog on the subject that goes way over my head. For questions and discussions, Encode’s Forum is popular among experts and should be able to help with any queries you have.

For shellcode, algorithms based on the following conditions are considered:

  1. Compact decompressor.
  2. Good compression ratio.
  3. Portable across operating systems and architectures.
  4. Difficult to detect by signature.
  5. Unencumbered by patents and licensing.

Meeting the requirements isn’t that easy. Search for “lightweight compression algorithms” and you’ll soon find recommendations for algorithms that aren’t compact at all. It’s not an issue on machines with 1TB hard drives of course. It’s a problem for resource-constrained environments like microcontrollers and wireless sensors. The best algorithms are usually optimized for speed. They contain arrays and constants that allow them to be easily identified with signature-based tools.

Algorithms that are compact might have suboptimal compression ratios. The compressor component is closed source or restricted by licensing. There is light at the end of the tunnel, however, thanks primarily to the efforts of those designing executable compression. First, we look at those algorithms and then what Windows API can be used as an alternative. There are open source libraries designed for interoperability that support Windows compression on other platforms like Linux.

Table of contents

  1. Executable Compression
  2. Windows NT Layer DLL
  3. Windows Compression API
  4. Windows Packaging API
  5. Windows Imaging API
  6. Direct3D HLSL Compiler
  7. Windows-internal libarchive library
  8. LibreSSL Cryptography Library
  9. Windows.Storage.Compression
  10. Windows Undocumented API
  11. Summary

1. Executable Compression

The first tool known to compress executables and save disk space was Realia SpaceMaker published sometime in 1982 by Robert Dewar. The first virus known to use compression in its infection routine was Cruncher published in June 1993. The author of Cruncher used routines from the disk reduction utility for DOS called DIET. Later on, many different viruses utilized compression as part of their infection routine to reduce the size of infected files, presumably to help evade detection longer. Although completely unrelated to shellcode, I decided to look at e-zines from twenty years ago when there was a lot of interest in using lightweight compression algorithms.

The following list of viruses used compression back in the late 90s/early 00s. It’s not an extensive list, as I only searched the more popular e-zines like 29A and Xine by iKX.

  • Redemption, by Jacky Qwerty/29A
  • Inca, Hybris, by Vecna/29A
  • Aldebaran, by Bozo/iKX
  • Legacy, Thorin, Rhapsody, Forever, by Billy Belcebu/iKX
  • BeGemot, HIV, Vulcano, Benny, Milennium, by Benny/29A
  • Junkmail, Junkhtmail, by roy g biv/29A/defjam

The following compression engines were examined. A 1MB EXE file was used as the raw data and not all of them were tested.

BCE that appeared in 29a#4 was disappointing with only an 8% compression ratio. BNCE that appeared in DCA#1 was no better at 9%, although the decompressor is only 54 bytes. The decompressor for LSCE is 25 bytes, but the compressor simply encodes repeated sequences of zero and nothing else. JQCoding has a ~20% compression ratio while LZCE provides the best at 36%. With exception to the last two mentioned, I was unable to find anything in the e-zines with a good compression ratio. They were super tiny, but also super eh..inefficient. Worth a mention is KITTY, by snowcat.

While I could be wrong, the earliest example of compression being used to unpack shellcode can be found in a generator written by Z0MBiE/29A in 2004. (shown in figure 1). NRV compression algorithms, similar to what’s used in UPX, were re-purposed to decompress the shellcode (see freenrv2 for more details).

Figure 1: Shellcode constructor by Z0MBiE/29A

UPX is a very popular tool for executable compression based on UCL. Included with the source is a PE packer example called UCLpack (thanks Peter) which is ideal for shellcode, too. aPLib also provides good compression ratio and the decompressor doesn’t contain lots of unique constants that would assist in detection by signature. The problem is that the compressor isn’t open source and requires linking with static or dynamic libraries compiled by the author. Thankfully, an open-source implementation by Emmanuel Marty is available and this is also ideal for shellcode.

Other libraries worth mentioning that I didn’t think were entirely suitable are Tiny Inflate and uzlib. The rest of this post focuses on compression provided by various Windows API.

2. Windows NT Layer DLL

Used by the Sofacy group to decompress a payload, RtlDecompressBuffer is also popular for PE Packers and in-memory execution. rtlcompress.c demonstrates using the API.

  • Compression

Obtain the size of the workspace required for compression via the RtlGetCompressionWorkSpaceSize API. Allocate memory for the compressed data and pass both memory buffer and the raw data to RtlCompressBuffer. The following example in C demonstrates this.

DWORD CompressBuffer(DWORD engine, LPVOID inbuf, DWORD inlen, HANDLE outfile) {      
    ULONG                            wspace, fspace;
    SIZE_T                           outlen;
    DWORD                            len;
    NTSTATUS                         nts;
    PVOID                            ws, outbuf;
    HMODULE                          m;
    RtlGetCompressionWorkSpaceSize_t RtlGetCompressionWorkSpaceSize;
    RtlCompressBuffer_t              RtlCompressBuffer;
      
    m = GetModuleHandle("ntdll");
    RtlGetCompressionWorkSpaceSize = (RtlGetCompressionWorkSpaceSize_t)GetProcAddress(m, "RtlGetCompressionWorkSpaceSize");
    RtlCompressBuffer              = (RtlCompressBuffer_t)GetProcAddress(m, "RtlCompressBuffer");
        
    if(RtlGetCompressionWorkSpaceSize == NULL || RtlCompressBuffer == NULL) {
      printf("Unable to resolve RTL API\n");
      return 0;
    }
        
    // 1. obtain the size of workspace
    nts = RtlGetCompressionWorkSpaceSize(
      engine | COMPRESSION_ENGINE_MAXIMUM, 
      &wspace, &fspace);
          
    if(nts == 0) {
      // 2. allocate memory for workspace
      ws = malloc(wspace); 
      if(ws != NULL) {
        // 3. allocate memory for output 
        outbuf = malloc(inlen);
        if(outbuf != NULL) {
          // 4. compress data
          nts = RtlCompressBuffer(
            engine | COMPRESSION_ENGINE_MAXIMUM, 
            inbuf, inlen, outbuf, inlen, 0, 
            (PULONG)&outlen, ws); 
              
          if(nts == 0) {
            // 5. write the original length
            WriteFile(outfile, &inlen, sizeof(DWORD), &len, 0);
            // 6. write compressed data to file
            WriteFile(outfile, outbuf, outlen, &len, 0);
          }
          // 7. free output buffer
          free(outbuf);
        }
        // 8. free workspace
        free(ws);
      }
    }
    return outlen;
}
  • Decompression

LZNT1 and Xpress data can be unpacked using RtlDecompressBuffer, however, Xpress Huffman data can only be unpacked using RtlDecompressBufferEx or the multi-threaded RtlDecompressBufferEx2. The last two require a WorkSpace buffer.

    typedef NTSTATUS (WINAPI *RtlDecompressBufferEx_t)(
      USHORT                 CompressionFormatAndEngine,
      PUCHAR                 UncompressedBuffer,
      ULONG                  UncompressedBufferSize,
      PUCHAR                 CompressedBuffer,
      ULONG                  CompressedBufferSize,
      PULONG                 FinalUncompressedSize,
      PVOID                  WorkSpace);
      
DWORD DecompressBuffer(DWORD engine, LPVOID inbuf, DWORD inlen, HANDLE outfile) {
    ULONG                            wspace, fspace;
    SIZE_T                           outlen = 0;
    DWORD                            len;
    NTSTATUS                         nts;
    PVOID                            ws, outbuf;
    HMODULE                          m;
    RtlGetCompressionWorkSpaceSize_t RtlGetCompressionWorkSpaceSize;
    RtlDecompressBufferEx_t          RtlDecompressBufferEx;
      
    m = GetModuleHandle("ntdll");
    RtlGetCompressionWorkSpaceSize = (RtlGetCompressionWorkSpaceSize_t)GetProcAddress(m, "RtlGetCompressionWorkSpaceSize");
    RtlDecompressBufferEx          = (RtlDecompressBufferEx_t)GetProcAddress(m, "RtlDecompressBufferEx");
        
    if(RtlGetCompressionWorkSpaceSize == NULL || RtlDecompressBufferEx == NULL) {
      printf("Unable to resolve RTL API\n");
      return 0;
    }
        
    // 1. obtain the size of workspace
    nts = RtlGetCompressionWorkSpaceSize(
      engine | COMPRESSION_ENGINE_MAXIMUM, 
      &wspace, &fspace);
          
    if(nts == 0) {
      // 2. allocate memory for workspace
      ws = malloc(wspace); 
      if(ws != NULL) {
        // 3. allocate memory for output
        outlen = *(DWORD*)inbuf;
        outbuf = malloc(outlen);
        
        if(outbuf != NULL) {
          // 4. decompress data
          nts = RtlDecompressBufferEx(
            engine | COMPRESSION_ENGINE_MAXIMUM, 
            outbuf, outlen, 
            (PBYTE)inbuf + sizeof(DWORD), inlen - sizeof(DWORD), 
            (PULONG)&outlen, ws); 
              
          if(nts == 0) {
            // 5. write decompressed data to file
            WriteFile(outfile, outbuf, outlen, &len, 0);
          } else {
            printf("RtlDecompressBufferEx failed with %08lx\n", nts);
          }
          // 6. free output buffer
          free(outbuf);
        } else {
          printf("malloc() failed\n");
        }
        // 7. free workspace
        free(ws);
      }
    }
    return outlen;
}

3. Windows Compression API

Despite being well documented and offering better compression ratios than RtlCompressBuffer, it’s unusual to see these API used at all. Four engines are supported: MSZIP, Xpress, Xpress Huffman and LZMS. To demonstrate using these API, see xpress.c

Compression

DWORD CompressBuffer(DWORD engine, LPVOID inbuf, DWORD inlen, HANDLE outfile) {
    COMPRESSOR_HANDLE ch = NULL;
    BOOL              r;
    SIZE_T            outlen, len;
    LPVOID            outbuf;
    DWORD             wr;
    
    // Create a compressor
    r = CreateCompressor(engine, NULL, &ch);
    
    if(r) {    
      // Query compressed buffer size.
      Compress(ch, inbuf, inlen, NULL, 0, &len);      
      if(GetLastError() == ERROR_INSUFFICIENT_BUFFER) {
        // allocate memory for compressed data
        outbuf = malloc(len);
        if(outbuf != NULL) {
          // Compress data and write data to outbuf.
          r = Compress(ch, inbuf, inlen, outbuf, len, &outlen);
          // if compressed ok, write to file
          if(r) {
            WriteFile(outfile, outbuf, outlen, &wr, NULL);
          } else xstrerror("Compress()");
          free(outbuf);
        } else xstrerror("malloc()");
      } else xstrerror("Compress()");
      CloseCompressor(ch);
    } else xstrerror("CreateCompressor()");
    return r;
}

Decompression

DWORD DecompressBuffer(DWORD engine, LPVOID inbuf, DWORD inlen, HANDLE outfile) {
    DECOMPRESSOR_HANDLE dh = NULL;
    BOOL                r;
    SIZE_T              outlen, len;
    LPVOID              outbuf;
    DWORD               wr;
    
    // Create a decompressor
    r = CreateDecompressor(engine, NULL, &dh);
    
    if(r) {    
      // Query Decompressed buffer size.
      Decompress(dh, inbuf, inlen, NULL, 0, &len);      
      if(GetLastError() == ERROR_INSUFFICIENT_BUFFER) {
        // allocate memory for decompressed data
        outbuf = malloc(len);
        if(outbuf != NULL) {
          // Decompress data and write data to outbuf.
          r = Decompress(dh, inbuf, inlen, outbuf, len, &outlen);
          // if decompressed ok, write to file
          if(r) {
            WriteFile(outfile, outbuf, outlen, &wr, NULL);
          } else xstrerror("Decompress()");
          free(outbuf);
        } else xstrerror("malloc()");
      } else xstrerror("Decompress()");
      CloseDecompressor(dh);
    } else xstrerror("CreateDecompressor()");
    return r;
}

4. Windows Packaging API

If you’re a developer that wants to sell a Windows application to customers on the Microsoft Store, you must submit a package that uses the Open Packaging Conventions (OPC) format. Visual Studio automates building packages (.msix or .appx) and bundles (.msixbundle or .appxbundle). There’s also a well documented interface (IAppxFactory) that allows building them manually. While not intended to be used specifically for compression, there’s no reason why you can’t. An SDK sample to extract the contents of packages uses SHCreateStreamOnFileEx to read the package from disk. However, you can also use SHCreateMemStream and decompress a package entirely in memory.

5. Windows Imaging API (WIM)

These encode and decode .wim files on disk. WIMCreateFile internally calls CreateFile to return a file handle to an archive that’s then used with WIMCaptureImage to compress and add files to the archive. From what I can tell, there’s no way to work with .wim files in memory using these API.

For Linux, the Windows Imaging (WIM) library supports Xpress, LZX and LZMS algorithms. libmspack and this repo provide good information on the various compression algorithms supported by Windows.

6. Direct3D HLSL Compiler

Believe it or not, the best compression ratio on Windows is provided by the Direct3D API. Internally, they use the DXT/Block Compression (BC) algorithms, which are designed specifically for textures/images. The algorithms provide higher quality compression rates than anything else available on Windows. The compression ratio was 60% for a 1MB EXE file and using the API is very easy. The following example in C uses D3DCompressShaders and D3DDecompressShaders. While untested, I believe OpenGL API could likely be used in a similar way.

Compression

#pragma comment(lib, "D3DCompiler.lib")
#include <d3dcompiler.h>
uint32_t d3d_compress(const void *inbuf, uint32_t inlen) {
    
    D3D_SHADER_DATA dsa;
    HRESULT         hr;
    ID3DBlob        *blob;
    SIZE_T          outlen = 0;
    LPVOID          outbuf;
    HANDLE          file;
    DWORD           len;
    
    file = CreateFile("compressed.bin", GENERIC_WRITE, 0, 0, CREATE_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL);
    if(file == INVALID_HANDLE_VALUE) return 0;
    
    dsa.pBytecode      = inbuf;
    dsa.BytecodeLength = inlen;
    
    // compress data
    hr = D3DCompressShaders(1, &dsa, D3D_COMPRESS_SHADER_KEEP_ALL_PARTS, &blob);
    if(hr == S_OK) {
      // write to file
      outlen = blob->lpVtbl->GetBufferSize(blob);
      outbuf = blob->lpVtbl->GetBufferPointer(blob);
      
      WriteFile(file, outbuf, outlen, &len, 0);
      blob->lpVtbl->Release(blob);
    }
    CloseHandle(file);
    return outlen;
}

Decompression

uint32_t d3d_decompress(const void *inbuf, uint32_t inlen) {
    D3D_SHADER_DATA dsa;
    HRESULT         hr;
    ID3DBlob        *blob;
    SIZE_T          outlen = 0;
    LPVOID          outbuf;
    HANDLE          file;
    DWORD           len;
    
    // create file to save decompressed data to
    file = CreateFile("decompressed.bin", GENERIC_WRITE, 0, 0, CREATE_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL);
    if(file == INVALID_HANDLE_VALUE) return 0;
    
    dsa.pBytecode      = inbuf;
    dsa.BytecodeLength = inlen;
    
    // decompress buffer
    hr = D3DDecompressShaders(inbuf, inlen, 1, 0, 0, 0, &blob, NULL);
    if(hr == S_OK) {
      // write to file
      outlen = blob->lpVtbl->GetBufferSize(blob);
      outbuf = blob->lpVtbl->GetBufferPointer(blob);
      
      WriteFile(file, outbuf, outlen, &len, 0);
      blob->lpVtbl->Release(blob);
    }
    CloseHandle(file);
    return outlen;    
}

The main problem with dynamically resolving these API is knowing what version is installed. The file name on my Windows 10 system is “D3DCompiler_47.dll”. It will likely be different on legacy systems.

7. Windows-internal libarchive library

Since the release of Windows 10 build 17063, the tape archiving tool ‘bsdtar’ is available and uses a stripped down version of the open source Multi-format archive and compression library to create and extract compressed files both in memory and on disk. The version found on windows supports bzip2, compress and gzip formats. Although, bsdtar shows support for xz and lzma, at least on my system along with lzip, they appear to be unsupported.

8. LibreSSL Cryptography Library

Windows 10 Fall Creators Update and Windows Server 1709 include support for an OpenSSH client and server. The crypto library used by this port appears to have been compiled from the LibreSSL project, and if available can be found in C:\Windows\System32\libcrypto.dll. As some of you know, Transport Layer Security (TLS) supports compression prior to encryption. LibreSSL supports the ZLib and RLE methods, so it’s entirely possible to use COMP_compress_block and COMP_expand_block to compress and decompress raw data in memory.

9. Windows.Storage.Compression

This namespace located in Windows.Storage.Compress.dll internally uses Windows Compression API. CreateCompressor is invoked with the COMPRESS_RAW flag set. It also invokes SetCompressorInformation with COMPRESS_INFORMATION_CLASS_BLOCK_SIZE flag if the user specifies one in the Compressor method.

10. Windows Undocumented API

DLLs on Windows use the DEFLATE algorithm extensively to support various audio, video, image encoders/decoders and file archives. Normally, the deflate routines are used internally and can’t be resolved dynamically via GetProcAddress. However, between at least Windows 7 and 10 is a DLL called PresentationNative_v0300.dll that can be found in the C:\Windows\System32 directory. (There may also be PresentationNative_v0400.dll, but I haven’t investigated this thoroughly enough.) Four public symbols grabbed my attention, which are ums_deflate_init, ums_deflate, ums_inflate_init and ums_inflate. For a PoC demonstrating how to use them, see winflate.c

Compression

The following code uses zlib.h to compress a buffer and write to file.

DWORD CompressBuffer(LPVOID inbuf, DWORD inlen, HANDLE outfile) {
    SIZE_T             outlen, len;
    LPVOID             outbuf;
    DWORD              wr;
    HMODULE            m;
    z_stream           ds;
    ums_deflate_t      ums_deflate;
    ums_deflate_init_t ums_deflate_init;
    int                err;
    
    m = LoadLibrary("PresentationNative_v0300.dll");
    ums_deflate_init = (ums_deflate_init_t)GetProcAddress(m, "ums_deflate_init");
    ums_deflate      = (ums_deflate_t)GetProcAddress(m, "ums_deflate");
    
    if(ums_deflate_init == NULL || ums_deflate == NULL) {
      printf("  [ unable to resolve deflate API.\n");
      return 0;
    }
    // allocate memory for compressed data
    outbuf = malloc(inlen);
    if(outbuf != NULL) {
      // Compress data and write data to outbuf.
      ds.zalloc    = Z_NULL;
      ds.zfree     = Z_NULL;
      ds.opaque    = Z_NULL;
      ds.avail_in  = (uInt)inlen;       // size of input
      ds.next_in   = (Bytef *)inbuf;    // input buffer
      ds.avail_out = (uInt)inlen;       // size of output buffer
      ds.next_out  = (Bytef *)outbuf;   // output buffer
      
      if(ums_deflate_init(&ds, Z_BEST_COMPRESSION, "1", sizeof(ds)) == Z_OK) {
        if((err = ums_deflate(&ds, Z_FINISH)) == Z_STREAM_END) {
          // write the original length first
          WriteFile(outfile, &inlen, sizeof(DWORD), &wr, NULL);
          // then the data
          WriteFile(outfile, outbuf, ds.avail_out, &wr, NULL);
          FlushFileBuffers(outfile);
        } else {
          printf("  [ ums_deflate() : %x\n", err);
        }
      } else {
        printf("  [ ums_deflate_init()\n");
      }
      free(outbuf);
    }
    return 0;
}

Decompression

Inflating/decompressing the data is based on an example using zlib.

DWORD DecompressBuffer(LPVOID inbuf, DWORD inlen, HANDLE outfile) {
    SIZE_T             outlen, len;
    LPVOID             outbuf;
    DWORD              wr;
    HMODULE            m;
    z_stream           ds;
    ums_inflate_t      ums_inflate;
    ums_inflate_init_t ums_inflate_init;
    
    m = LoadLibrary("PresentationNative_v0300.dll");
    ums_inflate_init = (ums_inflate_init_t)GetProcAddress(m, "ums_inflate_init");
    ums_inflate      = (ums_inflate_t)GetProcAddress(m, "ums_inflate");
    
    if(ums_inflate_init == NULL || ums_inflate == NULL) {
      printf("  [ unable to resolve inflate API.\n");
      return 0;
    }
    // allocate memory for decompressed data
    outlen = *(DWORD*)inbuf;
    outbuf = malloc(outlen*2);
    
    if(outbuf != NULL) {
      // decompress data and write data to outbuf.
      ds.zalloc    = Z_NULL;
      ds.zfree     = Z_NULL;
      ds.opaque    = Z_NULL;
      ds.avail_in  = (uInt)inlen - 8;       // size of input
      ds.next_in   = (Bytef*)inbuf + 4;     // input buffer
      ds.avail_out = (uInt)outlen*2;        // size of output buffer
      ds.next_out  = (Bytef*)outbuf;        // output buffer
      
      printf("  [ initializing inflate...\n");
      if(ums_inflate_init(&ds, "1", sizeof(ds)) == Z_OK) {
        printf("  [ inflating...\n");
        if(ums_inflate(&ds, Z_FINISH) == Z_STREAM_END) {
          WriteFile(outfile, outbuf, ds.avail_out, &wr, NULL);
          FlushFileBuffers(outfile);
        } else {
          printf("  [ ums_inflate()\n");
        }
      } else {
        printf("  [ ums_inflate_init()\n");
      }
      free(outbuf);
    } else {
      printf("  [ malloc()\n");
    }
    return 0;
}

11. Summary/Results

That sums up the algorithms I think are suitable for a shellcode. For the moment, UCL and apultra seem to provide the best solution. Using Windows API is a good option. They are also susceptible to monitoring and may not be portable. One area I didn’t cover due to time is Media Foundation API. It may be possible to use audio, video and image encoders to compress raw data and the decoders to decompress. Worth researching?

Library / API Algorithm / Engine Compression Ratio
RtlCompressBuffer LZNT1 39%
RtlCompressBuffer Xpress 47%
RtlCompressBuffer Xpress Huffman 53%
Compress MSZIP 55%
Compress Xpress 40%
Compress Xpress Huffman 48%
Compress LZMS 58%
D3DCompressShaders DXT/BC 60%
aPLib N/A 45%
UCL N/A 42%
Undocumented API DEFLATE 46%
This entry was posted in assembly, compression, linux, malware, programming, security, shellcode, windows and tagged , , , , . Bookmark the permalink.

2 Responses to Shellcode: Data Compression

  1. Pingback: 12月9日每日安全热点 - 莫斯科城市监控系统访问权限在暗网出售 - 安全客,安全资讯平台

  2. Pingback: Shellcode: Recycling Compression Algorithms for the Z80, 8088, 6502, 8086, and 68K Architectures. | modexp

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s