Invoking System Calls and Windows Debugger Engine


Introduction

Quick post about Windows System calls that I forgot about working on after the release of Dumpert by Cn33liz last year, which is described in this post. Typically, EDR and AV set hooks on Win32 API or NT wrapper functions to detect and mitigate against malicious activity. Dumpert attempts to bypass any user-level hooks by invoking system calls directly. It first queries the operating system version via RtlGetVersion and then selects the applicable code stubs to execute. SysWhispers generates header/ASM files by extracting the system call numbers from the code stubs in NTDLL.dll and evilsocket also demonstrated how to do this many years ago. @FuzzySec and @TheWover have also implemented dynamic invocation of system calls after remapping NTDLL in Sharpsploit, which you can read about in their Bluehat presentation.

Using system calls on Windows to interact with the kernel has always been problematic because the numbers assigned for each kernel function change between the versions released. Just after Cn33liz published Dumpert, I thought of how invocation might be improved without using assembly and there are lots of ways, but consider at least three for now. The first method, which is probably the simplest and safest, maps NTDLL.dll into executable memory and resolves the address of any system call via the Export Address Table (EAT) before executing. This is relatively simple to implement. The second approach maps NTDLL.dll into read-only memory and uses a disassembler, or at the very least, a length disassembler to extract the system call number. The third will also map NTDLL.dll into read-only memory, copy the code stub to an executable buffer before invoking. The length of the stub is read from the exception directory. Overcomplicated, perhaps, and I did consider a few disassembly libraries for the second method, but just to save time settled with the Windows Debugger Engine, which has a built-in disassembler already.

A PoC to inject a DLL into remote process can be found here. It doesn’t use a disassembler, but because it uses the exception directory to locate the end of a system call, it will only work with 64-bit processes.

Windows Debugging Engine

Disassembling code via the engine requires a live process. Thankfully it’s possible to attach the debugger to the local process in noninvasive mode. You can just map NTDLL into executable memory and invoke any system call from there, however, I wanted an excuse to use the debugging engine. 😛 lde.c, lde.h

LDE::LDE() {
    CHAR path[MAX_PATH];
    
    ctrl = NULL;
    clnt = NULL;
    // create a debugging client
    hr = DebugCreate(__uuidof(IDebugClient), (void**)&clnt);
    if(hr == S_OK) {
      // get the control interface
      hr = clnt->QueryInterface(__uuidof(IDebugControl3), (void**)&ctrl);
      if(hr == S_OK) {
        // attach to existing process
        hr = clnt->AttachProcess(NULL, 
          GetProcessId(GetCurrentProcess()), 
          DEBUG_ATTACH_NONINVASIVE | DEBUG_ATTACH_NONINVASIVE_NO_SUSPEND);
        if(hr == S_OK) {
          hr = ctrl->WaitForEvent(DEBUG_WAIT_DEFAULT, INFINITE);
        }
      }
    }
    ExpandEnvironmentStrings("%SystemRoot%\\system32\\NTDLL.dll", path, MAX_PATH);
    // open file
    file = CreateFile(path, 
      GENERIC_READ, FILE_SHARE_READ, NULL, 
      OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL);
      
    if(file == INVALID_HANDLE_VALUE) return;
    
    // map file
    map = CreateFileMapping(file, NULL, PAGE_READONLY, 0, 0, NULL);
    if(map == NULL) return;
    
    // create read only view of map
    mem = (LPBYTE)MapViewOfFile(map, FILE_MAP_READ, 0, 0, NULL);
}

WinDbg has a command to disassemble a complete function called uf (Unassemble Function). Internally, WinDbg builds a Control-flow Graph (CFG) to map the full function before displaying the disassembly of each code block. You can execute a command like uf via the Execute method and so long as you’ve setup IDebugOutputCallbacks, you can capture the disassembly that way. I considered using a CFG to implement something similar to uf, which you can if you wish. The system calls on my own build of Windows 10 have at the most, one branch, so I scrapped the idea of using a CFG or executing uf. With NTDLL mapped, you can use something like the following to resolve the address of an exported API.

FARPROC LDE::GetProcAddress(LPCSTR lpProcName) {
    PIMAGE_DATA_DIRECTORY   dir;
    PIMAGE_EXPORT_DIRECTORY exp;
    DWORD                   rva, ofs, cnt;
    PCHAR                   str;
    PDWORD                  adr, sym;
    PWORD                   ord;
    
    if(mem == NULL || lpProcName == NULL) return NULL;
    
    // get pointer to image directories for NTDLL
    dir = Dirs();
    
    // no exports? exit
    rva = dir[IMAGE_DIRECTORY_ENTRY_EXPORT].VirtualAddress;
    if(rva == 0) return NULL;
    
    ofs = rva2ofs(rva);
    if(ofs == -1) return NULL;
    
    // no exported symbols? exit
    exp = (PIMAGE_EXPORT_DIRECTORY)(ofs + mem);
    cnt = exp->NumberOfNames;
    if(cnt == 0) return NULL;
    
    // read the array containing address of api names
    ofs = rva2ofs(exp->AddressOfNames);        
    if(ofs == -1) return NULL;
    sym = (PDWORD)(ofs + mem);

    // read the array containing address of api
    ofs = rva2ofs(exp->AddressOfFunctions);        
    if(ofs == -1) return NULL;
    adr = (PDWORD)(ofs + mem);
    
    // read the array containing list of ordinals
    ofs = rva2ofs(exp->AddressOfNameOrdinals);
    if(ofs == -1) return NULL;
    ord = (PWORD)(ofs + mem);
    
    // scan symbol array for api string
    do {
      str = (PCHAR)(rva2ofs(sym[cnt - 1]) + mem);
      // found it?
      if(lstrcmp(str, lpProcName) == 0) {
        // return the address
        return (FARPROC)(rva2ofs(adr[ord[cnt - 1]]) + mem);
      }
    } while (--cnt);
    return NULL;
}

The following will use the Disassemble method to show the code. You can also use it to inspect bytes if you wanted to extract the system call number. The beginning and end of the system call is read from the Exception directory.

bool LDE::DisassembleSyscall(LPCSTR lpSyscallName) {
    ULONG64                       ofs, start=0, end=0, addr;
    PIMAGE_DOS_HEADER             dos;
    PIMAGE_NT_HEADERS             nt;
    PIMAGE_DATA_DIRECTORY         dir;
    PIMAGE_RUNTIME_FUNCTION_ENTRY rf;
    DWORD                         i, rva;
    CHAR                          buf[LDE_MAX_STR];
    HRESULT                       hr;
    ULONG                         len;
    
    // resolve address of function in NTDLL
    addr = (ULONG64)GetProcAddress(lpSyscallName);
    if(addr == NULL) return false;
    
    // get pointer to image directories
    dir = Dirs();
    
    // no exception directory? exit
    rva = dir[IMAGE_DIRECTORY_ENTRY_EXCEPTION].VirtualAddress;
    if(rva == 0) return false;
    
    ofs = rva2ofs(rva);
    if(ofs == -1) return false;
    
    rf = (PIMAGE_RUNTIME_FUNCTION_ENTRY)(ofs + mem);

    // for each runtime function (there might be a better way??)
    for(i=0; rf[i].BeginAddress != 0; i++) {
      // is it our system call?
      start = rva2ofs(rf[i].BeginAddress) + (ULONG64)mem;
      if(start == addr) {
        // save end and exit search
        end = rva2ofs(rf[i].EndAddress) + (ULONG64)mem;
        break;
      }
    }
    
    if(start != 0 && end != 0) {
      while(start < end) {
        hr = ctrl->Disassemble(
          start, 0, buf, LDE_MAX_STR, &len, &start);
          
        if(hr != S_OK) break;
        
        printf("%s", buf);
      }
    }
    return true;
}

The following code will disassemble the system call.

int main(int argc, char *argv[]) {
    LDE *lde;
    
    if(argc != 2) {
      printf("usage: dis <system call name>\n");
      return 0;
    }
    
    // create length disassembly engine
    lde = new LDE();
      
    lde->DisassembleSyscall(argv[1]);

    delete lde;
    
    return 0;
}

Just to illustrate disassembly of NtCreateThreadEx and NtWriteVirtualMemory. The address of SharedUserData doesn’t change and therefore doesn’t require fixups to the code just because it’s been mapped somewhere else.

Invoking

Simply copy the code for the system call to memory allocated by VirtualAlloc with PAGE_EXECUTE_READWRITE permissions. Rewriting the above code, we have something like the following.

LPVOID LDE::GetSyscallStub(LPCSTR lpSyscallName) {
    ULONG64                       ofs, start=0, end=0, addr;
    PIMAGE_DOS_HEADER             dos;
    PIMAGE_NT_HEADERS             nt;
    PIMAGE_DATA_DIRECTORY         dir;
    PIMAGE_RUNTIME_FUNCTION_ENTRY rf;
    DWORD                         i, rva;
    SIZE_T                        len;
    LPVOID                        cs = NULL;
    
    // resolve address of function in NTDLL
    addr = (ULONG64)GetProcAddress(lpSyscallName);
    if(addr == NULL) return NULL;
    
    // get pointer to image directories
    dir = Dirs();
    
    // no exception directory? exit
    rva = dir[IMAGE_DIRECTORY_ENTRY_EXCEPTION].VirtualAddress;
    if(rva == 0) return NULL;
    
    ofs = rva2ofs(rva);
    if(ofs == -1) return NULL;
    
    rf = (PIMAGE_RUNTIME_FUNCTION_ENTRY)(ofs + mem);

    // for each runtime function (there might be a better way??)
    for(i=0; rf[i].BeginAddress != 0; i++) {
      // is it our system call?
      start = rva2ofs(rf[i].BeginAddress) + (ULONG64)mem;
      if(start == addr) {
        // save the end and calculate length
        end = rva2ofs(rf[i].EndAddress) + (ULONG64)mem;
        len = (SIZE_T) (end - start);
        
        // allocate RWX memory
        cs = VirtualAlloc(NULL, len,  MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE);
        if(cs != NULL) {
          // copy stub to memory
          CopyMemory(cs, (const void*)start, len);
        }
        break;
      }
    }
    // return pointer to code stub
    return cs;
}

Summary

Invoking system calls via remapping of the NTDLL.dll is of course the simplest approach. A lightweight LDE and CFG with no dependencies on external libraries would be useful for other Red Teaming activities like hooking API or even detecting hooked functions. It could also be used for locating GetProcAddress without touching the Export Address Table (EAT) or Import Address Table (IAT). However, GetSyscallStub demonstrates that you don’t need a disassembler just to read the code stub.

This entry was posted in assembly, programming, redteam, security, windows and tagged , , , , , . Bookmark the permalink.

1 Response to Invoking System Calls and Windows Debugger Engine

  1. Pingback: Weekly News Roundup — May 31 to June 6

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s