Analysing Windows Malware on Apple Mac M1/M2 ( Windows 11 ARM ) - Part II

Testing advanced x86-64 malware evasion techniques on Windows 11 ARM #

In the previous post, we explored the internals of WOW64 on Windows 11 ARM version. x86/x64 emulation internals on Windows 11 ARM

However, there are many intricacies in malware; i.e., they exploit and manipulate the runtime environment in a way that sometimes normal execution would be hampered or an emulated system would not work correctly.

So, in this blog post, we will test some of the common techniques used by malware which might cause some emulation issues and see what would be the success rate of it.

The tests that we will perform are

Invoking SYSCALL instead of system API’s #

Most of the kernel32.dll API exports eventually land in ntdll.dll where a syscall is made (most of them happen to be the Zw* versions of the API calls), i.e., Native API. These are the user-mode calls to the Win32/Win64 kernel land.

word-image-313791-1.jpeg
Image: paloalto networks

Let’s compile a sample application that performs a direct x64 SYSCALL to see if it gets executed on Windows 11 ARM.

we will use an example given from ired.team

syscall.c

#include <Windows.h>
#include <winternl.h>

EXTERN_C NTSTATUS SysNtCreateFile(
    PHANDLE FileHandle, 
    ACCESS_MASK DesiredAccess, 
    POBJECT_ATTRIBUTES ObjectAttributes, 
    PIO_STATUS_BLOCK IoStatusBlock, 
    PLARGE_INTEGER AllocationSize, 
    ULONG FileAttributes, 
    ULONG ShareAccess, 
    ULONG CreateDisposition, 
    ULONG CreateOptions, 
    PVOID EaBuffer, 
    ULONG EaLength);

int main()
{

    OBJECT_ATTRIBUTES oa;
    HANDLE fileHandle = NULL;
    NTSTATUS status = NULL;
    UNICODE_STRING fileName;
    IO_STATUS_BLOCK osb;
    WCHAR lpBuffer[MAX_PATH] = {0};
    WCHAR lpFinalPath[MAX_PATH] = {0};
    GetCurrentDirectoryW(MAX_PATH,lpBuffer);

    wcscpy(lpFinalPath, L"\\??\\");
    wcscat(lpFinalPath, lpBuffer);
    wcscat(lpFinalPath, L"\\test.txt");

    RtlInitUnicodeString(&fileName, (PCWSTR)lpFinalPath);

    ZeroMemory(&osb, sizeof(IO_STATUS_BLOCK));
    InitializeObjectAttributes(&oa, &fileName, OBJ_CASE_INSENSITIVE, NULL, NULL);

    SysNtCreateFile(
        &fileHandle, 
        FILE_GENERIC_WRITE, 
        &oa, 
        &osb, 
        0, 
        FILE_ATTRIBUTE_NORMAL, 
        FILE_SHARE_WRITE, 
        FILE_OVERWRITE_IF, 
        FILE_SYNCHRONOUS_IO_NONALERT, 
        NULL, 
        0);

    return 0;
}

syscallFile.asm

.code
    SysNtCreateFile proc
            mov r10, rcx
            mov eax, 55h
            syscall
            ret
    SysNtCreateFile endp
end

ml64.exe syscallFile.asm would result in the 64-bit assembly being compiled into an object file, which will be linked with the object file from the C code.

cl syscall.c ntdll.lib user32.lib syscallFile.obj
Microsoft (R) C/C++ Optimizing Compiler Version 19.38.33135 for x64
Copyright (C) Microsoft Corporation.  All rights reserved.

syscall.c
syscall.c(23): warning C4047: 'initializing': 'NTSTATUS' differs in levels of indirection from 'void *'
Microsoft (R) Incremental Linker Version 14.38.33135.0
Copyright (C) Microsoft Corporation.  All rights reserved.

/out:syscall.exe
syscall.obj
ntdll.lib
user32.lib
syscallFile.obj

This essentially creates an empty file using NtCreateFile, which has the SYSCALL number 0x55. If successful, it will drop an empty file in the current directory. The test was successful, as it created that empty file in the current directory.

Screenshot 2024-03-10 at 3.58.20 PM.png

API Hooking Test #

Malware employs API hooking as a mechanism to intercept and modify the behavior of application programming interfaces (APIs) provided by the operating system. By inserting code snippets known as hooks into the API function tables, malware can intercept API calls made by legitimate processes, redirecting them to its own code for manipulation or analysis. This allows the malware to gain control over system activities, such as file operations, network communication, and process management, enabling malicious actions such as data exfiltration, privilege escalation, and system compromise.

As mentioned in my previous blog x86/x64 emulation internals on Windows 11 ARM, FFS (fast-forward sequence) is provided for this purpose only so that hooking becomes possible. As mentioned in Microsoft docs, these sequences are mostly skipped because fast-forward sequences precisely do nothing; it would be a waste of emulation to run these instructions every time other than when a change is detected, i.e., in the case of hooking.

Let us try to first compile a sample application that uses Microsoft Detours for hooking. One of the features of Microsoft Detours is that it supports x86, x64, Itanium, and ARM processors.

#include <windows.h>

#include <stdio.h>
#include "detours.h"


HANDLE  WINAPI MyCreateFileA(
    LPCSTR                lpFileName,
    DWORD                 dwDesiredAccess,
    DWORD                 dwShareMode,
    LPSECURITY_ATTRIBUTES lpSecurityAttributes,
    DWORD                 dwCreationDisposition,
    DWORD                 dwFlagsAndAttributes,
    HANDLE                hTemplateFile
)
{
    printf("\nCreateFileA hook called ... ");
    return INVALID_HANDLE_VALUE;
}

HANDLE(WINAPI* pCreateFileA) (
    LPCSTR                lpFileName,
    DWORD                 dwDesiredAccess,
    DWORD                 dwShareMode,
    LPSECURITY_ATTRIBUTES lpSecurityAttributes,
    DWORD                 dwCreationDisposition,
    DWORD                 dwFlagsAndAttributes,
    HANDLE                hTemplateFile
    );

int main(int argc, char **argv)
    {

        HMODULE hKernel32 = LoadLibrary("kernel32.dll");

        DetourTransactionBegin();
        DetourUpdateThread(GetCurrentThread());

        if (hKernel32 == NULL) {
            // Handle the error
            return 1;
        }
        pCreateFileA = (HANDLE(__cdecl*)(LPCSTR, DWORD, DWORD, LPSECURITY_ATTRIBUTES, DWORD, DWORD, HANDLE) )GetProcAddress(hKernel32, "CreateFileA");

        DetourAttach(&(PVOID&)pCreateFileA, MyCreateFileA);
        if (DetourTransactionCommit() == NO_ERROR)
        {

        }
        else
        {
            printf("Unbale to load Detours engine ..");
            return EXIT_FAILURE;
        }

        const char* filePath = "example.txt";

        // Open the file for reading and writing
        HANDLE hFile = CreateFileA(
            filePath,                    // File path
            GENERIC_READ | GENERIC_WRITE, // Desired access (read and write)
            0,                            // Share mode (not shared)
            NULL,                         // Security attributes (default)
            OPEN_ALWAYS,                  // Creation disposition (open or create if not exists)
            FILE_ATTRIBUTE_NORMAL,        // File attributes (normal)
            NULL                          // Template file (not used)
        );

        if (hFile != INVALID_HANDLE_VALUE) {
            printf("File opened successfully!\n");

            CloseHandle(hFile);
        }
        else {
            DWORD dwError = GetLastError();
            fprintf(stderr, "Failed to open the file. Error code: %lu\n", dwError);
        }



        return EXIT_SUCCESS;
}

Screenshot 2024-03-10 at 10.23.28 PM.png

As we can see here, a hook is being called after CreateFileA is invoked. Similarly, if compiled as an ARM binary, it would also function properly. FFS in ARM is instead based on a JUMP table.

Compiling the same application for ARM would require a few changes, including linking against a detours.lib compiled specifically for ARM.

detours.png

armdetours.png

From WinDBG, we can observe a jump table where the transition from kernel32.dll to kernelbase.dll occurs. This approach is similar to the 5-instruction FFS (fast forward sequence) in x64 kernel32.dll.

Screenshot 2024-03-11 at 12.34.25 PM.png

Process Injection #

Most of executable type of windows malware are still x86 , so we will be checking injection on x86 binaries .

Malware often attempts to inject code into legitimate processes to evade detection, maintain persistence, and execute malicious activities without raising suspicion. Some of the commonly targeted processes for code injection by malware include:

Explorer.exe: As the Windows shell process responsible for managing the desktop and file management operations, injecting code into Explorer.exe allows malware to gain extensive control over the user’s system and interact with the graphical user interface.

svchost.exe: Since svchost.exe hosts multiple Windows services, injecting code into this process provides malware with the ability to execute arbitrary commands with system-level privileges, facilitating various malicious activities.

services.exe: As the Service Control Manager (SCM) process responsible for managing system services, injecting code into services.exe grants malware the ability to manipulate system services, modify system configurations, and execute arbitrary commands with elevated privileges.

Code injection, in general, involves launching a suspended instance of the executable and injecting/mapping/rebasing the malicious code into the process. The execution is then resumed either by changing the thread context of the original main thread to point towards the newly injected code or by creating a new thread that resumes execution at the injected code.

pic.jpg
image:cynet.com

But this raises the question: how would cross-injection work if the platform is Windows ARM and not x86?

To address this issue, there is a mechanism known as WOW64 FS redirection. Applications running under WOW64 have their file paths and registry paths redirected. Hence, under WOW64, system DLLs are loaded from the C:\Windows\SysWOW64\ folder instead of system32, which would contain 64-bit DLLs under a x64 operating system.

Most of the system32 DLLs and executables for the emulated architecture (x86 or x64) are provided in the redirected folder (SysWOW64). The DLLs and executables are compiled as x86 or x64 binaries.

Therefore, considering this redirection, the issue of binary type and path is already addressed.

Let’s take a practical example from a well-known Tofsee trojan. In Tofsee, the injection is done by launching a suspended version of svchost.exe.

Screenshot 2024-03-14 at 2.24.33 PM.png

Debugging the malware to see if it successfully executes, we clearly observe the process being created and the injection is successful.

Screenshot 2024-03-15 at 1.24.29 PM.png

Heaven’s Gate #

In the WOW64 environment, 32-bit applications run in a subsystem that emulates a 32-bit operating system on a 64-bit version of Windows. Normally, these applications cannot directly execute 64-bit code due to architectural limitations. However, Heaven’s Gate allows a WOW64 subsystem to enable a 32-bit process to execute 64-bit code directly.

The typical method to call any system services in Windows 32-bit on Windows 64-bit is a far jump known as Heaven’s Gate. This transition takes the call from 32-bit ntdll.dll to 64-bit ntdll.dll on Windows via j_Wow64Transition, and from there, a SYSCALL is executed. However, this only occurs on a WOW64 subsystem. Instead, in a native 32-bit subsystem, the translation to kernel land used to take place using the legacy int 0x2E Interrupt Gate, which eventually lands the code in the kernel at KiSystemService().

Imported File.jpeg

Either way, let’s experiment to see if an x86-64 Heaven’s Gate can be simulated in Windows ARM.

For this, we will write a simple Heaven’s Gate program that does nothing but a simple jump to Heaven’s Gate and back using far return.

#define WIN32_LEAN_AND_MEAN
#include <windows.h>
#include <stdio.h>

__declspec(naked) void HeavensGate(void)
{
    __asm
    {

            push 0x33
            call next
            next:
            add   [esp], 5
            retf
            __emit(0xE8) 
            __emit(0) 
            __emit(0) 
            __emit(0) 
            __emit(0)                                
            __emit(0xC7) 
            __emit(0x44) 
            __emit(0x24) 
            __emit(4) 
            __emit(0x23) 
            __emit(0) 
            __emit(0) 
            __emit(0)                                
            __emit(0x83) 
            __emit(4) 
            __emit(0x24) 
            __emit(0xD)                                
            __emit(0xCB)                                

            ret
    }
}


int main( int argc, char **argv)
{
    printf("Entering Heavens gate ...");

    HeavensGate();
    printf("\nLeaving Heavens gate ...");
    return 0;
}

Not surprisingly, it fails on Windows 11 ARM because the far jumps are not implemented in the x86 emulator on Windows ARM.

So one of the failures that would occur while trying to emulate the x86 binary on Windows 11 ARM would be a malware attempting to use Heaven’s Gate to either switch to its 64-bit variant or trying to make a 64-bit syscall (for EDR bypass or hook bypass).

A way to circumvent this is to dump the shellcode after the retf (provided it doesn’t share any context with the 32-bit binary, e.g., handles or objects) and create a binary blob out of it.

In summary, while techniques such as invoking syscalls, API hooking, process injection, and remapping/manual loading of system DLLs showcased promising results for dynamically analyzing x86-64 malware on Windows 11 ARM under WOW64 emulation, the challenges encountered with the Heaven’s Gate transition.

 
5
Kudos
 
5
Kudos

Now read this

Relocating BaseAddress Agnostic Memory Dumps

Often times we need a loaded base address of a memory image that needs to be disk realigned in order to load it and parse the binary successfully in binary analysis tools like IDA or debuggers . During linking phase the Preferred Base... Continue →