A Guide to Malware Binary Reconstruction

Often we come across times where binary reconstruction while analyzing malware / unpacking malware is required . Taking leverage of automated tools is not always useful, sometimes manual reconstruction is required. In this blog we will cover up manual and automated binary reconstruction .

Reconstructing IAT from stolen API code #

This technique is used to hinder IAT construction after malware finishes unpacking its code, but first we need to understand how IAT is implemented in PE (portable executable).

IAT basics #

IAT (Import Address Table) is an internal structure in PE file . It consists of information to instruct windows loader to load and resolve dynamic link libraries and corresponding API function addresses. If you examine a PE file you will notice two pointers in the IMAGE_OPTIONAL_HEADER . One points to an internal structure _IMAGE_IMPORT_DESCRIPTOR a part of which further points to _IMAGE_IMPORT_BY_NAME. Another pointer points towards an array of resolved API addresses.

1.png

Functions are either imported by name or ordinal(API number) .

FirstThunk member points to an array of resolved Addresses of API functions which is known as import address table.

given an example above which imports GetProcAddress() from kernel32.dll
2.png

packers usually destruct the original form of IAT and resolve the API functions themselves rather than relying on windows loader . In these situations we need to reconstruct IAT. We will be using Scylla v0.9.6b for reconstruction .

Stolen API code #

Some packer hinder IAT rebuilding by using stolen code technique .In this technique some of the instructions in beginning of an API subroutine are emulated somewhere in the allocated memory region . Scanning for imports gives out unexpected or invalid API resolution.

3.png

4.png
5.png

In this case of stolen API code IAT rebuilding is not possible with Scylla. We will be writing a Scylla plugin to facilitate getting correct offsets and rebuilding the IAT.

Writing a Scylla plugin : #

Scylla provides a basic API interface for building a plugin and also provides an API to embed Scylla in your application. Scylla plugin is basically is DLL file which gets injected inside the target process. To setup an internal structures scylla provides a named file mapping which can be used in the target dll to point it to that particular memory region .

FileMapping is named as “ScyllaPluginExchange”;

Following is the definition of a internal structure used by Scylla

6.png

pointer to UNRESOLVED_IMPORT can be obtained using SCYLLA_EXCHANGE.offsetUnresolvedImports

first we will get the filemaping using this code provided in some of the scylla shipped plugins


BOOL getMappedView()
{
    hMapFile = OpenFileMappingA(FILE_MAP_ALL_ACCESS, 0, FILE_MAPPING_NAME); //open named file mapping object

    if (hMapFile == 0)
    {
        writeToLogFile("OpenFileMappingA failed\r\n");
        return FALSE;
    }

    lpViewOfFile = MapViewOfFile(hMapFile, FILE_MAP_ALL_ACCESS, 0, 0, 0); //map the view with full access

    if (lpViewOfFile == 0)
    {
        CloseHandle(hMapFile); //close mapping handle
        hMapFile = 0;
        writeToLogFile("MapViewOfFile failed\r\n");
        return FALSE;
    }
    else
    {
        return TRUE;
    }
}

this function will populate a SCYLLA_EXCHANGE structure which can be used to obtain pointer a to UNRESOLVED_IMPORT structure .

UNRESOLVED_IMPORT structure contains list of unresolved imports .


typedef struct _UNRESOLVED_IMPORT {       // Scylla Plugin exchange format
    DWORD_PTR ImportTableAddressPointer;  //in VA, address in IAT which points to an invalid api address
    DWORD_PTR InvalidApiAddress;          //in VA, invalid api address that needs to be resolved
} UNRESOLVED_IMPORT, *PUNRESOLVED_IMPORT;

ImportTableAddressPointer is a pointer which points towards an API address pointer and InvalidApiAddress points to a place where the call is made and in our case this is a dynamically generated memory region where stolen code is emulated.

7.png

As you can clearly see we need to locate the JMP address in each of ImportTableAddressPointer and we also need to calculate the Number Of opcodes preceding that JMP instruction which will be later on subtracted from the JMP destication to get the original API base address. Number Of opcodes vary from function to function .
For disassembly we will use Libdasm Library.

while (unresolvedImport->ImportTableAddressPointer != 0) //last element is a nulled struct
    {
        insDelta = 0;
        invalidApiAddress = unresolvedImport->InvalidApiAddress;
        sprintf(buffer, "API Address = 0x%p\t IAT Address = 0x%p\n",  invalidApiAddress, unresolvedImport->ImportTableAddressPointer);

        writeToLogFile(buffer);

      IATbase = unresolvedImport->InvalidApiAddress;
        for (j = 0; j <  COUNT_INS; j++)
        {
            memset(&inst, 0x00, sizeof(INSTRUCTION));

            i = get_instruction(&inst, IATbase, MODE_32);
            memset(buffer, 0x00, sizeof(buffer));
            get_instruction_string(&inst, FORMAT_ATT, 0, buffer, sizeof(buffer));
            if (strstr(buffer, "jmp"))
            {

                printf(" JUMP Dest = %d" , ( (unsigned int)strtol(strstr(buffer, "jmp") + 4 + 2, NULL, 16)));
                *(DWORD*)(unresolvedImport->ImportTableAddressPointer) =  ( (unsigned int)strtol(strstr(buffer, "jmp") + 4 + 2, NULL, 16) + IATbase ) - insDelta;
                unresolvedImport->InvalidApiAddress = ( (unsigned int)strtol(strstr(buffer, "jmp") + 4 + 2, NULL, 16) + IATbase ) - insDelta;


                break;
            }
            else{
                insDelta = insDelta + i;
            }


            IATbase = IATbase + i;
        }
        unresolvedImport++; //next pointer to struct
    }


This code will traverse a list of unresolved imports and will try to locate a destination jump address .

Opcode Delta is subtracted from a destination address which will be the final address of InvalidApiAddress

unresolvedImport->InvalidApiAddress = ( (unsigned int)strtol(strstr(buffer, “jmp”) + 4 + 2, NULL, 16) + IATbase ) - insDelta;

Doing a full range manual IAT search in Scylla shows up most of the API calls and some of them are invalid which will not be fixed by the Plugin . These imports must be discarded from the thunk manually .

8.png
9.png

After running the plugin most of the imports are resolved , but still some of them are invalid and need to be deleted from the thunk .
10.png

11.png

Dumping and Aligning RunPE+ (64 bit PE ) . #

RunPE works by creating a dummy process in a suspended mode and hollowing /injecting a hostile code inside in that particular remote process . This technique is used for staying hidden. RunPE injected code can carved out into a valid PE file . The Header format for PE+ files has been changed a bit from the 32bit version .MS introduced some QWORDS which are relevant to 64bit architecture.

When the file is mapped by windows loader a page for a section map is aligned according to VirtualSize given in IMAGE_DATA_DIRECTORY .Actual RawSize of section might be smaller or equal to the VirtualSize. If it is smaller , rest is filled with void , but a PE on disk is aligned according to FileAlignment as given in struct _IMAGE_OPTIONAL_HEADER64

12.png

So while converting a memory map to PE+ dump , we need to align a PE+ file in a way as suggested by File-alignment .
13.png

Loading a dump as it is would give out unexpected results as it is not properly aligned .
14.png

An Aligner code is required to fix this problem . First we will begin with parsing a PE+ file in PE standard structures

IMAGE_DOS_HEADER DosHdr = {0};
IMAGE_FILE_HEADER FileHdr = {0};
IMAGE_OPTIONAL_HEADER64 OptHdr = {0};

// Read All Structure as per offset

fread(&DosHdr, sizeof(IMAGE_DOS_HEADER), 0x01, fp);

fseek(fp, (unsigned int)DosHdr.e_lfanew + 4,SEEK_SET);

fread(&FileHdr, sizeof(IMAGE_FILE_HEADER), 1, fp);
fread(&OptHdr, sizeof(IMAGE_OPTIONAL_HEADER64), 1, fp);

Traverse and Read all section headers

while (iNumSec < FileHdr.NumberOfSections)
    {
        fread(&pTail[iNumSec], sizeof( IMAGE_SECTION_HEADER), 1, fp);
        iNumSec++;
    }

After that we will read until the beginning of PointerToRawData of the first section

i = ftell(fp);

    buffer = (unsigned char*) malloc(sizeof(char) * pTail[0].PointerToRawData + 1);

    fseek(fp, 0, SEEK_SET);

    fread(buffer, pTail[0].PointerToRawData, 1, fp); // Read/Write Everything Till the beginning of first section

    fwrite(buffer, pTail[0].PointerToRawData, 1, out);

and then finally we rewrite the data in an aligned form . It will read data from a section of length SizeOfRawData and then seek for another chunk at the next VirtualAddress


while ( i < iNumSec)
    {

        buffer = (unsigned char*) malloc(sizeof(char) * pTail[i].SizeOfRawData + 1);

        fseek(fp, pTail[i].VirtualAddress, SEEK_SET);
        fread(buffer, pTail[i].SizeOfRawData, 1, fp);

        fwrite(buffer, pTail[i].SizeOfRawData, 1, out);
        i++;
    }

After fixing the dump it becomes a valid PE+ file and properly loads in IDA

15.png

 
205
Kudos
 
205
Kudos

Now read this

Using concolic execution for static analysis of malware

Reverse engineering is about reducing the complex equation of binary code into na abstract understandable form . Dynamic and static analysis can speed up the process to a large extent , but they have their limitations when malware... Continue →