A Guide to Malware Binary Reconstruction
Often we come across times where binary reconstruction while analyzing malware / unpacking malware is required . Taking leverage of automated tools is not always useful, sometimes manual reconstruction is required. In this blog we will cover up manual and automated binary reconstruction .
Reconstructing IAT from stolen API code #
This technique is used to hinder IAT construction after malware finishes unpacking its code, but first we need to understand how IAT is implemented in PE (portable executable).
IAT basics #
IAT (Import Address Table) is an internal structure in PE file . It consists of information to instruct windows loader to load and resolve dynamic link libraries and corresponding API function addresses. If you examine a PE file you will notice two pointers in the IMAGE_OPTIONAL_HEADER . One points to an internal structure _IMAGE_IMPORT_DESCRIPTOR a part of which further points to _IMAGE_IMPORT_BY_NAME. Another pointer points towards an array of resolved API addresses.
Functions are either imported by name or ordinal(API number) .
FirstThunk member points to an array of resolved Addresses of API functions which is known as import address table.
given an example above which imports GetProcAddress() from kernel32.dll
packers usually destruct the original form of IAT and resolve the API functions themselves rather than relying on windows loader . In these situations we need to reconstruct IAT. We will be using Scylla v0.9.6b for reconstruction .
Stolen API code #
Some packer hinder IAT rebuilding by using stolen code technique .In this technique some of the instructions in beginning of an API subroutine are emulated somewhere in the allocated memory region . Scanning for imports gives out unexpected or invalid API resolution.
In this case of stolen API code IAT rebuilding is not possible with Scylla. We will be writing a Scylla plugin to facilitate getting correct offsets and rebuilding the IAT.
Writing a Scylla plugin : #
Scylla provides a basic API interface for building a plugin and also provides an API to embed Scylla in your application. Scylla plugin is basically is DLL file which gets injected inside the target process. To setup an internal structures scylla provides a named file mapping which can be used in the target dll to point it to that particular memory region .
FileMapping is named as “ScyllaPluginExchange”;
Following is the definition of a internal structure used by Scylla
pointer to UNRESOLVED_IMPORT can be obtained using SCYLLA_EXCHANGE.offsetUnresolvedImports
first we will get the filemaping using this code provided in some of the scylla shipped plugins
BOOL getMappedView()
{
hMapFile = OpenFileMappingA(FILE_MAP_ALL_ACCESS, 0, FILE_MAPPING_NAME); //open named file mapping object
if (hMapFile == 0)
{
writeToLogFile("OpenFileMappingA failed\r\n");
return FALSE;
}
lpViewOfFile = MapViewOfFile(hMapFile, FILE_MAP_ALL_ACCESS, 0, 0, 0); //map the view with full access
if (lpViewOfFile == 0)
{
CloseHandle(hMapFile); //close mapping handle
hMapFile = 0;
writeToLogFile("MapViewOfFile failed\r\n");
return FALSE;
}
else
{
return TRUE;
}
}
this function will populate a SCYLLA_EXCHANGE structure which can be used to obtain pointer a to UNRESOLVED_IMPORT structure .
UNRESOLVED_IMPORT structure contains list of unresolved imports .
typedef struct _UNRESOLVED_IMPORT { // Scylla Plugin exchange format
DWORD_PTR ImportTableAddressPointer; //in VA, address in IAT which points to an invalid api address
DWORD_PTR InvalidApiAddress; //in VA, invalid api address that needs to be resolved
} UNRESOLVED_IMPORT, *PUNRESOLVED_IMPORT;
ImportTableAddressPointer is a pointer which points towards an API address pointer and InvalidApiAddress points to a place where the call is made and in our case this is a dynamically generated memory region where stolen code is emulated.
As you can clearly see we need to locate the JMP address in each of ImportTableAddressPointer and we also need to calculate the Number Of opcodes preceding that JMP instruction which will be later on subtracted from the JMP destication to get the original API base address. Number Of opcodes vary from function to function .
For disassembly we will use Libdasm Library.
while (unresolvedImport->ImportTableAddressPointer != 0) //last element is a nulled struct
{
insDelta = 0;
invalidApiAddress = unresolvedImport->InvalidApiAddress;
sprintf(buffer, "API Address = 0x%p\t IAT Address = 0x%p\n", invalidApiAddress, unresolvedImport->ImportTableAddressPointer);
writeToLogFile(buffer);
IATbase = unresolvedImport->InvalidApiAddress;
for (j = 0; j < COUNT_INS; j++)
{
memset(&inst, 0x00, sizeof(INSTRUCTION));
i = get_instruction(&inst, IATbase, MODE_32);
memset(buffer, 0x00, sizeof(buffer));
get_instruction_string(&inst, FORMAT_ATT, 0, buffer, sizeof(buffer));
if (strstr(buffer, "jmp"))
{
printf(" JUMP Dest = %d" , ( (unsigned int)strtol(strstr(buffer, "jmp") + 4 + 2, NULL, 16)));
*(DWORD*)(unresolvedImport->ImportTableAddressPointer) = ( (unsigned int)strtol(strstr(buffer, "jmp") + 4 + 2, NULL, 16) + IATbase ) - insDelta;
unresolvedImport->InvalidApiAddress = ( (unsigned int)strtol(strstr(buffer, "jmp") + 4 + 2, NULL, 16) + IATbase ) - insDelta;
break;
}
else{
insDelta = insDelta + i;
}
IATbase = IATbase + i;
}
unresolvedImport++; //next pointer to struct
}
This code will traverse a list of unresolved imports and will try to locate a destination jump address .
Opcode Delta is subtracted from a destination address which will be the final address of InvalidApiAddress
unresolvedImport->InvalidApiAddress = ( (unsigned int)strtol(strstr(buffer, “jmp”) + 4 + 2, NULL, 16) + IATbase ) - insDelta;
Doing a full range manual IAT search in Scylla shows up most of the API calls and some of them are invalid which will not be fixed by the Plugin . These imports must be discarded from the thunk manually .
After running the plugin most of the imports are resolved , but still some of them are invalid and need to be deleted from the thunk .
Dumping and Aligning RunPE+ (64 bit PE ) . #
RunPE works by creating a dummy process in a suspended mode and hollowing /injecting a hostile code inside in that particular remote process . This technique is used for staying hidden. RunPE injected code can carved out into a valid PE file . The Header format for PE+ files has been changed a bit from the 32bit version .MS introduced some QWORDS which are relevant to 64bit architecture.
When the file is mapped by windows loader a page for a section map is aligned according to VirtualSize given in IMAGE_DATA_DIRECTORY .Actual RawSize of section might be smaller or equal to the VirtualSize. If it is smaller , rest is filled with void , but a PE on disk is aligned according to FileAlignment as given in struct _IMAGE_OPTIONAL_HEADER64
So while converting a memory map to PE+ dump , we need to align a PE+ file in a way as suggested by File-alignment .
Loading a dump as it is would give out unexpected results as it is not properly aligned .
An Aligner code is required to fix this problem . First we will begin with parsing a PE+ file in PE standard structures
IMAGE_DOS_HEADER DosHdr = {0};
IMAGE_FILE_HEADER FileHdr = {0};
IMAGE_OPTIONAL_HEADER64 OptHdr = {0};
// Read All Structure as per offset
fread(&DosHdr, sizeof(IMAGE_DOS_HEADER), 0x01, fp);
fseek(fp, (unsigned int)DosHdr.e_lfanew + 4,SEEK_SET);
fread(&FileHdr, sizeof(IMAGE_FILE_HEADER), 1, fp);
fread(&OptHdr, sizeof(IMAGE_OPTIONAL_HEADER64), 1, fp);
Traverse and Read all section headers
while (iNumSec < FileHdr.NumberOfSections)
{
fread(&pTail[iNumSec], sizeof( IMAGE_SECTION_HEADER), 1, fp);
iNumSec++;
}
After that we will read until the beginning of PointerToRawData of the first section
i = ftell(fp);
buffer = (unsigned char*) malloc(sizeof(char) * pTail[0].PointerToRawData + 1);
fseek(fp, 0, SEEK_SET);
fread(buffer, pTail[0].PointerToRawData, 1, fp); // Read/Write Everything Till the beginning of first section
fwrite(buffer, pTail[0].PointerToRawData, 1, out);
and then finally we rewrite the data in an aligned form . It will read data from a section of length SizeOfRawData and then seek for another chunk at the next VirtualAddress
while ( i < iNumSec)
{
buffer = (unsigned char*) malloc(sizeof(char) * pTail[i].SizeOfRawData + 1);
fseek(fp, pTail[i].VirtualAddress, SEEK_SET);
fread(buffer, pTail[i].SizeOfRawData, 1, fp);
fwrite(buffer, pTail[i].SizeOfRawData, 1, out);
i++;
}
After fixing the dump it becomes a valid PE+ file and properly loads in IDA