Skip to content

Hook Scanning for Application layer Processes

Application layer hooks are a programming technique that allows programs to define or extend their behavior by executing specific code when a specific event occurs. In general, third-party applications use hooks to implement program extensions when needed. However, since memory data is modified and disk data remains the original data, it provides the possibility to scan these hooks. Specifically, the principle of hook scanning is to read the disassembly code from the PE file on the disk and compare it with the one in memory. When there is a difference between the two, it can be proven that the program has been hooked.

This case will use the Capstone engine to implement hook scanning function. The reason for choosing this engine is because it supports the Python package and can easily interact with the LyScript plugin. In addition, the Capstone engine has a wide range of applications in reverse engineering, vulnerability analysis, malicious code analysis, and other fields. The famous disassembly debugger IDA also uses this engine.

The main features of the Capstone engine include:

  • Supports multiple instruction sets: supports multiple instruction sets such as x86, ARM, MIPS, PowerPC, etc., and can run on different platforms.
  • Lightweight and efficient: written in C language, the code is concise and efficient, and the disassembly speed is fast.
  • Easy to use: Provides easy-to-use APIs and documentation, supporting multiple programming languages such as Python, Ruby, Java, etc.
  • Customizability: Provides multiple configurable options to meet the needs of different users.

The installation of the Capstone engine is very easy, just execute pip install capstone. When disassembling with Capstone, readers only need to pass in a PE file path and call it through md.disasm (HexCode, 0).

Read the disk machine code and disassemble it

In the following code, first use the pefile library to read the PE file, obtain the ImageBase base address of the file, as well as the VirtualAddress, Misc_VirtualSize, andPointerToRawDatainformation of the .text section table. Next, the code calculates the starting address StartVAand ending addressStopVA of the .text section table, and uses file pointers to read the raw data of the .text section table in the file. Finally, the data is disassembled and returned through capstone.

python
from x32dbg import Debugger
from capstone import *
import pefile

# Disassembling disk files
def DisassemblyFile(FilePath):
    ref_list = []

    # Load the PE file
    pe = pefile.PE(FilePath)

    # Get the base address of the program
    ImageBase = pe.OPTIONAL_HEADER.ImageBase

    # Find the .text section to extract disassembly information
    for item in pe.sections:
        if str(item.Name.decode('UTF-8').strip(b'\x00'.decode())) == '.text':
            VirtualAddress = item.VirtualAddress
            VirtualSize = item.Misc_VirtualSize
            ActualOffset = item.PointerToRawData

    # Calculate start and stop virtual addresses of the .text section
    StartVA = ImageBase + VirtualAddress
    StopVA = ImageBase + VirtualAddress + VirtualSize

    # Open the file and read the binary data of the .text section
    with open(FilePath, "rb") as fp:
        fp.seek(ActualOffset)
        HexCode = fp.read(VirtualSize)

    # Initialize Capstone disassembler for x86 (32-bit)
    md = Cs(CS_ARCH_X86, CS_MODE_32)

    # Disassemble the binary code
    for item in md.disasm(HexCode, 0):
        # Calculate the address of each instruction
        addr = hex(int(StartVA) + item.address)

        # Construct a dictionary with address and assembly instruction
        dic = {"Address": str(addr), "Assembly": item.mnemonic + " " + item.op_str}

        # Append the dictionary to the reference list
        ref_list.append(dic)

    return ref_list

if __name__ == "__main__":
    # Example usage of DisassemblyFile function
    dasm = DisassemblyFile("C://Win32Project.exe")

    # Print out disassembled instructions
    for index in dasm:
        print("{} | {}".format(index.get("Address"), index.get("Assembly")))

When running the above code snippet, all disassembly code in the .text section of the Win32Project.exe program can be output, and some of the output effects are as follows;

python
0x401000 | push ebp
0x401001 | mov ebp, esp
0x401003 | sub esp, 0x1c
0x401006 | push esi
0x401007 | mov esi, dword ptr [0x4020e0]
0x40100d | push edi
0x40100e | mov edi, dword ptr [ebp + 8]
0x401011 | push 0x64
0x401013 | push 0x403438
0x401018 | push 0x67

Read memory machine code and disassemble

Next, it is necessary to read the PE machine code from memory and disassemble it into an assembly instruction set through the Capstone engine. In the following case, the DisamblyMemory function is the specific details of implementing memory disassembly. In the case, first call get_memory_localbase to obtain the program's entry address, then call get_memory_localsize to obtain the program length. In the DisamblyMemory function, use get_memory_byte to read the machine code into memory one by one, and call md.disasm to complete the disassembly.

python
from x32dbg import Debugger
from capstone import *

# Disassembling memory files
def DisassemblyMemory(address,offset,len):
    ref_list = []

    # Store the memory list as a byte array
    ref_memory_list = bytearray()

    # Loop byte by byte reading of data into variables
    for index in range(offset,len):
        char = dbg.get_memory_byte(address + index)
        ref_memory_list.append(char)

    # Disassemble data in memory
    md = Cs(CS_ARCH_X86,CS_MODE_32)
    for item in md.disasm(ref_memory_list,0x1):
        addr = hex(int(pe_base) + item.address)
        ref_list.append({"Address": str(addr), "Assembly": item.mnemonic + " " + item.op_str})
    return ref_list

if __name__ == "__main__":
    dbg = Debugger(address="127.0.0.1",port=6589)
    if False == dbg.connect():
        exit()

    # Read the. text base address from memory
    pe_base = dbg.get_memory_localbase()

    # Read the length of. text in memory
    pe_size = dbg.get_memory_localsize()

    # Get memory disassembly code
    dasm = DisassemblyMemory(pe_base,0,pe_size)

    # Print out disassembled instructions
    for index in dasm:
        print("{} | {}".format(index.get("Address"), index.get("Assembly")))

    dbg.close()

When running the above code snippet, the disassembly code in the program memory of Win32Project.exe can be output. By comparison, it is found that the disassembly fragments of the two are consistent except for different memory addresses;

python
0xd61001 | push ebp
0xd61002 | mov ebp, esp
0xd61004 | sub esp, 0x1c
0xd61007 | push esi
0xd61008 | mov esi, dword ptr [0xd620e0]
0xd6100e | push edi
0xd6100f | mov edi, dword ptr [ebp + 8]
0xd61012 | push 0x64
0xd61014 | push 0xd63438
0xd61019 | push 0x67

Comparing disk and memory disassembly

Finally, by comparing the memory and disk files, it is possible to determine which locations have been linked. Of course, before comparing, it is necessary to ensure that the linear addresses in memory are consistent with those on the disk. Two functions are encapsulated here, where scand_for_hooks_all is used to output all comparison parameters, while scan_for_hooks is only used to output specific parameters.

python
from x32dbg import Debugger
from capstone import *
import pefile

# Disassembling memory files
def DisassemblyMemory(address,offset,len):
    ref_list = []

    # Store the memory list as a byte array
    ref_memory_list = bytearray()

    # Loop byte by byte reading of data into variables
    for index in range(offset,len):
        char = dbg.get_memory_byte(address + index)
        ref_memory_list.append(char)

    # Disassemble data in memory
    md = Cs(CS_ARCH_X86,CS_MODE_32)
    for item in md.disasm(ref_memory_list,0x1):
        addr = hex(int(pe_base) + item.address)
        ref_list.append({"Address": str(addr), "Assembly": item.mnemonic + " " + item.op_str})
    return ref_list

# Disassembling disk files
def DisassemblyFile(FilePath):
    ref_list = []

    # Load the PE file
    pe = pefile.PE(FilePath)

    # Get the base address of the program
    ImageBase = pe.OPTIONAL_HEADER.ImageBase

    # Find the .text section to extract disassembly information
    for item in pe.sections:
        if str(item.Name.decode('UTF-8').strip(b'\x00'.decode())) == '.text':
            VirtualAddress = item.VirtualAddress
            VirtualSize = item.Misc_VirtualSize
            ActualOffset = item.PointerToRawData

    # Calculate start and stop virtual addresses of the .text section
    StartVA = ImageBase + VirtualAddress
    StopVA = ImageBase + VirtualAddress + VirtualSize

    # Open the file and read the binary data of the .text section
    with open(FilePath, "rb") as fp:
        fp.seek(ActualOffset)
        HexCode = fp.read(VirtualSize)

    # Initialize Capstone disassembler for x86 (32-bit)
    md = Cs(CS_ARCH_X86, CS_MODE_32)

    # Disassemble the binary code
    for item in md.disasm(HexCode, 0):
        # Calculate the address of each instruction
        addr = hex(int(StartVA) + item.address)

        # Construct a dictionary with address and assembly instruction
        dic = {"Address": str(addr), "Assembly": item.mnemonic + " " + item.op_str}

        # Append the dictionary to the reference list
        ref_list.append(dic)

    return ref_list

# Function to scan for hooks in disassembly all
def scan_for_hooks_all(dasm_memory, dasm_file, mem_base, file_base):
    hooks_found = False
    for index in range(min(len(dasm_memory), len(dasm_file))):
        if dasm_memory[index]["Assembly"] != dasm_file[index]["Assembly"]:
            print("Hook found at address: {}".format(dasm_memory[index]["Address"]))
            print("Memory disassembly: {}".format(dasm_memory[index]["Assembly"]))
            print("File disassembly:   {}".format(dasm_file[index]["Assembly"]))
            hooks_found = True
    return hooks_found

# Function to scan for hooks in disassembly
def scan_for_hooks(dasm_memory, dasm_file, mem_base, file_base):
    hooks_found = False
    for index in range(min(len(dasm_memory), len(dasm_file))):
        # Subtract the memory or file base address from the current memory address to obtain a relative address
        mem_addr = int(dasm_memory[index]["Address"],16) - mem_base
        file_addr = int(dasm_file[index]["Address"],16) - file_base

        # Skip if addresses don't match
        if mem_addr != file_addr:
            continue

        if dasm_memory[index]["Assembly"] != dasm_file[index]["Assembly"]:
            print("Hook found at address: {}".format(dasm_memory[index]["Address"]))
            print("Memory disassembly: {}".format(dasm_memory[index]["Assembly"]))
            print("File disassembly:   {}".format(dasm_file[index]["Assembly"]))
            hooks_found = True

    return hooks_found

if __name__ == "__main__":
    dbg = Debugger(address="127.0.0.1",port=6589)
    if False == dbg.connect():
        exit()

    # Read the.text base address from memory
    pe_base = dbg.get_memory_localbase()

    # Read the length of.text in memory
    pe_size = dbg.get_memory_localsize()

    # Get memory disassembly code
    dasm_memory = DisassemblyMemory(pe_base, 0, pe_size)

    # Example usage of DisassemblyFile function
    dasm_file = DisassemblyFile("C://Win32Project.exe")

    # Compare memory and file disassembly
    file_base = int(dasm_file[0]["Address"],16)

    # Scan the hook situation
    hooks_found = scan_for_hooks_all(dasm_memory, dasm_file, pe_base, file_base)
    if not hooks_found:
        print("No hooks found.")

    dbg.close()

Firstly, run the scan_for_hooks_all function, which can output all exception instructions and the memory address where the instruction is located. Due to the different base addresses in the file and memory, this method will output all eligible comparison items, as shown below;

python
Hook found at address: 0xd61008
Memory disassembly: mov esi, dword ptr [0xd620e0]
File disassembly:   mov esi, dword ptr [0x4020e0]

Hook found at address: 0xd61014
Memory disassembly: push 0xd63438
File disassembly:   push 0x403438

Hook found at address: 0xd61020
Memory disassembly: push 0xd63370
File disassembly:   push 0x403370

Hook found at address: 0xd6102c
Memory disassembly: call 0xf1
File disassembly:   call 0xf0

Hook found at address: 0xd61036
Memory disassembly: nop
File disassembly:   push 0

When running the scan_for_hooks function, it will output exceptions excluding uncorrected memory addresses, as shown below;

python
Hook found at address: 0xd61037
Memory disassembly: nop
File disassembly:   push 0