Self-Restoring Binary Deobfuscation with Unicorn and Capstone

This page has been machine-translated from the original page.

This article covers the deobfuscation technique for self-restoring binaries using Unicorn and Capstone, based on the Rev challenge “Singlestep” from Cyber Apocalypse CTF 2025.

Reference: Cyber Apocalypse CTF 2025 Writeup

I had used Unicorn for emulating execution code once a long time ago, but had not touched it much since.

Reference: Emulating x86_64 Architecture Shellcode with Unicorn

During the contest, I used gdb-python to forcibly extract the deobfuscated assembly from the execution code, but I ended up extracting an enormous amount of code due to failing to account for loop processing, and the extracted code could not be decompiled.

However, the Unicorn + Capstone method described here allows for a much cleaner deobfuscation of the binary, and also makes it possible to extract the deobfuscated code in a form that can be decompiled.

Problem Overview
Hooking and Dumping Execution Code with Unicorn and Capstone
Deobfuscating the Challenge Binary
Obtaining the Flag
Summary

Problem Overview

Singlestep

Malakar has locked away a sacred artifact away in his dungeon. He has enchanted the locking mechanism to be self-protecting. Can you embark on a mission to free the artifact back to the people’s hands?

Running the provided ELF binary prompts for some input.

Entering the correct value appears to yield the flag.

Decompiling the binary shows that the main function simply calls a function at address 0x43e0.

This function is implemented as follows. It is clearly attempting to replace the code at 0x43ec via an XOR operation and then continue execution.

Analyzing this operation in gdb confirms that when the XOR at 0x43e1 is executed, the code at 0x43ec is replaced with what appears to be a function prologue.

Furthermore, looking at the code above, the XOR at 0x43f1 restores the code at 0x43ec to its original form, ensuring that the decrypted code does not persist in memory after execution.

Tracing through the rest of the code, it becomes clear that the binary repeatedly decodes, executes, and re-encodes the real execution code in blocks ranging from pushfq -> xor to xor -> popfq.

During the contest I forcibly extracted the deobfuscated assembly using gdb-python, but this time I’ll deobfuscate more elegantly using Unicorn and Capstone.

Hooking and Dumping Execution Code with Unicorn and Capstone

The following code hooks the execution code emulated by Unicorn and outputs the disassembly results using Capstone.

(Due to the binary’s self-modifying behavior, the hook behavior for the XOR line that modifies execution code may differ slightly from the actual program behavior — this is ignored for now.)

from unicorn import *
from unicorn.x86_const import *
from capstone import *

# ============================
# Globals
# ============================

# Initialize Unicorn & Capstone
mu = Uc(UC_ARCH_X86, UC_MODE_64) # mu is initialized as a virtual CPU
md = Cs(CS_ARCH_X86, CS_MODE_64) # md is initialized as an x64 disassembler

# Initial values
with open('singlestep', 'rb') as f:
    code = f.read()

call_addrs = [0x43E0]
current_call_addr = 0
previous_call_addr = 0

# Hook function
def hook_code(uc, address, size, user_data):
    global current_call_addr
    global previous_call_addr
    previous_call_addr = current_call_addr
    
    instruction_bytes =  uc.mem_read(address, size)
    for i,instruction in enumerate(md.disasm(instruction_bytes, address)):
        complete_instruction = f"{i} 0x{instruction.address:x}:\t{instruction.mnemonic}\t{instruction.op_str}"
        print(complete_instruction)
        
        if instruction.mnemonic == "call":
            addr = int(instruction.op_str,16)
            if addr < 0x1260:
                print("PLT function called.")
                # uc.emu_stop()
                return

        elif instruction.mnemonic == "ret":
            # uc.emu_stop()
            return

# ============================


# Initialize virtual memory, RSP/RBP
stack_addr = 0x900000
stack_size = 0x100000
mu.mem_map(stack_addr, stack_size)
mu.reg_write(UC_X86_REG_RSP, stack_addr + stack_size - 8 - 0x200)
mu.reg_write(UC_X86_REG_RBP, stack_addr + stack_size - 8)

# Read original binary
with open("singlestep", "rb") as f:
    code = f.read()

# Allocate 0x50000 bytes of virtual memory and map the execution code
code_addr = 0x0
mu.mem_map(code_addr,0x50000)
mu.mem_write(code_addr,code)

# Add hook to Unicorn
# UC_HOOK_CODE invokes the hook immediately before each instruction executes
# hook_code(mu, address, size, user_data)
mu.hook_add(UC_HOOK_CODE, hook_code)

# Run program
while len(call_addrs) > 0:
    try:
        current_call_addr = call_addrs.pop()
        mu.emu_start(current_call_addr,0x900D) # emu_start(begin,end)
    except Exception as e:
        print("Error: %s" % e)
        print("at : %s" % hex(mu.reg_read(UC_X86_REG_RIP)))
        break

Emulator Initialization

The following is an excerpt of the emulator initialization code.

Both Uc(UC_ARCH_X86, UC_MODE_64) and Cs(CS_ARCH_X86, CS_MODE_64) initialize Unicorn and Capstone respectively, targeting x64.

The code then allocates a stack region, initializes the RSP/RBP registers, and finally maps the executable file data directly into memory.

To properly emulate an ELF, you would normally need to reproduce the ELF loader behavior (library loading, section alignment, etc.), but since we only need to emulate the XOR-based restoration of execution code, mapping the raw ELF file directly is sufficient.

# Initialize Unicorn & Capstone
mu = Uc(UC_ARCH_X86, UC_MODE_64) # mu is initialized as a virtual CPU
md = Cs(CS_ARCH_X86, CS_MODE_64) # md is initialized as an x64 disassembler


# Initialize virtual memory, RSP/RBP
stack_addr = 0x900000
stack_size = 0x100000
mu.mem_map(stack_addr, stack_size)
mu.reg_write(UC_X86_REG_RSP, stack_addr + stack_size - 8 - 0x200)
mu.reg_write(UC_X86_REG_RBP, stack_addr + stack_size - 8)


# Read original binary
with open("singlestep", "rb") as f:
    code = f.read()

    
# Allocate 0x50000 bytes of virtual memory and map the execution code
code_addr = 0x0
mu.mem_map(code_addr,0x50000)
mu.mem_write(code_addr,code)

Reference: Programming with C & Python languages – Unicorn – The Ultimate CPU emulator

Hooking Execution Code with Unicorn

The following code uses UC_HOOK_CODE to hook every instruction’s execution, configuring a hook for Unicorn code execution.

Unicorn supports hooking other operations too, such as specific memory accesses.

Reference: alexander-hanel/unicorn-engine-notes

# Add hook to Unicorn
# UC_HOOK_CODE invokes the hook immediately before each instruction executes
# hook_code(mu, address, size, user_data)
mu.hook_add(UC_HOOK_CODE, hook_code)

# Run program
while len(call_addrs) > 0:
    try:
        current_call_addr = call_addrs.pop()
        mu.emu_start(current_call_addr,0x900D) # emu_start(begin,end)
    except Exception as e:
        print("Error: %s" % e)
        print("at : %s" % hex(mu.reg_read(UC_X86_REG_RIP)))
        break

Disassembling Execution Code with Capstone

In the hook function called at each instruction, Capstone is used to disassemble the execution code.

When the hook function is invoked by Unicorn, the instruction address and its size are passed as arguments.

The byte data obtained via uc.mem_read(address, size) is passed to md.disasm(instruction_bytes, address), which returns an iterator containing the disassembly result.

Reference: Disassemble in iterartion style – Capstone – The Ultimate Disassembler

# Hook function
def hook_code(uc, address, size, user_data):
    global current_call_addr
    global previous_call_addr
    previous_call_addr = current_call_addr
    
    instruction_bytes =  uc.mem_read(address, size)
    for i,instruction in enumerate(md.disasm(instruction_bytes, address)):
        complete_instruction = f"{i} 0x{instruction.address:x}:\t{instruction.mnemonic}\t{instruction.op_str}"
        print(complete_instruction)
        
        if instruction.mnemonic == "call":
            addr = int(instruction.op_str,16)
            if addr < 0x1260:
                print("PLT function called.")
                # uc.emu_stop()
                return

        elif instruction.mnemonic == "ret":
            # uc.emu_stop()
            return

Running this allows the disassembly results of the hooked execution code to be dumped as shown below.

Deobfuscating the Challenge Binary

To deobfuscate the challenge binary, two main operations are needed:

Replace all addresses in the binary’s execution code that are not the actual deobfuscated instructions with NOP.
Replace the obfuscated portions with the deobfuscated instructions.

The following challenges arise for this binary:

A condition must be defined to clearly distinguish the actual deobfuscated instructions from all other instructions.
Library function calls and similar must be ignored so emulation can continue.
The deobfuscated code itself does not need to be executed (input validation, branching, etc.).

With the above in mind, the following solver was created.

Running this code produces a deobfuscated binary containing the deobfuscated execution code.

from unicorn import *
from unicorn.x86_const import *
from capstone import *
import copy

# ============================
# Globals
# ============================

# Initialize Unicorn & Capstone
mu = Uc(UC_ARCH_X86, UC_MODE_64) # mu is initialized as a virtual CPU
md = Cs(CS_ARCH_X86, CS_MODE_64) # md is initialized as an x64 disassembler

# Initial values
with open('singlestep', 'rb') as f:
    code = f.read()

code = bytearray(code)
deobfuscated_code = copy.deepcopy(code)

call_addrs = [0x43E0]
deobfuscated_addrs = []
popfq_flag = False
current_call_addr = 0
previous_call_addr = 0

# Hook function
def hook_code(uc, address, size, user_data):
    global current_call_addr
    global previous_call_addr
    global popfq_flag
    global deobfuscated_code
    global deobfuscated_addrs
    previous_call_addr = current_call_addr
    
    instruction_bytes =  uc.mem_read(address, size)

    for i,instruction in enumerate(md.disasm(instruction_bytes, address)):
        complete_instruction = f"{i} 0x{instruction.address:x}:\t{instruction.mnemonic}\t{instruction.op_str}"

    if instruction.mnemonic == "popfq":
        popfq_flag = True
        deobfuscated_code[address:address+size] = b"\x90" * size
    
    elif instruction.mnemonic == "pushfq":
        popfq_flag = False
        deobfuscated_code[address:address+size] = b"\x90" * size
    
    elif instruction.mnemonic == "ret":
        popfq_flag = False
        print("ret.")
        # print(complete_instruction)
        uc.emu_stop()
        return
    
    else:
        if popfq_flag and address not in deobfuscated_addrs:
            popfq_flag = False

            # print(complete_instruction)
            deobfuscated_code[address:address+size] = instruction_bytes
            deobfuscated_addrs.append(address)

            # Skip de-deobfuscated instruction
            uc.reg_write(UC_X86_REG_RIP, address + size)
            
            if instruction.mnemonic == "call":
                addr = int(instruction.op_str,16)
                if addr < 0x1260:
                    print("PLT function called.")
                    return
                else:
                    call_addrs.append(addr)

        else:
            deobfuscated_code[address:address+size] = b"\x90" * size

    return
    
# ============================


# Initialize virtual memory, RSP/RBP
stack_addr = 0x900000
stack_size = 0x100000
mu.mem_map(stack_addr, stack_size)
mu.reg_write(UC_X86_REG_RSP, stack_addr + stack_size - 8 - 0x200)
mu.reg_write(UC_X86_REG_RBP, stack_addr + stack_size - 8)

# Read original binary
with open("singlestep", "rb") as f:
    code = f.read()

# Allocate 0x50000 bytes of virtual memory and map the execution code
code_addr = 0x0
mu.mem_map(code_addr,0x50000)
mu.mem_write(code_addr,code)

# Add hook to Unicorn
# UC_HOOK_CODE invokes the hook immediately before each instruction executes
# hook_code(mu, address, size, user_data)
mu.hook_add(UC_HOOK_CODE, hook_code)

# Run program
while len(call_addrs) > 0:
    try:
        current_call_addr = call_addrs.pop()
        if current_call_addr not in deobfuscated_addrs:
            mu.emu_start(current_call_addr,0x900D) # emu_start(begin,end)
            deobfuscated_addrs.append(current_call_addr)
    except Exception as e:
        print("Error: %s" % e)
        print("at : %s" % hex(mu.reg_read(UC_X86_REG_RIP)))
        break

with open('deobfuscated', 'wb') as f:
    f.write(deobfuscated_code)

Below is a summary of the key points.

Defining Conditions to Clearly Distinguish Deobfuscated Instructions from Others

As confirmed earlier, the binary repeatedly decodes, executes, and re-encodes real execution code in blocks from pushfq -> xor to xor -> popfq.

Therefore, we simply need to capture instructions immediately after a pushfq -> xor -> popfq sequence, excluding pushfq itself.

Library Function Calls Must Be Ignored to Continue Emulation

In this case, Unicorn is used to emulate small units of execution code. When a call instruction is encountered, the function call is skipped during the current trace, and mu.emu_start is used to separately trace it.

# Skip de-deobfuscated instruction
uc.reg_write(UC_X86_REG_RIP, address + size)

if instruction.mnemonic == "call":
    addr = int(instruction.op_str,16)
    if addr < 0x1260:
        print("PLT function called.")
        return
    else:
        call_addrs.append(addr)

The Deobfuscated Code Itself Does Not Need to Be Executed

Since the deobfuscated code does not need to actually run, RIP is overwritten to skip emulation of it.

# Skip de-deobfuscated instruction
uc.reg_write(UC_X86_REG_RIP, address + size)

Obtaining the Flag

Running the deobfuscated binary produced by the above code shows that it behaves the same as the original program.

Analyzing the restored binary shows that unnecessary parts have been replaced with NOPs, leaving only the deobfuscated execution code.

The decompiler is also able to analyze the deobfuscated execution code.

Looking at the code, it first initializes a 4×4 memory region to create the following 2D array:

var_278 = [[ 88, -17,  19, -57],
 [ 45,  -9,  10, -29],
 [-56,  11, -12,  36],
 [-40,   8,  -9,  26]]

It also verifies that the input is 19 characters and matches a hyphen-delimited format like AAA-BBBB-CCCC-DDDD.

Finally, it checks whether the product of the 4×4 array initialized at the beginning and an array generated from the input equals the identity matrix.

The inverse of the initialized array is as follows:

Using the following script to recover the ASCII string from the inverse matrix, we can identify that the password required to obtain the flag is BFCF-EJJL-CKKL-BLJQ.

import numpy as np

secret = ""
array = [[1, 5, 2, 5],
       [4, 8, 7, 8],
       [2, 8, 6, 5],
       [1, 8, 3, 7]]

for i in range(4):
    for j in range(4):
        n = array[i][j]
        secret += chr(n + i*j + ord("A"))

    secret += "-"

print(secret[:-1])

Summary

During the contest I was forcibly extracting the deobfuscated code with gdb, but the Unicorn + Capstone approach was very enlightening.

That said, both Unicorn and Capstone have relatively limited documentation, so I feel like it could take some time to become proficient with them.

For simple execution-time hooking alone, it might also be worth trying Frida.

Published Mar 30, 2025

Aspiring Reverse Engineer and CTF Player (Team: 0nePadding). Passionate about WinDbg and Anti-Virus internals. OSCP / CISSP. Working at Microsoft Japan, but all views expressed are my own.かしわば(@kash1064) on Twitter