All Articles

Idek CTF 2022 Writeup

This page has been machine-translated from the original page.

I participated in Idek CTF 2022, which was held in January 2023.

Despite the contest being held in 2023, the name remained “2022.”

I retired early on the very first problem and ended up with 0 solves, but I decided to write up the challenges I couldn’t finish as a review exercise.

Table of Contents

Polyglot

This challenge provided a small binary named polyglot, and the goal was to extract information about two flags embedded inside it.

The word “polyglot” itself means “a person who speaks multiple languages,” but in programming it more commonly refers to code that can be executed by multiple interpreters or compilers in the same way.

Reference: Polyglot Programming

The file command reported the file as “DOS executable (COM),” so I suspected it might be a polyglot source file compiled as a DOS binary.

$ file polyglot
polyglot: DOS executable (COM)

However, I was unable to run it even with emulators such as DOSBox, and after looking into it I still couldn’t figure out what the file actually was — so I gave up and moved on.

Reading the Binary

Examining the binary’s hex dump produces the following output.

$ xxd polyglot
00000000: eb7e 9090 0002 0010 0370 0091 0200 80d2  .~.......p......
00000010: ff83 00d1 0168 6238 6468 6238 4204 0091  .....hb8dhb8B...
00000020: 2100 044a e16b 2238 5f70 00f1 41ff ff54  !..J.k"8_p..A..T
00000030: 0808 80d2 2000 80d2 e107 0091 8203 80d2  .... ...........
00000040: 0100 00d4 afbc f06b 0482 05a4 56b6 1648  .......k....V..H
00000050: c093 ae51 788f b5b8 4e31 b5ed 9fa5 b3a0  ...Qx...N1......
00000060: c6d8 9500 7fdd 5af3 3ecf 497d f0cc c365  ......Z.>.I}...e
00000070: 16d6 ea8c 3c52 ddd8 c0c9 82cb 4bfd d484  ....<R......K...
00000080: e88b 0100 0066 0f6f 0514 0200 0049 89d2  .....f.o.....I..
00000090: 31c9 4531 c00f 1147 0266 0f6f 0510 0200  1.E1...G.f.o....
000000a0: 000f 1147 1266 0f6f 0514 0200 000f 1147  ...G.f.o.......G
000000b0: 2266 0f6f 0518 0200 000f 1147 3266 0f6f  "f.o.......G2f.o
000000c0: 051c 0200 000f 1147 4266 0f6f 0520 0200  .......GBf.o. ..
000000d0: 000f 1147 5266 0f6f 0524 0200 000f 1147  ...GRf.o.$.....G
000000e0: 6266 0f6f 0528 0200 000f 1147 7266 0f6f  bf.o.(.....Grf.o
000000f0: 052c 0200 000f 1187 8200 0000 660f 6f05  .,..........f.o.
00000100: 2d02 0000 0f11 8792 0000 0066 0f6f 052e  -..........f.o..
00000110: 0200 000f 1187 a200 0000 660f 6f05 2f02  ..........f.o./.
00000120: 0000 0f11 87b2 0000 0066 0f6f 0530 0200  .........f.o.0..
00000130: 000f 1187 c200 0000 660f 6f05 3102 0000  ........f.o.1...
00000140: 0f11 87d2 0000 0066 0f6f 0532 0200 000f  .......f.o.2....
00000150: 1187 e200 0000 660f 6f05 3302 0000 0f11  ......f.o.3.....
00000160: 87f2 0000 0066 c707 0000 4889 c831 d244  .....f....H..1.D
00000170: 0fb6 4c0f 0249 f7f2 0fb6 0416 4401 c841  ..L..I......D..A
00000180: 01c0 410f b6c0 0fb6 5407 0288 540f 0248  ..A.....T...T..H
00000190: 83c1 0144 884c 0702 4881 f900 0100 0075  ...D.L..H......u
000001a0: c9c3 0fb7 0741 5449 89fa 5553 0fb6 dc48  .....ATI..US...H
000001b0: 85d2 7453 4189 c049 89d3 89c5 4883 c702  ..tSA..I....H...
000001c0: 4129 f04c 8d0c 1641 83c0 0141 8d14 300f  A).L...A...A..0.
000001d0: b6d2 4801 fa0f b602 01c3 0fb6 cb48 01f9  ..H..........H..
000001e0: 440f b621 4488 2288 0144 01e0 0fb6 c00f  D..!D."..D......
000001f0: b604 0730 0648 83c6 0149 39f1 75cd 410f  ...0.H...I9.u.A.
00000200: b6c3 4000 e888 dc5b 5d66 4189 0241 5cc3  ..@....[]fA..A\.
00000210: 4881 ec58 0100 0066 0f6f 0582 0100 00ba  H..X...f.o......
00000220: 2000 0000 48b8 80aa 0ab4 418e 7b1b 488d   ...H.....A.{.H.
00000230: 7c24 4048 8d74 2420 0f29 4424 2066 0f6f  |$@H.t$ .)D$ f.o
00000240: 056c 0100 000f 2944 2430 660f 6f05 6f01  .l....)D$0f.o.o.
00000250: 0000 0f29 0424 4889 4424 0fe8 25fe ffff  ...).$H.D$..%...
00000260: 4889 e6ba 1700 0000 e835 ffff ff48 c7c0  H........5...H..
00000270: 0100 0000 48c7 c701 0000 0048 89e6 48c7  ....H......H..H.
00000280: c217 0000 000f 05e8 0000 0000 31c0 4881  ............1.H.
00000290: c458 0100 0048 c7c0 3c00 0000 4831 ff0f  .X...H..<...H1..
000002a0: 0500 0102 0304 0506 0708 090a 0b0c 0d0e  ................
000002b0: 0f10 1112 1314 1516 1718 191a 1b1c 1d1e  ................
000002c0: 1f20 2122 2324 2526 2728 292a 2b2c 2d2e  . !"#$%&'()*+,-.
000002d0: 2f30 3132 3334 3536 3738 393a 3b3c 3d3e  /0123456789:;<=>
000002e0: 3f40 4142 4344 4546 4748 494a 4b4c 4d4e  ?@ABCDEFGHIJKLMN
000002f0: 4f50 5152 5354 5556 5758 595a 5b5c 5d5e  OPQRSTUVWXYZ[\]^
00000300: 5f60 6162 6364 6566 6768 696a 6b6c 6d6e  _`abcdefghijklmn
00000310: 6f70 7172 7374 7576 7778 797a 7b7c 7d7e  opqrstuvwxyz{|}~
00000320: 7f80 8182 8384 8586 8788 898a 8b8c 8d8e  ................
00000330: 8f90 9192 9394 9596 9798 999a 9b9c 9d9e  ................
00000340: 9fa0 a1a2 a3a4 a5a6 a7a8 a9aa abac adae  ................
00000350: afb0 b1b2 b3b4 b5b6 b7b8 b9ba bbbc bdbe  ................
00000360: bfc0 c1c2 c3c4 c5c6 c7c8 c9ca cbcc cdce  ................
00000370: cfd0 d1d2 d3d4 d5d6 d7d8 d9da dbdc ddde  ................
00000380: dfe0 e1e2 e3e4 e5e6 e7e8 e9ea ebec edee  ................
00000390: eff0 f1f2 f3f4 f5f6 f7f8 f9fa fbfc fdfe  ................
000003a0: ff62 5a46 8aaa 47b6 8784 bf1b e6da 0ad7  .bZF..G.........
000003b0: 4081 0e14 6af7 6e2b f119 d52e 33a8 b6d1  @...j.n+....3...
000003c0: 7618 2537 37f5 1470 6359 1d85 0ea5 d9db  v.%77..pcY......

At first glance it is hard to make sense of this, but since file recognized it as a COM program, I tried disassembling it as x86.

The leading bytes appear to be Intel-architecture shellcode that jumps to 0x80 and makes some function calls.

$ objdump -M intel -D -b binary -m i386 polyglot
00000000 <.data>:
   0:   eb 7e                   jmp    0x80
   2:   90                      nop
   3:   90                      nop
{{ (abbreviated) }}
  80:   e8 8b 01 00 00          call   0x210
  85:   66 0f 6f 05 14 02 00    movdqa xmm0,XMMWORD PTR ds:0x21

On the other hand, the output frequently contains (bad) entries.

In objdump, (bad) is recorded when an opcode cannot be decoded correctly.

Since the challenge description stated that “two flags are embedded,” I inferred that the binary also contained a second architecture beyond x86_64.

I wanted to isolate the x86_64 portion, but unfortunately Ghidra couldn’t handle this kind of binary well. (IDA Pro reportedly can, but the free version didn’t support it.)

Analyzing the Shellcode with Binary Ninja

So I turned to Binary Ninja.

After specifying the x8664 architecture and opening the [Linear] view, Binary Ninja neatly decompiled only the x8664 shellcode section.

image-20230226182209138

Emulating the x86_64 Shellcode with Unicorn

As noted in the challenge description, the flag is split into two parts.

Reading Binary Ninja’s decompiled output, it appears that executing the x86_64 shellcode should yield the first flag.

I used the Unicorn emulator to run the shellcode.

I based the solver on the sample Python code from the official documentation.

Reference: Programming with C & Python languages – Unicorn – The Ultimate CPU emulator

Reference: Unicorn – The Ultimate CPU emulator

From the decompilation, the decrypted flag string inside the shellcode is (likely) printed by a write syscall invoked at the syscall at offset 0x0285.

image-20230226201419296

I wrote the following solver that uses Unicorn to run the code up to 0x285 and dump everything on the stack at that point.

from unicorn import *
from unicorn.x86_const import *

# code to be emulated
code = open("polyglot", "rb").read()

# memory address where emulation starts
ADDRESS = 0x0

print("Emulate x86 code")
try:
    mu = Uc(UC_ARCH_X86, UC_MODE_64)

    # map 2MB memory for this emulation
    mu.mem_map(ADDRESS,  0x100000)

    # write machine code to be emulated to memory
    mu.mem_write(ADDRESS, code)

    # initialize machine registers
    mu.reg_write(UC_X86_REG_RSP, 0x0 + 0x100000)
    mu.reg_write(UC_X86_REG_RBP, 0x0 + 0x100000)

    # emulate code in infinite time & unlimited instructions
    mu.emu_start(ADDRESS, ADDRESS + 0x285)

    # now print out some registers
    print("Emulation done. Below is the CPU context")

    rsp = mu.reg_read(UC_X86_REG_RSP)
    rbp = mu.reg_read(UC_X86_REG_RBP)
    print(mu.mem_read(rsp, rbp-rsp).split(b'\x00')[0])

except UcError as e:
    print("ERROR: %s" % e)

Running this outputs bytearray(b'3_X86_N_4rM_1n_0n3_biN}'), giving us the second half of the flag.

Decompiling the ARM64 Shellcode with Capstone

Now let’s recover the first half of the flag.

I tried every architecture Binary Ninja supports, but none of them produced a usable analysis.

image-20230226202240307

According to writeups, IDA Pro supports this, but it’s far too expensive for personal use.

The following video showed me a technique using Capstone to write a custom ARM64 disassembler script.

Reference: Using Unicorn Engine for emulation | Polyglot - IdekCTF 2023 - YouTube

The code itself is very simple — just load the binary with the ARM64 architecture specified.

import capstone

with open("polyglot", "rb") as fp:
    bytecode = fp.read()

engine = capstone.Cs(capstone.CS_ARCH_ARM64, capstone.CS_MODE_ARM)
disasm = engine.disasm(bytecode[4:], 0x10000)

for item in disasm:
    print(f"{item.address:#08x}: {item.mnemonic:8} {item.op_str}")

True to its name “polyglot,” this binary is a polyglot of x86_64 and ARM64.

Running the script produces the following assembly.

0x010000: adr      x0, #0x10040
0x010004: add      x3, x0, #0x1c
0x010008: mov      x2, #0
0x01000c: sub      sp, sp, #0x20
0x010010: ldrb     w1, [x0, x2]
0x010014: ldrb     w4, [x3, x2]
0x010018: add      x2, x2, #1
0x01001c: eor      w1, w1, w4
0x010020: strb     w1, [sp, x2]
0x010024: cmp      x2, #0x1c
0x010028: b.ne     #0x10010
0x01002c: mov      x8, #0x40
0x010030: mov      x0, #1
0x010034: add      x1, sp, #1
0x010038: mov      x2, #0x1c
0x01003c: svc      #0

The code is short enough to solve by hand, but let’s use Unicorn again.

I based the ARM64 emulation script on the following sample.

Reference: unicorn/sample_arm64eb.py at master · unicorn-engine/unicorn

The flow is almost the same as for x86_64, but this time the disassembly tells us the decrypted flag string is stored in the x1 register.

So I ran emulation up to 0x3c and then read 0x1c bytes from x1.

#!/usr/bin/env python
# Sample code for ARM64 of Unicorn. Nguyen Anh Quynh <aquynh@gmail.com>
# Python sample ported by Loi Anh Tuan <loianhtuan@gmail.com>
# AARCH64 Python sample ported by zhangwm <rustydaar@gmail.com>

from __future__ import print_function
from unicorn import *
from unicorn.arm64_const import *


# code to be emulated
code = open("polyglot", "rb").read()

# memory address where emulation starts
ADDRESS = 0x0

# Test ARM64
def arm64():
    print("Emulate ARM64 Big-Endian code")
    try:
        # Initialize emulator in ARM mode
        mu = Uc(UC_ARCH_ARM64, UC_MODE_ARM | UC_MODE_BIG_ENDIAN)

        mu.mem_map(ADDRESS, 0x100000)
        mu.mem_write(ADDRESS, code)

        mu.mem_map(ADDRESS + 0x100000, 0x100000)
        mu.reg_write(UC_ARM64_REG_SP, 0x100000+0x100000)

        # emulate machine code in infinite time
        mu.emu_start(ADDRESS, ADDRESS + 0x3c)

        x1 = mu.reg_read(UC_ARM64_REG_X1)
        print(mu.mem_read(x1, 0x1c).split(b'\x00')[0])

    except UcError as e:
        print("ERROR: %s" % e)

if __name__ == '__main__':
    arm64()

Running this solver outputs bytearray(b'idek{__Why_50_m4nY_4rch5_l1k'), giving us the first half of the flag.

Summary

It feels a little unfair that the difficulty changes so drastically depending on whether you have IDA Pro, but it was an extremely educational challenge.

I want to get more proficient with both Unicorn and Capstone going forward.