Google CTF 2022 d8: From V8 Bytecode to Code Execution

This weekend I have played Google CTF with r3kapig. On the first day I tried the OCR challenge but failed to solve it, and on the second day I spent the whole day working on the d8 that I am more familiar with. Finally I managed to solve it at midnight as the second blood. This challenge is quite interesting so it is worth to do a write-up.

0x00 Overview

In this challenge, we need to exploit a runner.cc that takes binary input and passes it to v8::ScriptCompiler::CachedData, which is to be executed. After some investigation, we found that we can use such primitive to execute arbitrary V8 bytecode. It turns out that V8 bytecode execution has many out-of-bound primitives that can be exploited because they are deemed as trusted input by V8. The final solution utilizes an out-of-bound read in CreateArrayLiteral to fetch a faked ArrayBoilerplateDescription, leading to an object faking primitive and thus code execution with regular exploitation technique.

0x01 V8 Code Caching

More information about this can be found here and here. To make it simple, it is a technique that allows V8 to avoid having to parse a same script for many times. When a JavaScript script is compiled, a cache that stores the compilation result can be generated, and when such script is encountered again, such cache can be used instead of re-compiling the same script again.

Generating Code Cache

According to the documentation in the link above, we need to use v8::ScriptCompiler::kProduceCodeCache or v8::ScriptCompiler::GetCodeCache to generate cache for a script being compiled. However, non of these is found in the given V8 version. After checking test-api.cc, we found that we need to use v8::ScriptCompiler::CreateCodeCache to generate the code cache for a script that was just compiled and executed. Yes, it turns out the documentation makes the mistake. Note that we must call CreateCodeCache after script->Run(context), otherwise some lazily compiled functions are not cached.

In addition, v8::V8::SetFlagsFromCommandLine can be used to allow the script to use native syntaxes such as %DebugPrint. This can make debugging much more convenient. Note that the %DebugPrint is compiled into code cache in the V8 bytecode, so when such cache is executed in the runner.cc, the %DebugPrint output can still be shown.

Another thing to note is that when runner.cc loads the cache, an empty script string is also provided. The cache loader would check if the hash inside the binary cache is identical to the hash of the script, if not the cache will be rejected. After some debugging, we found that the hash of empty script is 0 and the hash of binary cache is the 4 bytes at offset +8. Therefore to allow the cache to be executed such field is set to 0.

Also, the cache generated by debug/release version can not be shared among each other, otherwise the cache will be rejected. In debug version, a flag FLAG_verify_snapshot_checksum is set to perform some additional checksum checking, to disable this, this flag is manually set to false at function SerializedCodeData::SanityCheckWithoutSource.

At this point we can generate the cache for our JavaScript code that can be run by runner.cc. The full code about this is here. We can use ./gen exp.js --allow-natives-syntax --print-bytecode to compile the JavaScript into cache and store the binary cache into ./blob.bin. We use --print-bytecode to see V8 bytecode being generated, and such bytecode can also be found in generated cache.

0x02 Exploiting V8 Bytecode

Initially we are thinking if the raw machine code generated by JIT is also stored into cache, if so we can directly execute the shellcode by modifying them in the cache. However, after some trials, it turns out that thing cannot be easy like this. Therefore, it seems that we should use V8 bytecode to achieve the exploit.

By arbitrarily modifying the bytecode, the V8 easily crashes. I have come up with some exploitation ideas but only the last one works for me finally.

Use bytecode to leak the hole into JavaScript, which is an exploitation primitive that has been previously used. However, it seems that this primitive is already mitigated in current version according to my friend sakura, so I have not spent much time on it, although it can potentially work.
When V8 bytecode accesses the argument register, there is an index value byte in the instruction byte sequence. By modifying such byte, OOB can be caused. However, after further investigation, the OOB occurs on stack, and the data behind is not easily controllable. In addition, the argument array is not stored in compressed pointer form but 64-bit pointers, so it is hard to exploit by simply writing Smi numbers to stack. Therefore, this idea does not work.
We found that CreateArrayLiteral also has an index value that is used to access a FixedArray in OldSpace. The value being fetched is a pointer to ArrayBoilerplateDescription, which describes how array is initialized. By controlling the content after the FixedArray, we can fake such ArrayBoilerplateDescription instance, so we can also obtain an object faking primitive. After then the exploitation is regular.

CreateArrayLiteral

This is a bytecode that is used to create JavaScript array. Let’s write some test code to see how it works.

function foo()
{
	const o = [[], 1.1, 0x123];
	return o[0];
}

foo();
readline();
// ./d8 test.js --print-bytecode

Here is the bytecode generated for this foo function:

79 00 00 04       CreateArrayLiteral [0], [0], #4
c4                Star0
0c                LdaZero
2f fa 01          GetKeyedProperty r0, [1]
a9                Return

It turns out that an instruction CreateArrayLiteral [0], [0], #4 is generated for the array creation. The question is how does interpreter know what elements to put into the array? The answer lies at [0] of CreateArrayLiteral. Such value is used as index to access an FixedArray, which is printed below the bytecode.

0x1b1f00253b31: [FixedArray] in OldSpace
 - map: 0x1b1f00002239 <Map>
 - length: 1
           0: 0x1b1f00253b25 <ArrayBoilerplateDescription PACKED_ELEMENTS, 0x1b1f00253af9 <FixedArray[3]>>

The index 0 accesses an ArrayBoilerplateDescription instance. This is an instance used to describe how the array elements are initialized. Let see what information it contains.

pwndbg> job 0x1b1f00253b25
0x1b1f00253b25: [ArrayBoilerplateDescription] in OldSpace
 - map: 0x1b1f000033f5 <Map[12]>
 - elements kind: PACKED_ELEMENTS
 - constant elements: 0x1b1f00253af9 <FixedArray[3]>
pwndbg> job 0x1b1f00253af9
0x1b1f00253af9: [FixedArray] in OldSpace
 - map: 0x1b1f00002239 <Map>
 - length: 3
           0: 0x1b1f00253b0d <ArrayBoilerplateDescription PACKED_SMI_ELEMENTS, 0x1b1f00002261 <FixedArray[0]>>
           1: 0x1b1f00253b19 <HeapNumber 1.1>
           2: 291
pwndbg> p/x 291
$1 = 0x123

The ArrayBoilerplateDescription instance contains a constant elements field which points to another FixedArray. Such FixedArray contains the elements to be initialized for the newly created array. One interesting point to note is that the first element is another ArrayBoilerplateDescription pointer instead of a JSArray pointer, and this makes sense: each time when we create an array, we want a new JSArray instance to be created instead of using the reference to the old JSArray.

An important question to ask is that if we can fake such ArrayBoilerplateDescription instance used for CreateArrayLiteral, can we have object faking primitive? After manually modifying pointers inside the constant elements to an existing JavaScript object pointer, the answer turns out to be yes. Therefore, the next question is how to fake the ArrayBoilerplateDescription via OOB read of CreateArrayLiteral instruction.

Controlling Memory after FixedArray

Since the FixedArray is in the OldSpace, it is quite intuitive to try to create an Array with double elements and calls garbage collection to put it into OldSpace, and to see if the array elements can locate at memory after the FixedArray(e.i. the OOB read victim). However, it seems that the element content is too far away from the FixedArray, so this approach does not work.

Then we found that the content inside constant elements is very close to FixedArray, but it is before instead of after the FixedArray. However, if we create another function that also contains a CreateArrayLiteral instruction, its constant elements can reside after the target victim FixedArray, as long as this function declaration is located after the victim function. In addition, if the array created contains only double elements, its constant elements is a FixedDoubleArray, which means unboxed double is stored in the memory, so we can fully control the memory content after the victim FixedArray!

The specific OOB index is found by some debugging and trials. Initially we set the unboxed double to As, and then we inspect the memory after the target victim FixedArray in the gen.cc process (debug version). Note that we cannot do so in runner.cc because we cannot print the bytecode there. Nonetheless, fortunately the memory layout is very similar among them. With this we calculate an index value and set it as index used by instruction CreateArrayLiteral, and then we can run the challenge binary with the modified cache. If a crash occurs with address 0x????41414141, the index is valid for OOB access.

Final Exploit

At this point the exploitation steps should be clear:

Prepare a large double array, and the low 32 bits of its element address are fixed (V8 pointer compression). Such array is used to prepare the faked instances such as ArrayBoilerplateDescription, FixedArray, Uint32Array, etc.
Spray the low 32 bits of the address of the elements of the large array as FixedDoubleArray after the FixedArray used as OOB read victim. At the large array, the ArrayBoilerplateDescription should be faked (Note that in pointer compression mode low 32 bits of pointers to built-in instances such as Map are fixed).
Call the victim function, whose CreateArrayLiteral instruction should be modified beforehand to cause the OOB read, and we return the element as the faked object.
As long as we have the faked object, the exploitation is very regular, which I will not discuss here.

The full exploit is here. We firstly need to compile this exploit to a cache binary, and then use the Python script to locate and modify the CreateArrayLiteral instruction, and then use the modified cache as the final exploit to be used for the challenge binary.