Chromium Issue 944971 Notes
0x00 Overview
This vulnerability is located at function RegExpReplace
, which can be covered if the replacement string contains '$'
. The problem is the RegExp
is assumed to be unmodified in this function but Object::ToString
can actually modify this RegExp
. This can cause the memory shape of the object to be changed, and the set_last_index
function will be an out-of-bound write.
Commit hash: d9734801b75a638a22253166e9edaafef86f77ed
0x01 Coverage
Reading the given exploit, it seems that rgx = new RegExp(/AAAAAAAA/y); rgx[Symbol.replace];
is the key to trigger this function. However, if I just run rgx[Symbol.replace]("AAAAAAAA", "BBBB")
, the function fails to be triggered if I a breakpoint is set there. This suggests that the RegExp.prototype[Symbol.replace]
does not correspond to RegExpReplace
directly, but instead, RegExpReplace
is one of the subroutines that RegExp.prototype[Symbol.replace]
calls.
So we need to find the function that corresponds to RegExp.prototype[Symbol.replace]
directly first. My idea is to run the exploit and set a breakpoint at RegExpReplace
, then find the directly corresponding function by using backtrace
command.
#0 v8::internal::(anonymous namespace)::RegExpReplace at runtime-regexp.cc:1255
#1 v8::internal::__RT_impl_Runtime_RegExpReplaceRT at runtime-regexp.cc:1706
#2 v8::internal::Runtime_RegExpReplaceRT at runtime-regexp.cc:1687
#3 Builtins_CEntry_Return1_DontSaveFPRegs_ArgvOnStack_NoBuiltinExit () from libv8.so
#4 Builtins_RegExpReplace () from libv8.so
#5 Builtins_RegExpPrototypeReplace () from libv8.so
Function Builtins_RegExpPrototypeReplace
seems to be the function corresponding RegExp.prototype[Symbol.replace]
directly since it is the lowest one in backtrace
with its name still related to regular expression replace. We can also verify the guess by setting a breakpoint and call the function with any arguments.
By searching string in src/
, regexp-replace.tq
seems to be the file that implements this function. The function is written in torque.
In RegExpPrototypeReplace
, there is a branch.
if (regexp::BranchIfFastRegExp(context, rx)) {
return RegExpReplace(UnsafeCast<JSRegExp>(rx), s, replaceValue);
// note this RegExpReplace is CSA implementation
// not the vulnerable RegExpReplace function that we want to reach
} else {
return RegExpReplaceRT(context, rx, s, replaceValue);
}
RegExpReplaceRT
corresponds to RUNTIME_FUNCTION(Runtime_RegExpReplaceRT)
, and in this function RegExpReplace
will be called if RegExpUtils::IsUnmodifiedRegExp
returns true
.
// Fast-path for unmodified JSRegExps (and non-functional replace).
if (RegExpUtils::IsUnmodifiedRegExp(isolate, recv)) {
// We should never get here with functional replace because unmodified
// regexp and functional replace should be fully handled in CSA code.
CHECK(!functional_replace);
RETURN_RESULT_OR_FAILURE(
isolate, RegExpReplace(isolate, Handle<JSRegExp>::cast(recv), string,
replace_obj));
}
Therefore, if we let regexp::BranchIfFastRegExp(context, rx)
return false
, and RegExpUtils::IsUnmodifiedRegExp(isolate, recv)
return true
, RegExpReplace
can be called.
However, the problem is, there is no such way. After reading source codes of these 2 functions, I found that they are actually checking the same thing: if the RegExp
instance is unmodified (by checking Map
pointers), and also there is no function in between that allows us to executed arbitrary JavaScript code to modify things, so we can never produce such situation.
The key point is in CSA implementation of RegExpReplace
at builtins-regexp-gen.cc:2973
. It contains following codes.
// 3. Does ToString({replace_value}) contain '$'?
BIND(&checkreplacestring);
{
TNode<String> const replace_string =
ToString_Inline(context, replace_value);
// ToString(replaceValue) could potentially change the shape of the RegExp
// object. Recheck that we are still on the fast path and bail to runtime
// otherwise.
{
Label next(this);
BranchIfFastRegExp(context, regexp, &next, &runtime);
BIND(&next);
}
TNode<String> const dollar_string = HeapConstant(
isolate()->factory()->LookupSingleCharacterStringFromCode('$'));
TNode<Smi> const dollar_ix =
CAST(CallBuiltin(Builtins::kStringIndexOf, context, replace_string,
dollar_string, SmiZero()));
GotoIfNot(SmiEqual(dollar_ix, SmiConstant(-1)), &runtime);
Return(
ReplaceSimpleStringFastPath(context, regexp, string, replace_string));
}
// ......
BIND(&runtime);
Return(CallRuntime(Runtime::kRegExpReplaceRT, context, regexp, string,
replace_value)); // call RegExpReplaceRT function!
Therefore, as the codes above suggests, RegExpReplaceRT
can still be called even if CSA implementation of RegExpReplace
is called, as long as the replacement string contains '$'
, although I don’t know why it works in this way. This explains the return 'BBBB$'
in the exploit.
Thus, the way to trigger RegExpReplace
is now clear: /aa/[Symbol.replace]('aaaaaa', 'BB$')
.
0x02 Vulnerability
In the issue page, it is suggested:
RegExpReplace expects the incoming regexp object to be an unmodified regexp, but there is a call to Object::ToString that can change the type of the regexp. This leads to OOB reads and writes to regexp.lastIndex.
The problem is, why does a modified regexp
cause an OOB? In the exploit, what is done to rgx
is modifying rgx.lastIndex
to an object, whose valueOf
function calls to_dict(rgx)
defined as below.
function to_dict(obj){
obj.__defineGetter__('x',()=>2);
obj.__defineGetter__('x',()=>2);
}
So how this will affect the structure of rgx
? Let’s look at memory layout of JSRegExp
first.
gef➤ job 0x5c3edbcdbf1
0x5c3edbcdbf1: [JSRegExp]
...
- properties: 0x0915789c0c71 <FixedArray[0]> {
#lastIndex: 0 (data field 0)
}
gef➤ x/8gx 0x5c3edbcdbf1-1
0x5c3edbcdbf0: 0x000008d524e01359 0x00000915789c0c71
0x5c3edbcdc00: 0x00000915789c0c71 0x000005c3edbcf479
0x5c3edbcdc10: 0x000038ef4cedf099 0x0000000000000000
0x5c3edbcdc20: 0x0000000000000000 <---- lastIndex 0x00000915789c08a1
gef➤ job 0x000008d524e01359
0x8d524e01359: [Map]
- type: JS_REGEXP_TYPE
- instance size: 56 <---- size = 7 * 8 = 56
- inobject properties: 1 <---- field lastIndex
...
After to_dict
is called, the memory layout of rgx
changes.
gef➤ job 0x5c3edbcdbf1
0x5c3edbcdbf1: [JSRegExp]
...
- properties: 0x05c3edbd1df9 <NameDictionary[29]> {
#x: 0x38ef4cee1e51 <AccessorPair> (accessor, dict_index: 2, attrs: [WEC])
#lastIndex: 0 (data, dict_index: 1, attrs: [W__]) <---- lastIndex now stored in properties
}
gef➤ x/8gx 0x5c3edbcdbf1-1
0x5c3edbcdbf0: 0x000008d524e0aa49 0x000005c3edbd1df9
0x5c3edbcdc00: 0x00000915789c0c71 0x000005c3edbcf479
0x5c3edbcdc10: 0x000038ef4cedf099 0x0000000000000000
0x5c3edbcdc20: 0x00000915789c0321 <---- becomes FILLER_TYPE map 0x00000915789c08a1
gef➤ job 0x000008d524e0aa49
0x8d524e0aa49: [Map]
- type: JS_REGEXP_TYPE
- instance size: 48 <---- size shrinks to 6 * 8 = 48 !
- inobject properties: 0 <---- becomes 0
...
gef➤ job 0x00000915789c0321
0x915789c0321: [Map]
- type: FILLER_TYPE
- instance size: 8
...
As it illustrates, the lastIndex
will be migrated into properties
, and the size of rgx
will shrink. Offsets of other fields remain unchanged. However, in RegExpReplace
function, it will use set_last_index
function.
if (match_indices_obj->IsNull(isolate)) {
if (sticky) regexp->set_last_index(Smi::kZero, SKIP_WRITE_BARRIER);
return string;
}
// ...
if (sticky) {
// to trigger this we need the rgx to be sticky,
// which explains the `y` flag in /AAAAAAAA/y in the exploit
regexp->set_last_index(Smi::FromInt(end_index), SKIP_WRITE_BARRIER);
}
This function is a low-level function, which will access the offset of last_index
directly without any check! In other word, this operation can cause OOB write that writes to map
pointer of the next object in the heap.
This should also be the only 2 pieces of codes that access the last_index
. Other subroutines such as RegExpImpl::Exec
will only access unchanged fields such as data
, which will not trigger the vulnerability.
0x03 Exploitation
Actually I don’t know how this can be exploited: we can only write map
of next object to a Smi
, which almost always causes segmentation fault. Even if the garbage collection is triggered just as the exploit does, the value that can be overwritten is still a map
pointer. Indeed as I expected the exploit does not work and crashes when accessing 0x80000000b
, because end_index == 8
and map
pointer is replaced by 0x800000000
.
TODO: investigate more in the future.