0x00 Overview

This vulnerability is located at function RegExpReplace, which can be covered if the replacement string contains '$'. The problem is the RegExp is assumed to be unmodified in this function but Object::ToString can actually modify this RegExp. This can cause the memory shape of the object to be changed, and the set_last_index function will be an out-of-bound write.

Commit hash: d9734801b75a638a22253166e9edaafef86f77ed

0x01 Coverage

Reading the given exploit, it seems that rgx = new RegExp(/AAAAAAAA/y); rgx[Symbol.replace]; is the key to trigger this function. However, if I just run rgx[Symbol.replace]("AAAAAAAA", "BBBB"), the function fails to be triggered if I a breakpoint is set there. This suggests that the RegExp.prototype[Symbol.replace] does not correspond to RegExpReplace directly, but instead, RegExpReplace is one of the subroutines that RegExp.prototype[Symbol.replace] calls.

So we need to find the function that corresponds to RegExp.prototype[Symbol.replace] directly first. My idea is to run the exploit and set a breakpoint at RegExpReplace, then find the directly corresponding function by using backtrace command.

#0 v8::internal::(anonymous namespace)::RegExpReplace at runtime-regexp.cc:1255
#1 v8::internal::__RT_impl_Runtime_RegExpReplaceRT at runtime-regexp.cc:1706
#2 v8::internal::Runtime_RegExpReplaceRT at runtime-regexp.cc:1687
#3 Builtins_CEntry_Return1_DontSaveFPRegs_ArgvOnStack_NoBuiltinExit () from libv8.so
#4 Builtins_RegExpReplace () from libv8.so
#5 Builtins_RegExpPrototypeReplace () from libv8.so

Function Builtins_RegExpPrototypeReplace seems to be the function corresponding RegExp.prototype[Symbol.replace] directly since it is the lowest one in backtrace with its name still related to regular expression replace. We can also verify the guess by setting a breakpoint and call the function with any arguments.

By searching string in src/, regexp-replace.tq seems to be the file that implements this function. The function is written in torque.

In RegExpPrototypeReplace, there is a branch.

if (regexp::BranchIfFastRegExp(context, rx)) {
  return RegExpReplace(UnsafeCast<JSRegExp>(rx), s, replaceValue);
  // note this RegExpReplace is CSA implementation
  // not the vulnerable RegExpReplace function that we want to reach
} else {
  return RegExpReplaceRT(context, rx, s, replaceValue);
}

RegExpReplaceRT corresponds to RUNTIME_FUNCTION(Runtime_RegExpReplaceRT), and in this function RegExpReplace will be called if RegExpUtils::IsUnmodifiedRegExp returns true.

// Fast-path for unmodified JSRegExps (and non-functional replace).
if (RegExpUtils::IsUnmodifiedRegExp(isolate, recv)) {
  // We should never get here with functional replace because unmodified
  // regexp and functional replace should be fully handled in CSA code.
  CHECK(!functional_replace);
  RETURN_RESULT_OR_FAILURE(
      isolate, RegExpReplace(isolate, Handle<JSRegExp>::cast(recv), string,
                             replace_obj));
}

Therefore, if we let regexp::BranchIfFastRegExp(context, rx) return false, and RegExpUtils::IsUnmodifiedRegExp(isolate, recv) return true, RegExpReplace can be called.

However, the problem is, there is no such way. After reading source codes of these 2 functions, I found that they are actually checking the same thing: if the RegExp instance is unmodified (by checking Map pointers), and also there is no function in between that allows us to executed arbitrary JavaScript code to modify things, so we can never produce such situation.

The key point is in CSA implementation of RegExpReplace at builtins-regexp-gen.cc:2973. It contains following codes.

// 3. Does ToString({replace_value}) contain '$'?
BIND(&checkreplacestring);
{
  TNode<String> const replace_string =
      ToString_Inline(context, replace_value);

  // ToString(replaceValue) could potentially change the shape of the RegExp
  // object. Recheck that we are still on the fast path and bail to runtime
  // otherwise.
  {
    Label next(this);
    BranchIfFastRegExp(context, regexp, &next, &runtime);
    BIND(&next);
  }

  TNode<String> const dollar_string = HeapConstant(
      isolate()->factory()->LookupSingleCharacterStringFromCode('$'));
  TNode<Smi> const dollar_ix =
      CAST(CallBuiltin(Builtins::kStringIndexOf, context, replace_string,
                       dollar_string, SmiZero()));
  GotoIfNot(SmiEqual(dollar_ix, SmiConstant(-1)), &runtime);

  Return(
      ReplaceSimpleStringFastPath(context, regexp, string, replace_string));
}
// ......
BIND(&runtime);
Return(CallRuntime(Runtime::kRegExpReplaceRT, context, regexp, string,
                   replace_value)); // call RegExpReplaceRT function!

Therefore, as the codes above suggests, RegExpReplaceRT can still be called even if CSA implementation of RegExpReplace is called, as long as the replacement string contains '$', although I don’t know why it works in this way. This explains the return 'BBBB$' in the exploit.

Thus, the way to trigger RegExpReplace is now clear: /aa/[Symbol.replace]('aaaaaa', 'BB$').

0x02 Vulnerability

In the issue page, it is suggested:

RegExpReplace expects the incoming regexp object to be an unmodified regexp, but there is a call to Object::ToString that can change the type of the regexp. This leads to OOB reads and writes to regexp.lastIndex.

The problem is, why does a modified regexp cause an OOB? In the exploit, what is done to rgx is modifying rgx.lastIndex to an object, whose valueOf function calls to_dict(rgx) defined as below.

function to_dict(obj){
	obj.__defineGetter__('x',()=>2);
	obj.__defineGetter__('x',()=>2);
}

So how this will affect the structure of rgx? Let’s look at memory layout of JSRegExp first.

gef➤  job 0x5c3edbcdbf1
0x5c3edbcdbf1: [JSRegExp]
 ...
 - properties: 0x0915789c0c71 <FixedArray[0]> {
    #lastIndex: 0 (data field 0)
 }
gef➤  x/8gx 0x5c3edbcdbf1-1
0x5c3edbcdbf0:	0x000008d524e01359	0x00000915789c0c71
0x5c3edbcdc00:	0x00000915789c0c71	0x000005c3edbcf479
0x5c3edbcdc10:	0x000038ef4cedf099	0x0000000000000000
0x5c3edbcdc20:	0x0000000000000000 <---- lastIndex	0x00000915789c08a1
gef➤  job 0x000008d524e01359
0x8d524e01359: [Map]
 - type: JS_REGEXP_TYPE
 - instance size: 56 <---- size = 7 * 8 = 56
 - inobject properties: 1 <---- field lastIndex
 ...

After to_dict is called, the memory layout of rgx changes.

gef➤  job 0x5c3edbcdbf1
0x5c3edbcdbf1: [JSRegExp]
 ...
 - properties: 0x05c3edbd1df9 <NameDictionary[29]> {
   #x: 0x38ef4cee1e51 <AccessorPair> (accessor, dict_index: 2, attrs: [WEC])
   #lastIndex: 0 (data, dict_index: 1, attrs: [W__]) <---- lastIndex now stored in properties
 }
gef➤  x/8gx 0x5c3edbcdbf1-1
0x5c3edbcdbf0:	0x000008d524e0aa49	0x000005c3edbd1df9
0x5c3edbcdc00:	0x00000915789c0c71	0x000005c3edbcf479
0x5c3edbcdc10:	0x000038ef4cedf099	0x0000000000000000
0x5c3edbcdc20:	0x00000915789c0321 <---- becomes FILLER_TYPE map	0x00000915789c08a1
gef➤  job 0x000008d524e0aa49
0x8d524e0aa49: [Map]
 - type: JS_REGEXP_TYPE
 - instance size: 48 <---- size shrinks to 6 * 8 = 48 !
 - inobject properties: 0 <---- becomes 0
 ...
gef➤  job 0x00000915789c0321
0x915789c0321: [Map]
 - type: FILLER_TYPE
 - instance size: 8
 ...

As it illustrates, the lastIndex will be migrated into properties, and the size of rgx will shrink. Offsets of other fields remain unchanged. However, in RegExpReplace function, it will use set_last_index function.

if (match_indices_obj->IsNull(isolate)) {
  if (sticky) regexp->set_last_index(Smi::kZero, SKIP_WRITE_BARRIER);
  return string;
}
// ...
if (sticky) { 
  // to trigger this we need the rgx to be sticky, 
  // which explains the `y` flag in /AAAAAAAA/y in the exploit
  regexp->set_last_index(Smi::FromInt(end_index), SKIP_WRITE_BARRIER);
}

This function is a low-level function, which will access the offset of last_index directly without any check! In other word, this operation can cause OOB write that writes to map pointer of the next object in the heap.

This should also be the only 2 pieces of codes that access the last_index. Other subroutines such as RegExpImpl::Exec will only access unchanged fields such as data, which will not trigger the vulnerability.

0x03 Exploitation

Actually I don’t know how this can be exploited: we can only write map of next object to a Smi, which almost always causes segmentation fault. Even if the garbage collection is triggered just as the exploit does, the value that can be overwritten is still a map pointer. Indeed as I expected the exploit does not work and crashes when accessing 0x80000000b, because end_index == 8 and map pointer is replaced by 0x800000000.

TODO: investigate more in the future.