Andy Wingo: the last 5 years of V8's garbage collector

December 7, 2023 | by Arround The Web | No comments

Andy Wingo: the last 5 years of V8’s garbage collector

Captain, status report: I’m down here in a jeffries tube, poking at V8’s
garbage collector. However, despite working on other areas of the project
recently, V8 is now so large that it’s necessary to ignore whole subsystems when working on any given task. But how I’m looking at the GC in anger: what is its deal? What does V8’s GC even look like these days?

The last public article on the structure of V8’s garbage
collector was in 2019; fine enough, but
dated. Now in the evening of 2023 I think it could be useful to revisit
it and try to summarize the changes since then. At least, it would have
been useful to me had someone else written this article.

To my mind, work on V8’s GC has had three main goals over the last 5
years: improving interactions between the managed heap and C++,
improving security, and increasing concurrency. Let’s visit these in
turn.

C++ and GC

Building on the 2018 integration of the Oilpan tracing garbage
collector into the Blink web
engine,
there was some refactoring to move the implementation of Oilpan into V8
itself.
Oilpan is known internally as cppgc.

I find the cppgc name a bit annoying because I can never remember what
it refers to, because of the other thing that has been happpening in C++
integration: a migration away from precise
roots and
instead towards conservative
root-finding.

Some notes here: with conservative stack scanning, we can hope for
better mutator throughput and fewer bugs. The throughput comes from not
having to put all live pointers in memory; the compiler can keep them in
registers, and avoid managing the HandleScope. You may be able to avoid
the compile-time and space costs of stack maps (side tables telling the
collector where the pointers
are). There
are also two classes of bug that we can avoid: holding on to a handle
past the lifetime of a handlescope, and holding on to a raw pointer
(instead of a handle) during a potential GC point.

Somewhat confusingly, it would seem that conservative stack scanning has
garnered the acronym “CSS” inside V8. What does CSS have to do with
GC?, I ask. I know the answer but my brain keeps asking the question.

In exchange for this goodness, conservative stack scanning means that
because you can’t be sure that a word on the stack refers to an object
and isn’t just a spicy integer, you can’t move objects that might be the
target of a conservative root. And indeed the conservative edge might
actually not point to the start of the object; it could be an interior
pointer, which places additional constraints on the heap, that it be
able to resolve internal pointers.

Security

Which brings us to security and the admirable nihilism of the sandbox
effort.
The idea is that everything is terrible, so why not just assume that no
word is safe and that an attacker can modify any word they can address. The only way to limit the scope of an attacker’s modifications is then to
limit the address space. This happens firstly by pointer
compression, which happily
also has some delightful speed and throughput benefits. Then the
pointer cage is placed within a larger cage, and off-heap data such as
Wasm memories and array buffers go in that larger cage. Any needed
executable
code
or external
object
is accessed indirectly, through dedicated tables.

However, this indirection comes with a cost of a proliferation in the
number of spaces. In the
beginning,
there was just an evacuating newspace, a mark-compact oldspace, and a
non-moving large object space. Now there are closer to 20 spaces: a
separate code space, a space for read-only objects, a space for
trusted objects, a space for each kind of indirect descriptor used by
the sandbox, in addition to spaces for objects that might be shared
between threads, newspaces for many of the preceding kinds, and so on.
From what I can see, managing this complexity has taken a significant
effort. The result is pretty good to work with, but you pay for what
you get. (Do you get security guarantees? I don’t know enough to say.
Better pay some more to be sure.)

Finally, the C++ integration has also had an impact on the spaces
structure, and with a security implication to boot. The thing is,
conservative roots can’t be moved, but the original evacuating newspace
required moveability. One can get around this restriction by
pretenuring new allocations from C++ into the mark-compact space, but
this would be a performance killer. The solution that V8 is going for
is to use the block-structured mark-compact space that is already used for the old-space, but for new
allocations. If an object is ever traced during a young-generation
collection, its page will be promoted to the old generation, without
copying. Originally called minor mark-compact or MinorMC in the
commit logs, it was renamed to minor mark-sweep or MinorMS to
indicate that it doesn’t actually compact. (V8’s mark-compact old-space
doesn’t have to compact: V8 usually chooses to just mark in place.
But we call it a mark-compact space because it has that capability.)

This last change is a performance hazard: yes, you keep the desirable
bump-pointer allocation scheme for new allocations, but you lose on
locality in the old generation, and the rate of promoted bytes will be
higher than with the semi-space new-space. The only relief is that for
a given new-space size, you can allocate twice as many objects, because
you don’t need the copy reserve.

Why do I include this discussion in the security section? Well, because
most MinorMS commits mention this locked
bug. One day
we’ll know, but not today. I speculate that evacuating is just too rich
a bug farm, especially with concurrency and parallel mutators, and that
never-moving collectors will have better security properties. But
again, I don’t know for sure, and I prefer to preserve my ability to
speculate rather than to ask for too many details.

Concurrency

Speaking of concurrency, ye gods, the last few years have been quite the
ride I think. Every phase that can be done in parallel (multiple
threads working together to perform GC work) is now fully parallel:
semi-space evacuation, mark-space marking and compaction, and sweeping.
Every phase that can be done concurrently (where the GC runs threads
while the mutator is running) is concurrent: marking and sweeping. A
major sweep task can run concurrently with an evacuating minor GC. And,
V8 is preparing for multiple mutators running in parallel. It’s all a
bit terrifying but again, with engineering investment and a huge farm of
fuzzers, it seems to be a doable transition.

Concurrency and threads means that V8 has sprouted new schedulers:
should a background task have incremental or concurrent marking? How
many sweepers should a given isolate have? How should you pause
concurrency when the engine needs to do something gnarly?

The latest in-progress work would appear to be concurrent marking of
the new-space.
I think we should expect this work to result in a lower overall
pause-time, though I am curious also to learn more about the model: how
precise is it? Does it allow a lot of slop to get promoted? It seems
to have a black allocator, so there will be some slop, but perhaps it
can avoid promotion for those pages. I don’t know yet.

Summary

Yeah, GCs, man. I find the move to a non-moving young generation is
quite interesting and I wish the team luck as they whittle down the last
sharp edges from the conservative-stack-scanning performance profile.
The sandbox is pretty fun too. All good stuff and I look forward to
spending a bit more time with it; engineering out.

Source: Planet GNU