Andy Wingo: ephemeral success

December 15, 2022 | by Arround The Web | No comments

Andy Wingo: ephemeral success

Good evening, patient hackers 🙂 Today finishes off my series on
implementing ephemerons in a garbage
collector.

Last time, we had a working solution for ephemerons, but it involved
recursively visiting any pending ephemerons from within the copy
routine—the bit of a semi-space collector that is called when
traversing the object graph and we see an object that we hadn't seen
yet. This recursive visit could itself recurse, and so we could
overflow the control stack.

The solution, of course, is "don't do that": instead of visiting
recursively, enqueue the ephemeron for visiting later. Iterate, don't
recurse. But here we run into a funny problem: how do we add an
ephemeron to a queue or worklist? It's such a pedestrian question
("just... enqueue it?") but I think it illustrates some of the
particular concerns of garbage collection hacking.

speak, memory

The issue is that we are in the land of "can't use my tools because I
broke my tools with my tools". You can't make a standard List<T>
because you can't allocate list nodes inside the tracing routine: if you
had memory in which you could allocate, you wouldn't be calling the
garbage collector.

If the collector needs a data structure whose size doesn't depend on the
connectivity of the object graph, you can pre-allocate it in a reserved
part of the heap. This adds memory overhead, of course; for a 1000 MB
heap, say, you used to be able to make graphs 500 MB in size (for a
semi-space collector), but now you can only do 475 MB because you have
to reserve 50 MB (say) for your data structures. Another way to look at
it is, if you have a 400 MB live set and then you allocate 2GB of
garbage, if your heap limit is 500 MB you will collect 20 times, but if
it's 475 MB you'll collect 26 times, which is more expensive. This is
part of why GC algorithms are so primitive; implementors have to
be stingy that we don't get to have nice things / data structures.

However in the case of ephemerons, we will potentially need one worklist
entry per ephemeron in the object graph. There is no optimal fixed size
for a worklist of ephemerons. Most object graphs will have no or few
ephemerons. Some, though, will have practically the whole heap.

For data structure needs like this, the standard solution is to reserve
the needed space for a GC-managed data structure in the object itself. For
example, for concurrent copying collectors, the GC might reserve a word
in the object for a forwarding pointer, instead of just clobbering the
first word. If you needed a GC-managed binary tree for a specific kind
of object, you'd reserve two words. Again there are strong pressures to
minimize this overhead, but in the case of ephemerons it seems sensible
to make them pay their way on a per-ephemeron basis.

so let's retake the thing

So sometimes we might need to put an ephemeron in a worklist. Let's add
a member to the ephemeron structure:

struct gc_ephemeron {
  struct gc_obj header;
  int dead;
  struct gc_obj *key;
  struct gc_obj *value;
  struct gc_ephemeron *gc_link; // *
};

Incidentally this also solves the problem of how to represent the
struct gc_pending_ephemeron_table; just reserve 0.5% of the heap or so
as a bucket array for a buckets-and-chains hash table, and use the
gc_link as the intrachain links.

struct gc_pending_ephemeron_table {
  struct gc_ephemeron *resolved;
  size_t nbuckets;
  struct gc_ephemeron buckets[0];
};

An ephemeron can end up in three states, then:

Outside a collection: gc_link can be whatever.
In a collection, the ephemeron is in the pending ephemeron table: gc_link is part of a hash table.
In a collection, the ephemeron's key has been visited, and the ephemeron is on the to-visit worklist; gc_link is part of the resolved singly-linked list.

Instead of phrasing the interface to ephemerons in terms of visiting
edges in the graph, the verb is to resolve ephemerons. Resolving an
ephemeron adds it to a worklist instead of immediately visiting any
edge.

struct gc_ephemeron **
pending_ephemeron_bucket(struct gc_pending_ephemeron_table *table,
                         struct gc_obj *key) {
  return &table->buckets[hash_pointer(obj) % table->nbuckets];
}

void add_pending_ephemeron(struct gc_pending_ephemeron_table *table,
                           struct gc_obj *key,
                           struct gc_ephemeron *ephemeron) {
  struct gc_ephemeron **bucket = pending_ephemeron_bucket(table, key);
  ephemeron->gc_link = *bucket;
  *bucket = ephemeron;
}

void resolve_pending_ephemerons(struct gc_pending_ephemeron_table *table,
                                struct gc_obj *obj) {
  struct gc_ephemeron **link = pending_ephemeron_bucket(table, obj);
  struct gc_ephemeron *ephemeron;
  while ((ephemeron = *link)) {
    if (ephemeron->key == obj) {
      *link = ephemeron->gc_link;
      add_resolved_ephemeron(table, ephemeron);
    } else {
      link = &ephemeron->gc_link;
    }
  }
}

Copying an object may add it to the set of pending ephemerons, if it is
itself an ephemeron, and also may resolve other pending ephemerons.

void resolve_ephemerons(struct gc_heap *heap, struct gc_obj *obj) {
  resolve_pending_ephemerons(heap->pending_ephemerons, obj);

  struct gc_ephemeron *ephemeron;
  if ((ephemeron = as_ephemeron(forwarded(obj)))
      && !ephemeron->dead) {
    if (is_forwarded(ephemeron->key))
      add_resolved_ephemeron(heap->pending_ephemerons,
                             ephemeron);
    else
      add_pending_ephemeron(heap->pending_ephemerons,
                            ephemeron->key, ephemeron);
  }
}

struct gc_obj* copy(struct gc_heap *heap, struct gc_obj *obj) {
  ...
  resolve_ephemerons(heap, obj); // *
  return new_obj;
}

Finally, we need to add something to the core collector to scan resolved
ephemerons:

int trace_some_ephemerons(struct gc_heap *heap) {
  struct gc_ephemeron *resolved = heap->pending_ephemerons->resolved;
  if (!resolved) return 0;
  heap->pending_ephemerons->resolved = NULL;
  while (resolved) {
    resolved->key = forwarded(resolved->key);
    visit_field(&resolved->value, heap);
    resolved = resolved->gc_link;
  }
  return 1;
}

void kill_pending_ephemerons(struct gc_heap *heap) {
  struct gc_ephemeron *ephemeron;
  struct gc_pending_ephemeron_table *table = heap->pending_ephemerons;
  for (size_t i = 0; i < table->nbuckets; i++) {
    for (struct gc_ephemeron *chain = table->buckets[i];
         chain;
         chain = chain->gc_link)
      chain->dead = 1;    
    table->buckets[i] = NULL;
  }
}

void collect(struct gc_heap *heap) {
  flip(heap);
  uintptr_t scan = heap->hp;
  trace_roots(heap, visit_field);
  do { // *
    while(scan < heap->hp) {
      struct gc_obj *obj = scan;
      scan += align_size(trace_heap_object(obj, heap, visit_field));
    }
  } while (trace_ephemerons(heap)); // *
  kill_pending_ephemerons(heap); // *
}

The result is... not so bad? It makes sense to make ephemerons pay
their own way in terms of memory, having an internal field managed by
the GC. In fact I must confess that in the implementation I have been
woodshedding, I actually have three of these damn things; perhaps more
on that in some other post. But the perturbation to the core algorithm
is perhaps less than the original code. There are still some
optimizations to make, notably postponing hash-table lookups until the
whole strongly-reachable graph is discovered; but again, another day.

And with that, thanks for coming along with me for my journeys into
ephemeron-space.

I would like to specifically thank Erik Corry and Steve Blackburn for
their advice over the years, and patience with my ignorance; I can only
imagine that it's quite amusing when you have experience in
a domain to see someone new and eager come in and make many of the
classic mistakes. They have both had a kind of generous parsimony in
the sense of allowing me to make the necessary gaffes but also providing
insight where it can be helpful.

I'm thinking of many occasions but I especially appreciate the advice to
start with a semi-space collector when trying new things, be it
benchmarks or test cases or API design or new functionality, as it's a
simple algorithm, hard to get wrong on the implementation side, and
perfect for bringing out any bugs in other parts of the system. In this
case the difference between fromspace and tospace pointers has a
material difference to how you structure the ephemeron implementation;
it's not something you can do just in a trace_heap_object function, as
you don't have the old pointers there, and the pending ephemeron table
is indexed by old object addresses.

Well, until some other time, gentle hackfolk, do accept my sincerest waste
disposal greetings. As always, yours in garbage, etc.,

Source: Planet GNU