2022-05-16

A Wrong Name: Fifteen Years of TransitiveClosure in MMTk

by Kunshan Wang

Table of contents [show/hide]

The TransitiveClosure interface in MMTk is confusing. It should have been split into two different interfaces, but not… until now. What’s more interesting is how we ended up having an interface like that 15 years ago, and why it stayed that way since then.

MMTk?

I have been contributing to the Memory Management Toolkit (MMTk) project since I left Huawei. MMTk is a framework for garbage collection. It was part of the JikesRVM, and has been a good platform for GC research. Many state-of-the-art garbage collection algorithms have been developed on it. Now we are re-implementing MMTk in Rust so that it can be integrated into many different languages and VMs, such as OpenJDK, V8, Julia, GHC, PyPy and, of course Ruby which I am working on.

As I started working on MMTk, one part of the code, that is, the TransitiveClosure interface/trait has always confused me.

TransitiveClosure?

In the core MMTk repository, the mmtk-core, you can find the TransitiveClosure trait and its implementation for all ProcessEdgesWork instances. (Let’s not worry about what ProcessEdgesWork does for now.)

pub trait TransitiveClosure {
    fn process_edge(&mut self, slot: Address);
    fn process_node(&mut self, object: ObjectReference);
}

impl<T: ProcessEdgesWork> TransitiveClosure for T {
    fn process_edge(&mut self, _slot: Address) {
        unreachable!();
    }
    #[inline]
    fn process_node(&mut self, object: ObjectReference) {
        ProcessEdgesWork::process_node(self, object);
    }
}

The presence of unreachable!(); startled me. The code implemented one method (process_node), but declared the other method (process_edge) unreachable. This is not how we usually use traits. When we define a trait with two methods, we expect both methods to be callable on all instances. Otherwise, why do we even have the process_edge method in TransitiveClosure in the first place?

I guess some types must have implemented the TransitiveClosure trait and provided a proper process_edge implementation. And… I am right. It is ObjectsClosure.

impl<'a, E: ProcessEdgesWork> TransitiveClosure for ObjectsClosure<'a, E> {
    #[inline(always)]
    fn process_edge(&mut self, slot: Address) {
        if self.buffer.is_empty() {
            self.buffer.reserve(E::CAPACITY);
        }
        self.buffer.push(slot);
        // ... more code omitted.
    }
    fn process_node(&mut self, _object: ObjectReference) {
        unreachable!()
    }
}

unreachable!() again? What the …!

Clearly ProcessEdgesWork and ObjectsClosure are implementing two different interfaces.

ProcessEdgesWork only implements process_node, while
ObjectsClosure only implements process_edge.

Should we split TransitiveClosure?

Maybe we should split it into two different traits, one contains process_node and the other contains process_edge. In this way, a type may only implement the trait it needs, and not the unreachable!() stub.

Or, should we?

To confirm this, let’s find their call sites, and see whether we ever use both methods at the same time. The short answer is, no.

The process_node method

process_node is only called by XxxSpace::trace_object, where XxxSpace is a concrete space. It can be CopySpace, MallocSpace, LargeObjectSpace and so on.

ImmixSpace even has two flavours of trace_object, both call process_node.

The trace_object method of a space visits an object during tracing. It marks or copies the object and, if it is the first time it visits an object, it enqueues the object into the marking queue by calling the process_node method which does the actual enqueuing.

The process_edge method

process_edge is only called by VM::Scanning::scan_object.

scan_object is implemented by a VM binding (such as the OpenJDK binding) because it is VM-specific. MMTk calls scan_object when it needs the VM to locate all reference fields in an object.

pub trait Scanning<VM: VMBinding> {
    fn scan_object<T: TransitiveClosure>(
        trace: &mut T,
        object: ObjectReference,
        tls: VMWorkerThread,
    );

    // ... more code omitted
}

The trait parameter is a call-back. When scan_object(trace, object, tls) is called, it scans object, and calls trace.process_edge on each edge, i.e. each reference field, of object.

Yes! They are different!

We confirmed that each of the two methods is used in a different scenario.

process_node is only used by trace_object, and
process_edge is only used by scan_object.

And nothing calls both process_node and process_edge at the same time.

So let’s split them into two traits.

But my colleague Yi Lin reminded me that there were other classes that extends TransitiveClosure in the original MMTk in JikesRVM. To be safe, I looked into the original JikesRVM MMTk before making further decisions.

Back in JikesRVM

Now we temporarily move away from Rust MMTk, and go back to JikesRVM MMTk. If the TransitiveClosure trait is designed like this in Rust, it must have its roots back in JikesRVM.

TransitiveClosure in JikesRVM MMTk

In JikesRVM, there is a class named TransitiveClosure. Yes. It is a class, not an interface.

@Uninterruptible
public abstract class TransitiveClosure {
  // Other methods omitted...

  public void processEdge(ObjectReference source, Address slot) {
    VM.assertions.fail("processEdge not implemented.");
  }

  public void processNode(ObjectReference object) {
    VM.assertions.fail("processNode not implemented.");
  }
}

Defining TransitiveClosure as a class allows it to provide failing default implementations of processEdge and processNode. This allows its subclasses to override one method while leaving the other failing.

(Note that the JikesRVM code was written before Java 8, so interface methods could not have default implementations.)

TransitiveClosure subclasses in JikesRVM MMTk

In JikesRVM, many classes inherit from TransitiveClosure.

TransitiveClosure
- TraceLocal
  - NoGCTraceLocal
  - MSTraceLocal
  - SSTraceLocal
  - MCMarkTraceLocal
  - MCForwardTraceLocal
  - ImmixTraceLocal
  - ImmixDefragTraceLocal
  - …
- RCZero
- RCModifiedProcessor
- GenRCModifiedProcessor
- ObjectReferenceBuffer
  - RCDecBuffer
- TraceWriteBuffer

TraceLocal

Back in JikesRVM MMTk, there was no concept of “work packet”. The abstraction of tracing is the Trace class and its thread-local counterpart, TraceLocal. ThreadLocal was the local context of a GC thread during tracing GC.

public abstract class TraceLocal extends TransitiveClosure {
    @Override
    @Inline
    public final void processNode(ObjectReference object) {
        values.push(object);
    }

    @Override
    @Inline
    public final void processEdge(ObjectReference source, Address slot) {
        ObjectReference object = VM.activePlan.global().loadObjectReference(slot);
        ObjectReference newObject = traceObject(object, false);
        if (overwriteReferenceDuringTrace()) {
            VM.activePlan.global().storeObjectReference(slot, newObject);
        }
    }

    // ... more code omitted
}

In JikesRVM, TraceLocal is the counterpart of both ProcessEdgesWork and ObjectsClosure of the current Rust MMTk.

Space.traceObject calls TraceLocal.processNode to enqueues newly visited objects,
- just like XxxSpace::trace_object calling ProcessEdgesWork.process_node in Rust MMTk.
Scanning.scanObject calls TraceLocal.processEdge to process each reference field (edge),
- just like Scanning::scan_object calling ObjectsClosure.process_edge in Rust MMTk.

In JikesRVM MMTk, each plan (GC algorithm) defines its own TraceLocal subclass.

(Some GC algorithms. such as MarkCompact and Immix, even have more than one TraceLocal for different kinds of traces.)

It looks like the TransitiveClosure is a proper interface for TraceLocal. It implements both processNode and processEdge.

However, TraceLocal is the only class that implements both processNode and processEdge. Other classes don’t.

processEdge: Field visitors

Some subclasses of TransitiveClosure are related to reference counting (RC). They are RCZero, RCModifiedProcessor, etc. They only override the processEdge method, assuming processNode is never called.

What do they do?

They are field visitors. For example, RCZero visits edges and stores null to each field.

public final class RCZero extends TransitiveClosure {
    // Does not override processNode, leaving it failing

    public void processEdge(ObjectReference source, Address slot) {
        slot.store(ObjectReference.nullReference());
    }
}

During RC collection, RCZero is passed to Scanning.scanObject as the callback to set all reference fields of an object to null.

public abstract class RCBaseCollector extends StopTheWorldCollector {
    private final RCZero zero;  // This implements TransitiveClosure
    // ... more code omitted
    @Override
    public void collectionPhase(short phaseId, boolean primary) {
        // ... more code omitted
        if (phaseId == RCBase.PROCESS_DECBUFFER) {
            // ... more code omitted
            while (!(current = decBuffer.pop()).isNull()) {
                // ... more code omitted
                VM.scanning.scanObject(zero, current);  // Passing zero as callback
            }
        }
    }
}

Other RC-related classes are similar. They record the fields or apply decrements to the objects pointed by each field.

processNode: to enqueue objects

The TraceWriteBuffer class only implements processNode.

(In fact, TraceWriteBuffer is the only class that overrides processNode but note processEdge throughout the history of JikesRVM.)

public final class TraceWriteBuffer extends TransitiveClosure {
    private final WriteBuffer buffer;
    // ... more code omitted
    @Inline
    public void processNode(ObjectReference object) {
        buffer.insert(object.toAddress());
    }
}

TraceWriteBuffer is only used by CMSMutator (concurrent mark-sweep mutator).

public class CMSMutator extends ConcurrentMutator {
    private final TraceWriteBuffer remset;
    // ... more code omitted
    @Override
    protected void checkAndEnqueueReference(ObjectReference ref) {
        if (ref.isNull()) return;
        if (barrierActive) {
            if (!ref.isNull()) {
                if      (Space.isInSpace(CMS.MARK_SWEEP, ref)) CMS.msSpace.traceObject(remset, ref);            // here
                else if (Space.isInSpace(CMS.IMMORTAL,   ref)) CMS.immortalSpace.traceObject(remset, ref);      // here
                else if (Space.isInSpace(CMS.LOS,        ref)) CMS.loSpace.traceObject(remset, ref);            // here
                else if (Space.isInSpace(CMS.NON_MOVING, ref)) CMS.nonMovingSpace.traceObject(remset, ref);     // here
                else if (Space.isInSpace(CMS.SMALL_CODE, ref)) CMS.smallCodeSpace.traceObject(remset, ref);     // here
                else if (Space.isInSpace(CMS.LARGE_CODE, ref)) CMS.largeCodeSpace.traceObject(remset, ref);     // here
            }
        }
        // ... more code omitted
    }
}

In simple terms, the code above means “Trace the object ref in the space it is in, and, if it is the first time the object is traced, enqueue it in remset.”

What’s in common between TraceWriteBuffer and TraceLocal is that both of them contain a buffer (a remember set, or the tracing queue) where the traceObject method can enqueue newly visited objects to. This is what processNode is for, i.e. enqueuing objects.

So why not introducing a dedicated interface?

There comes an interesting question:

If some classes are just field visitors, why don’t we have a dedicated interface for it, and name it FieldVisitor?

and

If some classes are just places to enqueue objects, why don’t we have a dedicated interface for it, and name it ObjectBuffer?

TransitiveClosure has been there for 15 years. Many developers have made contributions to MMTk, and some of them must have noticed the issues I talked about. Why have TransitiveClosure remained this way till today?

And why did we end up having this TransitiveClosure amalgamation in the first place?

Through the history

To answer these questions, I dug into the Git revision history of the JikesRVM repository.

The git blame command can show me in which commit any line in any source file is last modified. I use vim-fugitive, and it even allows me to follow a line of code from one commit to another, and see every single change to a line of code in history.

Early days of object scanning

The history of Scanning.scanObject is the history of object scanning interface and implementation.

Back in 2003, MMTk (was JMTk back then) and JikesRVM were more tightly coupled than they are today. Unlike the modern Scanning interface, the ScanObject class back then contained concrete implementations directly. The ScanObject.scan method enumerates reference fields, and directly calls the Plan.traceObjectLocation static method, which does the load/traceObject/store sequence like our modern ProcessEdgesWork::process_edge method. Everything was hard-wired. The operation for visiting field was fixed.

A commit in 2003 introduced the ScanObject.enumeratePointers method which calls back to Plan.enumeratePointerLocation which can be customised. This allows a certain degree of freedom of what to do with each field, instead of load/traceObject/store.

Another commit in 2003 introduced the Enumerate class which was subsequently renamed to Enumerator and made fully abstract.

abstract public class Enumerator implements Uninterruptible {
    abstract public void enumeratePointerLocation(Address location);
}

The ScanObject.enumeratePointers method then used Enumerator as the call back instead calling into Plan directly, allowing the behavior of visiting each edge to be fully customised.

As I conjectured, some developers did notice that the call-back for scanObjects should be customisable, and Enumerator was introduced just for that.

In 2006, just before that important change

Both the Scanning.scanObject and Scanning.enumeratePointers existed in the Scanning class before late 2006.

public abstract class Scanning implements Constants, Uninterruptible {
    public abstract void scanObject(TraceLocal trace, ObjectReference object);
    public abstract void enumeratePointers(ObjectReference object, Enumerator e);
    // ... more code omitted
}

Both scanObject and enumeratePointers enumerate reference fields in an object. However, they are used in totally different places.

Scanning.scanObject

The Scanning.scanObject method was used for tracing, as it took TraceLocal as parameter, and called TraceLocal.traceObjectLocation for each reference filed.

Note that at that time, TraceLocal was a root class. There was no superclasses or interfaces like TransitiveClosure for it to extend/implement. (Constants is an all-static interface, and Uninterruptible is just a marker.) This means scanObject was only applicable to subclasses of TraceLocal, and nothing else.

public abstract class TraceLocal implements Constants, Uninterruptible { // no superclass
    public final void enqueue(ObjectReference object) throws InlinePragma { // traceObject calls this
        values.push(object);
    }
    
    public final void traceObjectLocation(Address objLoc, boolean root) // scanObject calls this
            throws InlinePragma {
        ObjectReference object = objLoc.loadObjectReference();
        ObjectReference newObject = traceObject(object, root);
        objLoc.store(newObject);
    }
    // ... more code omitted
}

Scanning.enumeratePointers

On the other hand, the Scanning.enumeratePointers could in theory be used by any code that needs to enumerate reference fields. At that time, it was used for (deferred) reference counting. The following was the “mark grey” operation in trial-deletion for cycle collection in reference counting.

public final class TrialDeletion extends CycleDetector
        implements Constants, Uninterruptible {
    private TDGreyEnumerator greyEnum;  // extends Enumerator
    // ... more code omitted
    private final boolean markGrey(ObjectReference object, long timeCap)
            throws InlinePragma {
        // ... more code omitted
        while (!object.isNull()) {
            // ... more code omitted
            if (!abort && !RefCountSpace.isGrey(object)) {
                RefCountSpace.makeGrey(object);
                Scan.enumeratePointers(object, greyEnum);  // pay attention to this call site
            }
            object = workQueue.pop();
        }
        return !abort;
    }
    // ... more code omitted
}

The call site of Scan.enumeratePointers passed a TDGreyEnumerator instance which customised the behaviour of visiting fields. It just forward the call to TrialDeletion.enumerateGrey.

public class TDGreyEnumerator extends Enumerator implements Uninterruptible {
    private TrialDeletion td;
    // ... more code omitted
    public void enumeratePointerLocation(Address objLoc) throws InlinePragma {
        td.enumerateGrey(objLoc.loadObjectReference());
    }
}

The obvious problem

We then had Scanning.scanObject for tracing, and Scanning.enumeratePointers for RC.

But why did we need both methods? scanObject was basically a special case of enumeratePointers that called TraceObject.enumeratePointerLocation.

Could we unify them? Apparently someone noticed that, and he did a refactoring.

Unifying scanObject and enumeratePointers with “TraceStep”

In late 2006, someone created a commit which introduced a new version of reference counting collector, and at the same time did “a huge refactoring” (see the commit message).

This commit created a class named TraceStep. Scanning.scanObject then took TraceStep as parameter instead of TraceLocal. The enumeratePointers method was removed.

public abstract class TraceStep implements Constants, Uninterruptible {
    public abstract void traceObjectLocation(Address objLoc);
}

public abstract class Scanning implements Constants, Uninterruptible {
    public abstract void scanObject(TraceStep trace, ObjectReference object);
}

Obviously, TraceStep was intended to replaced Enumerator as the call-back interface for scanObject to enumerating fields. Both tracing and RC started using scanObject from then on.

For tracing, the TraceLocal started to extend TraceStep.

public abstract class TraceLocal extends TraceStep   // Now extends TraceStep
        implements Constants, Uninterruptible {
    public final void traceObjectLocation(Address objLoc) // called by scanObject
            throws InlinePragma {
        traceObjectLocation(objLoc, false);
    }
    public final void traceObjectLocation(Address objLoc, boolean root)
            throws InlinePragma {
        // ... just like before
    }
}

RC operations, such as “mark grey”, started extending TraceStep instead of Enumerator.

public final class TrialDeletionGreyStep extends TraceStep  // Now extends TraceStep instead of Enumerator
        implements Uninterruptible {
    public void traceObjectLocation(Address objLoc) {
        ObjectReference object = objLoc.loadObjectReference();
        ((TrialDeletionCollector)CDCollector.current()).enumerateGrey(object);
    }
}

public final class TrialDeletionCollector extends CDCollector
        implements Uninterruptible, Constants {
    public TrialDeletionGreyStep greyStep;  // A TraceStep instead of Enumerator
    // ... more code omitted
    private final void markGrey(ObjectReference object) throws InlinePragma {
        // ... more code omitted
        Scan.scanObject(greyStep, object);  // now passes a TraceStep as arg
        // ... more code omitted
    }
}

This commit successfully unified the object scanning interface. Scanning.scanObject became the only object-scanning method. It became applicable to all TraceStep instances alike, and was no longer coupled with TraceLocal. Good! Nicely done!

More over, this commit cleverly found a concept that describes both TraceLocal and the various operations in reference counting, such as “mark grey”, “scan black”, etc. It was “TraceStep”. Intuitively, all of them were steps of tracing.

But really? We will soon find that it is not really that clever.

Here comes TransitiveClosure

In 2007, someone made another commit to “reorganise the core of transitive closure”, and the motivations were “concurrent collection” and “implementing prefetching during trace”.

In this commit, TraceStep was renamed to our familiar TransitiveClosure.

The TraceStep.traceObjectLocation method was renamed to TransitiveClosure.processEdge, and
a new method processNode was added.

@Uninterruptible
public abstract class TransitiveClosure {
  /**
   * Trace an edge during GC.
   *
   * @param objLoc The location containing the object reference.
   */
  public void processEdge(Address objLoc) {
    VM.assertions.fail("processEdge not implemented.");
  }

  /**
   * Trace a node during GC.
   *
   * @param object The object to be processed.
   */
  public void processNode(ObjectReference object) {
    VM.assertions.fail("processNode not implemented.");
  }
}

Note that I didn’t add any // ... more code omitted comment because back in 2007, that was the entire class body of TransitiveClosure.

TraceLocal now extends TransitiveClosure instead of TraceStep, and

the traceObjectLocation method was renamed to processEdge, and
the enqueue method was renamed to processNode.

public abstract class TraceLocal extends TransitiveClosure {
    public final void processNode(ObjectReference object) {
        // ... like before
    }

    public final void processEdge(ObjectReference source, Address slot) {
        // ... like before
    }
    // ... more code omitted
}

A few days later, a subsequent commit introduced a TraceWriteBuffer class that extended TransitiveClosure and overrode processNode alone, and not processEdge.

public final class TraceWriteBuffer extends TransitiveClosure {
    // ... more code omitted
    @Inline
    public void processNode(ObjectReference object) {
        // .. as you have seen in previous sections.
    }
}

It remains the only class that overrides processNode but note processEdge in the history of JikesRVM, even today.

With this change, TransitiveClosure started to serve two distinct purposes.

the callback for scanObject, and
the place to enqueue an object after tracing it.

If a class was only used in one of the two cases, it would override only one of the two methods.

10 years passed…

TransitiveClosure remained this way in JikesRVM MMTk since then.

And the TransitiveClosure class grew a little bit. Some static fields and methods about specialised scanning were added to it, as if it were a good place to hold information for specialised scanning.

…and there was Rust MMTk.

In 2017, we started porting MMTk to Rust.

Porting MMTk to Rust

The TransitiveClosure was copied to the Rust version, except this time it was represented as a Rust trait.

pub trait TransitiveClosure {
    fn process_edge(&mut self, slot: Address);
    fn process_node(&mut self, object: ObjectReference);
}

And TraceLocal was a trait that requires TransitiveClosure.

pub trait TraceLocal: TransitiveClosure {
    // ... other methods
}

The Scanning.scan_object method took TransitiveClosure as parameter, just like JikesRVM MMTk.

pub trait Scanning<VM: VMBinding> {
    fn scan_object<T: TransitiveClosure>(
        trace: &mut T,
        object: ObjectReference,
        tls: OpaquePointer,
    );

    // ... more code omitted
}

And the XxxSpace::trace_object methods still took TransitiveClosure as parameter, like this:

impl<VM: VMBinding> CopySpace<VM> {
    pub fn trace_object<T: TransitiveClosure>(
        &self,
        trace: &mut T,
        object: ObjectReference,
        allocator: Allocator,
        tls: OpaquePointer,
    ) -> ObjectReference {

    // ... more code omitted
}

And there was still TraceLocal. (Not any more now.)

pub trait TraceLocal: TransitiveClosure {
    // ... omitted.  There are many methods, but none is interesting here.
}

Like in JikesRVM MMTk,

the scan_object method still called trace.process_edge for each visited edge, and
the trace_object method still called trace.process_node to enqueue the object on first visit.

Initially, we did not address the fact that TransitiveClosure served two different purposes. By that time, we had just begun porting MMTk to Rust. We prioritised making Rust MMTk working, and ported from JikesRVM MMTk in a style closely resembled the original Java code.

And MMTk worked. Not just worked, but worked for OpenJDK, JikesRVM and several other VMs, too.

Introducing work packets

We later removed TraceLocal and introduced the work packet system.

The work packet system represents each unit of work as a “packet” that can be scheduled on any GC worker thread. The TraceLocal class was replaced by two work packets:

The ProcessEdgesWork work packet represents a list of edges to be traced.
The ScanObjects work packet represents a list of objects to be scanned.

A ProcessEdgesWork work packet also carries an object queue inside it. It needs the process_node method so that XxxSpace.trace_object can call it and queue newly visited objects.

Therefore, ProcessEdgesWork implements TransitiveClosure because trace_object expects it. And… remember? That implementation startled me…

impl<T: ProcessEdgesWork> TransitiveClosure for T {
    fn process_edge(&mut self, _slot: Address) {
        unreachable!();
    }
    #[inline]
    fn process_node(&mut self, object: ObjectReference) {
        ProcessEdgesWork::process_node(self, object);
    }
}

Then what visits edges when scanning an object? It is now the ObjectsClosure object. It needs to provide process_edge, but Scanning::scan_object expects TransitiveClosure.

So we have to do what the … we need to satisfy that requirement, as we have seen before:

impl<'a, E: ProcessEdgesWork> TransitiveClosure for ObjectsClosure<'a, E> {
    #[inline(always)]
    fn process_edge(&mut self, slot: Address) {
        if self.buffer.is_empty() {
            self.buffer.reserve(E::CAPACITY);
        }
        self.buffer.push(slot);
        // ... more code omitted.
    }
    fn process_node(&mut self, _object: ObjectReference) {
        unreachable!()
    }
} 

As you can see, even after we migrated to the work packet system, and even though we no longer have any type that overrides both process_edge and process_node, we still kept both process_edge and process_node in the TransitiveClosure trait.

Why do we end up having TransitiveClosure like this?

We have been startled by smelly code. We have expressed our anger, impatience, surprise, etc., without explicit vulgarity. We have looked into JikesRVM for the old MMTk. We have gone through the history to see the change of the object scanning interface, and read the commits from developers with the intention of improving MMTk.

But why do we end up having a TransitiveClosure trait like this?

Have we noticed that object scanning is not necessarily part of tracing?

Yes. The Enumerator interface was introduced just for that. When called back from scan_object, it allows us to do anything to reference fields.

Have we refactored scan_object so it takes a simple callback instead of TraceLocal?

Yes. When TraceStep was introduced, We unified Scanning.scanObject and Scanning.enumeratePointers. Scanning.scanObject was refactored to only depend on TraceStep, and TraceStep was an abstract class with only an abstract method traceObjectLocation(Address objLoc).

public abstract class TraceStep implements Constants, Uninterruptible {
    public abstract void traceObjectLocation(Address objLoc);
}

public abstract class Scanning implements Constants, Uninterruptible {
    public abstract void scanObject(TraceStep trace, ObjectReference object);
}

This should have been the ideal interface for Scanning.scan_object for MMTk in Java. (Rust could use closure to make it more concise.)

But why did we migrate away from it?

Probably only the author of this commit knows the exact reason.

To my understanding, I think it was because TraceStep was such a good, but a wrong, name.

We named it “TraceStep”.
We made it the superclass of TraceLocal.
We then introduced TraceWriteBuffer.
We noticed TraceWriteBuffer was another place to enqueue objects in addition to TraceLocal.
Then we naturally thought that both TraceWriteBuffer and TraceLocal should have a common superclass that had a method named enqueue.

But TraceStep doesn’t have enqueue.
Then we extended TraceStep into TransitiveClosure, and added enqueue.

And we even renamed enqueue to processNode to make it consistent with processEdge.
Then we have TransitiveClosure which had processEdge and processNode.

public abstract class TraceLocal extends TransitiveClosure {
    public final void processNode(ObjectReference object) { ... }
}

public final class TraceWriteBuffer extends TransitiveClosure {
    public void processNode(ObjectReference object) { ... }
}

Wow! “TransitiveClosure” was an even better name! That was what TraceLocal really was, i.e. computing the transitive closure of an object graph! A “transitive closure” is a graph! A graph has nodes and edges!

public abstract class TransitiveClosure {
  public void processEdge(Address objLoc) { ... }
  public void processNode(ObjectReference object) { ... }
}

But TraceWriteBuffer doesn’t override processEdge, and RCZero doesn’t override processNode!

No worries. We leave them “unreachable”.

“Unreachable”? That doesn’t sound right.

But it worked… for 15 years.

The name “TransitiveClosure” made so much sense that we stuck to TransitiveClosure forever, even after we ported MMTk to Rust.

But TraceStep is a wrong name.

TraceLocal, “mark grey”, “scan black”, RCZero and so on are all steps in tracing,
and those trace steps process edges,
hence Scanning.scanObject should accept TraceStep as a call-back argument, so Scanning.scanObject can give TraceStep edges to process.

Wrong.

Scanning.scanObject accepts a callback argument not because the callback is a step of tracing, but simply because the callback visits edges.

I don’t really know what counts as a “trace step”.

Does “assigning null to each reference field” count as a “trace step”?
Does “applying decrement operations to the reference counts of all neighbor objects” count as a “trace step” even if it is used in reference counting, only?
Is trial-deletion considered as a kind of tracing at all?

No matter what it is, isn’t it much easier to just say

“Scanning.scanObject accepts it as a callback argument because it visits edges.”

The nature of interfaces.

This is the nature of interfaces. A reuseable component should not make assumptions about its neighbours more than necessary. This is the separation of concern.

It is just like when we do the following in rust:

vec![1,2,3].iter().foreach(|e| {
    println!("{}", e);
});

The Iterator::foreach function accepts this closure not because it prints things, but because it visits elements. foreach is intended for visiting elements. The closure receives the object. That is the contract of the foreach method. Whether it prints the element or how it prints the element is not part of the contract.

So “Enumerator” was a right name. It correctly describes the role of the object in a scanObject invocation, that is, “it enumerates fields”, nothing more. That’s all what Scanning.scanObject need to care about. It passes each field to the callback, and that’s it. It should not assume what that callback object does to each edge. Whether it is a TraceLocal or an RC operation is beyond its obligation.

“TraceStep” was wrong. “TransitiveClosure” was also wrong. Neither of them is what Scanning.scanObject cares about.

Finding the way out

We have seen the history, and know why it ended up like this.

We know what was wrong, and what would be right.

“Enumerator” was right. “TraceStep” was wrong. “TransitiveClosure” was wrong, too, but it just sounded so good.

No matter how good it sounds, we need to fix it.

From our analysis in the beginning of this article, we should split TransitiveClosure into two traits,

one as the callback of Scanning::scan_object, and
the other to be used by XxxSpace::trace_object to enqueue object.

I have opened an issue, and detailed the steps of splitting and removing TransitiveClosure from mmtk-core.

Refactoring Scanning::scan_object

Scanning::scan_object takes an object and a callback as parameters. It will find all reference fields in the object, and invoke the callback for each reference field.

We need a proper name for the callback of Scanning::scan_object.

I name it EdgeVisitor, and its only method is, as you can imagine, visit_edge.

pub trait EdgeVisitor {
    fn visit_edge(&mut self, edge: Address);
}

And it replaces TransitiveClosure as the parameter type of Scanning::scan_object:

pub trait Scanning<VM: VMBinding> {
    fn scan_object<EV: EdgeVisitor>(
        tls: VMWorkerThread,
        object: ObjectReference,
        edge_visitor: &mut EV,
    );
}

Then the only callback that have ever been passed to Scanning::scan_object, i.e. ObjectsClosure, now implements EdgeVisitor, instead.

impl<'a, E: ProcessEdgesWork> EdgeVisitor for ObjectsClosure<'a, E> {
    fn visit_edge(&mut self, slot: Address) {
        // ... code omitted
    }
}

And this change has been merged into the master branch of mmtk-core.

Meanwhile in Australia…

While I was refactoring mmtk-core and working on mmtk-ruby, my colleague Wenyu Zhao was busy with his paper about the LXC GC algorithm targetting PLDI 2022.

Wenyu independently introduced the EdgeIterator struct. It is a wrapper over Scanning::scan_object and TransitiveClosure, and the unreachable!() statement, too! :P

pub struct EdgeIterator<'a, VM: VMBinding> {
    f: Box<dyn FnMut(Address) + 'a>,
    _p: PhantomData<VM>,
}

impl<'a, VM: VMBinding> EdgeIterator<'a, VM> {
    pub fn iterate(o: ObjectReference, f: impl FnMut(Address) + 'a) {
        let mut x = Self { f: box f, _p: PhantomData };
        <VM::VMScanning as Scanning<VM>>::scan_object(&mut x, o, VMWorkerThread(VMThread::UNINITIALIZED));
    }
}

impl<'a, VM: VMBinding> TransitiveClosure for EdgeIterator<'a, VM> {
    #[inline(always)]
    fn process_edge(&mut self, slot: Address) {
        (self.f)(slot);
    }
    fn process_node(&mut self, _object: ObjectReference) {
        unreachable!()
    }
}

With this struct, it allows a closure to be used in the place of a TransitiveClosure. The following code applies the inc RC operation to all adjacent objects:

EdgeIterator::<E::VM>::iterate(src, |edge| {
    self.inc(unsafe { edge.load() });
})

And the following applies dec, and optionally frees the object:

EdgeIterator::<VM>::iterate(o, |edge| {
    let t = unsafe { edge.load::<ObjectReference>() };
    if !t.is_null() {
        if Ok(1) == super::rc::dec(t) {
            debug_assert!(super::rc::is_dead(t));
            f(t);
        }
    }
});

Pretty neat, isn’t it? It is so neat that I want to steal the code and add an EdgeVisitor::from_closure factory method for my EdgeVisitor.

One interesting thing is, Wenyu introduced this for reference counting, according to the commit message. What a coincidence! Daniel introduced TraceStep in 2006 for exactly the same reason: reference counting, according to its commit message, too. Understandably, reference counting is very different from tracing. RC needs to scan objects, but not for tracing, so passing TraceLocal to scan_object doesn’t make sense. Therefore, those additional operations, be it mark-grey, scan-black or just freeing objects, all define their own callbacks to be called by scan_object. This necessitates the creation of a better interface for the callback of scan_object.

Refactoring Space::trace_object

The XxxSpace::trace_object method usually has this form:

impl<VM: VMBinding> CopySpace<VM> {
    #[inline]
    pub fn trace_object<T: TransitiveClosure>(
        &self,
        trace: &mut T,
        object: ObjectReference,
        semantics: Option<CopySemantics>,
        worker: &mut GCWorker<VM>,
    ) -> ObjectReference {
        // ... more code omitted
        if /* is first visited */ {
            let new_object = object_forwarding::forward_object::<VM>(...);

            trace.process_node(new_object);  // enqueue object
            new_object  // return the forwarded obj ref
        }
    }

    // ... more code omitted
}

If an object is first visited, it enqueues the object in trace, and returns the forwarded object reference.

This method is polymorphic w.r.t. T. T is the ProcessEdgesWork (a sub-trait of TransitiveClosure) type used by the plan. We used to have one different ProcessEdgesWork type for each plan, which made this generic type parameter necessary.

We already noticed that we may safely remove this only use case of the TransitiveClosure trait (i.e. trace_object) once we remove plan-specific ProcessEdgesWork implementations. And the good thing is, we have recently just removed all plan-specific ProcessEdgesWork implementations. Although we are not sure whether we will have plan-specific ProcessEdgesWork for complex GC algorithms in the future, I think the code is much cleaner for a refactoring.

However, I believe the trace parameter is completely unnecessary. We just need a return value to indicate whether it is the first time the object is visited, so that the caller of trace_object (which is only ProcessEdgesWork::process_edge at this time) can enqueue the object. So instead of

fn process_edge(&mut self, slot: Address) {
    let object = unsafe { slot.load::<ObjectReference>() };
    let new_object = self.trace_object(object);
    if Self::OVERWRITE_REFERENCE {
        unsafe { slot.store(new_object) };
    }
}

we shall have

fn process_edge(&mut self, slot: Address) {
    let object = unsafe { slot.load::<ObjectReference>() };
    let (new_object, first_visit) = self.trace_object(object);
    if Self::OVERWRITE_REFERENCE {
        unsafe { slot.store(new_object) };
    }
    if first_visit {
        self.enqueue(new_object);
    }

However, the #[inline] above trace_object indicates that it is very performance-critical. We’d better measure before making the decision to change.

Epilogue

After fifteen years, the TransitiveClosure trait is finally going to change.

The Rust MMTk is under active development now. As we proceed, we may see more things like this, things that have remained in its current state for years, or even decades. But this doesn’t mean they are always right. We have to rethink about the code again and again, and fix the problems whenever we can.

Update

We eventually removed TransitiveClosure as well as the stale TraceLocal trait from mmtk-core. However, unlike we discussed in previous sections, we did not remove the trace parameter from trace_object. We think “enqueuing objects” is still a responsibility of the trace_object method, so we should still enqueue objects in trace_object instead of in process_edges using the return value. Therefore, we simply renamed TransitiveClosure to ObjectQueue, and renamed process_node to enqueue, so they indicate exactly what they do. And now it is the actual object queue instead of ProcessEdgesWork that implements ObjectQueue.

The signature of CopySpace::trace_object now looks like this:

#[inline(always)]
pub fn trace_object<Q: ObjectQueue>(
    &self,
    queue: &mut Q,
    object: ObjectReference,
    semantics: Option<CopySemantics>,
    worker: &mut GCWorker<VM>,
) -> ObjectReference {

Note that ObjectQueue is still a trait. It does not have to be a physical queue. If a GC algorithm does not hold objects in a queue during tracing, it can implement this trait and process the “enqueued” object immediatey.