Home | Kunshan Wang

Recent blog

2024-03-07 OSPP'2023: The good, the bad, and the ugly

TL;DR: Last year, the MMTk project participated in the Open-Source Promotion Plan (OSPP). We mentored two students and they completed two student projects, which was cheering. But the OSPP’2023 event itself was organised in a way I found frustrating, and even hostile to the free software community. Meanwhile, I realised that there were toxic people lurking in the community, which was worrying.

WARNING: Contains harsh words. Viewer discretion is advised.

continue reading...

About Me

I am a programming language and virtual machine enthusiast.

I am a contributor to the Memory Management Toolkit (MMTk) project. I am actively developing an MMTk binding for Ruby so that the Ruby interpreter can use MMTk as its garbage collection backend.

I got my PhD degree from the Australian National University. I designed the Mu micro virtual machine. You can find my publications here.

I worked for Huawei before. The Open Ark Compiler still contains some of my code. However, having participated in that project does not mean I appreciate its design choices, especially the use of naive reference counting as the garbage collection mechanism.

You can find my fun projects on GitHub.

I recently created TUIModPlayer, a Rust program that plays Module (a.k.a. “mod”) music in terminals.

Publications

Kunshan Wang, Stephen M. Blackburn, Peter Zhu and Matthew Valentine-House, "Reworking Memory Management in CRuby: A Practitioner Report", in Proceedings of the 2025 ACM SIGPLAN International Symposium on Memory Management (ISMM'25), 2025. abstract pdf doi link

Ruby is a dynamic programming language that was first released in 1995 and remains heavily used today. Ruby underpins Ruby on Rails, one of the most widely deployed web application frameworks. The scale at which Rails is deployed has placed increasing pressure on the underlying CRuby implementation, and in particular its approach to memory management. CRuby implements a mark-sweep garbage collector which until recently was non-moving and only allocated fixed-size 40-byte objects, falling back to malloc to manage all larger objects.

This paper reports on a multi-year academic-industrial collaboration to rework CRuby’s approach to memory management with the goal of introducing modularity and the ability to incorporate modern high performance garbage collection algorithms. This required identifying and addressing deeply ingrained assumptions across many aspects of the CRuby runtime. We describe the longstanding CRuby implementation and enumerate core challenges we faced and lessons they offer.

Our work has been embraced by the Ruby community, and the refactorings and new garbage collection interface we describe have been upstreamed.

We look forward to this work being used to deploy a new class of garbage collectors for Ruby. We hope that this paper will provide important lessons and insights for Ruby developers, garbage collection researchers and language designers.
Kunshan Wang, Stephen M. Blackburn, Antony L. Hosking and Michael Norrish, "Hop, Skip, & Jump: Practical On-Stack Replacement for a Cross-Platform Language-Neutral VM", in 14th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments (VEE 2018), March 24–25, 2018, Williamsburg, VA, USA., 2018. abstract pdf doi link

On-stack replacement (OSR) is a performance-critical technology for many languages, especially dynamic languages. Conventional wisdom, apparent in JavaScript engines such as V8 and SpiderMonkey, is that OSR must be implemented in a low-level (i.e., in assembly) and language-specific way.

This paper presents an OSR abstraction based on Swapstack, materialized as the API for a low-level virtual machine, and shows how the abstraction of resumption protocols facilitates an elegant implementation of this API on real hardware. Using an experimental JavaScript implementation, we demonstrate that this API enables the language implementation to perform OSR without the need to deal with machine-level details. We also show that the API itself is implementable on concrete hardware. This work helps crystallize OSR abstractions and, by providing a reusable implementation, brings OSR within reach for more language implementers.
Kunshan Wang, "Micro Virtual Machines: A Solid Foundation for Managed Language Implementation", Ph.D. thesis, College of Engineering and Computer Science, The Australian National University, 2017. abstract pdf doi link

Today new programming languages proliferate, but many of them suffer from poor performance and inscrutable semantics. We assert that the root of many of the performance and semantic problems of today’s languages is that language implementation is extremely difficult. This thesis the fundamental challenges of efficiently developing high-level managed languages.

Modern high-level languages provide abstractions over execution, memory management and concurrency. It requires enormous intellectual capability and engineering effort to properly manage these concerns. Lacking such resources, developers usually choose naive implementation approaches in the early stages of language design, a strategy which too often has long-term consequences, hindering the future development of the language. Existing language development platforms have failed to provide the right level of abstraction, and forced implementers to reinvent low-level mechanisms in order to obtain performance.

My thesis is that the introduction of micro virtual machines will allow the development of higher-quality, high-performance managed languages.

The first contribution of this thesis is the design of Mu, with the specification of Mu as the main outcome. Mu is the first micro virtual machine, a robust, performant, and light-weight abstraction over just three concerns: execution, concurrency and garbage collection. Such a foundation attacks three of the most fundamental and challenging issues that face existing language designs and implementations, leaving the language implementers free to focus on the higher levels of their language design.

The second contribution is an in-depth analysis of on-stack replacement and its efficient implementation. This low-level mechanism underpins run-time feedback-directed optimisation, which is key to the efficient implementation of dynamic languages.

The third contribution is demonstrating the viability of Mu through RPython, a real-world non-trivial language implementation. We also did some preliminary research of GHC as a Mu client.

We have created the Mu specification and its reference implementation, both of which are open-source. We show that that Mu’s on-stack replacement API can gracefully support dynamic languages such as JavaScript, and it is implementable on concrete hardware. Our RPython client has been able to translate and execute non-trivial RPython programs, and can run the RPySOM interpreter and the core of the PyPy interpreter.

With micro virtual machines providing a low-level substrate, language developers now have the option to build their next language on a micro virtual machine. We believe that the quality of programming languages will be improved as a result.
Yi Lin, Kunshan Wang, Stephen M. Blackburn, Michael Norrish and Antony L. Hosking, "Stop and Go: Understanding Yieldpoint Behavior", in Proceedings of the Fourteenth ACM SIGPLAN International Symposium on Memory Management (ISMM 2015), Portland, OR, June 14, 2015, 2015. abstract pdf doi link

Yieldpoints are critical to the implementation of high performance garbage collected languages, yet the design space is not well understood. Yieldpoints allow a running program to be interrupted at well-defined points in its execution, facilitating exact garbage collection, biased locking, on-stack replacement, profiling, and other important virtual machine behaviors. In this paper we identify and evaluate yieldpoint design choices, including previously undocumented designs and optimizations. One of the designs we identify opens new opportunities for very low overhead profiling. We measure the frequency with which yieldpoints are executed and establish a methodology for evaluating the common case execution time overhead. We also measure the median and worst case time-to-yield. We find that Java benchmarks execute about 100 M yieldpoints per second, of which about 1/20000 are taken. The average execution time overhead for untaken yieldpoints on the VM we use ranges from 2.5% to close to zero on modern hardware, depending on the design, and we find that the designs trade off total overhead with worst case time-to-yield. This analysis gives new insight into a critical but overlooked aspect of garbage collector implementation, and identifies a new optimization and new opportunities for very low overhead profiling.
Kunshan Wang, Yi Lin, Stephen M. Blackburn, Michael Norrish and Antony L. Hosking, "Draining the Swamp: Micro Virtual Machines as Solid Foundation for Language Development", in 1st Summit oN Advances in Programming Languages (SNAPL 2015), 2015. abstract pdf doi link

Many of today’s programming languages are broken. Poor performance, lack of features and hard-to-reason-about semantics can cost dearly in software maintenance and inefficient execution. The problem is only getting worse with programming languages proliferating and hardware becoming more complicated.

An important reason for this brokenness is that much of language design is implementation-driven. The difficulties in implementation and insufficient understanding of concepts bake bad designs into the language itself. Concurrency, architectural details and garbage collection are three fundamental concerns that contribute much to the complexities of implementing managed languages.

We propose the micro virtual machine, a thin abstraction designed specifically to relieve implementers of managed languages of the most fundamental implementation challenges that currently impede good design. The micro virtual machine targets abstractions over memory (garbage collection), architecture (compiler backend), and concurrency. We motivate the micro virtual machine and give an account of the design and initial experience of a concrete instance, which we call Mu, built over a two year period. Our goal is to remove an important barrier to performant and semantically sound managed language design and implementation.

Contact

Email: wks1986 AT gmail DOT com