Traversing nested lists with coroutines, Rosetta Code style
by Kunshan Wang
I’ll try to use coroutines to traverse nested lists, Rosetta Code style. That means I’ll do it in many different programming languages and libraries, including Ruby, Lua, Python (including greenlets), JavaScript, Rust, C#, etc. This task shows the difference between symmetric vs asymmetric coroutines, and stackful vs stackless coroutines.
Note that this post alone may not be enough to teach you how to use coroutines in all those languages.
I’ll also provide basic information about coroutines, swap-stack, async/await, etc. in the appendices.
The task
Input:
- a nested list of numbers, such as
[1, [[2, 3], [4, 5]], [6, 7, 8]]
Output:
-
recursively output all numbers in the list. At each level, visit all numbers in one element before visiting any subsequent elements.
When given the list above, the output should be 1, 2, 3, 4, 5, 6, 7 and 8, in that order.
Requirement:
- Use coroutine(s) to enumerate a nested list, and yield elements to the calling coroutine one at a time.
I will try to do this task using as many programming languages as possible, Rosetta Code style, to compare their coroutine syntax and API. At the time of writing (2022), different programming languages still differ greatly w.r.t. the design of coroutines.
The code
Ruby fibers (stackful, both asymmetric and symmetric)
Fibers are “primitives for implementing light weight cooperative concurrency in Ruby”.
Ruby fibers are stackful. According to the documentation:
As opposed to other stackless light weight concurrency models, each fiber comes with a stack. This enables the fiber to be paused from deeply nested function calls within the fiber block.
Ruby fibers can operate in both asymmetric and symmetric mode. I’ll demonstrate the task in both modes below.
Asymmetric
The Fiber#resume
instance method resumes a fiber, and a subsequent call to
the Fiber.yield
class method jumps back to the resumer.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
def traverse(x)
if x.is_a? Array
x.each do |elem|
traverse(elem) # recursive call
end
else
Fiber.yield x # can yield within recursive calls
end
end
def fiber_traverse(x)
Fiber.new do
traverse(x)
raise StopIteration
end
end
DATA = [1, [[2, 3], [4, 5]], [6, 7, 8]]
fiber = fiber_traverse DATA
loop do # Break if StopIteration is raised.
value = fiber.resume
puts value
end
The Enumerator
class can automatically transform
block-based visiting functions into fiber-based coroutine. It uses fiber only
when necessary. It uses fiber when used as external iterators (calling e.next
explicitly), but still uses call-back for internal iteration (e.each { |v| ...
}
).
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
def traverse(x, &block)
if x.is_a? Array
x.each do |elem|
traverse(elem, &block)
end
else
block.yield(x) # This is just a usual method call, not a coroutine yield.
end
end
def enum_traverse(x)
Enumerator.new do |yielder| # The yielder encapsulates how to yield.
traverse(x) do |value|
yielder.yield(value) # This may or may not use coroutine yield.
end
end
end
DATA = [1, [[2, 3], [4, 5]], [6, 7, 8]]
# We can do internal iteration
enum_traverse(DATA).each do |value| # use call-back
puts value
end
# and external iteration too.
e = enum_traverse(DATA)
loop do # Break if StopIteration is raised.
value = e.next # This will use fiber.
puts value
end
Symmetric
The Fiber#transfer
method can switch to any fiber, but always needs an
explicit fiber to switch to. We can pass the current fiber to the new fiber
when we create it, so it can remember which fiber to transfer back to.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
require "fiber"
def traverse(x, parent)
if x.is_a? Array
x.each do |elem|
traverse(elem, parent) # always remember the parent
end
else
parent.transfer x
end
end
def fiber_traverse(x)
current = Fiber.current # get the current fiber
Fiber.new do
traverse(x, current) # pass the fiber as parent
raise StopIteration
end
end
DATA = [1, [[2, 3], [4, 5]], [6, 7, 8]]
fiber = fiber_traverse DATA
loop do # Break if StopIteration is raised.
value = fiber.transfer
puts value
end
Lua threads (stackful, asymmetric)
Lua “threads” are stackful coroutines. Lua has a stackless interpreter, therefore it can easily implement stackful coroutines (Why? See appendix).
Lua provides asymmetric coroutines (with limitations) for the sake of simplicity and portability.
-
Simplicity: According to Coroutines in Lua, asymmetric coroutines may be easier to understand.
On the other hand, asymmetric coroutines truly behave like routines, in the sense that control is always transferred back to their callers. Since even novice programmers are familiar with the concept of a routine, control sequencing with asymmetric coroutines seems much simpler to manage and understand, besides allowing the development of more structured programs
-
Portability: Supporting symmetric coroutines (or even proper stackful asymmetric coroutines) will require C to have coroutine facilities, such as swap-stack, which is not always available. According to Coroutines in Lua:
Lua and C code can freely call each other; therefore, an application can create a chain of nested function calls wherein the languages are interleaved. Implementing a symmetric facility in this scenario imposes the preservation of C state when a Lua coroutine is suspended. This preservation is only possible if a coroutine facility is also provided for C; but a portable implementation of coroutines for C cannot be written.
Lua also added a limitation: a coroutine cannot yield while there are C function frames on its stack. Otherwise, Lua would require a swap-stack mechanism for C, making Lua less portable.
In Lua, the coroutine.resume
function continues the execution of a
coroutine, and coroutine.yield
jumps back to the calling coroutine.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
function traverse(x)
if type(x) == "table" then
for i, v in ipairs(x) do
traverse(v) -- recursive call
end
else
coroutine.yield(x) -- can yield within recursive calls
end
end
function coroutine_traverse(x)
return coroutine.create(function()
traverse(x)
return nil
end)
end
local list = {1, {{2, 3}, {4, 5}}, {6, 7, 8}}
local coro = coroutine_traverse(list)
while true do
local _, value = coroutine.resume(coro)
if value == nil then
break -- terminated
end
print(value)
end
The coroutine.wrap
function can wrap the coroutine into an iterator
function suitable for the generic for
statement.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
function traverse(x)
if type(x) == "table" then
for i, v in ipairs(x) do
traverse(v)
end
else
coroutine.yield(x)
end
end
function coroutine_traverse(x)
return coroutine.wrap(function()
traverse(x)
end)
end
local list = {1, {{2, 3}, {4, 5}}, {6, 7, 8}}
for value in coroutine_traverse(list) do
print(value)
end
Python generators (stackless, asymmetric)
Python generators are a built-in feature since Python 2.x. They are single-frame coroutines.
A function that contains a yield
keyword is considered a
generator function. Calling a generator function will create a new generator
object stopped at the beginning of the function, and can be resumed with the
next(...)
built-in function.
Being stackless, each coroutine has only one frame, so it cannot yield while calling another function. To implement recursive traversal with stackless coroutines, it is common to create one generator for each level of nested list, and yield values from the innermost coroutine to the outer coroutine, level by level.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
def traverse(x):
if isinstance(x, list):
for elem in x:
new_gen = traverse(elem) # Create the next level of generator
try:
while True:
v = next(new_gen)
yield v # Yield everything the inner generator yields
except StopIteration: # until iteration stops.
pass
else:
yield x
DATA = [1, [[2, 3], [4, 5]], [6, 7, 8]]
gen = traverse(DATA) # The top-level generator
try:
while True:
v = next(gen) # Iterate through it
print(v)
except StopIteration: # until iteration stops.
pass
Python’s for
statement is a syntax sugar for calling next(...)
until the exception StopIteration
is thrown. The yield
from
statement is a syntax sugar for yielding everything from
another generator. Using all the syntax sugar, the code above will become the
following:
1
2
3
4
5
6
7
8
9
10
11
def traverse(x):
if isinstance(x, list):
for elem in x:
yield from traverse(elem) # Use "yield from" to yield everything.
else:
yield x
DATA = [1, [[2, 3], [4, 5]], [6, 7, 8]]
for v in traverse(DATA):
print(v)
Python coroutines (WTF?)
Python 3.5 attempts to introduce async/await-based asynchronous programming
mechanisms, but it used the word “coroutine” to refer to
functions annotated with the async
keyword, like async def foo(...)
, which
is confusing. Async functions may contain the new await
expression, but its semantics is very vaguely defined as “suspend
the execution of coroutine on an awaitable object”, whatever “on an awaitable
object” means. That is in stark contrast to the highly detailed semantics of
co_await
expression in C++20 and the .await
expression in
Rust.
In the face of ambiguity, refuse the temptation to guess.
Because it is so confusing, I am not going to do the task using Python “coroutines”.
Python greenlet (stackful, symmetric)
Symmetric
The greenlet library provides stackful symmetric coroutines.
Greenlets are stackful. According to the documentation:
A “greenlet” is a small independent pseudo-thread. Think about it as a small stack of frames; the outermost (bottom) frame is the initial function you called, and the innermost frame is the one in which the greenlet is currently paused.
Greenlets are symmetric. One greenlet can switch to another using the
glet.switch()
method to pass a value, or
glet.throw()
to switch and immediately raise an exception.
There are implementations of Greenlets for both CPython and PyPy.
The official greenlet implementation for CPython uses platform-specific assembly code (for amd64, aarch64, riscv, etc.) to switch native stacks, similar to what [Boost Context] does.
PyPy implements the greenlet API using stacklets, which are PyPy’s own swap-stack mechanism. Like [Boost Context] and the official greenlet for CPython, PyPy also uses platform-specific assembly code (for x86-64, aarch64, mips64, etc. Sorry, RISC-V.) to switch between stacks.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
import greenlet
def traverse(x, parent):
if isinstance(x, list):
for elem in x:
traverse(elem, parent) # recursive call
else:
parent.switch(x) # can switch at any level of stack
def greenlet_traverse(x):
current = greenlet.getcurrent() # remember the current coroutine
def _traverse_x():
traverse(x, current) # and pass it as the parent
current.throw(StopIteration)
return greenlet.greenlet(_traverse_x)
DATA = [1, [[2, 3], [4, 5]], [6, 7, 8]]
glet = greenlet_traverse(DATA)
try:
while True:
v = glet.switch() # switch to the greenlet
print(v)
except StopIteration:
pass
Emulate asymmetric coroutine using the parent
field
We have just demonstrated that greenlets are symmetric. However, each greenlet
has a parent. It is the coroutine to switch to when the
current coroutine terminates, normally or by exception. However, it doesn’t
mean greenlets are asymmetric because the parent can be changed at any time
during execution, and it is not wrong to explicitly switch
to the parent.
We can rewrite our last example and use the glet.parent
field instead of our
own parent
variable to record the parent.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
import greenlet
def traverse(x):
if isinstance(x, list):
for elem in x:
traverse(elem)
else:
greenlet.getcurrent().parent.switch(x) # switch to parent
def greenlet_traverse(x):
def _traverse_x():
traverse(x)
raise StopIteration()
return greenlet.greenlet(_traverse_x) # The parent is the current greenlet
DATA = [1, [[2, 3], [4, 5]], [6, 7, 8]]
glet = greenlet_traverse(DATA)
try:
while True:
v = glet.switch() # switch to the greenlet
print(v)
except StopIteration:
pass
JavaScript generators (stackless, asymmetric)
The function*
declaration defines a generator
function. Generator functions can have yield
operator that
pauses the execution of the coroutine.
When a generator function called, it creates a generator object. It can be used
like an iterator. The next
method switches to the coroutine.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
function* traverse(x) {
if (Array.isArray(x)) {
for (const elem of x) {
for (const y of traverse(elem)) {
yield y; // Yield what the inner layer yields.
}
}
} else {
yield x;
}
}
const DATA = [1, [[2, 3], [4, 5]], [6, 7, 8]];
let gen = traverse(DATA);
for (;;) {
const result = gen.next(); // Resumes the coroutine.
if (result.done) {
break;
}
console.log(result.value);
}
And there are syntax sugars. The yield*
operator yields
everything from another generator. Because a generator behaves like an
iterator, the for-of
statement can iterate through the values it yields.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
function* traverse(x) {
if (Array.isArray(x)) {
for (const elem of x) {
yield* traverse(elem)
}
} else {
yield x
}
}
const DATA = [1, [[2, 3], [4, 5]], [6, 7, 8]]
for (const v of traverse(DATA)) {
console.log(v);
}
JavaScript async/await (stackless, asymmetric, asynchronous)
JavaScript provides asynchronous programming facilities in the form of
async/await (see appendix). An async
function always returns a Promise
object which
can be settled (fulfilled or rejected) later. An async
function may contain
await
operators which cause async function execution to pause
until its operand (a Promise
) is settled, and resume execution after
fulfilment.
Asynchronous programming is more like cooperative multi-tasking than coroutines.
Despite the difference, I now give an example of traversing nested lists using async/await. I create two concurrent tasks, one traverses the nested list, and the other prints the numbers, and they communicate through a “zero-capacity queue”. It is similar to multi-thread programming, except there is only one thread executing both tasks in alternation, and the execution is scheduled by a scheduler.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
// A zero-capacity queue.
// `enqueue` will block until another task calls `dequeue`,
// and `dequeue` will block until another task calls `enqueue`.
class ZeroQueue {
constructor() {
this.getter_resolver = null
this.data = null
this.setter_resolver = null
}
async enqueue(num) {
if (this.getter_resolver != null) {
// If a consumer came before, we satisfy it.
this.getter_resolver(num)
this.getter_resolver = null
} else {
// If we come first, we leave the value and wait until consumed.
this.data = num
await new Promise((resolve) => {
this.setter_resolver = resolve
})
}
}
async dequeue() {
if (this.setter_resolver != null) {
// If a producer already came, we take the value and let it continue.
this.setter_resolver(null)
this.setter_resolver = null
return this.data
} else {
// If we come first, we wait for the producer.
return await new Promise((resolve) => {
this.getter_resolver = resolve
})
}
}
}
const queue = new ZeroQueue()
async function traverse(x) {
if (Array.isArray(x)) {
for (const elem of x) {
await traverse(elem)
}
} else {
// await may potentially yield,
// giving the user an impression of block-waiting.
await queue.enqueue(x)
}
}
async function print_all() {
for (;;) {
// await may potentially yield,
// giving the user an impression of block-waiting.
const v = await queue.dequeue()
if (v == "end") {
return
} else {
console.log(v)
}
}
}
const DATA = [1, [[2, 3], [4, 5]], [6, 7, 8]]
// The first task traverses the list and signal termination.
traverse(DATA).then(() => {
queue.enqueue("end")
})
// The second task keep polling till the end.
print_all()
JavaScript async generators (stackless, asymmetric, asynchronous)
Functions annotated with async function*
defines an async generator function.
An async generator is like a generator, but the .next()
method returns a Promise
so it can be awaited. This allows the generator to
use await
while iterating. It can also use the for await ... of
statement as a syntax sugar.
This practice is like building coroutine on top of async/await on top of coroutine, which looks ugly to me. Anyway, here is the code:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
async function* traverse(x) {
if (Array.isArray(x)) {
for (const elem of x) {
yield* await traverse(elem)
}
} else {
yield x
}
}
const DATA = [1, [[2, 3], [4, 5]], [6, 7, 8]]
async function main() {
for await (const v of traverse(DATA)) {
console.log(v);
}
}
main()
Rust async/await (stackless, asymmetric, asynchronous)
Rust’s async
and await
keywords provides support for asynchronous
programming (see appendix) based on stackless asymmetric
coroutines. There is a dedicated book that covers
asynchronous programming in Rust.
An async
function or an async
block,
when executed, do not execute their bodies immediately, but creates an object
that holds the execution context of that function or block. Each async function
or block is represented to the user as a Future
. The Future::poll
method
will resume the async thing until it yields (on an await
site) or finishes.
The await
expression can only be used in async
functions or
blocks. Its semantics is complicated but well-documented. It
calls Future::poll
on a Future
object and, if the Future
is ready, it
grabs its value continues without yielding; otherwise, it yields from the
current async
function or block. When resumed, it will poll the Future
again and may or may not yield depending on whether the Future
is ready.
Implementation-wise, the documentation suggests that
Rust decomposes an async
function (or block) into a state machine (see
appendix) where each state represents an await
site.
Async/await is not supposed to be used like coroutines. In fact, the book Asynchronous Programming in Rust contrasts async/await against coroutines. I have given an example in JavaScript that traverses nested list using async/await using two tasks and a channel. It is possible to do the same in Rust, but that’ll need a scheduler. Since I am too lazy to write a scheduler or introduce a third-party scheduler, I’ll try a different approach here.
I’ll abuse the async/await mechanism to exploit its underlying coroutine and make it behave like a generator.
-
Resume: We know that
Future::poll
resumes the coroutine. We callFuture::poll
directly, which is seldom done in usual async/await-based programs unless we are implementing the “executor” (i.e. scheduler). -
Yield: Each
.await
corresponds to a yield site. We customise the behaviour of ourFuture
object (i.e.WaitUntilResultTaken
) so that the.await
always yields (Pending
) when reached from within the coroutine, but will continue (Ready
) when resumed from the main function. The behaviour is controlled by theresult_taken
variable.
The async fn traverse
will recursively call itself in the .await
expression,
and will yield level by level through all the .await
sites to the main
function.
Then we have a problem. Whenever it yields, the value yielded to the .poll()
call site in main
is always Poll::Pending
. Then how do we pass the
traversed numbers to main
? To work around this, the coroutine writes the
number in a shared variable result_reporter.num
before it yields. This makes
the ResultReporter
struct effectively behave like a “zero-capacity queue” in
our previous example, but it connects the generator and
the main function instead of two tasks.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
use async_recursion::async_recursion;
use std::future::Future;
use std::pin::Pin;
use std::sync::atomic::{AtomicBool, AtomicI32, Ordering};
use std::task::{Context, Poll};
#[derive(Default)]
struct ResultReporter {
num: AtomicI32, // Rust is having problem figuring out whether the coroutine races with main,
result_taken: AtomicBool, // so we just use atomic variables here.
}
impl ResultReporter {
fn report_result<'a>(&'a self, num: i32) -> WaitUntilResultTaken<'a> {
self.num.store(num, Ordering::Relaxed);
self.result_taken.store(false, Ordering::Relaxed);
WaitUntilResultTaken {
result_taken: &self.result_taken,
}
}
fn take_result(&self) -> i32 {
let num = self.num.load(Ordering::Relaxed);
self.result_taken.store(true, Ordering::Relaxed);
num
}
}
struct WaitUntilResultTaken<'a> {
result_taken: &'a AtomicBool,
}
impl<'a> Future for WaitUntilResultTaken<'a> {
type Output = ();
fn poll(self: Pin<&mut Self>, _: &mut Context<'_>) -> Poll<Self::Output> {
if self.result_taken.load(Ordering::Relaxed) {
Poll::Ready(())
} else {
Poll::Pending
}
}
}
enum NestedList {
Leaf(i32),
Nested(Vec<NestedList>),
}
#[async_recursion]
async fn traverse(list: &NestedList, reporter: &ResultReporter) {
match list {
NestedList::Leaf(num) => {
// This `await` expression will call `WaitUntilResultTaken::poll()` twice.
// The first time is when we reach the `await` here. It returns `Poll::Pending` so we yield.
// The second time is when `main` calls `poll`. It returns `Poll::Ready(())` so we continue.
reporter.report_result(*num).await
}
NestedList::Nested(lists) => {
for elem in lists {
// This `await` will pass the `Poll::Pending` to the caller level by level until it reaches `main`.
traverse(elem, reporter).await
}
}
}
}
fn main() {
// [1, [[2, 3], [4, 5]], [6, 7, 8]]
let nested_list = NestedList::Nested(vec![
NestedList::Leaf(1),
NestedList::Nested(vec![
NestedList::Nested(vec![NestedList::Leaf(2), NestedList::Leaf(3)]),
NestedList::Nested(vec![NestedList::Leaf(4), NestedList::Leaf(5)]),
]),
NestedList::Nested(vec![
NestedList::Leaf(6),
NestedList::Leaf(7),
NestedList::Leaf(8),
]),
]);
let result_reporter = ResultReporter::default();
let mut coro = traverse(&nested_list, &result_reporter);
let null_ctx = unsafe { &mut *std::ptr::null_mut::<Context>() };
loop {
let coro_p = unsafe { Pin::new_unchecked(&mut coro) };
let poll_result = coro_p.poll(null_ctx); // Let the coroutine run.
match poll_result {
Poll::Pending => {
// When pausing (at `await` sites) during execution, we get the result.
let num = result_reporter.take_result();
println!("{}", num);
}
Poll::Ready(()) => {
// When finished, we just quit.
break;
}
}
}
}
Rust coroutine crate (stackful, asymmetric)
The third-part crate coroutine
provides stackful asymmetric
coroutines. It is built upon the context
crate (see below).
It looks like this crate has not been maintained for quite some time and it
won’t compile. I’ll not do the task using the coroutine
crate. If you are
interested, their documentation contains some examples.
Rust context crate (stackful, symmetric)
The third-part crate context
is similar to [Boost Context]. It
provides the abstraction of stackful symmetric coroutines.
It implements swap-stack using machine-specific assembly code (x86-64, AArch64, ppc64, sorry RISC-V).
The Context::resume
method switches the thread to the
other coroutine, and pass the context of the original coroutine so the thread
can switch back.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
use context::stack::ProtectedFixedSizeStack;
use context::{Context, Transfer};
#[derive(Debug)]
enum NestedList {
Leaf(i32),
Nested(Vec<NestedList>),
}
fn traverse(list: &NestedList, t: &mut Option<Transfer>) {
match list {
NestedList::Leaf(num) => {
// Context-switching consumes the `Context` object. We have to take and replace it.
let old_t = t.take().unwrap();
let new_t = unsafe { old_t.context.resume(*num as usize) };
*t = Some(new_t);
}
NestedList::Nested(lists) => {
for elem in lists {
// This is stackful coroutine. We can do recursive function call.
traverse(elem, t)
}
}
}
}
fn main() {
// [1, [[2, 3], [4, 5]], [6, 7, 8]]
let nested_list = NestedList::Nested(vec![
NestedList::Leaf(1),
NestedList::Nested(vec![
NestedList::Nested(vec![NestedList::Leaf(2), NestedList::Leaf(3)]),
NestedList::Nested(vec![NestedList::Leaf(4), NestedList::Leaf(5)]),
]),
NestedList::Nested(vec![
NestedList::Leaf(6),
NestedList::Leaf(7),
NestedList::Leaf(8),
]),
]);
extern "C" fn context_function(t: Transfer) -> ! {
// The initial Transfer carries the pointer to the list as `data`.
let list = unsafe { &*(t.data as *const NestedList) };
// From now on, we'll frequently take and replace the Transfer. Use Option.
let mut t_holder = Some(t);
traverse(list, &mut t_holder);
// Send a special value to indicate the end of traversal.
let t = t_holder.unwrap();
unsafe { t.context.resume(usize::MAX) };
unreachable!();
}
let stack = ProtectedFixedSizeStack::default();
let mut t = Transfer::new(unsafe { Context::new(&stack, context_function) }, 0);
// The initial `resume` sends the list reference as a usize.
t = unsafe { t.context.resume(&nested_list as *const NestedList as usize) };
// Use this special value to indicate end of traversal.
while t.data != usize::MAX {
println!("{}", t.data);
// Subsequent `resume` doesn't need to carry values.
t = unsafe { t.context.resume(0usize) };
}
}
C# iterators (stackless, asymmetric)
C# can implement iterators using the yield return
or yield break
statements.
A function that contains yield
returns an Enumerable<T>
or
Enumerator<T>
which is resumed every time an item is requested.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
using System;
using System.Collections.Generic;
interface NestedList
{
IEnumerable<int> Traverse();
}
class Leaf : NestedList
{
private int num;
public Leaf(int num)
{
this.num = num;
}
public IEnumerable<int> Traverse()
{
yield return num;
}
}
class Branch : NestedList
{
private List<NestedList> children;
public Branch()
{
children = new List<NestedList>();
}
public Branch Add(NestedList child)
{
children.Add(child);
return this;
}
public IEnumerable<int> Traverse()
{
foreach (var child in children)
{
// It is stackless. We need to yield from the inner iterators.
foreach (var y in child.Traverse())
{
yield return y;
}
}
}
}
public class IteratorTraversal
{
public static void Main()
{
// [1, [[2, 3], [4, 5]], [6, 7, 8]]
var nestedList = new Branch()
.Add(new Leaf(1))
.Add(new Branch()
.Add(new Branch()
.Add(new Leaf(2))
.Add(new Leaf(3)))
.Add(new Branch()
.Add(new Leaf(4))
.Add(new Leaf(5))))
.Add(new Branch()
.Add(new Leaf(6))
.Add(new Leaf(7))
.Add(new Leaf(8)));
foreach (var n in nestedList.Traverse()) // Create the iterator
{
Console.WriteLine(n);
}
}
}
C# async/await (stackless, asymmetric, asynchronous)
C# supports asynchronous programming (see appendix). Like async/await in any other language, its programming model is more like cooperative multi-task programming than coroutines. It is possible to implement asynchronous traversal using an AsyncQueue to communicate between the traversal task and a consumer task that consumes the visited values. I am not going to give an example here.
C swapcontext (stackful, symmetric)
The C programming language itself doesn’t have any support for coroutines or swap-stack.
The POSIX function makecontext
can create a “context” for a
function on a given stack so that when a subsequent invocation of
swapcontext
swaps to that context, it will continue
execution from the beginning of the given function on the given stack. This
effectively provides a swap-stack mechanism, and can be used to implement
symmetric coroutines.
One design flaw of makecontext
is that it only supports passing int
arguments to the given function. Because the size of int
is
platform-specific, it is hard to pass pointers across makecontext
. According
to the man page, POSIX 2008 removed makecontext
and
swapcontext
, citing portability issues, and recommended the use of POSIX
threads.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>
#include <ucontext.h>
struct Node {
int value;
struct Node *first_child;
struct Node *next_sibling;
};
// [1, [[2, 3], [4, 5]], [6, 7, 8]]
struct Node node8 = { .value = 8, .first_child = NULL, .next_sibling = NULL };
struct Node node7 = { .value = 7, .first_child = NULL, .next_sibling = &node8 };
struct Node node6 = { .value = 6, .first_child = NULL, .next_sibling = &node7 };
struct Node node678 = { .value = -1, .first_child = &node6, .next_sibling = NULL };
struct Node node5 = { .value = 5, .first_child = NULL, .next_sibling = NULL };
struct Node node4 = { .value = 4, .first_child = NULL, .next_sibling = &node5 };
struct Node node45 = { .value = -1, .first_child = &node4, .next_sibling = NULL };
struct Node node3 = { .value = 3, .first_child = NULL, .next_sibling = NULL };
struct Node node2 = { .value = 2, .first_child = NULL, .next_sibling = &node3 };
struct Node node23 = { .value = -1, .first_child = &node2, .next_sibling = &node45 };
struct Node node2345 = { .value = -1, .first_child = &node23, .next_sibling = &node678 };
struct Node node1 = { .value = 1, .first_child = NULL, .next_sibling = &node2345 };
struct Node node12345678 = { .value = -1, .first_child = &node1, .next_sibling = NULL };
ucontext_t main_context;
ucontext_t coro_context;
int current_value;
bool finished;
void do_yield(int value) {
current_value = value;
swapcontext(&coro_context, &main_context); // switch back to main
}
void visit_node(struct Node *node) {
for (struct Node *current = node; current != NULL; current = current->next_sibling) {
if (current->first_child == NULL) {
do_yield(current->value);
} else {
visit_node(current->first_child); // recursive call
}
}
}
void traverse() {
finished = false;
visit_node(&node12345678);
finished = true;
}
int main() {
getcontext(&coro_context);
// Obviously, it is stackful.
void *stack = malloc(65536); // Not sure if 65536 is enough, though.
// If we are careful enough, we should use mprotect
// to create a PROT_NONE region to protect against
// stack overflow.
coro_context.uc_stack = (stack_t){ .ss_sp = stack, .ss_size = 4096 };
coro_context.uc_link = &main_context;
makecontext(&coro_context, (void(*)())traverse, 0);
while (true) {
swapcontext(&main_context, &coro_context); // switch to coroutine
if (finished) {
break;
} else {
printf("%d\n", current_value);
}
}
return 0;
}
C++ coroutine (stackless, asymmetric, asynchronous)
C++20 introduced “coroutines”. More precisely, it introduced mechanisms so that libraries can implement asynchronous programming on top of them.
Any function that contains co_await
, co_yield
and/or co_return
are
coroutine functions.
Calling a coroutine function behaves like the pseudo-code defined in Section 9.5.4 (dcl.fct.def.coroutine) paragraph 5.
{
promise-type promise promise-constructor-arguments ;
try {
co_await promise.initial_suspend() ;
function-body
} catch ( ... ) {
if (!initial-await-resume-called)
throw ;
promise.unhandled_exception() ;
}
final-suspend :
co_await promise.final_suspend() ;
}
Basically, it implicitly creates a “promise object” which is called (awaited) at
different points of the coroutine execution, like “hooks” or aspect-oriented
programming. The compiler will generate those promise.xxxx()
method calls,
and it is the programmer’s (or library writer’s) responsibility to define the
“promise type” and the .initial_suspend()
, .unhandled_exception()
, and
.final_suspend()
methods to make the generated calls meaningful.
And evaluation a co_await
expression behaves as defined in Section 7.6.2.3
(expr.await) paragraph 5. It takes an “awaiter” (more
precisely, anything that can be converted to an “awaiter”) as operand, and
- It calls
awaiter.await_ready()
.- If it returns true, it calls
awaiter.await_resume()
, and that’s the value of theco_await
expression. - If it returns false, it calls
awaiter.await_suspend()
, and depending on its result, it may suspend the execution of the coroutine, or suspend and switch to another coroutine, or just continue execution.
- If it returns true, it calls
It’s the “conditionally yielding” behaviour of common async/await-based
programming. Again the compiler generates the above method calls, and it is
again the programmer’s (or library writer’s) responsibility to implement the
.await_ready()
, .await_resume()
and .await_suspend()
methods to make those
generated calls meaningful.
The co_yield
expression calls promise.yield_value(x)
,
and programmer (or library writer) shall implement the promise object to make
.yield_value
method meaningful.
And the co_return
statement calls
promise.return_void()
, and the programmer (or library writer) shall implement
the promise object to make .return_void
method meaningful.
And the standard library function coroutine_handle::resume()
resumes a paused
“coroutine”.
As we can see
- the specification defines the semantics of coroutine functions, the
co_await
,co_yield
andco_return
expressions/statements, and thecoroutine_handle
library object and its.resume()
method, and - the compiler (GCC, Clang, etc.) generate code for the
co_xxx
expression/statements, and - the programmer defines the promise type and (optionally) “awaiter” types and fills in lots and lots of methods to customise the behaviour.
How complicated C++20 “coroutines” are!
It is complicated enough to confuse a C++ programmer with 25-years’ experience.
And I admit I am not smart enough to use C++20 coroutines.
If you are not smart enough, either, but want to learn about C++20 coroutines, I recommend starting with another language, such as Ruby or Lua, and come back to C++20 when the idea of coroutines don’t scare you.
Anyway, here is the code. The following code tries to use C++20 coroutines as generators.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
#include <iostream>
#include <coroutine>
struct Node {
int value;
Node *first_child;
Node *next_sibling;
};
// [1, [[2, 3], [4, 5]], [6, 7, 8]]
Node node8 = { .value = 8, .first_child = nullptr, .next_sibling = nullptr };
Node node7 = { .value = 7, .first_child = nullptr, .next_sibling = &node8 };
Node node6 = { .value = 6, .first_child = nullptr, .next_sibling = &node7 };
Node node678 = { .value = -1, .first_child = &node6, .next_sibling = nullptr };
Node node5 = { .value = 5, .first_child = nullptr, .next_sibling = nullptr };
Node node4 = { .value = 4, .first_child = nullptr, .next_sibling = &node5 };
Node node45 = { .value = -1, .first_child = &node4, .next_sibling = nullptr };
Node node3 = { .value = 3, .first_child = nullptr, .next_sibling = nullptr };
Node node2 = { .value = 2, .first_child = nullptr, .next_sibling = &node3 };
Node node23 = { .value = -1, .first_child = &node2, .next_sibling = &node45 };
Node node2345 = { .value = -1, .first_child = &node23, .next_sibling = &node678 };
Node node1 = { .value = 1, .first_child = nullptr, .next_sibling = &node2345 };
Node node12345678 = { .value = -1, .first_child = &node1, .next_sibling = nullptr };
struct Traverser {
struct promise_type;
using handle_type = std::coroutine_handle<promise_type>;
struct promise_type {
int current_value_ = -1;
bool finished_ = false;
// This is executed when the "coroutine" is created.
Traverser get_return_object() {
return Traverser(handle_type::from_promise(*this));
}
// Called at the beginning of the coroutine.
// We let it stop there to mimic Python generator behaviour.
std::suspend_always initial_suspend() {
return {};
}
// Called when the coroutine finished execution.
std::suspend_always final_suspend() noexcept {
// We set a variable so the main function knows it finished.
finished_ = true;
return {};
}
// Called when a co_yield expression is evaluated.
std::suspend_always yield_value(int value) {
current_value_ = value;
return {};
}
// Called when an exception is thrown.
void unhandled_exception() {
std::terminate();
}
};
handle_type handle_;
Traverser(handle_type handle) : handle_(handle) {}
void resume() {
handle_.resume();
}
bool finished() const {
return handle_.promise().finished_;
}
int get_value() {
return handle_.promise().current_value_;
}
};
Traverser visit_node(Node *node) {
for (Node *current = node; current != nullptr; current = current->next_sibling) {
if (current->first_child == nullptr) {
// Yield.
co_yield current->value;
} else {
// It is stackless. We need multiple levels of coroutines.
auto sub_traverser = visit_node(current->first_child);
while (true) {
sub_traverser.resume();
if (sub_traverser.finished()) {
break;
}
// Yield from sub-coroutine.
co_yield sub_traverser.get_value();
}
}
}
}
int main() {
auto traverser = visit_node(&node12345678);
while (true) {
traverser.resume();
if (traverser.finished()) {
break;
}
std::cout << traverser.get_value() << std::endl;
}
return 0;
}
C++ Boost.Coroutine2 (stackful, asymmetric)
Note: Boost.Coroutine is deprecated in favour for Boost.Coroutine2.
Boost.Coroutine2 provides stackful asymmetric coroutines. It is implemented on top of Boost.Context (see below).
A boost::coroutines2::coroutine<T>
has two members: pull_type
and
push_type
. A coroutine can be created by instantiating either of them. In
this task, we will create a pull_type
so that the main function can pull data
from the coroutine. It will create a coroutine which receives a &push_type
so
that it can yield and push data back to the main function.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
#include <iostream>
#include <boost/coroutine2/coroutine.hpp>
struct Node {
int value;
Node *first_child;
Node *next_sibling;
};
// [1, [[2, 3], [4, 5]], [6, 7, 8]]
Node node8 = { .value = 8, .first_child = nullptr, .next_sibling = nullptr };
Node node7 = { .value = 7, .first_child = nullptr, .next_sibling = &node8 };
Node node6 = { .value = 6, .first_child = nullptr, .next_sibling = &node7 };
Node node678 = { .value = -1, .first_child = &node6, .next_sibling = nullptr };
Node node5 = { .value = 5, .first_child = nullptr, .next_sibling = nullptr };
Node node4 = { .value = 4, .first_child = nullptr, .next_sibling = &node5 };
Node node45 = { .value = -1, .first_child = &node4, .next_sibling = nullptr };
Node node3 = { .value = 3, .first_child = nullptr, .next_sibling = nullptr };
Node node2 = { .value = 2, .first_child = nullptr, .next_sibling = &node3 };
Node node23 = { .value = -1, .first_child = &node2, .next_sibling = &node45 };
Node node2345 = { .value = -1, .first_child = &node23, .next_sibling = &node678 };
Node node1 = { .value = 1, .first_child = nullptr, .next_sibling = &node2345 };
Node node12345678 = { .value = -1, .first_child = &node1, .next_sibling = nullptr };
typedef boost::coroutines2::coroutine<int> coro_t;
void visit_node(Node *node, coro_t::push_type &sink) {
for (Node *current = node; current != nullptr; current = current->next_sibling) {
if (current->first_child == nullptr) {
sink(current->value); // Yield at any level. It's stackful!
} else {
visit_node(current->first_child, sink); // Recursive call.
}
}
}
int main() {
auto traverser = coro_t::pull_type([](coro_t::push_type &sink) {
visit_node(&node12345678, sink);
});
for (auto value : traverser) { // Can be used as iterator using begin() and end().
std::cout << value << std::endl;
}
return 0;
}
C++ Boost.Context (stackful, symmetric)
Boost.Context implements a swap-stack mechanism using machine-dependent
assembly language(x86-64, ARM64,
RISCV64, etc.). We can also implement symmetric coroutines
using its C++ API. A boost::coroutine::fiber
represents a coroutine, and
fiber::resume()
switches to that coroutine. Because fiber::resume()
doesn’t
pass values, we need to use a side channel (the shared State
object) to pass
the value and indicate that the traversal has finished.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
#include <iostream>
#include <boost/coroutine2/coroutine.hpp>
struct Node {
int value;
Node *first_child;
Node *next_sibling;
};
// [1, [[2, 3], [4, 5]], [6, 7, 8]]
Node node8 = { .value = 8, .first_child = nullptr, .next_sibling = nullptr };
Node node7 = { .value = 7, .first_child = nullptr, .next_sibling = &node8 };
Node node6 = { .value = 6, .first_child = nullptr, .next_sibling = &node7 };
Node node678 = { .value = -1, .first_child = &node6, .next_sibling = nullptr };
Node node5 = { .value = 5, .first_child = nullptr, .next_sibling = nullptr };
Node node4 = { .value = 4, .first_child = nullptr, .next_sibling = &node5 };
Node node45 = { .value = -1, .first_child = &node4, .next_sibling = nullptr };
Node node3 = { .value = 3, .first_child = nullptr, .next_sibling = nullptr };
Node node2 = { .value = 2, .first_child = nullptr, .next_sibling = &node3 };
Node node23 = { .value = -1, .first_child = &node2, .next_sibling = &node45 };
Node node2345 = { .value = -1, .first_child = &node23, .next_sibling = &node678 };
Node node1 = { .value = 1, .first_child = nullptr, .next_sibling = &node2345 };
Node node12345678 = { .value = -1, .first_child = &node1, .next_sibling = nullptr };
struct State {
int current_value = -1;
bool finished = false;
};
void visit_node(Node *node, State &state, boost::context::fiber &sink) {
for (Node *current = node; current != nullptr; current = current->next_sibling) {
if (current->first_child == nullptr) {
state.current_value = current->value;
sink = std::move(sink).resume(); // Yield at any level. It's stackful!
} else {
visit_node(current->first_child, state, sink); // Recursive call.
}
}
}
int main() {
State state;
auto traverser = boost::context::fiber([&state](boost::context::fiber &&sink) {
visit_node(&node12345678, state, sink);
state.finished = true;
return std::move(sink);
});
while (true) {
traverser = std::move(traverser).resume(); // Resume the coroutine.
if (state.finished) {
break;
}
std::cout << state.current_value << std::endl;
}
return 0;
}
Appendices
Why this task?
The purpose of this task is to compare symmetric, asymmetric, stackful and stackless coroutines.
This task is natural to implement with coroutines, because the traversal of a data structure is relatively independent from the consumption of the values.
This task is also much easier with stackful coroutines than stackless coroutines. Because the data structure (nested lists) is recursive, it is easier to traverse it using a recursive algorithm, and recursion needs a stack. Stackful coroutines can handle recursive calls quite trivially. But when using stackless coroutines, we have to do some hack and chain up multiple coroutines to form a stack of coroutines, and yield values from the innermost coroutine through multiple layers of coroutines to the consumer.
From the code examples above, we can clearly see the difference between different kinds of coroutines.
What are coroutines anyway?
According to Conway, a coroutine is “an autonomous program which communicates with adjacent modules as if they were input or output subroutines”. Coroutines are subroutines all at the same level, each acting as if it were the master program when in fact there is no master program.
Coroutines has the following characteristics. (See Coroutines in Lua)
- The values of data local to a coroutine persist between successive calls.
- The execution of a coroutine is suspended as control leaves it, only to carry on where it left off when control re-enters the coroutine at some later stage.
The first characteristic means coroutines can be resumed from where it paused.
In the second characteristic, “as control leaves it” means it is the programmer that decides where to pause a coroutine, not the implicit scheduler. Control flow is part of a program, not the runtime.
For this reason, the “fibers” in Ruby, the “generators” in Python, the
“threads” in Lua, and the “user contexts” in the swapcontext
POSIX API are all
coroutines, despite not being called “coroutines”. The programmer explicitly
transfers control from one to another.
On the other hand, a “goroutine” in the Go programming language is not a coroutine, despite the similar name. In fact, the official document defines a “goroutine” as “an independent concurrent thread of control”, i.e. a thread.
Note: Goroutine is a threading mechanism implemented in the user space, an “M × N” green thread system. The Go runtime schedules M goroutines on N native threads. It is the scheduler that switches a native thread between different goroutines at unspecified time and locations, usually when IO operations may block, or when queues are full or empty. To the programmer, a goroutine is just like a thread: it keeps going forward. It may block, but not because the programmer asked it to yield, but because some requests cannot be satisfied, such as queues and IO.
Coroutines are not mutually exclusive with threads. Each thread is executing one coroutine at a time, and each thread may jump from one coroutine to another according to the control flow in the program.
Symmetric and asymmetric coroutines.
With symmetric coroutines, all coroutines are equal. A thread can jump from any coroutine to any other coroutine. When switching, the programmer always needs to specify which coroutine to jump to, i.e. destination.
With asymmetric coroutines, coroutines have a parent-child relation. There
are two different operations that jumps between coroutines, namely resume
and
yield
. When a thread “resumes” a coroutine, the destination becomes the child
of the source coroutine, until it “yields”, when the thread jumps back from the
child to the parent coroutine. A coroutine cannot be resumed when it already
has a parent. When yielding, the programmer doesn’t need to specify the
destination, because the destination is always implicitly the parent coroutine.
Symmetric and asymmetric coroutines have equal expressive power, and they can implement each other. See Coroutines in Lua for more details.
Stackful and stackless coroutines.
A stackful coroutine has its own execution stack. A thread can switch coroutine when the current coroutine has any number of frames on its stack. When switching, the thread saves the entire stack and switches to a whole new stack (implementation-wise, it only needs to save registers, change the stack pointer, and restore registers from the new stack).
A stackless coroutine has only one frame. Therefore, a coroutine is usually defined by a “coroutine function”. From my knowledge, all stackless coroutines are asymmetric. A coroutine function can only yield within the coroutine function itself. That means when a coroutine C1 calls another function F2, the thread cannot yield from C1 while executing F2; when a coroutine C1 resumes another coroutine C2, the thread can only yield from C2 back to C1, but not directly from C2 to C1’s parent.
Stackful coroutines are more powerful, but need some kind of “swap-stack” mechanism (see below) to implement. Stackless coroutines are more restricted, but does not require swap-stack.
The swap-stack mechanism
Conceptually, the context of nested function calls is a stack, a last-in-first-out data structure, called the control stack, often simply called a stack.
A control stack has many frames. Each frame contains the execution context of a function activation, (and this is why a frame is also known as an activation record). The context includes the program counter (PC) as well as the values of local variables. The top frame is the context of the most recently entered function, and is the only frame on a stack that is active. All other frames are paused at a call site, waiting for the called function (the frame above it) to return.
In most programming languages, a thread is always bound to one stack. But more generally, a thread can switch among different stacks. When a thread switches from one stack to another, it pauses the top frame as well making the whole stack paused. Then it switches its stack pointer to the new stack, and resume the top frame of the new stack, therefore continue from where that frame was paused.
This is basically what stackful coroutine does. Each coroutine has a stack, which can be paused and resumed. When resumed, it continues from where it was paused.
Implementing swap-stack with compiler
Implementation-wise, if the language is compiled, it needs a special instruction sequence that does the following things:
- Save live registers on the top of the current stack, and
- set the stack pointer (SP) register to the destination stack, and
- restore the saved registers from the top of the destination stack.
The C and C++ programming languages themselves do not have support for swap-stack. In practice, we usually rely on libraries or compiler extensions to do that in C or C++.
Here I give two examples of implementations of swap-stack for compiled code.
-
One is from Boost.Context. It is implemented as a library in the assembly language, therefore it has to be platform-specific. Here are the code for x86-64, ARM64 and RISCV64. Because it is implemented as a library, it can only depend on the application binary interface (ABI) of the platform. In this case, the assembly code has to conservatively save all callee-saved registers no matter whether they are still in use or not, because as a library, it does not have the liveness information the compiler has.
-
The other is an LLVM extension created by Dolan et al.. As part of a compiler framework, it can identify and save only live registers, making it much more efficient than library-based approaches.
Implementing swap-stack with interpreter
If the language is interpreted, it depends. An interpreter can be stackful or stackless, and even stackless interpreters can allow the interpreted functions to call foreign C functions.
A stackful interpreter uses the native (C) stack to implement function invocation. Such interpreters usually has the following form:
void interpret_function(Frame *frame) {
while (true) {
switch (current_instruction().type()) {
case ...:
...
case CALL:
...
Frame *new_frame = create_frame(called_function);
interpret_function(new_frame); // Recursive call
...
}
}
}
Because a language-level function call corresponds to an interpreter-level (C-level) function call, each frame in the interpreted language corresponds to a C frame. It is difficult to implement swap-stack with stackful interpreter because it needs to swap out the C stack (which has C frames) in order to swap out the interpreted language stack. As we have discussed before, C does not have native support for swap-stack, and it needs libraries written in assembly language or compiler extensions to do so.
What about stackless interpreters?
A stackless interpreter does not turn language-level function calls into C-level function calls. A stackless interpreter usually has the following form:
void interpret_function(Frame *initial_frame) {
Frame *current_frame = initial_frame;
while (true) {
switch (current_instruction().type()) {
case ...:
...
case CALL:
...
Frame *new_frame = create_frame(called_function);
new_frame->parent = current_frame;
current_frame = new_frame; // Only replace the frame pointer
...
}
}
}
A stackless interpreter always remains in the single interpret_function
function activation even when the interpreted language program makes a call.
Swap-stack is relatively easier to implement with stackless interpreter,
because it does not need to swap out any C frames…
… unless it allows foreign function calls. If the stackless interpreter allows the interpreted language to call foreign C functions, then C functions must have frames on some stack. Then we face the same problem as implementing swap-stack for compiled languages.
The Mu micro virtual machine
I designed the Mu micro virtual machine, and it is the main part of my PhD thesis. Swap-stack is a very important mechanism of the Mu micro VM, and it is designed to be supported by the JIT compiler. It enables the implementation of symmetric stackful coroutines, and it is the foundation of other VM mechanisms, such as trapping and on-stack replacement (OSR). If you are interested, read Section 5.3.6 of my thesis.
Decomposing a function into a state machine
Interpreters usually have no problem saving the frame of a function at a yield
point so that it can be resumed later. The interpreter can implement the layout
of stack frames and the behaviour of function calls / coroutine resumption in
any way it wants. They may even allocate frames in the heap so that they can
temporarily remove a frame from the stack and put it back later. For compilers,
if swap-stack is available, one thread can just save the register states on one
stack and restore them from another stack. Without swap-stack, however, it may
be a challenge.
One way to implement pause-able and resume-able functions is decomposing a
function into a state machine. Each yield
point becomes a state, and a
function starts by matching the state and jumping to the right place.
For example, assume the yield()
call represents a coroutine yield in the
following C pseudo-code:
void foo() {
printf("Long\n");
yield();
printf("time\n");
yield();
printf("no\n");
yield();
printf("see\n");
return;
}
Such a function can be transformed into a function that takes a state when called, and returns a new state when yielding or returning.
enum State {
START, STATE1, STATE2, STATE3, END
};
enum State foo(enum State state) {
switch(state) {
case START:
printf("Long\n");
return STATE1;
case STATE1:
printf("time\n");
return STATE2;
case STATE2:
printf("no\n");
return STATE3;
case STATE3:
printf("see\n");
return END;
}
}
int main() {
enum State state = START;
while (state != END) {
printf("Resuming foo()...\n");
state = foo(state);
printf("foo() paused\n");
}
}
What about local variables? Local variables can be packed into the states, too.
C programmers may use the union
type, but it is easier with tagged unions or
object-oriented programming.
Suppose we have a (pseudo) Rust function where yield!()
represents a coroutine
yield.
fn square_sum(a: i32, b: i32) -> Future<i32> {
let a2 = a * a;
println!("a * a = {}", a2);
yield!();
let b2 = b * b;
println!("b * b = {}", b2);
yield!();
let result = a2 + b2;
println!("result = {}", result);
return result;
}
We can use an enum
to hold live (will be used later) local variables at each
yield!()
point.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
/// Note: each state holds the live local variables.
enum State {
Start { a: i32, b: i32 },
State1 { b: i32, a2: i32 }, // Note: a is no longer useful.
State2 { a2: i32, b2: i32 }, // Note: neither a nor b are useful now.
End { result: i32 },
}
/// Calling this function merely gets the initial state.
/// It doesn't actually execute the body of the original function.
fn square_sum(a: i32, b: i32) -> State {
State::Start { a, b }
}
/// Call this for each step.
fn square_sum_step(state: State) -> State {
match state {
State::Start { a, b } => { // Restore local variables from the state.
let a2 = a * a;
println!("a * a = {}", a2);
State::State1 { b, a2 } // Save useful local variables into state.
}
State::State1 { b, a2 } => { // restore
let b2 = b * b;
println!("b * b = {}", b2);
State::State2 { a2, b2 } // save
}
State::State2 { a2, b2 } => { // restore
let result = a2 + b2;
println!("result = {}", result);
State::End { result } // save
}
State::End { .. } => {
panic!("Coroutine already finished!");
}
}
}
fn main() {
let mut state = square_sum(3, 4);
loop {
match state {
State::End { result } => {
println!("Execution finished. Result is {}", result);
break;
}
_ => {
println!("Resuming...");
state = square_sum_step(state);
println!("Yielded.");
}
}
}
}
Asynchronous programming (async/await) and coroutines
In asynchronous programming, a program consists of many tasks that can be completed in the future, and one task can wait for other tasks to complete before continuing. There are many ways to implement asynchronous programming. It can be trivially executed inline (e.g. the X10 compiler is allowed to inline an async activity), executed sequentially, using threads, or using coroutines.
The notion of “Future” and its friend “Promise” are well-known in multi-thread
programming (C++, Java, C# and
Python). A pair of Future and Promise represents a value yet
to be produced. The Future waits for the value to be produced, and the Promise
is a place to store the value to be acquired via the Future. C++ even has the
std::async
function in the standard library to launch an
asynchronous task in a new thread.
Recently many programming languages employed a style of asynchronous programming language based on coroutines in the form of async/await. I guess the reason behind its gaining popularity is two fold:
-
Native, OS-provided threads are too heavy-weight, but not many programming languages support light-weight “M × N” green threads, that is, M OS threads are multiplexed to run N application-level threads and N ≫ M. AFAIK, only Erlang and Go supports such light-weight threads.
-
Not many languages support stackful coroutines. As we discussed before, stackful coroutines are only practical with swap-stack. Some languages (such as Kotlin) are targeted to runtimes (such as JVM) that don’t support swap-stack.
As a compromise, some languages resorted to coroutines. They attempted to implement cooperative multi-tasking using coroutines that yield when they are about to block, and a scheduler that decides which coroutine can continue without blocking. And there is async/await.
A function can be annotated with the async
keyword. An async function is
like a Python generator. When called, it doesn’t execute the body of the
function immediately, but will create an object that holds the execution context
of the function, like a frame. An async function may contain await
expressions. An await
is like a conditional yield
. If a given Future
is
ready, then grab the value and continue; otherwise, suspend the execution and
give control to the scheduler so that it can find something else to execute.
Async and await gives the programmer the feeling of multi-thread programming
except that the programmer must explicitly annotate places that may
potentially yield with await
.
You can find an async/await example in JavaScript earlier in this post. It looks pretty like two threads communicating with each other using a channel.
Consequence of being stackless
Without proper swap-stack support, the compiler has to implement coroutines by
decomposing async
functions into state machines. await
expressions are
places the function may yield, and each await
represents a state in the state
machine.
However, async/await is not the only way to implement cooperative multi-task
programming on top of coroutines. gevent is a Python framework based on
Greenlets which implement symmetric coroutines. With the ability to
switch coroutine at any level of stack, each coroutine can yield to the
scheduler as part of potentially blocking functions (such as sleeping, IO
operations, etc.), and programmers do not need to annotate any expression with
await
.