Bootstrapping a language

Every Swanson host must be able to execute S₀ code. But S₀ is very low level (we often call it Swanson’s “assembly language”), so it’s not very easy or enjoyable to write S₀ code directly. Instead, you’ll want to write your programs in some other, high-level language. But Swanson hosts are not required to know anything about these high-level languages. This leaves us with a conundrum — how do we execute programs written in a high-level language on a Swanson host?

We solve this problem using translation as part of the loading step in Swanson’s execution model. Loading takes the name of a unit and somehow produces code for that unit that the host is able to execute. If a Swanson unit is implemented in a high-level language, then the loader for that unit will translate the code into S₀ as part of the loading process.

As an example, S₁ is a (slightly) higher level language that’s (slightly) more pleasant to program in than S₀. If we can write a translator from S₁ into S₀, then we can load Swanson units that are implemented as S₁ modules. We use the name of the module to find the .s1 file containing the module’s code, use the translator to translate that S₁ code into S₀, and provide that to the Swanson host as the loaded value for the unit.

This process can be recursive! You could implement a translator for another, even higher level language (for instance, a hypothetical S₂). However, this translator need not translate S₂ directly into S₀. Instead, it could translate S₂ into S₁, and then rely on the S₁ translator to translate that into S₀.

All of this works, but it still leaves one problem unsolved: it doesn’t let you self-host a high-level language. That is, you cannot write the translator for S₁ in S₁ itself. When loading an S₁ module, the process described above depends on the S₁ translator already being available for the Swanson host to execute. If the S₁ translator is written in S₁, then we have a chicken-and-egg problem: we need the S₁ translator to already be translated, so that we can execute it to load in the code for the S₁ translator!

To solve this, we rely on bootstrapping, in which you implement the translator for your language twice. The “real” implementation, which most Swanson hosts will use, is self-hosted and implemented in the language itself. But we must also provide a separate “bootstrap” implementation written in some other language. (It doesn’t matter which language, as long as you can break the cycle of translation dependencies.)

For S₁, we have written a bootstrap translator in Python. It takes in all of the S₁ code that is needed to implement the self-hosted S₁ translator, and translates it into S₀. Other Swanson hosts load this translated S₀ code directly. From the host’s point of view, the S₁ translator isn’t self-hosted at all — it’s as if we had implemented it by hand in S₀. We chose Python for the S₁ bootstrap translator because the Python interpreter is widely available. We wrote the bootstrap translator so that it has no dependencies on any external Python libraries, and even avoided splitting up the implementation into separate files! This makes it trivial to run the bootstrap translation process when you need to.

This chicken-and-egg problem isn’t limited to the S₁ → S₀ translator; it’s true of any self-hosted language. When designing a new language (or adding support for an existing language), you have a choice: implement your translator twice, so that one of the implementations can be self-hosted, or avoid the temptation to self-host your translator.