An SL file holds a library of Swanson code that has already been translated into S₀. The SL file uses a binary format that should be very easy to parse, making it useful for “bootstrap” code.
The bulk of the Swanson standard library is implemented in S₁, and the translators for other languages will typically assume S₁ as the “simplest” language that they can be implemented in terms of. But we don’t want every host to have to implement an S₁ parser and translator directly. Instead, we use a single “bootstrap host” to translate that S₁ code into S₀, and write it out into SL files. Since the standard library includes a S₁ translator implemented in S₁ itself, that means that all other hosts can get by with loading SL files. By loading the standard library via their bootstrapped SL files, other hosts then get an S₁ translator for free.
This document assumes a familiarity with the Swanson execution model and S₀ language.
An SL file represents a Swanson library, and contains one or more units. Each unit is implemented as an S₀ module.
An SL file consists of three sections:
a header section
a binaries section
a modules section
All integer values in an SL file are encoded using a variable-length encoding.
The first byte includes a prefix indicating the encoded length of this
particular integer. The length prefix is a sequence of 0
bits followed by a
1
bit. The number of 0
bits indicates the number of additional bytes
(not including the byte containing the length prefix). All remaining bits in
the first byte, along with all bits in any additional bytes, provide the value
of the integer, encoded in big-endian order.
Some examples:
0x80 => 1_0000000 => 0
Length prefix of 1
means no additional bytes. Remaining bits 0000000
encode the number 0.
0xff => 1_1111111 => 127
Length prefix of 1
means no additional bytes. Remaining bits 1111111
encode the number 127.
0x40 0x80 => 01_000000 10000000 => 128
Length prefix of 01
means one additional byte. Remaining bits 000000
10000000
encode the number 128.
0x20 0xc3 0x50 => 001_00000 11000011 01010000 => 50,000
Length prefix of 001
means two additional bytes. Remaining bits 00000
11000011 01010000
encode the number 50,000.
When decoding, this scheme has the nice property that you can determine the number of bytes needed for an integer using a single “count leading zeroes” operation, which is available as a single instruction on most modern CPUs, and exposed an instrinsic in most host languages.
The header section identifies this file as an SL file, and contains pointers to the other sections in the file.
header section {
magic number [uint32_be]
version [uint32_be]
}
The magic number
is the four-byte big-endian constant 0x534C4942, which is
the same as the ASCII string “SLIB
”.
The version
is the four-byte big-endian constant 0x00000003, indicating that
this is version 3 of the SL file format.
The binaries section contains all of the binary constants used throughout the rest of the file. Binary constants are used for S₀ names, and for the value of any Swanson literals created by an S₀ “create literal” statement.
binaries section {
binary count [varint]
binary constants [array of constant]
}
constant {
length [varint]
content [bytes]
}
The binary count
field specifies how many binary constants there are in the
section. Each constant then appears consecutively. Each constant starts with a
length
field indicating how long the constant’s content is. The content
then follows. The constant is not encoded in any way; its binary content is
included in the file verbatim.
“Names” are used throughout the modules section. Each name is annotated with the source file location where the name appears. Names all have the same structure:
name {
content [varint]
location [source location]
}
source location {
source file [varint]
start line [varint]
start column [varint]
end line [varint]
end column [varint]
}
The content
and source file
fields are each the index of one of the
binary constants in the binaries section. The content
field’s constant
gives the content of the name. The source file
field’s constant gives the
name of the source file where the name appears. The start line
, start
column
, end line
, and end column
fields give the location of the name
within source file
. Each of these fields are 0-indexed. The end
fields
should point at the character immediately following the name in its source file.
(That means that if the name appears on a single line — with start line
and
end line
being equal — then subtracting start column
from end column
will give you the length of the source file syntax that the name comes from.)
Several parts of a module include a globbed list, which consists of a list of names, along with an optional glob. Like names, globs are annotated with the source file location where the glob appears.
glob {
present [u8 = "*"]
location [source location]
}
missing glob {
missing [u8 = " "]
}
optional glob = glob | missing glob
globbed list {
name count [varint]
names [array of name]
glob [optional glob]
}
The name count
field specifies how many elements there are in the names
field.
The modules section provides the definition of each S₀ module in the file.
modules section {
module count [varint]
modules [array of module]
}
The module count
field specifies how many modules there are in the file.
Each module then appears consecutively.
module {
module name [name]
block count [varint]
blocks [array of block]
}
The block count
field specifies how many blocks there are in the module.
Each block then appears consecutively.
block {
block name [name]
containing [globbed list]
branch count [varint]
branches [array of branch]
}
Each block starts with its name and its containing clause, which is a globbed
list. After the containing clause is the block’s list of branches. The branch
count
field specifies how many branches there are in the block. Each branch
then appears consecutively.
branch {
branch name [name]
receiving [globbed list]
statement [array of statement]
invocation [invocation]
}
Each branch starts with its name and its receiving clause, which is a globbed list. After the receiving clause is the list of statements in the branch, followed by the branch’s invocation. Each kind of statement, and the invocation, have different formats.
statement = create closure | create literal | rename
create closure {
code [u8 = "C"]
dest [name]
block [varint]
close over [globbed list]
}
The one-byte code
field has the value 0x43 (ASCII “C
”) for a create
literal statement. The block
field is the index of one of the blocks in
this module. The close-over clause is a globbed list.
create literal {
code [u8 = "L"]
dest [name]
content [varint]
location [source location]
}
The one-byte code
field has the value 0x4C (ASCII “L
”) for a create
literal statement. The content
field specifies the content of the new
literal. It is an index of one of the binary constants in the binaries section.
(Note that like names, the literal content is annotated with information about
its location within a source file.)
rename {
code [u8 = "R"]
dest [name]
source [name]
}
The one-byte code
field has the value 0x52 (ASCII “R
”) for a create
literal statement.
invocation {
code [u8 = "I"]
target [name]
branch [name]
inputs [globbed list]
}
The one-byte code
field has the value 0x49 (ASCII “I
”) for an
invocation. The invocation’s inputs are a globbed list.
The invocation is the last portion of a branch. The next branch in the block immediately follows the invocation. (If there are no more branches in the block, the next block in the module immediately follows. If there are no more blocks in the module, the next module in the file immediately follows. If there are no more modules in the file, no more content appears in the file.)
Version 1 was introduced in February 2021. Up until then, S₀ code was always encoded in its human-readable text format. The SL format was created to be easier for hosts to parse.
Version 2 was introduced in January 2022, as part of the work to add explicit inputs and input globs to S₀ invocations.
Version 3 was introduced in March 2022, as part of the work to add globs to the containing and receiving clauses in S₀ blocks and branches.
Version 4 was introduced in June 2022, and adds location information to literals.