SBL file format

An SBL file holds a library of Swanson code that was written in, or has already been translated into, S₀. The SBL file uses a binary format that is easy to parse, making it useful for “bootstrap” code.

This document assumes a familiarity with the Swanson execution model and S₀ language.

Overview

An SBL file represents a Swanson library, and contains one or more units. Each unit is implemented as an S₀ module.

An SBL file consists of two sections:

  • a header section

  • a modules section

Header section

The header section identifies this file as an SBL file.

header section {
  magic number [u16be]
  version [u16be]
}

The magic number is the 2-byte big-endian constant 0x5342, which is the same as the ASCII string “SB”.

The version is the 2-byte big-endian constant 0x3031, which is the same as the ASCII string “01”, indicating that this is version 1 of the SBL file format.

Modules section

The modules section provides the definition of each S₀ module in the file.

modules section {
  module count [u16be]
  modules [array of module]
}

The module count field is a 16-bit big-endian integer that specifies how many modules there are in the file. (There are a maximum of 65,535 modules in an SBL file.) Each module then appears consecutively.

module {
  module name length [u16be]
  module name [bytes]
  branch count [u8]
  branches [array of branch]
}

A module consists of a name and a list of branches. The module name length field is a 16-bit big-endian integer that specifies the length of the module’s name; the binary content of the name immediately follows. The branch count field specifies how many branches there are in the module. Each branch then appears consecutively.

branch {
  branch name [branch name]
  instruction count [u16be]
  instructions [array of instruction]
  invocation [invoke instruction]
}

branch name {
  name length [u8]
  name [bytes]
}

Each branch consists of a name, whose length is encoded as a 1-byte integer, a list of instructions, and one invocation. The instruction count field specifies how many instructions there are in the branch, and does not include the invocation. Each instruction then appears consecutively, followed by the invocation.

instruction = bytes instruction
            | newtag instruction
            | droptag instruction
            | lock instruction
            | unlock instruction
            | nil instruction
            | dropnil instruction
            | cons instruction
            | uncons instruction
            | move instruction
            | quote instruction
            | enclosed quote instruction
            | upquote instruction
            | enclosed upquote instruction

bytes instruction {
  code [u8 = "B"]
  length [u32be]
  content [bytes]
}

newtag instruction {
  code [u8 = "T"]
}

droptag instruction {
  code [u8 = "t"]
}

lock instruction {
  code [u8 = "/"]
}

unlock instruction {
  code [u8 = "\"]
}

nil instruction {
  code [u8 = "N"]
}

dropnil instruction {
  code [u8 = "n"]
}

cons instruction {
  code [u8 = "C"]
}

uncons instruction {
  code [u8 = "c"]
}

move instruction {
  source and target stacks [u8]
}

quote instruction {
  code [u8 = "q"]
  name length [u8]
  name [bytes]
  branch count [u8]
  branches [array of branch]
}

enclosed quote instruction {
  code [u8 = "Q"]
  name length [u8]
  name [bytes]
  branch count [u8]
  branches [array of branch]
}

upquote instruction {
  code [u8 = "%"]
  depth [u8]
}

enclosed upquote instruction {
  code [u8 = "^"]
  depth [u8]
}

invoke instruction {
  code [u8 = "i"]
  label [branch name]
}

Each instruction consists of at least a 1-byte character encoding the instruction type. Most instructions operate on stack 0.

The code for a move instruction defines the source and target stacks of the operation:

0xe1  from stack 1 to stack 0
0xe2  from stack 2 to stack 0
0xe3  from stack 3 to stack 0
0xf1  from stack 0 to stack 1
0xf2  from stack 0 to stack 2
0xf3  from stack 0 to stack 3

A bytes instruction contains the binary content that is pushed onto the execution stack. The length of the content is encoded as a 32-bit unsigned integer, yielding a maximum size of 4GB for the binary content.

A move instruction also includes an 8-bit unsigned integer indicating the source stack of the instruction.

A quote instruction or enclosed quote instruction contains a list of branches. The branch name count field is encoded as a 1-byte integer.

An upquote instruction or enclosed upquote instruction contains the depth of the enclosing quotation to upquote, encoded as a 1-byte integer.

A seq instruction also contains a list of branch names. The branch name count field is encoded as a 1-byte integer.

An invoke instruction contains a single branch name, which control is passed to.

Version history

Version 1 was introduced in February 2023.