VSPL

This is a Very Simple Programming Language designed to demonstrate how
a compiler and interpreter can be implemented.  It can also be used as
a case study to compare different styles of compiler design and as a
language independent benchmark (similar to the BCPL benchmark program
bench.b). This distribution includes several different implementation
of VSPL. All are meant to yield identical results. Currently (1 May
2003) the versions for BCPL, MCPL, C and lex/yacc all work. Other
version are under (slow) development.

The language is simple but not so simple as to be unrealistic. The aim
was for it to be implementable by a program that was about 20 pages
long. It has a fairly conventional syntax which should be easy to
understand by anyone familiar with BCPL, ML, C, Modula-3 or Java.
Like BCPL, it is typeless but has a more restrictive syntax in order
not to give adhoc recursive descent parsers an unfair advantage over
table driven compilers. Unlike BCPL the reserved words 'do' and 'then'
are not optional, and semicolons used to separate declarations and
commands are also mandatory.

The language allows both static and local variables and vectors, it
allows both pointers and indirection, it has variadic recursive
functions and routines and all these quantites can be passed as
arguments, returned as results, or assigned to variables. Strings use
backslack ('\') as the escape character and are zero terminated (as in
C), and there is an output mechanism similar to printf in C. It also
has a function (sys(n,...)) that allows dynamic control of the
interpreter.

A typical VSPL program is the following:

static count, isprime[1000]

let start() = valof
{ for i = 2 to 999 do isprime[i] := true; // Until proved otherwise.
  count := 0;
  for p = 2 to 999 do if isprime[p] do
  { let i = p*p;
    while i<=999 do { isprime[i] := false; i := i+p };
    out(p)
  };
  resultis 0
}
 
let out(n) be
{ if count mod 10 = 0 do printf("\n");
  printf(" %3d", n);
  count := count+1
}

The BCPL implementation (vspl.b) can be compiled and run (under the
BCPL Cintcode System) by the command:

vspl primes.vs

To aid understanding of how the compiler and interpreter works, it can
trace the output of the lexical analyser, print the parse tree, output
a mnemonic form of the compiled code and trace the execution of the
interpreter. These are are illustrated by the following (running in
directory VSPL/bcpl under BCPL Cintcode):

0> type ../hello.vs
let start() = valof
{ writef("Hello world\n");
  resultis 0
}
0> vspl ../hello.vs -l

VSPL (27 May 2000) BCPL Version
token =  43 Let
token =   2 Name       start
token =  49 Lparen
token =  50 Rparen
token =  18 Eq
token =   6 Valof
token =  53 Lcurly
token =  82 Printf
token =  49 Lparen
token =   3 String     "Hello world
"
token =  50 Rparen
token =  61 Semicolon
token =  34 Resultis
token =   1 Num        0
token =  54 Rcurly
10>
0> vspl ../hello.vs -p -c -t

VSPL (27 May 2000) BCPL Version
Parse Tree
Decl
*-Fndef
! *-start
! *-Nil
! *-Valof
!   *-Seq
!     *-Printf  -- line 2
!     ! *-"Hello world
"
!     *-Resultis  -- line 3
!       *-0
*-Nil
Laddr L1
Fnap 3
Halt
String L2 start
Entry L1 L2
Stack 2
String L3 Hello world

Laddr L3
Printf 3
Ln 0
Fnrn
Program size: 15   Data size: 6
p:10006  sp:10008          0          0    100: Laddr    106
p:10006  sp:10009          0        106    102: Fnap     3
p:10009  sp:10011        104        106    106: Stack    2
p:10009  sp:10011        104        106    108: Laddr    10002
p:10009  sp:10012        106      10002    110: Printf   3
Hello world
p:10009  sp:10011        104        106    112: Ln       0
p:10009  sp:10012        106          0    114: Fnrn    
p:10006  sp:10008          0          0    104: Halt    

Instructions executed: 8
10> 

Lexical Analyser

Comments are introduced by // and continue to the end of the line.

The lexical tokens are:

Name       eg  x v1  Tax_rate
Num        eg 123 'A' '\n' #3FF        Hex constants added 9/8/2021
String     eg "Hello\n"

Ind, Not, Mul, Div, Mod, Add, Sub,
Eq, Ne, Lt, Gt, Le, Ge,
Lsh, Rsh, And, Or, Xor

Lparen, Rparen, Lcurly, Rcurly, Lsquare, Rsquare,
Comma, Semicolon,

Static, Let, Be,
Assign, Seq, If, Unless, Do, Test, Then, Else,
While, Until, For, Valof, Resultis, Return
Printf, Sys, Eof


Syntax

At the outermost level there can be static variable, function
and routine declarations. 

Static Declarations

static S1, S2,..., Sn

where Si is either a name, or a vector declaration of the form N[k].
where k is a number specifying the number of elements to allocate.

Function and Routine Declarations

let N(N1,...,Nn) = E
let N(N1,...,Nn) be C

where N, N1,...,Nn are identifiers, E is an expression and C is a command.

All identifiers declared at the outermost level must be distinct and
have the scope of the whole program.  The main function 'start' must
be declared. The scope of the arguments N1,...,Nn of a function or
routine is the body E or C, respectively.  Arguments are called by
value.

Expressions

E -> N                -- a name
     K                -- a number
     S                -- a string
     ( E )            -- parentheses are used for grouping
     N(E1,...En)      -- function call
     printf(format, E1,...En) -- formatted output
     sys(E1,...En)    -- system call
     E1[E2]           -- a subscripted expression
     @ E              -- address of
     ! E              -- indirection
     + E              -- monadic plus
     - E              -- monadic minus
     E * E            -- integer multiplication
     E / E            -- integer division
     E mod E          -- integer remainder after division
     E + E            -- integer addition
     E - E            -- integer subtraction
     E = E            -- equals
     E ~= E           -- not equals
     E <= E           -- less than or equals
     E >= E           -- greater than or equals
     E < E            -- less than
     E > E            -- less than or equals
     E << E           -- left shift
     E >> E           -- right shift
     E & E            -- bitwise and
     E | E            -- bitwise or
     E ^ E            -- bitwise exclusive or
     valof C          -- Execute C until 'resultis E' encountered

Commands

C -> N := E                    -- assignment
     N(E1,...,En)              -- routine call
     printf(E1,...En)          -- formatted output
     sys(E1,...En)             -- system call
     resultis E                -- return from a 'valof' expression
     return                    -- return from a routine
     { D1;...; Dn; C1;...;Ck } -- block
     { C1;...;Ck }             -- compound command
     if E do C                 -- conditional commands
     unless E do C
     test E then C1 else C2
     while E do C              -- repetitive commands
     until E do C
     for N = E1 to E2 do C     -- N is a new local variable

D  -> let N = E                -- local variable declaration
      vec N [ k ]              -- local vector declaration

Abstract Syntax Tree

In what follows, ln denotes a line number.

N  -> [Name, -, <chars>]                       -- Name

NL -> 0 | N | [Comma, N, NL]                   -- Name list

EL -> 0 | E | [Comma, E, EL]                   -- Expression list

E  -> N | [Num, k] | [String, chars] |         -- Expression
      [Fnap, N, EL, ln] | [Printf, EL, ln] | [Sys, EL, ln] |
      [Ind, E] | [Lv, E] | [Vecap, E, E] | [Neg, E] |
      [Mul, E, E] | [Div, E, E] | [Mod, E, E] |
      [Add, E, E] | [Sub, E, E] |
      [Eq, E, E] | [Ne, E, E] |
      [Le, E, E] | [Ge, E, E] |
      [Lt, E, E] | [Gt, E, E] |
      [Lsh, E, E] | [Rsh, E, E] |
      [And, E, E] | [Or, E, E] | [Xor, E, E] |
      [Valof, C]

C  -> [Assign, E, E, ln] | [Rtap, N, EL, ln] |  -- Command
      [Printf, EL, ln] | [Sys, EL, ln] |
      [Resultis, E, ln] | [Return, ln] |
      [If, E, C, ln] | [Unless, E, C, ln] | [Test, E, C, C, ln] |
      [While, E, C, ln] | [Until, E, C, ln] | [For, N, E, E, C, ln]
      [Let, N, E, C, ln] | [Vec, N, [Num, k], C, ln] | [Seq, C, C]

S  -> N | [Statvec, [Num, k]]                  -- Static declarator

SL -> S | [Comma, S, SL]                       -- Declarator list

D  -> [Static, SL, ln] |                       -- Declaration 
      [Fndef, N, NL, E, ln] | [Rtdef, N, NL, C, ln]

P  -> 0 | [Decl, D, P]                         -- Program


Interpretive Code

Instructions are compiled into a code vector, and static data into
a data vector. The VSPL abstract machine has 5 registers:

res   -- The result register, used for the result by VALOF and function calls.
p     -- The stack frame pointer, pointing to return information,
         arguments and local variable belonging to the current function.
	 The return information consists of the following three words
	 1) The previous p pointer
	 2) The return address
	 3) The entry address for help debugging. The word before the
	    entry instruction is a string giving the name of the function.
s     -- The stack pointer, s!0 is the top element of the stack.
pc    -- Pointer to the next instruction to obey.
count -- A counter, decremented every time an instruction is obeyed.

Data Statements

String Li <string>   -- Assemble a string addressed by label Li
Var Li               -- Allocate a static variable addressed by Li
Vec Li k             -- Allocate a vector of k consecutive 
                        locations addressed by label Li

Program directives and instructions

Loading instructions

Ln  n        -- s := s+1; s!0 := n            load a constant
Lp  n        -- s := s+1; s!0 := p!n          load a local
Llp n        -- s := s+1; s!0 := @ p!n        load address of a local
Ll  Li       -- s := s+1; s!0 := <var Li>     load a static variable
Laddr Li     -- s := s+1; s!0 := @ <var Li>   load address of a static
Lres         -- s := s+1; s!0 := res          load function result

Monadic expression operators

Ind          -- s!0 := ! (s!0)                indirection
Neg          -- s!0 := - (s!0)                monadic minus
Not          -- s!0 := ~ (s!0)                monadic bitwise not

Dyadic expression operators

Vecap    -- s := s-1; s!0 := (s!0) ! (s!1)    subscription
Mul      -- s := s-1; s!0 :=  s!0  *  s!1     integer multiply
Div      -- s := s-1; s!0 :=  s!0  /  s!1     integer division
Mod      -- s := s-1; s!0 :=  s!0 MOD s!1     integer remainder
Add      -- s := s-1; s!0 :=  s!0  +  s!1     add
Sub      -- s := s-1; s!0 :=  s!0  -  s!1     subtract
Eq       -- s := s-1; s!0 :=  s!0  =  s!1     equal
Ne       -- s := s-1; s!0 :=  s!0 ~=  s!1     not equal
Le       -- s := s-1; s!0 :=  s!0 <=  s!1     less or equal
Ge       -- s := s-1; s!0 :=  s!0 >=  s!1     greater or equal
Lt       -- s := s-1; s!0 :=  s!0  <  s!1     less than
Gt       -- s := s-1; s!0 :=  s!0  >  s!1     greater than
Lsh      -- s := s-1; s!0 :=  s!0 <<  s!1     logical left shift
Rsh      -- s := s-1; s!0 :=  s!0 >>  s!1     logical right shift
And      -- s := s-1; s!0 :=  s!0  &  s!1     bitwise and
Or       -- s := s-1; s!0 :=  s!0  |  s!1     bitwise or
Xor      -- s := s-1; s!0 :=  s!0 XOR s!1     bitwise xor

Assignment Instructions

Sp  k    -- s := s-1; p!k := s!1              store in local
Sl  Li   -- s := s-1; <var Li> := s!1         store in static
Stind    -- s := s-2; !(s!2) := s!1           store indirect

Labels and Jumps

Lab Li    -- Set label Li to the address of the next instruction
Jt  Li    -- s := s-1; IF     s!1 GOTO Li     jump if true
Jf  Li    -- s := s-1; UNLESS s!1 GOTO Li     jump if false
Jump Li   -- GOTO Li

Other Statements

Entry Li Lj    -- Mark the entry point to a function or routine,
                  set label Li to point to the next instruction,
                  Lj refers to the name of the function or routine.
Stack k        -- s := p+k
Fnap k         -- Enter function s!0, setting p to p+k
Rtap k         -- Enter routine  s!0, setting p to p+k
Rtrn           -- Rreturn from current routine
Fnrn           -- res := s!0; Return from current function
Resultis Li    -- res := s!0; s := s-1; GOTO Li
Halt           -- Terminate execution of the interpreter
Printf k       -- Perform formatted output similar to C's printf,
                  but only deals with %d, %s,  %x and %c. %c added 9/8/2021.
Sys    k       -- Call the sys function with arguments p!k,p!(k+1)...

The following VSPL sys calls are defined:

sys(0, code)   -- Return from the interpreter with result code
sys(1, regs)   -- Recursively enter the interpreter with register
                  values specified by the VSPL vector regs. On return
                  the final state will be left in regs.
sys(2, b)      -- Start/stop tracing the interpreter execution depending
                  on whether b is TRUE or FALSE
sys(3, k)      -- Set the count register to k, returning the previous
                  value


VSPL command Options

  -o file    redirect output to a file
  -l         trace the lexical analyser
  -p         output the abstract syntax tree
  -c         output the interpretive code
  -t         trace interpreter execution

Current Implementations

Currently, VSPL has implementations in BCPL, MCPL, C and lex/yacc/C.
Implementations in Java, C++, Smalltalk, ML, Prolog, Haskell and an
Object Oriented BCPL version may appear one day.

There is a VSPL test program, try: vspl vcmpltest.vs

Another demonstration is: vspl rantree.vs

Debugging tricks

1) Use printf(formant, <args>)

2) Turn VSPL instruction tracing on an of using sys(2,1) and sys(2,0)

3) Define and use abort.
   let abort(n) be
   { printf("\nabort: n=%d\n", n);
     n := n/0   // Used when debugging
   }

Possible (easy) projects

1) A manifest constants

2) Add a switchon commands with cases, default and endcase

3) Add break and loop to repetitive commands.

4) Make do optional before commands starting with { or a keyword.

5) Make ; optional if the next token is on another line.

(c) Martin Richards   23 Oct 2006
Updated 23 Jan 2024 
