Building a Programming Language Pt. 3 - Interpreting

2 August 2019

So far in this series we've built a lexer and a parser for a new language called Smol. If you haven't read the other posts it's a good idea to do so before continuing on.

Today we will develop a very basic Smol interpreter. This interpreter is far from perfect and doesn't always behave exactly as you'd expect. For now it can only handle variables, functions, and if statements. Still, it's a fairly powerful language that could even be useful as an embeded scripting language.

A Recap of the AST

Recall from the last post we built a parser that output an AST that looks like this:

[ { type: 'variable-declaration', name: 'x', varType: 'let' },
  { type: 'function-call',
    function: '=',
    args:
     [ { type: 'number', value: 1 },
       { type: 'identifier', name: 'x' } ] },
  { type: 'variable-declaration', name: 'y', varType: 'let' },
  { type: 'function-call',
    function: '=',
    args:
     [ { type: 'number', value: 7 },
       { type: 'identifier', name: 'y' } ] },
  { type: 'if',
    condition:
     { type: 'function-call',
       function: '>',
       args:
        [ { type: 'identifier', name: 'x' },
          { type: 'identifier', name: 'y' } ] },
    body:
     [ { type: 'function-call',
         function: 'print',
         args: [ { type: 'string', value: 'Hello world!' } ] } ] } ]

In a more advanced language the AST would likely be transformed into some form of optimized bytecode. Smol is not one of those languages. To keep things simple we are just going to interpret the AST directly.

Interpreting the AST

As before, all code is written in typescript and can be run and modified from this repl.it sandbox.

Lets define our interpreter function:

export function interpreter(instruction: AST | Instruction, mem: Mem): Value {

}

The interpreter will take an instruction which can be an entire AST or a single Instruction. It also takes a Mem (Memory) context and returns a Value. These are defined as follows:

export interface Value {
    // The result of what was being interpreted.
    val: any;
    // If true, the value is meant to be returned from a function.
    isReturn?: boolean;
}

// Mem is a key value pair where the key is the identifier of whatever is being referenced.
export interface Mem {
    [key: string]: {
        // If the memType is let, the value should not be reassignable.
        memType: "let" | "var";

        // We store our Value as val in mem. Unfortunately this means that accessing the true value
        // of an identifier looks like this: mem["identifier"].val.val
        // We'll figure out a better solution in the future but it works for now.
        val: Value | undefined;
    }
}

Lets flesh out the interpreter a little. We'll start by handling an AST.

export function interpreter(instruction: AST | Instruction, mem: Mem): Value {
    // If the instruction is an array, it's an entire AST.
    if (instruction instanceof Array) {
        let val: Value = { val: undefined };

        // Loop through each instruction of the AST and recursively call the interpreter on it.
        for (const instr of instruction) {
            val = interpreter(instr, mem);

            // If our value is meant to return, stop looping and return said value. This isn't
            // a perfect solution (we don't even check to see if we're in a function). But it works.
            if (val.isReturn) { return val; }
        }

        // Since our langauge is expression oriented we always return the value of the last
        // instruction interpreted by our AST. This enables implicit returns in functions as well.
        return val;
    }
}

Ok. Our interpreter can take in an AST. Now we need to be able to execute single instructions. Anything that comes after if (instruction instanceof Array) {} is a single instruction. So figuring out what to do with the instruction is as simple as checking it's type.

export function interpreter(instruction: AST | Instruction, mem: Mem): Value {
    if (instruction instanceof Array) { /** Unchanged */ }

    // Check to see if the instruction is a primitive value.
    const isPrimitive = instruction.type === "string" ||
        instruction.type === "boolean" ||
        instruction.type === "number";

    // Primitive instructions are simply returned.
    if (isPrimitive) {
        return { val: instruction.value };
    }

    // If the type is a function we return the raw instruction so it can be evaluated on call.
    if (instruction.type === "function") {
        return { val: instruction };
    }

    // We found the declaration of a variable.
    if (instruction.type === "variable-declaration") {
        // Basic check to make sure there is not already a variable with the same name.
        if (mem[instruction.name]) throw new Error(`Variable ${instruction.name} is already defined`);

        // Initialize the variable in our mem object. instruction.name is its identifier.
        mem[instruction.name] = { memType: instruction.varType, val: undefined };

        // This instruction does not return a value, so we return undefined.
        return { val: undefined };
    }

    // We'll define the interpretFnCall function later.
    if (instruction.type === "function-call") {
        return interpretFnCall(instruction, mem);
    }

    // A return instruction.
    if (instruction.type === "return") {
        // We mark isReturn as true, and evaluate the instruction's expression.
        return { isReturn: true, val: interpreter(instruction.exp, mem).val };
    }

    // An if instruction.
    if (instruction.type === "if") {
        // Start by evaluating the if's condition.
        const result = interpreter(instruction.condition, mem);

        // If the condition result is truthy we evaluate the if body and return its result.
        if (result.val) {
            // { ...mem } creates a new block of memory that inherits mem from the if statement's
            // outer scope. This keeps variables defined in the if body, scoped to the if body.
            return interpreter(instruction.body, { ...mem });
        }

        // Our if condition returned falsey, so we return undefined.
        return { val: undefined };
    }

    // The instruction is an identifier, we return the value of said identifier.
    if (instruction.type === "identifier") {
        // For now we assume the coder actually defined the identifier and don't do any error
        // checking.
        return { val: mem[instruction.name].val!.val! }
    }

    // If we've made it this far we ran into an instruction we don't understand.
    throw new Error(`Unkown instruction ${inspect(instruction, false, 100)}`);
}

That's really all there is to our main interpreter function. All thats left is to define the interpretFnCall function.

Interpreting a Function Call

Lets get to it.

export function interpretFnCall(call: FunctionCall, mem: Mem): Value {
    // We start by handling the assignment function
    if (call.function === "=") {
        // Our identifier is the last item in the args list, so we pop it.
        const identifier = call.args.pop()!;

        // We handle a few edge case errors.
        if (identifier instanceof Array) throw Error("Syntax error on assignment");
        if (identifier.type !== "identifier") throw new Error("Expected identifier at assignment");

        // We make sure the identifier has been declared.
        const existing = mem[identifier.name];
        if (!existing) throw new Error("Var not defined");

        // Forbid reassignmet of "let" variables.
        if (existing.memType === "let" && existing.val) {
            throw new Error("Variable already assigned");
        }

        // Evaluate the right hand side of the assignment and save the value in memory.
        existing.val = interpreter(call.args.pop()!, mem);

        // For now assignment doesn't return a value..
        return { val: undefined };
    }

    // Handle all of our primitive infix functions

    if (call.function === "<") {
        const left = interpreter(call.args.pop()!, mem)!.val;
        const right = interpreter(call.args.pop()!, mem)!.val;

        return { val: left < right };
    }

    if (call.function === ">") {
        const left = interpreter(call.args.pop()!, mem)!.val;
        const right = interpreter(call.args.pop()!, mem)!.val;

        return { val: left > right };
    }

    if (call.function === "==") {
        const left = interpreter(call.args.pop()!, mem)!.val;
        const right = interpreter(call.args.pop()!, mem)!.val;

        return { val: left === right };
    }

    if (call.function === "+") {
        const left = interpreter(call.args.pop()!, mem)!.val;
        const right = interpreter(call.args.pop()!, mem)!.val;

        return { val: left + right };
    }

    if (call.function === "-") {
        const left = interpreter(call.args.pop()!, mem)!.val;
        const right = interpreter(call.args.pop()!, mem)!.val;

        return { val: left - right };
    }

    if (call.function === "*") {
        const left = interpreter(call.args.pop()!, mem)!.val;
        const right = interpreter(call.args.pop()!, mem)!.val;

        return { val: left * right };
    }

    if (call.function === "/") {
        const left = interpreter(call.args.pop()!, mem)!.val;
        const right = interpreter(call.args.pop()!, mem)!.val;

        return { val: left / right };
    }

    // Print primitive function.
    if (call.function === "print") {

        // Takes one argument. We call our interpreter as that arg is stored as an AST, this
        // Allows us to handle any expression within the function call.
        console.log(interpreter(call.args.pop()!, mem)!.val);

        // Print does not return a value
        return { val: undefined };
    }

    // If we get to this point, the user is calling a user defined function.

    // Here we ensure the function identifier is defined.
    const variable = mem[call.function];
    if (!variable || !variable.val) throw new Error(`Function ${call.function} is not defined`);

    // Grab the function value and ensure it's type is actually a function.
    const fn: SmolFunction = variable.val.val;
    if (fn.type !== "function") throw new Error(`Function ${call.function} is not a function`);

    // Evaluate the arguments and store them in a new mem object.
    const args: Mem = {};
    fn.args.forEach((arg, index) => {
        args[arg] = { memType: "let", val: interpreter(call.args[index], mem) };
    });

    // Interpret the function body and pass a new mem object that inherits the outer scope
    // and includes the passed arguments.
    return interpreter(fn.body, { ...mem, ...args });
}

Thats it! We've now got a basic working interpreter!

Putting It All Together

We've now got all the main pieces of a working Smol interpreter. Lets put it all together in a function that can evaluate Smol code (just like JavaScript's eval function).

function evaluate(smol: string) {
    const tokens = lexer(smol);
    const ast = parser(tokens);
    interpreter(ast, {});
}

evaluate(`
    let x = 1;
    let y = 7;

    let do_math = |x, y| {
        let add = |a, b| { a + b };

        print("x + y:");
        print(add(x, y));

        if y > 3 {
            print("y is greater than 3");
        }
    };

    do_math(x, y);
`);

This will conclude the Building a Programming Language series for the time being. The full source code to Smol is available here. As always don't hesitate to contact me with questions or suggestions.

Drew Youngwerth's Blog

Code, Art and the Universe.