Building a Lisp Interpreter (in Rust) - Part 2
Table of Contents
1. Introduction
This is part 2 of 3 of the Building a Lisp Interpreter (in Rust) series. In
Part 1, we created the Lisp value and a tokenizer/parser. In this part, we will
evaluate expressions.
To recap, a Lisp interpreter has 2 main components.
read- Read the text into a Lisp data structure. (In Part 1)eval- Evaluate that structure according to Lisp’s simple rules. (This part)
Our main goal will be to implement:
fn eval(expr: &Val, env: &mut Environment) -> Result<Val, Error> { .. }
Where expr is a Lisp data structure that holds the code we will evaluate and
env contains the variables and functions that are available for use.
1.1. eval Rules
Simple data (atoms) return just themselves.
12 ;; => 12 #true ;; => #true
A naked symbol returns the variable it references.
pi ;; => 3.1415... ? does-not-exist ;; => error
Lists are function calls. The first element is the function and the rest are the arguments.
(+ 1 2) ;; => 3 (+ 1 2 (* 3 4)) ;; => (+ 1 2 12) => 15
Because we aren't guaranteed to have a function at the first position, sometimes it may be an error to evaluate a list.
(1 2 3) ;; Error, 1 is not a function
Also, the function itself may be misused leading to an error.
(/) ;; Error, / requires at least 1 argument. (+ #true #false) ;; Error, + only accepts numbers.
A list can also have special behavior based on the first element. For example, an if statement.
(if #true (run-this-code) (never-run-this-code))
Or define can be used to set a variable.
(define x 12) x ;; 12
2. Data structures
We defined a basic datastructure in Lisp 1. Before continuing, they need a little more refinement. We want to support:
- Functions - To be defined in native Rust at the moment.
- Environment - A place where we can store and retrieve variables.
- Errors - Your standard Rust
enumbased error.
2.1. Function Type in Value
In our language, we have the concept of a function. For example, in (+ 1 2 3),
+ is a function that adds all its arguments. We will keep a function simple. A
function takes in any number of arguments and can return either the resulting
value or an error.
type Function = fn(&[Val]) -> Result<Val, Error>;
Which makes our Val:
#[derive(Clone, Debug, PartialEq)] pub enum Val { Int(i64), Bool(bool), Symbol(String), List(Vec<Val>), Function(Function), // Support for functions! }
2.2. Environment/Variables
In the symbol example, we referenced a variable.
pi ;; => 3.14
Even in (+ 1 2 3), we also reference a variable. + is bound to a
function.
To handle this, we will create an Environment. Values can be stored and
retrieved from the environment, similar to a hashmap. In fact, we'll back it
with a HashMap.
use std::collections::HashMap; #[derive(Debug, Default)] pub struct Environment { values: HashMap<String, Val>, } impl Environment { pub fn with_value(mut self, name: impl Into<String>, val: Val) -> Environment { self.set(name.into(), val); self } pub fn set(&mut self, name: impl Into<String>, val: Val) { self.values.insert(name.into(), val); } pub fn get(&self, name: &str) -> Option<&Val> { self.values.get(name) } }
2.3. Errors
We've been alluding to errors happening. Here are the final set of errors we use in this post.
#[derive(Debug, Clone, PartialEq)] pub enum Error { TokenizerError(TokenizerError), ValueNotFound(String), ValueNotCallable(Val), WrongType { expected: &'static str, got: Val }, WrongArity { expected: usize, got: usize }, Other(&'static str), } impl From<TokenizerError> for Error { fn from(e: TokenizerError) -> Error { Error::TokenizerError(e) } }
3. Eval
3.1. API
The eval function will take in an expression and an environment where it can store and retrieve values from. The behavior is based on the value.
pub fn eval(expr: &Val, env: &mut Environment) -> Result<Val, Error> { match expr { Val::Int(x) => todo!(), Val::Bool(x) => todo!(), Val::Symbol(x) => todo!(), Val::List(x) => todo!(), Val::Function(x) => todo!(), } }
For convenience purposes, we will also create eval_str which automatically
converts a string into a Val expression for us.
In case there are multiple expressions, we will return the final result. There are some other reasonable alternatives like returning a vector of all the values or maybe even returning an error if there is more than one expression.
pub fn eval_str(s: &str, env: &mut Environment) -> Result<Val, Error> { let mut res_val: Option<Val> = None; let mut tokenizer = Tokenizer::new(s); while let Some(expr) = read_next(&mut tokenizer)? { res_val = Some(eval(&expr, env)?); } res_val .ok_or(Error::TokenizerError(TokenizerError::UnexpectedEndOfInput)) }
Now all we have to do is implement the eval function.
3.2. Simple Case: Atoms/Constants
The case for integers and booleans is easy. They just evaluate to themselves. Another one that may not seem obvious is a function. When we encounter a function (that is not the first element of a list), we should just return that function as a value.
12 ;; => 12 #true ;; => #true #false ;; => #false + ;; => <function>
#[test] fn test_eval_int() { assert_eq!( eval_str("12", &mut Environment::default()).unwrap(), Val::Int(12) ); } #[test] fn test_eval_bool() { assert_eq!( eval_str("#true", &mut Environment::default()).unwrap(), Val::Bool(true) ); assert_eq!( eval_str("#false", &mut Environment::default()).unwrap(), Val::Bool(false) ); } #[test] fn test_eval_function() { fn no_op(_: &[Val]) -> Result<Val, Error> { Ok(Val::Int(0)) } assert_eq!( eval(&Val::Function(no_op), &mut Environment::default()).unwrap(), Val::Function(no_op) ); }
To implement this, we will simply return a copy of our input:
pub fn eval(expr: &Val, env: &mut Environment) -> Result<Val, Error> { match expr { Val::Int(_) | Val::Bool(_) | Val::Function(_) => Ok(expr.clone()), Val::Symbol(x) => todo!(), Val::List(x) => todo!(), } }
3.3. Symbol Case: Variables
A variable can be referenced by using a raw symbol. Let's say we manually define
x as 10 in our environment, then:
x ;; 10
#[test] fn test_variable() { assert_eq!( eval_str("x", &mut Environment::default().with_value("x", Val::Int(10))).unwrap(), Val::Int(10) ); }
To implement this, we will simply query the Environment.
pub fn eval(expr: &Val, env: &mut Environment) -> Result<Val, Error> { match expr { Val::Int(_) | Val::Bool(_) | Val::Function(_) => Ok(expr.clone()), Val::Symbol(x) => env .get(x.as_str()) .cloned() .ok_or_else(|| Error::ValueNotFound(x.clone())), Val::List(x) => todo!(), } }
3.4. Function Call Case: List Evaluation
List evaluation is more complex. Let's fork this off into a function.
pub fn eval(expr: &Val, env: &mut Environment) -> Result<Val, Error> { match expr { Val::Int(_) | Val::Bool(_) | Val::Function(_) => Ok(expr.clone()), Val::Symbol(x) => env .get(x.as_str()) .cloned() .ok_or_else(|| Error::ValueNotFound(x.clone())), Val::List(x) => eval_list(x.as_slice(), env), } } fn eval_list(expr: &[Val], env: &mut Environment) -> Result<Val, Error> { todo!() }
For our test, let's use the simple (+ 1 2 3) and only handle integers.
(+ 1 2 3) ;; 6
fn add(args: &[Val]) -> Result<Val, Error> { let mut sum: i64 = 0; for arg in args { let Val::Int(x) = arg else { return Err(Error::WrongType { expected: "Int", got: arg.clone(), }); }; sum += x; } Ok(Val::Int(sum)) } #[test] fn test_add() { let mut env = Environment::default().with_value("+", Val::Function(add)); assert_eq!(eval_str("(+ 1 2 3)", &mut env).unwrap(), Val::Int(6)); }
Let's see what it takes to evaluate a list. In (func a b c), what do we need
to do? We need to get the arguments, and pass them to func. So there are
roughly 3 steps:
- Evaluate the first argument. If it does not resolve to a function, then it is an error.
- Evaluate the arguments and collect them.
- Call the function with the given arguments.
fn eval_list(expr: &[Val], env: &mut Environment) -> Result<Val, Error> { // 1. Evaluate the first argument. let (func_expr, arg_vals) = match expr { [] => return Err(Error::Other("empty function call")), [first, rest @ ..] => (first, rest), }; let func_val = eval(func_expr, env)?; // 1a. If it does not resolve to a function, then return an error. let Val::Function(f) = func_val else { return Err(Error::ValueNotCallable(func_val)); }; // 2. Evaluate the arguments and collect them. let args: Vec<Val> = arg_vals .iter() .map(|x| eval(x, env)) .collect::<Result<_, _>>()?; // 3. Call the function with the given arguments. f(&args) }
Any sufficiently complicated [program] contains an ad-hoc, informally-specified, bug-ridden, slow implementation of half of [Lisp].
— Greenspun's Tenth Rule
Tada 🥳! We have a basic interpreter that is relatively powerful despite the simple implementation.
4. Special Forms
Not all expressions follow the standard evaluation rules. These are called
special forms. In a normal function call like (+ 1 2), all arguments are
evaluated before being passed to the function. Special forms have their own
evaluation rules. We will introduce the special forms if and define.
4.1. If
The if special form takes 3 arguments: a condition, a "true" branch, and a
"false" branch. Only one branch should be evaluated based on the condition.
(if #true 1 2) ;; => 1 (if #false 1 2) ;; => 2
If if were a normal function, both branches would be evaluated. This would be
problematic for side effects. For example, this program would be broken if if
were a function call!
(if #false (crash-my-program!) 42)
#[test] fn test_if_lazy_evaluation() { fn crash(_: &[Val]) -> Result<Val, Error> { panic!("This should never be called!"); } let mut env = Environment::default() .with_value("crash-my-program!", Val::Function(crash)); // The true branch (call crash-my-program!) should not be evaluated on // #false. assert_eq!( eval_str("(if #false (crash-my-program!) 42)", &mut env).unwrap(), Val::Int(42) ); }
4.1.1. Implementation
We need to handle if before attempting to call it as a function. Let's add an
escape hatch for our special forms in eval_list:
fn eval_list(expr: &[Val], env: &mut Environment) -> Result<Val, Error> { let (func_expr, arg_vals) = match expr { [] => return Err(Error::Other("empty function call")), [first, rest @ ..] => (first, rest), }; // Handle special forms if let Val::Symbol(s) = func_expr { match s.as_str() { "if" => return eval_if(arg_vals, env), _ => {} } } // Normal function call let func_val = eval(func_expr, env)?; // ... rest of function unchanged }
Now, it is easy to evaluate our if expression.
- First we need to deconstruct our arguments into the
condition, thetrue branchand thefalse branch. - Then we evaluate our
condition. How do we evaluate our condition? Well,evalcan evaluate an expression given an environment. We can call that. - Based on the result, we can then choose to evaluate the
true branchorfalse branch.
fn eval_if(args: &[Val], env: &mut Environment) -> Result<Val, Error> { // 1. Deconstruct our arguments. let [cond, true_branch, false_branch] = args else { return Err(Error::WrongArity { expected: 3, got: args.len(), }); }; // 2. Evaluate our condition. let cond_val = eval(cond, env)?; let Val::Bool(b) = cond_val else { return Err(Error::WrongType { expected: "Bool", got: cond_val, }); }; // 3. Evaluate the chosen branch. if b { eval(true_branch, env) } else { eval(false_branch, env) } }
4.2. Define
The define special form binds a value to a name in the environment. The first
argument is the symbol (not evaluated), and the second is the value (evaluated).
Here is a simple definition and variable reference:
(define x 10) x ;; => 10
#[test] fn test_define() { let mut env = Environment::default().with_value("+", Val::Function(add)); eval_str("(define x 10)", &mut env).unwrap(); assert_eq!(eval_str("x", &mut env).unwrap(), Val::Int(10)); }
Now here is an expression assigned to a variable:
(define y (+ 1 2 3)) y ;; => 6
#[test] fn test_define_with_expression() { let mut env = Environment::default().with_value("+", Val::Function(add)); eval_str("(define y (+ 1 2 3))", &mut env).unwrap(); assert_eq!(eval_str("y", &mut env).unwrap(), Val::Int(6)); }
To implement this, we can add another escape hatch in eval_list for
define. Then we implement define.
fn eval_list(expr: &[Val], env: &mut Environment) -> Result<Val, Error> { // ... destructure expr into func_expr and arg_vals ... // Handle special forms if let Val::Symbol(s) = func_expr { match s.as_str() { "if" => return eval_if(arg_vals, env), "define" => return eval_define(arg_vals, env), _ => {} } } // ... rest of function unchanged }
Define can be implemented with a similar philosophy to function evaluation and if.
- Deconstruct the args into the
nameof the variable and theexpression. - Evaluate the
expression. - Assign its result to
name.
fn eval_define(args: &[Val], env: &mut Environment) -> Result<Val, Error> { // 1. Deconstruct the name and expression. let [name, expr] = args else { return Err(Error::WrongArity { expected: 2, got: args.len(), }); }; // 1a. Name must be a symbol. let Val::Symbol(name) = name else { return Err(Error::WrongType { expected: "Symbol", got: name.clone(), }); }; // 2. Evaluate the expression. let val = eval(expr, env)?; // 3. Assign the result to name. env.set(name.clone(), val.clone()); // We must return a value, so here you go. Some Lisps choose to return an // "undefined" value. Ok(val) }
5. Conclusion
With function evaluation, if and define, we can now write some non-trivial
programs. In the next part, we'll add user-defined functions with lambda.
Preview:
(define foo (lambda (a b c) (+ a b c))) (foo 10 20 30) ;; 60