How to Build a Programming Language And Why With Bob Nystrom

Introduction

Adam: Hello, this is Adam Gordon Bell. Join me as I learned about building software. This is CoRecursive.

Bob: If you’re working on a language that has real users, the error messages, the error handling strategy of your parser is like the main user interface of your language. So this is what you think about all day every day, is error recovery, good error messages, graceful error handling. Because if you watch what a user does when they’re sitting at their IDE, more than half the time, the code is in a broken state.

Adam: Was Bob Nystrom. He is going to teach us about an area of fundamental computer science, but in a practical way, building a language implementation. Bob is writing a book about building interpreters. And today, he walks us through the process for building our own programming language and language implementation. If you don’t intend to build your own programming language, the skills and tricks that language authors use to parse unstructured texts into a form that can be executed, it really comes up all the time in day-to-day programming. And if you find yourself reaching for regex a lot, I think that this knowledge will be quite helpful.

Today, I’m speaking with Bob Nystrom, author of Crafting Interpreters. Bob, let’s talk about interpreters.

Bob: Oh boy, it’s my favorite subject.

What Is an Interpreter?

Adam: I always start off with these definitions to get the context for things. I think it’s going to be challenging in this case. What’s an interpreter?

Bob: Oh man, we can have the 30-second conversation about that, the five-minute, or maybe the three-hour conversation. I actually spent a decent amount of time in one of the intro chapters of the book trying to come up with a good answer for what is a compiler and what is an interpreter. And I’m still not entirely satisfied with the answer. I think the short answer is, an interpreter is a program where if you throw some source code at it, it will run it. I think that’s probably the best one-sentence answer I can give. How does that sound?

What Is A Compiler?

Adam: That’s a good definition. What is a compiler then? Well, define a compiler in comparison, maybe. What do you think?

Bob: That’s the harder question. One of the things that’s tricky about this is, because we’re programmers and we tend to like things being black and white, we like really crisp definitions for things. And it’s important to have canonical definitions because that’s how we communicate with each other. If I use a word and you think I mean something different, then we’re not effectively sharing information. So there’s this goal of clarity, but at the same time, our industry is getting older and naturally words change in meaning over time. So if you ask what a compiler means, I think you’ll get different answers if you ask different people and you’ll get different answers over time.

And that doesn’t mean that those answers are wrong. It just means the word gets harder to use effectively. But the pragmatic informal answer is that people think of some kinds of programming language implementations as being compilers and some kinds of them as being interpreters. And I think there are a bunch of them that people have rough consensus on. Like if you ask a bunch of people, is GCC a compiler? They’ll say yes. And if you ask a bunch of them if Python is an interpreter, most of them will probably say yes, but then it starts to get more interesting.

One way to look at it is, there’s the user experience answer, which is that a compiler is a tool that takes some source code and turns it into another form, but doesn’t actually run it. So if you have some code that says, Print, hello world, a compiler is not going to put hello world on screen, it’s going to produce some other output file that then you could use to put hello world on screen, but it won’t do it itself, whereas an interpreter will. Maybe that’s one way to think of the distinction between the two.

Adam: So it’s like interpreter has two steps where a compiler is just the first step and then it has to additionally be run, in the simplest form?

Bob: Right. Or another way to think of it is, those two steps exist, but in a compiler, the developer sees them explicitly, whereas with an interpreter, they’re both bundled up together.

Why Write an Interpreter?

Adam: So what would you say is the benefit to writing an interpreter or just a language implementation? You’re writing a book about doing this, so I assume you think it’s important.

Bob: No, not important. I’m not the person who thinks about things at that level a lot. The short answer is, I just think programming languages are super cool. Who wouldn’t want to make a programming language? It’s awesome. Super fun. To some degree, one of the categories of software that I find most rewarding is ones where I write a program of some finite length and it can do a combinationatorial number of interesting things. I was listening to the interview you did with Jamis Buck a little while ago, and maze generators are a good example of that. You write this little 100-line program and it can generate an infinite number of mazes, and that feels like this exponential power amplification.

And programming languages are like that. You write an interpreter, maybe it’s 1,000 lines of code, and you can throw an infinite number of programs edit and it can interpret all of them, and that’s like a really cool sensation. So one answer is just like, this is a cool software. And they avoid like a lot of the grungy stuff about programming. Like if you want to make a like a web app or something, 90% of your job is just looking up stuff on Stack Overflow and dealing with cruddy APIs. It’s just like taping stuff together. It’s not super intellectually rewarding.

Whereas, if you’re implementing a programming language, it doesn’t actually touch the outside world very much. It’s mostly like, it just needs us a string of input and maybe it produces a string of output or prints some stuff to standard out, but then there’s this big ball of code that you get to right in the middle. So you spend most of your time mentally focused on the interesting parts of the problem and the domain, which is nice in terms of like, “Is this a fun kind of code to work on?” It’s fun in that sense.

Adam: Yeah. And it’s like working from first principles… When I build a crud web app, I guess, I don’t know, databases and web servers do a lot of the heavy lifting and I’m just writing the glue.

Bob: Yeah. And to some degree, that’s good, because it means you’re effective. It’d be really hard to make a web app if you had to like build your own database engine from scratch every time. But if you’re a bottom-up thinker and you’re interested in what’s really going on under the hood, then programming at that level maybe isn’t quite as satisfying.

On Not Using Libraries

Adam: Yeah. In your book, you build this programming language called Lox. What libraries do you use to build that?

Bob: None. One of the main conceits of the book, if you are a programmer, you were a consumer of languages, so they’re relevant to your life, but they have a reputation for being mystifying, opaque, something like, “Oh, programming language people, those are different kinds of people. They have these special skills,” and it can be intimidating. So one of my goals of the book is to demystify that and to make it more approachable and say, “Yeah, these are just programs. They were just written in some code and they take in text and they output text and there’s not any real magic going on there.”

And my goal is, there’s a meta goal here, which is, hopefully I can make people feel a little more confident and courageous in their ability to understand things, even things that seem intimidating or unapproachable at first. So that’s like the conceit of the book. And in order to make that tangible, the structure of the book is, it shows you every single line of code in two complete interpreter implementations and it doesn’t defer to any compiler compilers or third-party libraries or anything. So we use the Java standard library array list and hash map, we use the C standard library, which is nothing right off, and that’s it.

And then I walk you through every single line of code, and it’s not actually that much, which is interesting. You can actually build a fairly full featured interpreter in just like, I think each of them is under 2,000 lines of code.

Lox Language

Adam: Oh, wow.

Bob: Yeah. It’s pretty cool. And I think it’s a cool lesson to show that, “Yeah, this is how simple they can be.” It’s a really simple language. And in particular, it has a really limited core library. It can print, and that’s about all it can do. And that stuff tends to be a lot of the surface area of a language implementation. But in terms of language features, it’s fairly full featured, it has a complete syntax, it has like infix operators, operator precedence, statements, control flow, function calls, recursion, classes, single inheritance, super calls, field, methods, closures, full lexical scope. It’s semantically pretty close to JavaScript except class-based instead of prototype based. So it is a real language and it doesn’t actually take that much code to implement one, which is interesting.

Adam: That is interesting. And it sounds like it has more features than at least most JavaScript until recently lacked.

Bob: Yeah. It’s a strictly smaller language than JavaScript, but JavaScript has a reputation for seeming small because there’s this perception that prototypes are conceptually simpler than classes. There isn’t really like complexity wise, that much difference between the two. So the fact that Lox has classes doesn’t really make it much bigger than JavaScript, and JavaScript has just a lot of stuff, especially over the past few years that it’s accreted over time. You can think of Lox as like a toy JavaScript, maybe.

Adam: Okay. Why did you choose something JavaScript-esque as your language to build an implementation for?

Bob: It’s sort of coincidental. Syntactically, I wanted something in the C tradition, so curly braces, and semi-colons just to, one, to have something fairly familiar and approachable because if I’m teaching someone how to write an interpreter for the first time, how to write their own parser and stuff, which they’ve never done before, I don’t want to also force them to learn a new style of syntax. And I didn’t want to do something, a LISP Black syntax, because one of the things that I think is interesting to talk about is like, how do you write a parser for a language that actually does have rich syntax on operator precedence and stuff like that?

Learning Compilers in School

Adam: In school I took a course that was on compilers or whatever. We had the Dragon Book and our big final project was translating scheme into Java. I get what you’re saying, because something with as expressions, it seems like cheating because it’s already kind of parsed.

Bob: Yeah. You technically have to still write a scanner and a parser, but it’s almost trivial. And in some areas, that’s a totally good trade-off to make. If the interesting things you’re trying to present to someone are not around syntax and parsing, then it’s like, “Yeah, what’s the minimal amount of syntax and parsing we can do to get to the stuff we care about,” which in your class is cogeneration and semantics and things like that. But for me, if I’m trying to get someone to understand how does an actual programming language work? Most of the languages they use do have syntax, do have more complicated syntax. So they won’t understand how that works unless I show them a parser that can handle that kind of syntax.

Lexical Analysis

Adam: Yeah. I like this approach, very pragmatic. So where do we start? So we start with a string of, or a file of texts, and then what do we do?

Bob: So we basically move front to back. So you start with a string and that’s literally just a sequence of characters. We could have a long discussion here about stringing coding and UTF-8 and ASCII and Unicode, but sidestep all of that. So the first thing you do is this phase called lexical analysis, which is the fancy term for it. They call it scanning and lexing. Basically, you can think of it as taking a string of letters and turning it into a string of words. So you end up with something that’s still linear, it’s still a sequence, but it’s been chunked together. So maybe you start with the characters V-A-R. And what you end up with is a single token that represents the keyword Var.

Things like punctuation, parentheses become single character tokens, things like identifiers and keywords become tokens that have multiple characters within them. So then what you’re left with is you have this string of tokens so this is your words. The next phase of that is parsing. So if you’ve done, depending on how your English education was, there used to be a thing where people were taught how to diagram sentences.

Parsing

Adam: Oh, I hated that. It never made sense to me.

Bob: Well, if I got the book for you. So diagramming sentences is basically what a parser does. So it takes a series of words and it says, “Okay, there’s actually this tree-like grammatical structure in there.” And what a parser does is it builds that tree. So for example, if the tokens you get are a one token followed by a plus token followed by a two token, the scanner doesn’t know that there’s any relationship between those, but the parser will say, “Okay, that’s an addition where the plus is the operator and one and two are the two operands, those are children of that addition tree node.

So the parser does that to your entire program. It turns it into, or at least each source file individually, it builds a big tree out of it. You have statements that contain key words and expressions. Expressions may contain subexpressions, all the way down to the leaves. And the leaves are little things like the literals, numbers, variables, stuff like that. So that’s your parser. So you get this grammatical structure out of it so you know how things are nested and how things relate to each other, but you don’t actually know what anything means. So if you have if you have an expression that’s like A plus B, you know you’re adding A and B, but you don’t actually know what A and B are and you don’t even know what they refer to.

Adam: To make sure I’m following. So we start with a string of just characters, so we have Var X equals seven, and then we turn that into like token. So a token Var, a token X, token equals, token seven. And then we turn this into a tree. So what’d our tree look like?

Bob: So that tree is going to say, “Okay, the root node is a variable declaration statement, and the name child node of that variable declaration is going to be X. And the initializer expression node is going to be seven, which is an integer literal expression node.” It ends up blowing up into something that feels like verbose and legalese because you’re tracking, not just the pieces of everything now, but what they represent in the grammar and how they relate to each other.

Adam: That makes sense.

Bob: Yeah. So that’s what the parser does. And then either after that or during that, there’s a resolution process, which is where you start figuring out how different parts of the program refer to each other. So if you see an expression that mentioned some variable A during resolution, that means you go and you find the syntax for a variable declaration where a variable name day was being declared, and you say, “Okay. I understand the scoping rules of the language, so now I know that this A expression is referring to that A declaration.” So that’s the point where you start to get some actual semantics, you start to know what the program means.

Adam: So if I have that Var X equals seven, and then if I have something else that says like, Y equals X plus seven, then somehow I link those two in the tree or did I lose the plot?

Bob: Think about it starting at the very top level. In a scripting language, a program is a series of statements. So then the root node of your program is going to be this program node and it will have this list of child statement nodes. And you walk them one after the other. So you’ll have one of those statement nodes, maybe a variable declaration for Var X equals seven. You may have another one for Var Y equals three, and maybe you’ll have another statement that’s like Print X. And then during resolution, you’re walking those statements in order and you’re keeping track of scopes as blocks, begin and end.

And then when you walk down that tree into expressions and you start finding variable references, you can look back at the variable declarations that you’ve already seen and say, “Okay, well, I did see a declaration for X before, so I know that this X expression now is referring to that one.”

Evaluation

Adam: That makes sense. I was thinking, this is why I’m not a language designer, my implementation would be like just some hash map that I carried around and I just threw things in it, and maybe I put some key on the front for like scoping.

Bob: Yeah, your intuition is exactly right. So the way that you do resolution is you build this thing called a symbol table, and it’s exactly what you say. It’s a hash table that is keeping track of the variables that have been declared that are currently in scope. Actually, in a lot of implementations, what you do is you do a stack of hash tables because you have nested blocks scopes. So you get a hash table for each scope and then you stack them up as you get into inner scopes. So the way it works is during resolution, you’re walking through the tree that the parser produced, and each time you get to a variable declaration, you take that variable name and you add it to the hash table and you say, “Okay, now I know that that variable exists.”

When you reached like the open curly brace of a new block, you push a new empty hash table onto that stack because now you’re in a new scope. When you get to the right curly brace, you pop that hash table off because now all of those variables have gone out of scope and they don’t exist anymore. And then whenever you find a variable expression node, you just look up in that stack of hash table and you say, “Okay, can I find a key with that name?” And if so, then that variable exists and it’s in scope. There’s interesting questions around, what do the values in that hash table represent? Do they point to declarations? That depends on what your implementation is trying to do.

But in general, yeah, you have this stack of hash tables and that represents the language’s understanding of the scope and the declarations that are available at any point in time.

Getting Into the Details

Adam: One thing I like about getting in there and understanding this parsing and evaluation step is, I feel like it gives some intuition for, I don’t know, aspects of languages that are as a result of their implementation. And I think you mentioned in your book… I’m thinking of like Turbo Pascal, I think it is, you have to declare all your variables at the top. This is probably has to do with how they build this tree of state or the stack of-

Bob: It’s exactly right. And it’s actually an interesting historical artifact too. Because you’re building a compiler or an interpreter, when you refer to a variable, you need to be able to figure out which declaration that variable refers to, especially if it’s a statically typed language, because you need to know the type of that variable to know what to do with it, to know how to generate code for accessing it or even to be able to interpret it on runtime. So the easiest way to make sure that that always happens is that the declarations always come before the uses of the variables.

And for local variables, we take for granted because if you use a local variable before it’s declared, it doesn’t have a value anyway, so the language forbids that. But it’s a little more interesting to think about function calls. We definitely do take advantage of recursion and we take advantage of mutual recursion so you can have a function A that calls some other function B, and under some circumstances, B may call back to A, something that we use fairly frequently. But if you think about what the compiler has to do so it will be processing the body of that A function and it’s going to see a call to B, and it hasn’t processed B yet, it hasn’t reached the declaration of the definition of B. And you can’t move B first because then you would get the exact same problem in the other direction because the body of B refers to A.

So there’s this question, how does the language implementation handle this? And we take for granted now that languages like JavaScript and Python and Java, this just works. So in Java, you can have a bunch of methods in a class and you don’t have to think about the order that those methods appear at all because Java just makes it work and they can call each other freely and everything’s fine. The way that works under the hood is, when Java’s doing resolution, it’s a two-stage process. So it parses the whole file, and it walks it once, finds all the declarations for the methods, and it keeps track of everything. It needs to know about the existence of those methods, but it doesn’t step into their bodies yet. It just says, “Okay, I know that there’s this method and here’s the parameters it takes.”

And then it walks through it again, and then goes into the bodies and then it can start actually compiling calls to those methods. And now that it knows every method already, it can compile a call to any of them, so recursion just is totally fine and everything’s fine.

Historical Language Perspective

Adam: It makes me think of like a code style, some people will, let’s say Java, they’ll Java with all of the, like helper, private things at the top and then the thing that calls it at the bottom. And I prefer the opposite where it’s a top-down, where the first thing you see is the top level thing. But this bottom-up style, I think it’s because of what you’re describing, from languages where you needed to declare things before you could call them.

Bob: Right. So historically, we take for granted that Java can do that now, because it’s like, “Yeah, that’s fine. Just parse the whole file and then walk it twice. No big deal.” But if you were Nicholas Wirth in the ’70s, writing a Pascal compiler, the computer that it was running on literally didn’t have enough memory to hold the entire source file that you’re compiling, so you couldn’t walk the whole file multiple times, or if you did, you would have to actually reparse it multiple times, which is really computationally slow. So C and Pascal and a lot of other languages from the time were designed such that you could only call functions whose definition or declaration you had already seen.

So that’s why Pascal forces you to put all the types first and then all the variables and then the code that uses the variables, because it ensures that everything the compiler needs to know, it will have seen before it needs to know it. And that’s why in C you have explicit forward declaration. So anytime you want to do recursion in C, you have to do the separate syntax that says, “Here’s the function. I’m not giving you the body, I’m just going to tell you the structure of it,” so that the compiler sees that before it calls and compiles any calls to it.

And it’s purely just like a historical artifact, we just didn’t have enough memory. But then there is a cultural effect, because you do see some programmers, they naturally order their functions in bottom-up order versus top-down order, and a lot of the ones who do it in bottom-up order are older programmers coming from C where that is the way you do it because you don’t want to make unnecessary forward declarations. So you want to make sure that your helpers are first so that they’re fully defined before you compile a call to it. And so weird that this hardware limitation lingers on in the culture, right?

Adam: Yeah. And if nobody rethinks it, this cultural aspect can travel forward in time way past when it’s necessary. It’s interesting. We need an anthropologist or something.

Programming Language Anthropologist

Bob: Yeah. I’m a college dropout, so I don’t have a degree, but when I was in college, I was also simultaneously working on a minor in sociology. And that whole side of things is super fascinating to me, how programming culture is built and how it’s shared over time and changes over time, I think is super fascinating. If there was a job for being like a programming language anthropologist or something, that would be the career for me.

Adam: Nice. You can make it up. Let’s make it up.

Bob: Yeah, maybe I should.

Parser Generators

Adam: I don’t want totally skip over parsing. As I said, I did this class back in university, it’s been a long time. But what I recall is that we used some like parser generator where we-

Bob: Yeah, you probably used Yacker, Flex, maybe Bison.

Adam: Yeah. I don’t even know which one, but my question is, why aren’t you using a parser generator?

Bob: There’s a couple pieces to that. So one of them is, if we do use one and the reader doesn’t know how that tool works, they may be left feeling that there’s still some magic, where they’re like, “Yeah, I understand this, but I couldn’t have written that tool myself so I still really don’t know what’s going on. I’m still obligated to rely on someone else’s expertise to understand how a language works.” So I wanted to make sure that it didn’t have that problem, which is the philosophical reason. The pragmatic reason, strangely, a lot of culture around parsers and a lot of cultural tension around programming languages in industry versus academia around parsers and parser generators and stuff like that.

So academia has this, it leans towards things that are publishable, and there’s a certain amount of math envy in the PL community. So the ideal programming language paper lets you prove things about a programming language. When they do parsers, they really like formalisms. And that also leads them in a direction of, they like parser generators because they say, “Okay, well I can prove that I can write a tool that will correctly parse grammars of this nature then everyone will just use that tool and everything’s better. And I’ve demonstrated something universal about grammars maybe.”

So in academia, a parser generators are really popular. And if you read the Dragon Book, if you read Compiler Is Principles, Practices, and Tools, something like that. If you read the Dragon Book, a big chunk of it, when it’s talking about parsing, it’s talking about parser generators and the discoveries that they made about what you can prove about certain classes of grammars, and then taking those discoveries and then showing how you can use those to make your own parser generator. So that’s a big thing in academia.

Adam: I think that I skipped an important part of this, which is I should define a parser generator.

Bob: Oh, sorry. Yes. Parser generator is a piece of software where you give it a textual description of your language’s grammar and it will spit out a chunk of code that is an implementation of a parser for that language, which is a weird meadow thing to think about because then the parser generator is also a parser because you’re giving it this grammar specification that is itself a text file, so then you’re like, “How do you write your parser generator?”

Adam: I never thought of that. So it emits a program that given your string, will produce the tree?

Bob: Right. Exactly right. You can think of it as it’s basically a domain specific language for parsers. So these grammars are written in a high level declarative language, and then it spits out an imperative parser for you in C or Java, whatever your implementation language is. So you can do that, you can use one of these parser tools and then all you have to do is write down your grammar and it will give you a parser implementation for you, or you can handle all your parser, you just write the imperative code yourself by hand. So in the book, we handle the parser, we don’t use a partial generator. The part of it is because I don’t want people to think that, “Okay, well the parser generator’s magic. So I still don’t know what’s going on.”

But also part of it is that if you look at language implementations that people actually use, most of those in fact do have hand-rolled parsers. So when you’re learning about programming languages, you’re usually learning about it from material coming out of academia, you’re learning about it through textbooks and stuff like that. They all tell you like, “Obviously, you should be using a parser generator. Parser generators are way better.” And that is the dominant trend in academia. But meanwhile, over an industry these people are not writing books, but they’re like, “Actually, we just hand rolled a parser.” And that’s what they do.

I like hand-rolled parsers, I think they’re cool. So my book is focused on, if you want to understand how the language is, you actually use your implemented, or if you’re likely to build one yourself that is similar to how people write them, a hand-rolled parser is actually what you’re most likely to do.

Academia

Adam: So this is the real world implementation or a form of it?

Bob: Yeah. I feel like I’m slagging off on academia here. The fact that partial generators exist is really cool, the technology behind them is really interesting, the algorithms are interesting. And there are a lot of good things about using them. It’s the exact reason why all declarative languages are useful in that they let you express your intent at a higher level, and they save you from doing a bunch of grungy work. So parser generators are cool, it’s worth exploring them. But for the book, I think of them as non-essential, and they weren’t on the critical path of me getting you to understand every bit of how a language implementation works.

Adam: Yeah. If you’re going to build your own language and it’s a learning exercise, it’s helpful to do it yourself. I think what you’re saying is on the other end of the spectrum, if this is a real deal, programmers are going to use this every day language, then maybe there’s performance gains or optimizations or something that are hand-rolled?

Bob: Yeah. So there were pragmatic reasons to want to hand roll your parser. They can be more of a chore to maintain because you’re handwriting them in a lower level, so it’s more tedious, but in return for that, performance questions are a little hard to answer, but I think in general, they can be fast and fast enough. But in particular, what you find is that hand-rolled parsers, it’s much easier to have high quality error handling. There are parser generators that can do that okay, but a lot of times with it just a typical parser generator, if you don’t put a lot of effort into it, the parser you get out of it, if the user types in code that has a mistake, will just give you some impenetrable error.

If you’re just trying to write a really simple language implementation so that you can prove some other properties or say something interesting about programming language theory, that doesn’t matter because you’re not super interested in incorrect code anyway, but if you’re working on a language that has real users, the error messages, the error handling strategy of your parser is like the main user interface of your language, so this is what you think about all day every day, is error recovery, good error messages, graceful error handling. Because if you watch what a user does when they’re sitting at their IDE, more than half the time the code is in a broken state, right there in the middle of typing stuff, they make mistakes.

Errors Are the User Interface

So error handling is a huge part of the job. And if you hand roll your parser, that’s all your code, so you can put whatever, a little weird error handling edge cases in there that you need to without any problems, whereas if you’re relying on code, that’s generated from a parser generator, it’s harder to put that stuff in there

Recursive Descent Pratt Parsing

Adam: In your language, I forget how many lines you said it was, how much of it is parsing? Is that a bulk of it or a small amount?

Bob: No. It’s not the bulk of it. A hand-rolled recursive descent parser, which is the strategy we use for most of the parsers, it’s actually fairly straightforward, there’s not too much overhead. If you want to listen to me click around, I could even get the line count, but it’s not the majority of the code base. That’s one of the things that I found interesting because when I was first getting into programming languages, parsing is intimidating because you read the literature and they spend a lot of time talking about it. So it makes it seem like, man, parsing is like a really big deal. There’s all these formalisms, they’re talking about like LR(k), LL(k), LALR. And they made all these tools to automate it for you.

And it’s just like, “Man, this has to be heavy stuff.” But then you learn about recursive descent, which is the simplest way to hand-roll a parser and it’s just imperative code for… Basically, it looks like if you take the textual description of the grammar, which is you could say like an, IF statement starts with the keyword, IF, and then the left parenthesis and then there’s a condition expression, a recursive descent parser is basically just that in imperative code where you say, “Look for the F key word, look for the left parenthesis, parse the condition expression.”

So there’s not really that much to it, but I think the reason that there’s a lot of focus on it in academia was just because there were a lot of fruitful things that they could research about it. But in terms of, is it particularly difficult or important when you’re making your language implementation today, it’s like, no, it’s just a kind of thing you just do.

Adam: I have a related story. I was reading your book and another one, and then I had a ticket at work to basically look at what Python libraries have of CVEs like security issues. So there’s like a GitHub project that’s crowdsourced, it’ll basically give you a string that says like, if it’s greater than or equal to version 1.07, but not less than two or if it starts with 3.0X. So I had to basically determine if a library meets that rule and therefore, it has a security condition. And I had done this in the past and it was a big hairy regexes, I guess, you make some assumptions about the format.

So I ended up building this parser and it worked well. It’s actually not that complicated, I’m going to agree with you.

Bob: Awesome. That’s good to hear. I went through the same phase when I was first learning programming, and it was where I’m like, “I need to take this string and break it into pieces. Obviously, I’m going to use regex.” And that gets you far enough to get yourself out of the weeds for really simple things that works, but once you get to something that starts to feel like real nested expressions, then you hit a wall and then it feels really difficult. And then you have this mentality of parsing must be really hard because all of a sudden this got way more difficult. But I think a lot of it is just that you hit the point where a regex isn’t the right technique.

And if you don’t know what technique to use instead, then the difficulty ramps up. But once you wrap your head around a straightforward, like scanner and a simple like Pratt parser or recursive descent parser, you’re like, “Okay, I know what to do now when I’m presented with a problem like this.” And then you just use that technique.

Adam: Yeah. I think the tricky thing is taking this list of tokens and then turning that into a tree. And especially when the tree basically precedents rules are complicated, is what I would say.

Bob: Yeah. And it is a little tricky, once you learn a couple of techniques and wrap your head around it, you’re like, “Okay, it’s not too bad.” But it is difficult, especially at first, because one, it’s recursive. You’re parsing an arithmetic expression and some of the sub expressions are also arithmetic expressions. And recursion itself is hard for people to wrap their heads around. It takes a while to learn it. The other thing is, it was hard. If you go back the old, early computer science literature when they were building their first compilers, famous people that have Wikipedia articles, they struggled with this stuff too because it was totally new to them like it’s new to us now.

And it turns out, it is difficult if you don’t have those techniques, and it’s even harder for them because they had to invent the techniques. But it is pretty difficult to wrap your head around, you’re stepping through this line of tokens one at a time, but you’re trying to build up a tree that you don’t know what the tree is going to be. It is a weird thing to wrap your head around.

Adam: Totally. That’s why in this thing that I’m describing, there is actually a comment with a link to a blog post that you wrote, and hopefully, some future me will know what it means.

Bob: There’s such a cool feeling when you find those techniques where it’s like, “Oh yeah.” Once you have the mental model, then it’s like, “Oh, I can solve this. I can solve an entire class of problems now.” And it feels you just grew another appendage or something. And especially if you’ve really tried to meander your way through it without that technique, so you really feel the pain of not knowing how to solve it. Then once you get that, you learn that one trick and then you’re like, “Oh wow, I can do this thing.” It’s a really cool sensation.

Adam: Yeah. That makes me think of, there was this blog post series way back with Ron Jeffries, and he was trying to make a Sudoku solver, and it’s just hard. And so he wrote 10 articles and then never got a solution. And then… I’m trying to think of the other guy’s name. Anyway, somebody else was familiar with constraint propagation, it’s just a way to track and find a solution that works. And they just had 100-line Python solution, but it’s not that they were genius it’s that they knew, “Hey, there’s this class of problems and this fits within it.”

Bob: Yeah. They have the right tool.

Adam: Yeah. How did you get started on building programming languages?

How Did This All Start

Bob: Man, it’s an interesting question because I think the answer that I’ve given has changed over time, even though the history hasn’t. So the usual anecdote I give is that my oldest daughter had just been born, so I had a chunk of paternity leave and I also had some vacation time at the same time. So I was off work for a big chunk of time. And we were on this weird schedule where my daughter was waking up in the middle of the night a lot, so I would just take all the nighttime feedings, and then my wife would get the daytime feedings. And I was basically on the night shift. So I was just nocturnal for a month or so.

And I had a lot of downtime while she was sleeping. So I was like, “Okay, well I need something to do.” And I’d gotten this book on F# because I wanted to learn F#, it is an interesting language. I was mostly a C# developer at the time. I needed something to write an F#. And the book had a couple of chapters on, F# has built into it, a version of Lex and Yacc, which are scanner and parser generators. So I was like, “Oh, this is interesting. Maybe I’ll try to make a little compiler a programming language thing.” And I started tinkering on that and I was like, “Man, this is the greatest thing ever, basically.” And I got super into it.

That’s where it started, but thinking about it after that, there are definitely points prior to that where I had more interest in programming languages and I had an affinity for it. So I have a memory of… My first programming language was BASIC when I was in middle school, elementary school. And I can remember doodling on a notebook in school and imagining if I were to make my own programming language, what keywords I would use and stuff like that. And thinking that was this really fascinating thing, even though I had zero idea as to how BASIC worked, it was totally magic to me, but there were still some part of my brain that was like, “I want to make a language, doing this seems cool.”

Adam: That’s great. What was in your language? I think I’ve had the similar thought as a kid, but mine would be to print something, you’d have to say like, “Adam’s awesome.” And then give the string.

Bob: Yeah. Basically at that level. Not that I remember any details about it, but something along those lines.

Magpie Lang

Adam: Did your F# language experiments bear fruit, did you end up playing around or implementing something?

Bob: I started making this little hobby language called Magpie that went through a bunch of different incarnations over time. You can basically watch me discover different programming language concepts by seeing me reboot the language in different ways. So it was initially statically typed and had an ML-type system because F# has a type system like that. And this was a brand new thing to me, and I was like, “I want to make a thing like that.” And then at some point. I discovered optional typing and I was like, “I should make it an optionally type language.” And then at some point I discovered multi-method and I was like, “I’m going to reboot it around multimethods.”

And then it fizzled out, it never had more than zero users. So in that sense, it was not successful, but in terms of… I learned a ton of things, writing it, I had a lot of fun writing it. And I wrote some blog posts that people have read and maybe that helped them understand bits and pieces of other languages that they actually give a damn about. So maybe it provided some value to other people too. It felt fruitful to me, even though no one uses it.

Icon - Programming Language History

Adam: It’s interesting in computer science, it seems to me there isn’t very many areas where you can do this historical exploration, but it sounds like you’re uncovering existing old techniques.

Bob: Yeah. For whatever reason, a big part of my brain is just really fascinated by programming language, history and features. Some people just have this encyclopedic memory for baseball scores, they can tell you like, who won the world series in 1967, I’m not that guy, but I am that guy for programming language history and language features. And I don’t know why I’m that guy, but I definitely, that stuff just sticks, and I spend a lot of time on Wikipedia and I’m just like, “I don’t know why, but I just find all the stuff fascinating.”

Adam: What’s an interesting programming language, historical anecdote?

Bob: They’re all interesting, they’re each beautiful snowflakes. Here’s a relevant one. A while back, I was reading about this language called Icon, which I can’t remember, I’ll get the details wrong here, but I think it grew out of the snowball family, but it’s like a scripting language high level. The interesting thing about it is it has generators which a lot of languages have them now, but they’re more deeply woven into the language where in most language, an expression, you evaluate an expression and it produces a value. That’s what expressions are. Expressions are things that produce a value.

In Icon, every expression can produce a value or no values or multiple values. So it’s like, every expression is returning a generator. And then an expression that evaluates to know values is what they consider failure. I’ve never actually written any Icon code, so my understanding of Icon is hand wavy at some point. So in a lot of cases where you’re in a context where you just want to evaluate an expression and get a value, that’s the behavior that you get, but in cases where it could be meaningful to not have a value arrive or to handle multiple values, the language will seamlessly do that.

And it does a bit of backtracking and repetition, almost Prolog or another logic language. I won’t do a good job of coming up with an example, but the idea is that in some places, when you… It looks you’re just evaluating expression, but what it’ll really do is it’ll keep trying as long as it gets values until it finds one that meets the criteria that it needs. And in other places, it looks you’re just evaluating an expression, if that expression fails to produce a value, imagine you’re reading from a file and the file doesn’t exist, you won’t get a string back, you’ll get the absence of a value. And if you do that in certain contexts, it understands to just like, “Okay, well, I’ll just short-circuit the rest of the expression because the file wasn’t there.”

So it has this short-circuiting and error handling seamlessly built into it, which was a really cool concept, but it was like a language dead end, no one’s using Icon anymore, and I don’t think there are any languages that are significantly derived from it.

Adam: It sounds do notation in high school. I don’t know if you’re familiar?

Bob: Yeah. It’s a bit like that. It’s like you get a little bit of programmatic control over the languages order of execution or short-circuiting. I’m probably doing a poor job of explaining it. I certainly understand it poorly. The interesting part of the anecdote is my day job now is I work on this programming language called Dart at Google. And one of the things that I’ve been working on the past year is the main framework that’s growing really quickly in Dart right now is this framework called Flutter. And it’s like a client application UI framework, so you’re building UI.

And it doesn’t use a separate template language, so your actual UI layout and stuff is all just vanilla Dart code, it’s like just expressions and constructor calls and stuff. And in practice, that means you’re doing a lot of collection literals like list literals. Like you have some widget that contains a list of buttons, that means you’re doing a constructor call that takes a list literal that has all the buttons inside the list. And that actually works out fairly fine. It’s surprising that C syntax can handle that stuff gracefully, but where it falls down is a lot of times, you actually do need some interesting conditional logic in there, where you want to make a widget that contains this list of buttons, but in some configurations, you don’t want to have those buttons

And when you’re in that situation, if you don’t have that conditional logic, the code looks pretty clean and declarative. It’s just like return constructor call for the outer widget, list literal right in place, all the child buttons, it has this nice textual tree shape where the shape of the code matches the shape of the UI. And it’s pretty nice. But once you need to do some conditional behavior, then you’re like, “Okay, well that list literal now, I need to hoist that out to a variable and then do some imperative code where it’s like, if it’s in this condition, add these child buttons to the list, otherwise do this.”

So now, the code doesn’t look declarative anymore, it reads bottom up instead of top down, and it’s kind of gross. So my job for the past year or so was, we need to figure out what language changes we can do to make that code more graceful, more declarative, make it nicer. And what we’re doing now is we basically, we added support for, if inside, list literals. Dart is a typical scripting language where it has square brackets for list literals like you do in Python and JavaScript. So right in the middle of that, you can do IF, and so you could do [A, B, if it’s Tuesday, see comedy].

And what that means is, on Tuesday, you get a list with A, B, C and D, and on other days, you just get A, B and D. So when in that condition doesn’t meet, then you don’t get that element, which is fairly straight forward, fairly intuitive. What’s interesting is, that looks a lot like languages like Ruby, where everything is an expression.

Adam: Yeah. It’s like IF expression, you can evaluate it.

Bob: Right. It’s like an IF expression, but it’s not because in that example, when it’s not Tuesday, you don’t get a list of A, B, NULL, D. So a list in IF expression always produces a value, because that’s what an expression does, an expression always produces a value. But that’s not what we want. What we want is this conditional thing that can sometimes produce a value or sometimes produce no values at all. And we also added a spread syntax, which lets you unpack a list into another list, which means also we have the syntax that lets you produce multiple values right in place, not an object containing multiple values, but the spread set of them.

And that sounds an awful lot like what Icon does. So in Icon, these things that are expression-like, can produce zero, one, multiple values. So I took that and I was like, “That’s actually a great solution for what we need here.” We need something that you can put inside a list literal that’s like an expression, but not really because it can produce an arbitrary number of values and they just get interpolated right in place. So it’s a really fun opportunity to take some random bit of programming lore from 30 years ago and be like, “Oh, this is a great solution for this problem that I have right now today.|

Adam: That’s awesome. I read a sci-fi book before where programmers were called programmer archeologists because it was in the future, and there was basically any problem they could find, there was already an existing solution, they just had to dig through the eons of existing code and find something.

Bob: Stack Overflow?

Adam: Yeah.

Bob: It does feel that a lot of times.

Missing Programming Language Features - Multi-Methods

Adam: What programming language feature doesn’t exist yet? Or what language feature is the world missing or combination perhaps?

Bob: One of my pet features that I think is super cool and wish it was more popular is this thing called multimethods. I don’t know if you’ve ever heard that, I can talk about what multimethods are.

Adam: I heard you say it earlier. Why don’t you define it?

Bob: Yeah. They relate to polymorphism and virtual dispatch. It depends on which language tradition you’re coming from. So I will assume that you and most of the listeners are coming from an object-oriented background. So in an OOP language, when you call a method on some object, that method is polymorphic in that whatever object is to the left of the dot can have a different implementation of that method. And at runtime, it looks at the object, finds the right method based on that object type and then calls that method. So even though it looks textually, it looks like you’re just calling this method do stuff, that at runtime will dispatch to different actual concrete methods based on the type of the runtime type of the object on the left, going into pedantic detail here because the actual details get relevant.

In most object-oriented languages, that’s how it works with the receiver, the part of the method that is to the left of the methods name. But all the other arguments to the method, so if you call food.dostuff(bar, bars). Bar and bars don’t get involved in choosing which do stuff method to call. So those arguments, they just aren’t involved in the dispatch process at all. So they say the most object learning languages are single dispatch, which means there’s only a single argument that gets involved in the process of figuring out which actual method to call. And in most OOP languages, we make that object even syntactically look privileged by pulling it out of the argument list and putting it to the left of the method name.

So multimethods, don’t do that. Multimethods do multiple dispatch. So what that means is that all of the arguments to the method get involved at runtime in determining which actual concrete method body you select. So most statically-typed languages today like C# and C++, and Java support overloading. So you can have multiple methods with the same name, but different argument lists. But those overloads are still elected at compile time based on the static type of the arguments. So with multimethods, you can imagine something overloading, except those overloads are selected at runtime based on the actual object types.

That sounds super weird and obscure and theoretical, what’s interesting about it is if you’ve done a lot of object oriented programming, you have run into stuff that just feels weird and hard. One example is like, if you’ve ever written your own, like maybe you’ve done a little graphics library or something, and you’ve written your own point or vector class, it’s nice to have that support like arithmetic operators. So you can add a vector and add two vectors and stuff like that. Most languages do support a certain amount of operator overloading. You can imagine being able to scale a vector too, so you want to be able to scale its magnitude by a number.

So you want to be able to support the multiply operator on vectors and numbers. And what’s interesting is that in some object-oriented languages, it’s actually hard to do that well. You can define the times operator on your vector class to take a number on the right, but sometimes it’s hard to define the times operator on the number of class to take a vector on the left or take a vector on the right. So there’s this weird asymmetry, where the left-hand side gets treated specially by the language, multimethods give you a graceful way to do that. You can just say, “I’m just going to define a new star method that takes a vector and an int, and a star method that takes an intent vector.”

And there was already a star method that takes an int and an int, and that’s fine. And these are all totally separate methods. And then at runtime, it will pick the right one.

Adam: It’s like pattern matching, but across a bunch of methods.

Bob: Yeah. It’s exactly what it is. So my hobby language, Magpie, what it did was it took pattern matching from languages like SML and use that same structure for defining multimethods. So you can think of a set of multimethods as being an implicit series of pattern matches that it would assemble together itself. A cool feature. I do a decent amount of game programming and this comes up a lot in games because a lot of the game is entities interacting with each other. Think about collision detection where you have different kinds of entities do different things when they run into each other, maybe two enemies, they don’t touch each other when they collide because you don’t want to deal with friendly fire.

But if the hero collides with an enemy, maybe that’s an attack. And if an enemy collides with the hero, that’s like a parry. And so you have all these pairs of objects and you wanted to find how they interact. That’s really difficult to do in a single dispatch language, but in a language with multiple dispatch, with multimethods, you can just walk all those cases and say, “For this pair, do this. For this pair, to do this.” And the language will just pick the right one at runtime based on the actual objects. It’s really cool.

The Expression Problem and Game Programming

Adam: Is this what they call the expression problem? Is that the same thing?

Bob: Yeah. So your expression problem is part of that. So you can think of multimethods as being a way to solve the expression problem.

Adam: I think like C# and Scala, you can add extension methods to things, is it?

Bob: Yeah. You can add extension methods. C#, you can add new operators where the left hand side is a type that you don’t control, but all of those are all statically dispatched. So it works because like a compile time, it can configure out which one you’re trying to call. Multimethods are a little more interesting because they let you do the actual runtime dispatch. For example, in my case, talking about a game where you’re doing collision detection, you don’t know it compile time when two entities collide, what kind of entity they are. That’s the whole point is like, it’s the game’s interaction at runtime. So a multimethod would let you call the right method based on what they actually are.

Adam: Yeah. Very cool.

Bob: Multimethods has been my pet feature for a long time, and then Julia came out and that is one of Julia’s core features. So when Julia came out, I was like, “Man, that looks awesome.” And my half-baked language Magpie is not going anywhere. So I’m just going to consider them having solved this problem, and I don’t need to worry about it anymore.

Adam: Do you think that we have too many programming languages or not enough? Is there more solutions space to explore?

More Languages?

Bob: I don’t think we have too many programming languages, but I just like them a lot. So I’m not an unbiased person to ask that question. I generally think that the software field is growing in terms of the number of different domains that we touch and the different kinds of problems we solve. So I think it’s reasonable for this set of tools that we use to grow over time. So from that perspective, I think we should have a variety of languages because the best language for writing an operating system is probably not the best language for writing a web application.

Adam: Yeah. I feel like there’s two pressures with languages. One is, the table stakes for language are really high, people expect some library system that you can import things

Bob: Yeah. They get higher over time too.

Adam: Yeah. And they expect so much from a standard library, and they expect, I don’t know, this and that packaging systems, but there’s another opposing trend I’ll say which is, at my work, everything I ship is like some service that runs inside Kubernetes with some load balancer in front of it. In fact, if there’s not something in a standard library, it may be available as an external service, so your standard library could actually be smaller for some language if the things you need to call aren’t libraries, but external calls.

Bob: Yeah. That’s interesting. You’re definitely right that over time, the table stakes go up for what it takes to make a language that people are willing to use. This is something we experienced firsthand on Dart where we’re like, “Okay, we need to have an IDE, and syntax highlighters, and the library ecosystem, and a package manager, and format, and all this stuff before. The surface areas are really large, so from that perspective, it’s harder to get in the game now than it used to be, but you’re also right that we’ve moved to a model that enables polyglot programming more than it used to in the past.

In the past, decade or so where, I don’t know if it’s just because performance has gotten good enough, we can afford the overhead of RPCs, and that means that it’s lower cost to have parts of your system written in different languages. And that gives you a way to incrementally adopt a new language and to take advantage of libraries in other languages that are already out there. And to some degree, maybe that does lower the on-ramp cost, which is for me, someone who’s excited about seeing new languages, is a good thing.

Adam: Yeah, definitely. There’s benefits too depending on something via RPC versus a library, because you don’t have to worry so much about giant trees of dependencies. Or maybe you’re just externalizing that, I’m not sure now that I think about it.

Bob: Yeah. It’s like any abstraction layer, you get some benefit from, maybe it’s lowering the cognitive load of what you need to know to interact with that system, but if that abstraction prevents you from knowing things that it turns out you practically do need to know, then that ends up becoming a regretful abstraction.

The Dart Programming Language

Adam: Yeah. You mentioned you work on the Dart language, in this book, you build this locks language, I found that you had another language called Wren and now you’ve mentioned one called Magpie.

Bob: So many languages.

Adam: How does the Dart implementation differ from this book like an approach?

Bob: Pretty wildly different. The Dart team is a big, giant industrial scale operation. I’ve done bits and pieces of work on some of the parts of the actual language implementation, but I mostly don’t work on the language implementations for Dart. We have a couple of compilers to JavaScript, an IDE and then VM that jets and compiles to machine code. And I spent some time working on one of the JavaScript compilers, but I mostly don’t work on the language implementations, but all of those are a lot more, I don’t want to say sophisticated because that has a moral character, but those are bigger and more complex than what we do in the book.

But if you were to read the book and implement locks and implement all the various pieces of things and then go over to the Dart implementation and start waiting through these millions of lines of code, if I did my job right, you should still find stuff that looks familiar. The parser may be 10,000 lines instead of 200 lines, but you can still say like, “Oh yeah, that’s a parser. I can tell that it’s doing recursive descent there. I can see where it’s handling operator precedence.” The backend is a lot more sophisticated on these because in the book, we go through lexing and parsing and then a bit of internal representation.

And we do compiling to byte code, but there’s not a lot of complicated backend optimizations and we’re not compiling directly to machine code. So there’s a whole bunch of techniques there that we don’t go into. So if you poke around in the Dart implementations for that side of things, then it’s like, “Okay, I don’t know what’s going on anymore.” And part of that because like, I don’t know that stuff or at least I don’t know it yet, so I wouldn’t be the right person to write that book.

Adam: Well, I think it’s one reason the book is so valuable because you could say, “Yeah, if you want to learn how to write a compiler, go look at GCC, but I don’t think it’s going to get you anywhere, it’s just an insurmountable mountain.”

Bob: Yeah. It’s overwhelming. And even to some degree, just the programming language textbooks. The Dragon Book… Actually I like the Dragon Book, but it’s a lot to take in at once, and it’s not clear. It starts off pretty theory heavy, so it’s hard to know what to focus on, what matters, what to care about. It’s also impenetrable in some ways. So my hope with my book is… My intent is for it to not be the last programming language book you read, but hopefully, it’s a good first one for you to read.

Adam: Oh, that’s great. And I think it’s probably a great place to end it. I have a lot of more questions like you’re compiling to… It’s an interpreter, but it compiles to byte code, and how do you do garbage collection, but maybe let’s save those for some other time.

Bob: Yeah. Maybe so.

Adam: Thank you for joining me, Bob. It’s been great.

Bob: Awesome. Thanks, Adam.

Adam: That was the show. I hope you enjoyed it. That Sudoku Solver solution was by Peter Norvig and it was something we were discussing in the Slack channel a little while ago. Comparing it to Ron Jeffries’ solution is super interesting, and maybe shows some of the limits of test-driven development. I think it also shows how valuable it can be to be familiar with existing solutions. The reason Ron Jeffries didn’t arrive at a solution was just he didn’t know about constraint propagation.

Multimethods was super interesting as well, and I’m wondering if there is a type class-based solution to multimethods. If you know, let me know. Until next time.

CORECURSIVE #032

How to Build a Programming Language

And Why With Bob Nystrom

Transcript