CoRecursive #037

Learning About Compilers

and Bytecode With Thorsten Ball

What do compilers do? What is the runtime of a language? What does it mean to compile something down to bytecode and what executes the bytecode. Throsten Ball answers these questions in this interview with Adam.

Transcript

A Speech in Two Languages

Thorsten: A compiler is basically an optimization of an interpreter that there’s theoretically speaking. And again, I hope that nobody listening falls off the chair. There is no difference in their capabilities. The example I always use is imagine that the two of us, we speak English and we have a friend that only speaks … I speak German, so my friend speaks only German, but he doesn’t speak English. And you were saying something to me and I want to translate it for him. So what I could do is after every sentence that you tell me, I could take his sentence translate it in my head and then say to my friend, “Oh, he just said this,” or “He just said that.” Or what I could do … This would be interpretation. What I could instead do would be to just listen to what you have to say paragraph wise, or maybe you have prepared a whole speech. I could listen to the whole speech and write down the German translation and then hand it to my friend and just say, “Read this. This is now German. You can understand this.”

And then there wouldn’t be this interpretation step because I just translated. I just compiled what you said into this other language. And now there is no overhead of me interpreting all the time. That’s a really simple example, but that’s a gist of it. And to be honest, as soon as you start getting into the real world concerning compilers and interpreters, the lines get blurry.

Adam: Hello, this is out of Adam Gordon Bell. Join me as I learn about building software. This is CoRecursive.

Today I talked to Thorsten Ball. He wrote this book called Building a Compiler in Go. We talk about virtual machines and bytecode and compilers, basically things that were always just kind of magic to me. This interview is fun and it’s casual. And we hit on a lot of interesting low-level topics. Like how does CPUs work and why does Java and C-sharp target a virtual machine based runtime. We also talk about how do you write readable code? Like so readable that it can stand alone in a book format. Thorsten has the answers for all of these questions and because he comes from the perspective of a web developer, he’s really good at explaining these things in a way that I can understand.

Thorsten, I’m pretty excited to have you here because I really liked your first book. And now you have … You’ve had a second book for a while actually, haven’t you? But I just started reading the second book.

Thorsten: Yeah. Thanks. It’s been out since August, like a year now, basically, August last year.

Why Learn Something You’ll Never Build?

Adam: And this is our second interview I previously interviewed you on SE Radio. So I think I’m on a theme here. So I interviewed you on Software Engineering Radio about interpreters, then I talked to Bob Nystrom about interpreters and so this is the third installment. So I want to talk about compilers. So your book is called Building a Compiler in Go. So my first question, I guess, is like, why build a compiler?

Thorsten: Because it’s fun. Because compilers are cool, fascinating, mysterious, powerful. I don’t know. That’s what drew me to compilers. But I think in the practical sense, there’s two answers. The first one is, if you have to build a programming language and you’re working on a team that builds a programming language, then of course you should know or learn how to build a compiler.

And for me, the actual reason was to learn, to basically get rid of a huge blind spot that I had. As a college dropout, self-taught developer, I never had visited a compiler course. So I never learned this. And this was always this black spot for me, like this mysterious thing. And everybody speaks with a lot of respect when talking about compilers and I wanted to learn how they work. And then I read a bunch of blog posts actually that recommended that it’s a really valuable thing to do, because if you learn how compilers work, you actually learn how a lot of things work in other tools. That means you learn how parsing works and you learn what Alexa is. You learn what an AST is, an Abstract Syntax Tree. You learn what an immediate representation is, IR, like, let’s say another form of AST. And you also learn how computers work kind of, because you have to think about what a compiler does, which is translating from one language into the other, that gives you a lot of context.

And that’s the main thing, like learning for the sake of learning. That’s why I did it, but there’s also a really practical aspect to it. And that is … I kind of touched upon this just now, but if you learn how to build a compiler or learn about compilers, you’ll see a lot of different things in a different light in your day-to-day work. Like you recognize these things that you built previously or that you need to build now, you now recognize that, “Oh, that is actually a parser. Oh, that is actually a form of a Immediate Representation.” And you also gain a lot of understanding of how languages work, which helps tremendously if you’re debugging, profiling, if you’re doing performance optimizations.

And this was two weeks ago, actually when me and two colleagues we were profiling and doing performance optimizations on a database that we use. And while looking at the profiling data, and this was a program written in Go, one of my colleagues said, “You know what? Like you need to know about how the runtime works, how the language works. If you want to make any sense of what you’re looking at.” And I feel that all the time, it might be this, I don’t know what the word for it is, but as soon as you learn about a concept, you can see it everywhere and everything is in a different light. And you think, “Oh, how could I have known this without knowing this other thing that I just learned about?” But I do feel that there is a great value in learning about compilers and programming language, even if you’ve never built an actual general purpose programming language.

Adam: That’s very cool. What was the example where you needed to understand how the runtime works?

Thorsten: It’s an open source project called Zoekt built by Google and it’s an database that indexes source code. And I work at SourceGraph and we index source code a lot because we’re building a search engine and code intelligence platform for source code. So we use Zoekt, and we also extend it for our purposes and we had to analyze and profile it. And then it’s like, you’re looking at the profiling there, you’re looking at flame graphs, you’re looking at graphs. That means you have to understand what is a stack? What is a call stack? If you don’t know what to call stack is it’s really tough to make sense of flame graphs. And if you don’t know what a garbage collector is, it makes a lot harder to understand what the runtime … What is it called [mullet GC call 00:06:54] that go has. This is basically what shows up as the GC in your profiling thing.

And then if you go deeper and try to understand what is actually happening, then you kind of start reasoning about, “Oh wait, we allocating so much memory here. This is why this profile is actually dominated by the garbage collector.” It’s not about speed. We’re not slow, and runtime is not slow. It’s just, the runtime has so much stuff to do because we produce so much memory garbage that it has to collect. And it’s this reasoning about and understanding of what is actually happening that allows you to then take a step back and draw the right, hopefully right conclusion about what to optimize and what not to optimize.

The Runtime That Barely Shows Up

Adam: Yeah. Because like certain things seem like magic to me. And I guess that since I started looking through your book, that’s one thing that always seemed magical. What does a compiler do? Or what does a runtime do? And I guess it’s not as magical as maybe you think, right? Like it’s actually fairly simple standard engineering stuff, right?

Thorsten: I was just going to say it is magical. To be honest, if I look at, let’s say the Go runtime to me, it’s still magical in how fast it is. Like there’s your program and you optimize it for speed and you want it to be fast, but then there’s also the runtime that is still faster. It’s the thing that still has to keep up with the thing that you’re doing. It just shows up barely in the profiles, because it just keeps collecting your memory and allocating stuff. And for example, a runtime also has to do scheduling. So if you have green threat, for example, as Go does, which are not processor threats or operating system threats, but a virtual construct mapped onto them, then the scheduler has to actually put them and schedule them onto actual threats. And if you think about it, that’s kind of amazing.

Like this is all happening, you’re trying to make your program go as fast as possible. And the overhead of the schedule is basically nil. It is there but you can’t ignore it because it’s just so fast. You trust it to do the right thing. And for me, that is actually still fascinating, magical.

Adam: No, that’s a good point. I feel like though, maybe I’m naive because I haven’t looked at the Go runtime or very many runtimes at all, but I’ve looked at the kind of example you built in your book. And I feel like that there’s just a lot of engineering hours between these two, right?

Thorsten: Yeah.

Adam: Like you have the basic structure and then everything else is just time, right?

Thorsten: I thought about this a lot, actually in the past few weeks that building a language or a runtime is a rather straightforward thing. And most of the complexity comes from making it faster, more efficient, more performance. You can build a Java interpreter or something. And I don’t know, let’s ignore all the special cases and all of that stuff, but it’s not a lot of code. If you want to make it scale, then it’s a ton of engineering work and a ton of code.

And if you look at, for example, Google’s V8, the JavaScript engine, that is insane. Java script itself is a fairly simple language, I would say. Nowadays, there’s a lot of stuff added onto it, but it’s still a fairly simple language compared to let’s say C++ or Rust or what have you. But if you look at V8, this is crazy. There’s so much work put into it, and there’s so much engineering skill visible. If you just dip your toes in it, it’s mind-blowing. They have multiple interpreters, they have multiple compilers in it. They have so many optimizations and it’s all this simple thing of let’s make JavaScript faster.

People run JavaScript on their pages. Let’s analyze how to write the Java script and let’s make that as fast as we can. And to me, it sounds like a lot of fun, but it’s also, it takes a lot of skill.

Adam: Yeah. Coming at this from a naive perspective. It sounds great that the use case is so simple. I feel like I deal with a lot of things where it’s like, there’s complicated requirements and then this is like, just make it faster. I mean, squeezing that Speedo is a challenge. But it’s like pure engineering, right? Rather than like … So how big does your compiler implementation end up?

Thorsten: I think at the end of the second book we at around 6K lines of code in total, including the test. So it’s tiny, tiny compared-

Adam: Wow.

Thorsten: … To the other. I actually looked it up before this and the compiler comes in at around 1,700, JVM is 1,100. And then the whole front end, which is the thing that takes in basically strings and turns it into a data structure, the parser, the lexer, including the AST, that’s nearly 2000 lines, including tests.

Adam: That’s crazy.

Thorsten: Yeah. It’s fine.

Adam: It’s like crazy small, I guess.

Thorsten: Yeah. Yeah. But if you write a book, you’ll learn to shed weight basically, like every line counts. And I personally think writing these two books, it sounds grand as it made me a better programmer, but it’s really this, how can I get rid of this line? How can I make this even simpler? Do I need this function? Do I need this method? The added requirement is, it has to be readable. It has to be easy to understand. So you cannot do the officication thing where you go, “No, no, no. Let’s just put it in one line and use a ternary operator,” or whatever, and make it as small and tiny as possible. Let’s just cut the variable names. No, it has to be really simple, easy to understand and as small as possible.

Code That Reads Like a Book

That’s also why I, for example, skipped a bunch of features that a lot of people have asked for, for example, For loops, While loops, why loops, I feel that I could have added this, but it doesn’t add a new thing because once you have functions and once you have conditionals, that means, that you can conditionally execute code and jump around, then you basically have all the tools available to build loops.

Adam: It’s interesting. In some ways, you’re describing like the truest readability test. Which is like, instead of focusing on lines of code, it’s like, no, can somebody pick this up in a book and read it?

Thorsten: Yeah. That is so challenging. And I don’t want to sound smart, but a lot of articles or blog posts, what I often see is when people use code examples, they have variable names that refer to something else. If you basically rip out a [inaudible 00:13:23], but out of your existing code base from work, it refers to, I don’t know, administrator or user, which is easy to understand. Everybody knows what this is, but oftentimes you read through code samples and you go, “Why is it called like this? Am I missing something? Is something happening before, did I miss another code?” Stuff like this, and all of this, you need to think about like, does every example build upon the previous one, does every snippet of code standalone, can a person look at this and understand it. And that was really hard, like really, really hard.

Adam: Have you ever played around with the Literate code or whatever?

Thorsten: Literate code is this idea that Donald’s Knuth invented, I think so, right?

Adam: Yeah.

Thorsten: Isn’t that why he invented Lex, I guess or Tex?

Adam: It could be, it could be. Yeah.

Thorsten: The idea that you basically write prose as in comments and then just inject the code in between. And this is kind of what I do in a book. The book starts with zero lines of code, and then you explain, let’s write down the first function in this file. And then you do this. In Literate program and would be, if you could then take the prose and the code and hit compile, and it would take the code and put it in the right places and then execute the program. So I could have used the tools that exist for Literate programming as in Org mode, for example, and Emacs that allows you to do this and then use that to basically-

Adam: [crosstalk 00:14:47] book.

Thorsten: … Produce my book. But I guess that would also be … That could be a rabbit hole that you never get out of.

Adam: So I think like we’re way off on a tangent, but like the thing that requires, I think the Literate code is that your language, it doesn’t have any constraints on ordering basically. Because to do the true literate code, you want to be able to introduce things in the order that makes sense to the-

Thorsten: Exactly yeah.

Adam: … Reader, right?

Thorsten: That is actually really hard. The order in which you introduce things that is, I’ve revised several chapters of the book when it’s like, “Oh, this seems like a simple feature that I can build in chapter two.” And then it’s like, “Oh, in order to build that feature, we have to explain this first. And in order to explain this, we have to do this.” So you try to find your way to the top of this or the bottom of this tree and try to find out what is really the starting point so that you can explain the whole thing from first principles. You need to have every thing built on top of the other thing. And then at the end, the compiler comes out and that was really hard.

And Bob Nystrom, who you mentioned, he also said on Twitter, that that is one of the hardest things about his book too. And I think Steve Klabnik who wrote the Rust book, he also said that this was a challenging part, especially in Rust, where Rust is really complex.

Turning Strings Into Trees

Adam: And you mentioned building on things like, so this book builds on your previous book and you mentioned a little bit, the front end of a compiler. What’s the front end of a compiler?

Thorsten: The front end is in my wording, the part of the compiler that takes in the source code and parses it and turns it into a form that is easy to work with in rest of the compiler. And that form is usually a tree because if you look at source code and if you look really closely, it’s actually structured like a tree. If you look at indentation, for example, if you know about tree structures, you can see how you can represent this as a tree. As in, hey, and if conditional has conditioned, and it has a child node, that is the consequence or the alternative, for example, and then you can see how you construct it as a tree.

And the front-end texts in the source code, a string or a file of full of strings, and then runs it through Alexa, which turns the source code into so-called tokens or Lexemes. And that is just think about it as the things that you want to work with, stripped of all the details. So instead of working with characters, you work with tokens, for example, instead of saying, Oh, this is L-E-T for let, for example, in a Let statement, you have a let token, or you have an [inaudible 00:17:31] token that is one, two, three, four. Instead of having the single characters. Gets rid of all the white space and stuff like this.

Adam: Like if you start with a string, let X equals seven and then your tokenizer turns it into like a let token, an X, it’s probably a token of type variable or with value equals X and then-

Thorsten: Equals sign.

Adam: Yeah.

Thorsten: And then a number with the value of seven and then semi-colon or whatever you have. And yeah, that’s basically it. So you get out of let X equals seven, you get four tokens, for example. And then you take these that’s what the lexer does, and then you’d put them into the parser and the parser that turns this into an AST or a syntax tree. You don’t have to go into the details here, but a tree structure that represents this. And for example, for the letter X equals seven, this could be a tree node called Let statement that has one field called identifier, which would point to the token X and then one field called value, which it binds to the identifier. And that would point to the expression that is seven.

And then after that, you can do stuff and turn this AST into a cleaner form or whatever you want to do. And then it goes through the rest of the compiler.

Just a Dictionary and a Switch

Adam: So like with the previous book and with how far I got into Bob’s book, like you basically, once you have that structure, then you’re like, okay, let’s run it. So you take that Let statement and you have like a case you’re like, Oh, case it’s a Let statement then grab out the name and grab what the value and throw that in a dictionary. And then like onto the next, right?

Thorsten: Yeah, that is a really succinct description of what interpretation means. And you cannot find that in any book like this microphone that you’re seeing here, this is on top of multiple compiler books and bonafide any paragraph in it that says, and then you just take a dictionary and put the val- … This is really good description because yeah, this is what you do. If you run it, like if you interpret it, you take the AST, you put it in a function called Eval. Most of the times for evaluate, then you say switch, okay. So whatever your language has. My current node, what is it? Is it a Let statement? Okay. If it’s a Let statement, let’s open a dictionary and say X equals seven, and then let’s interpret the next statement. If the next statement is X plus two, then we say, Oh, X, I know X, let’s look in the dictionary and take seven from the dictionary and add two to it. And then the wrapping statement might be print or puts that means, now we have to output nine or something. And that’s interpretation.

Adam: Yeah. It is very succinct. I think. We have just gotten rid of a whole bunch of books. We just replace them.

Thorsten: Yeah, just the last two minutes put it for free online. People can learn a ton.

Adam: So why do we have compilers and stuff? Why a second book even?

Thorsten: Again, performance, efficiency, optimizations. A compiler is basically an optimization of an interpreter. And I guess a lot of people that know about compiler, they just get goosebumps, “No, no, no. You can say that.” But you could theoretically do all of the things that you do with an interpreter, with a compiler or vice versa that there’s theoretically speaking. And again, I hope that nobody listening falls off the chair, there is no difference in their capabilities. But if you take a compiler, for example, you produce an intermediate result, you translate the code and translate it into another language. And that allows you to maybe make it faster and get rid of stuff. And if you think about the example that we just mentioned, where it says, let X equals seven and then X plus two, let’s say that’s in a loop for example. In the code, it says, run this, I don’t know, 500 times let X equal whatever.

Then every time you run the loop in your interpreter, you again, have to look at every node and say, “Oh, is this a Let statement is this? I don’t know, arithmetic expression or something? Is this an identifier? Is this a For loop?” Whatever. You have to [crosstalk 00:21:44] every time.

And then you can start doing optimizations and you can add caching to it and say, this AST node, actually it caches something that I can then run again, because I know what it is now. And then as soon as you start getting into it, you realize, “Wait a second, I’m just going for a program and look at it and analyze it and translate it into another form that can be executed.” So now you’re already doing some form of translation. And compilation is then taking this further and saying, “I’m going to look at this program and output it in another language.” And if I output it in another language, I’m just going to get rid of all the stuff that I don’t need. And I also have to analyze it only once. I don’t have to look at everything all the time, especially not when you then run the program, that’s the main thing.

Paying the Cost Before You Run

Compilers can be slow. I grew up running Linux and I had Gentoo running. So had to let my computer run for the whole night to compile my window manager. So I do know how slow compilers can be. And the reason they’re slow is because you’re paying the upfront cost that an interpreter pays at runtime. You’re paying a little bit more because you’re doing additional optimizations, but you’re shifting the cost. And you’re shifting the cost to a point in time, but you’re happy to pay it. But if you want to execute your program, you don’t want to pay this cost, you want to just run it as fast as possible.

Adam: And you only have to compile it once and you run it like a whole bunch of times, right?

Thorsten: Exactly. Yeah.

Adam: In theory.

Thorsten: Exactly. Just the example I always use is imagine that the two of us, we speak English and we have a friend that only speaks … I speak German. So my friend speaks only German, but he doesn’t speak English. And you’re saying something to me and I want to translate it for him. So what I could do is after every sentence that you told me, I could take a sentence, translate it in my head and then say to my friend, “Oh, he just this, or “He just said that.” Or what I could do, this would be interpretation. What I could instead do would be to just listen to what you have to say paragraph wise, or maybe you have prepared a whole speech. I could listen to the whole speech and write down the German translation and then hand it to my friend and just say, “Read this. This is now German. You can understand this.” And then there wouldn’t be this interpretation step because I just translated, I just compiled what you said into this other language. And now there is no overhead of me interpreting all the time.

That’s a really simple example, but that’s a gist of it. And to be honest, as soon as you start getting into the real world concerning compilers and interpreters, the lines get blurry. There’s is no … It’s blurry.

Adam: I like this metaphor. The advantage of this metaphor would be like, if you have a whole bunch of German friends, because you could hand the paper out to each of them, right?

Thorsten: Exactly.

Adam: Yeah. Because I’m thinking like if I had some optimizations I could do to Ruby code, so I’m going to start up my Rails app. And first I want to go through and find all these optimizations, but let’s say the optimizations take minutes to find, right?

Thorsten: Yeah.

Adam: If Ruby’s not compiled, then maybe the trade is not worth it because my startup time becomes so slow, right?

Thorsten: The whole point, as in, when are you prepared to pay the cost for making your program run faster?

How Ruby Got a Secret Compiler

Adam: At some point Ruby worked like we were describing. It just executed statements. And then at some point this could be wrong, but to my Googling at some point Ruby admitted C code and then executed it.

Thorsten: I think this is actually recently. So what happened is Ruby is now in version … It’s in 2.6 and a few years ago with Ruby 1.8, it worked exactly like we just said, it basically went through Ruby source code and said, “Oh, you have a class here, class called Fubar.” So I’m going to say Fubar the class in my dictionary. And then you want to say fubar.new. … I don’t know, print first name. I’m going to look in the dictionary. What is fubar? And it parsed the input, it build an AST and then interpreted the AST.

And then with Ruby 1.9, which is also, this was around I think 2012 or ‘11, ‘12, ;13, something run out and they switched to a virtual machine. That means we can go into detail, but for now let’s just say virtual machine is a different form of interpreter, but a much fast one. And what they did is they compiled the Ruby AST into something called bytecode, which is what the virtual machine runs on. And then that contains a few optimizations and then have the VM execute that. And that was seven years ago, roundabout. And what they have working on right now is what you just said, that they emit C code and then run that.

And that is actually incredibly interesting to me when I first saw this, because this is called JIT compilation and a JIT compilation is set for just-in-time compilation, which means instead of what we said, like, you compile by writing down the translation and then you can run it a bunch of time later on, which is also what we do when we compile a program. For example, let’s say you write a C program, you compile it, you put it in a compiler that you get out of the binary and then you can run it, I don’t know, hours, days, weeks, months, years later. JIT compilation means you feed your program to the programming language, and it says only when I need to execute this part of the program, I’m going to compile it. And then I’m going to execute the output of that compilation process, the result. And that is just in time, just when I need to execute it.

Compiling Just in Time

And the JVM, the Java Virtual Machine was one of the first big industrial implementation of program languages that did this. And actually this was seen as a joke. Like you’re never going to get this fast. That’s not going to happen. But as it turns out now that there’s a lot of time that the JVM can use to first run your program slowly and then compiles it in the background and then executes it.

What Ruby is now doing is they’re doing this JIT compilation by basically in their interpreter, opening a file and writing in the C code that you would run if you were to interpret the code in your Ruby interpreter, then they take that file, pass it to the C compiler, produce a dynamic library, like an object file, load that into memory and then just call the function they just wrote into a file, just compile and just load it into memory. And that gives you-

Adam: It doesn’t sound like it.

Thorsten: It sounds really, do it yourself at home, like a home-built solution, but it actually provides a speed up. You pay this cost only once and you get all of the benefits that the C compiler that you use offers.

Adam: It seems like the takeaway is that this is all a mess, like compilers versus interpreters. So we just described several different ways. Ruby is compiled, but it’s generally considered to be interpreted. JVM considered compiled, but I think even the JVM does like it’ll back up sometimes like it will compile and then uncompile or something, it will go back to the bytecode version.

Thorsten: That might be. Yeah, yeah, yeah. I know what you mean. I totally agree, 100%, the lines of blurry. And it’s this the more, you know the less you’re able to basically give a straight up answer. Which it’s really sad asking like, “Hey, what’s the difference between an interpreter and a compiler.” And you start out by saying, “Hey, an interpreter runs your program as soon as you start interpreter and feed it a program. You’re compiler translates it. And then later on you can run it. But in the actual real world, things are converging and interpreters have compilers built in. Compilers actually interpret stuff and then compile the result. And one thing that the compiler can do for example, is to-

Adam: Like a constant?

Thorsten: Exactly. Exactly so-

Adam: If I am like that X equals five times three.[crosstalk 00:29:48].

Thorsten: Exactly. Yeah. Exactly. So I’m pretty sure every C compiler that is used in production does this soil, if you ride in your C program into X equals 99 plus one, then it’s not going to do that calculation. When the program is running, it’s just going to put the 100 in there and say X equals 100. And that is also a form of interpretation. It’s an optimization, but it’s also interpreting the input. And then-

Adam: That’s interesting.

Thorsten: Yeah. Then if you look at what I mentioned earlier, a V8, they have, if I remember correctly, it’s changing all the time, but they have multiple interpreters. If you start a JavaScript program, they start out with one interpreter that runs your thing immediately. Because if you think about it, what you as a user want is you want to open a website and have the JavaScript run as fast as possible. So that is a hard requirement.

V8’s Tower of Compilers

So they run this, but at the same time in the background, they start a compiler that compiles the JavaScript into bytecode, which is the first form of compilation. And then they have their virtual machine running that takes the bytecode and executes it. But then the virtual machine too says, “Oh, I’ve now run this 10 times. And the user seems to be staying on this page for the foreseeable future. So I’m now going to optimize it.” And then it’s going to this other compiler and it’s going to compile it into native machine code.

Adam: It’s funny because a colleague was telling me about this and I probably have it wrong, but there’s even a layer below assembly, in the CPU, it can be doing these types of optimizations using microcode. And the level of abstractions is kind of insane.

Thorsten: Yeah. That is … You know the saying turtles all the way down? As in you’re staring into the abyss and it’s just only gets darker and deeper. But yeah, even the machine code that you admit, like let’s say you’re writing assembly language and you have your assembler and … I don’t know, adding two numbers and outputting the numbers, and then you compile that. Maybe stupid example because it’s such a simple thing. But you would think that your computer executes this, but then it turns out the computers nowadays or CPUs more specifically are so complex they’re systems of their own. I don’t want to say they have their own brain, but they have so much stuff building that they actually decide when and how to run your program.

So if you have a CPU with multiple codes, let’s say eight codes, then the CPU basically decides which code is running. And what it also does is just one speculative execution where the CPU basically looks at the code that you want to run. And let’s say your code is X equals get number from user. And then if X greater than, or equals 10 output fubar, else output, no, whatever. And the CPU will run this code and it will basically detect that there’s the If coming, right?

Adam: Mm-hmm (affirmative).

Thorsten: And then, okay, bad example was that it outputs stuff. But let’s say the thing that you want to do is just add other numbers, like no side effects, then the CPU will say, “Oh, there’s a 90% chance that this will be true. So I’m going to execute the true part before I even evaluated the actual condition, before I even look that the user input, I’m going to evaluate the true as if the user set it’s greater than 10. And then if the user doesn’t, I’m just going to throw away the results.

Adam: Because it’s so fast, it’s just like, I have extra time. So what’s the cost. Yeah. It’s funny for years we had a TiVo and I just switched cable companies had to get rid of the TiVo. It’s like a PVR. But TiVo, I don’t even know if they still exist. They’re very smart with taping things. And I remember 15 years ago we first got our TiVo like plugged it in, it’s like a Linux box with a hard drive and it just starts taping shows. Like it tapes the Ghostbusters movie. Because I don’t know, I got extra hard drive space and people like Ghostbusters. Right?

Thorsten: Yeah.

Adam: It’s like the CPUs, like I got nothing to do. So maybe we’ll hit the sift statement and I’ll just start executing things.

Thorsten: Yeah, exactly. The problem is the CPU has to be really smart about what it executes because it might have side effects.

From Trees Back to a Flat List

Adam: I’m going to bring it way back. So we did the front end, right?

Thorsten: Yeah.

Adam: So like what happens? So I have my AST. So in your book, what’s next? So I guess you’re saying, Hey, I want to make this more performance. So instead of executing statement, by statement, what do I do?

Thorsten: You compile it to another language, which means translating it to another language. That means we take in the AST that the parse have produced, and then we translate the AST to another language. And in the case of the book, that is a language called bytecode for a virtual machine. And just to give you a short explanation of virtual machine, a virtual machine is basically … And this is really again, sorry, simplified version, is a computer built in software, a CPU built in software.

We basically started this the wrong way. I was just going to say a CPU is really simple and what it does, you can’t really say that after what we just said, but the idea is simple. The CPU takes a instruction, a piece of program code from memory. It fetches it. That’s what it’s called. Then it decodes it. That means it looks at it and says, is this … I don’t know, an add instructions. Do I need to add a number? Or is this a sub instructions? Do I need to subtract numbers? And then once it does that, this is called the decoding step. It executes it. That means it talks to memory. It talks to the ALU or whatever, and executes the statement.

Now a virtual machine is a piece of software that has this in software. This is called the CPU cycle, the fetch-decode-execute-cycle and a virtual machine is doing the same thing in software. Just imagine a loop that says, Hey, give me a instruction, what is this instruction? Execute this instruction. Do the thing that the instruction says. That’s basically it. It’s a piece of software that does this. Fetch- decode-execute. It’s modeled after a CPU. That means it also has all of the advantages that a CPU has, which is universal computation, Turing, completeness, stuff like that.

A CPU Built in Software

And the machine code that the virtual machine is running on is often called bytecode, which are these instructions as an add one, add two whatever. And this is what we build in the book. We built a virtual machine that is an simplified form of a CPU that runs on bytecode. And the compiler takes the AST that the parse produces and turns it into bytecode and then feeds it to the virtual machine to execute it.

And this sounds like more work, is this faster? Like you translating it to another thing and then you have execute it again, even after you parse it. But even if we do this while running the program, or before we start running the program, the program gets faster. At the end of the second book, the benchmark is, this is the Fibonacci call, is actually three times faster without any other optimizations, without any profiling, without any memory optimizations, whatever. Just by changing the architecture from the interpreter to a virtual machine, it got three times faster. And the reason is that we touched on this just earlier that instead of going through the AST and saying, “Oh, is this a Let statement. Is this an identifier? Is this an expression? Is this a While loop? Is this a return statement?” Instead of doing this all the time and throwing away the result, recomputing this information all the time we compute it once and then, encode this information into bytecode, a really simple format that allows us to express the whole program, but in a much simpler language.

And then we feed it to the virtual machine who doesn’t even have to do the whole dance of, “Oh, what is a Let statement? What is this?” It doesn’t know about Let statements anymore. All it knows about is things like load the number on the stack, load another number on the stack, add these two together, safe to result in slot one, slot two. It’s a really, really simplified version of the program we had earlier where all the fluff is basically gone and it’s down to just the computation.

Adam: So you’ve made up another language that’s even simpler that you’ve mapped to is that one way of thinking of it?

Thorsten: Yeah, exactly. That’s what a compiler does, translating. And this simpler language, bytecode is a simpler language. It’s a machine language for a virtual machine. And let’s say you have a compiler that compiles natively, that means you don’t need any other thing, like a C compiler, it outputs a binary. You can run the binary, you can send it to your friend that also runs Linux, computer or Mac, and can also run it. What this compiler does is it takes the C language and translates it into machine language, which is the machine that your CPU and your operating system speak. And what we do in the book, in the compiler for the virtual machine is we translate our AST into the language that the virtual machine speaks, which is the bytecode that we defined.

Adam: Just the same as like Java works, right?

Thorsten: The day one version of the JVM, basically. Yeah, yeah. But yeah, this is the basic idea behind having a virtual machine.

Stack Machines vs Register Machines

Adam: So what happens if you have … Like, if I want to add two numbers together, what does that end up looking like?

Thorsten: In bytecode you mean?

Adam: Yeah.

Thorsten: There’s two categories, broad categories of virtual machines or computers in general, the one is stack machines and the other one is registered machines. And the difference is how these machines use memory. Stack machines say we do all of our computation in this part of memory that we call the stack and you push values on it and you pop values of it. And if you want to add two numbers together, you push the first number. Then you push the second number. Now you have two values on the stack, and then you say, add, and then we put the result on the stack, and then you can pop that off the stack and put it in somewhere else on a hard drive. I don’t know.

And a register machine has this additional region in memory that is addressable. That allows you to say, put this number in register A, put this number in register two, put this number in register three and there’s pros and cons of each approach, that’s a huge discussion, but generally speaking people are saying, stack machines are easier to build.

Adam: If I was writing something that compiled to X86 or something like it’s easy because … Maybe it’s not easy, but it can execute on the actual computer architecture. But in this case you have to build that virtual machine, right?

Thorsten: Yeah. But if I build a virtual machine in the book and I write a compiler for this virtual machine, and I compile my input programs for this virtual machine and the virtual machine itself is written in Go, which can be compiled for different architectures, AMD, X86, ARM, whatever you have 32-bit, 64-bit, Go already has that. And if I built my virtual machine in Go, I can run my virtual machine and all of these architectures, and I only have to compile my code for my virtual machine.

Now, if you compile to machine Go to native, then things get hairy. Because for example, I’m currently writing, actually in my spare time, I’m writing a scheme to X86 compiler, I’m following resources, and X86, you’re not going to write the machine code in binary. What you’re going to do is you’re going to emit assembly language and then assemble into machine code.

So you decide for an architecture, then you had to make the decision 32 or 64. Then you have to decide which assembler to target. And then also, even if you target the correct assembler, you also need to make sure that the architecture that you want to target is supported by this assembler. For example, I’m emitting currently X86, 32 bit code in assembly language for the GCC new compiler collection assembler. And the code that I emit is perfectly fine to compile on Linux, but not on Mac, right?

Adam: Yeah.

Thorsten: Because it’s different architectures. So then again, you have this, “Oh, no, it’s a different thing.” Even though both computers have like AMD 64s, I have the same assembler but it’s still different architecture. So it doesn’t work on both. And it’s just a lot of work. This is also … I think if you look at the biggest compilers or the most famous compilers, like GCC, LLVM, a lot of the code is in these so-called back-ends, which I didn’t expand yet. The backend of a competitor is the thing that actually takes the AST or another representation of the input source code and emits it into this other language, like outputs it, prints it, writes it to file.

Adam: So your backend admits your bytecode-

Thorsten: Exactly.

Adam: … But GCC emits.

Thorsten: GCC has a backend that basically fans out to multiple backends where it says, Oh, we’re running on free BSD. And this is a MIPS machine.

Adam: In your bytecode, how many instructions do you end up with?

Thorsten: I would say, I don’t know, 30 something, I guess that’s too much 20, maybe 20, 30. So, you have things like add, subtract, multiply, equals, equals not, jump, jump if equals, stuff like that. But that is tiny compared to real-world stuff. I think the JVM has 200 or something. There’s an upper limit basically. And that is the number of different instructions that you can encode in a byte. And because you want this to be as small as possible, you want the virtual machine to be able to say, to fetch and decode only one byte and be able to say, this is this instruction. If you had to fetch four bytes, to be able to say, oh, this is an add instruction, that would be catastrophic.

Why One Byte Changes Everything

Adam: Okay. I never thought of this before. So it’s called bytecode because it’s basically using a byte to store all the possible instructions?

Thorsten: Yeah. If you have a jump instruction that says jump to, you need to say where, where do you want to jump? And then you can say instruction 10, 15 to 100 whatever. And what you want, ideally, do I get the best performance, is you want this to be in one byte. So the machine only has to fetch one byte and let’s say use only four bits in a byte to encode the type of instruction, that means you have four bits left in which you can possibly encode the first operant, all right?

So you could have specialized versions of these instructions where you say, Oh, if it’s a jump instruction and we call this one, this is the instruction with the number 12, this is the small jump, which is able to jump to relative positions up to 128 which is what we can encode in four bits, then it’s just in one byte.” We take the first four bits, look at them and say, “Oh, it’s a jump, tiny,” whatever you want to call it. That means in the other four bytes is the target of this jump instruction. Then we don’t have to fetch another thing and have to decode it.

Adam: Wow. That’s where it gets messy.

Thorsten: If you think about it, this is where the performance gets one stuff like this. Because this loop, this fetching, decoding, this is basically all that the VM does. And your goal is to train that so down that the time that your program or that the VM and your program spend is only spend on your program, not in the overhead of the machine. Which brings us back to the beginning where we talked about to go run time and how it’s so magical that it’s basically just this 1% of execution time is spent in the runtime. And that’s your goal. You want the VM to disappear and only have your program at secure.

Adam: Before you rewind, when we were talking about like executing off the tree, so we would evolve by looking at our current node and figuring out what I should do and execute it. The VM implementation ends up similar, except with a much smaller instruction set.

Thorsten: Exactly.

Adam: I’ll grabbed the instruction, if it’s an ad, then I have to pop twice and then add them together and then push it back.

Thorsten: Yeah. Right. You’re 100%. Right. Which is also why sometimes people, and this is really not helpful to the whole thing. People say, VMs are bytecode interpreters.

Adam: Oh, okay.

Thorsten: So that is also another level of confusion added there because, they’re interpreting bytecode. And you can also say CPU is a machine code interpreter.

It All Comes Down to Simple Blocks

Adam: Yeah. That’s true. Yeah. That’s true. So your book is called Grading a Compiler, but also you have to write this implementation. You have to build a virtual machine that can execute the instructions of your compiler?

Thorsten: And that didn’t fit at time. Writing your compiler and a virtual machine in Go.

Adam: So one thing that if I think about it, logically, there’s something weird where like, so I take a string, I take a string, I turn it into a series of tokens, like a linear list, right?

Thorsten: Mm-hmm (affirmative).

Adam: I take that list, I turn it into a tree and I take that tree, I turn it back into a linear list of instructions, right?

Thorsten: Yeah. Yeah. But what you’re doing is called you’re lowering your program from one language to another. We all have this idea of a high level language and a low level language. And what you basically do is when you translate from your high level language to this bytecode thing is, you get rid of all the high level language parts.

Adam: It makes sense. Because if you think about it, the computer executes this very low level language, and then we’ve come up with all these ways to write things more human readable. And so like the compiler is the place where this mapping between the low level and the high level has to live.

Thorsten: Yeah. It’s funny that in … Let’s say you were programmer in the 60s and I wasn’t, but if you were, I’m pretty sure that you would be knowledgeable in the low level instructions that a computer is able to execute. And you could then, if you look at the other language like Lists, for example, which looks higher level, as in you have recursive function called first class functions, stuff like this, you can see how it’s made up out of these building blocks because you can then say, Oh, C for example, has For loops. Or this is just an if and it’s a conditional, they don’t even say if it’s a conditional and a jump instruction. I can see how that is build up from these smaller, simpler building blocks.

And nowadays all of us, basically that haven’t learned assembly language in college or at university, we’re doing the reverse process. We’re discovering like, “Oh, my high level language. That’s just made up out of these symbol building blocks.” And it’s a really humbling and fascinating experience to be honest. Like if you can see, “Oh wait, it all comes down to just this set of instructions.” And then you realize, “Oh wait, this is what Turing complete is about.” That you can express all of the programs that you can basically come up with in your work with the simple Turing machine, which is what a CPU or a computer is. And then mind blown.

Adam: Yeah. Because people don’t know this anymore. Like I certainly didn’t. I mean, some people do who’ve been around or who do C programming or device sort of stuff.

Thorsten: Yeah. If you do a hardware programming, I guess, you know this. And to be honest, I wouldn’t even go so far and say, you need to know this. I think it certainly doesn’t hurt to know this because, as I mentioned earlier, if you do performance optimizations, for example, you, at some point going to touch something beneath the thing you’re working on, and then it’s good to know how this lower level basically serves your higher level.

Adam: This virtual machine that you write, it’s a proxy for this go runtime or the JVM or whatever, right? It gives you a way to think about what’s happening.

They Couldn’t Build My Website

Thorsten: Yeah, Exactly. Yeah. I guess there’s cases where people, they stumble onto things where they say they did run a Java program. Then they look at the bytecode that has been compiled and that is being executed. And then they look at it and say, “Oh, there’s an additional instruction that’s not needed because you’re going to throw it as a result away anyway.”

Adam: I had an interview with Bryan Cantrill and he spent a long time writing operating systems probably still does. Because I’m following on Twitter. And sometimes he posts things where he’s this C code equals this assembly code for GCC. Why is this generating this assembly and this, that, and when Rust turn out faster than C, then he’s looking at the assembly and saying like, “What are they doing here?”

Thorsten: Yeah, exactly. Yeah.

Adam: Yeah. I think it’s crazy cool.

Thorsten: I was intimidated by this for a long time as in, Oh my God. These people know so, so much about all of this cool and fascinating stuff. And nowadays that I know a little bit more about this stuff. And also know some of these people, I can with confidence say that these people that do compile optimizations and assembly, whatever you have, they, for example, couldn’t build a website that looks as nice as the ones that I built. It’s a different set of skills. Programming is a vast last field. And it’s the same as I don’t have the slightest clue about graphics programming or game programming. Absolutely no idea. My background, my expertise is that platforms and web applications optimizing, scaling them, programming languages, but no clue about graphics programming.

Adam: So then does that mean your next book is going to be graphics programming?

Thorsten: No, I’m not super interested in graphics programming. I don’t know why. I really like the faceless things as in, I like compilers programming languages, web servers, databases, operating systems. I don’t know why, but it’s just fascinating to me. As in, this basically goes back to me being 15 and being told you can run a computer without attaching a monitor to it and without attaching a keyboard and it’s called a server and I can connect it. And ever since then, this has been my fascination. So yeah.

Fifteen Million Lines of GCC

Adam: So I think that you mentioned that the whole thing, including tests, you’re under 6,000 lines of code for your front-end, backend and everything?

Thorsten: Yeah.

Adam: How does that compare to the real world? Are there bigger compilers, smaller compilers?

Thorsten: A lot of, lot of bigger compilers. I actually looked this up when writing the book to have some numbers to back up the claim of all compilers are these huge complex beasts. And last I checked GCC has 15 million lines of code.

Adam: Wow.

Thorsten: That is a lot, a lot. That is crazy. And a lot of VM is not too far behind. I think if I remember correctly, it’s slightly smaller, but it’s still in the millions, even more than 10 million, I guess. As soon as you start adding real world stuff, for example, error reporting. Your parser needs to output things like, oh, in this file at this location, this line, this column, there was an error. I expected this. This is it. This sounds, but I don’t know. It depends on the audience who’s listening, but it might sound easy, but it’s actually like, this is the brittle stuff that you need to get right. And as we can all imagine, I guess it’s hard to get right. And then what we touched upon earlier, like the different backends for different architectures, that’s a lot of code optimizations. That’s basically this huge thing in the middle of a compiler that takes up a lot of complexity.

And optimizations are basically programs that take in a program and look at it and say, “How can I make this faster?” And they do this by either looking at tiny part or they need more information and look at the whole thing and then look at the tiny part again. And then they optimize the tiny part and have to make sure that it’s still correct. You cannot remove the stuff that’s actually important. Stuff like this correctness, error reporting, I guess you can imagine how complex it gets once you add all of the little details. That’s the upper end, I guess, of compilers now.

Adam: What’s the lower end, are there languages as small as monkey?

Thorsten: Yeah. Depends on your definition of language, but I guess you can write a compiler in … You can probably ride C compiler in 100 lines of C, but then it’s just for really, really tiny subset of C. It is a compiler as in it takes C and it outputs assembly language, for example, but it doesn’t compile the whole program where language C only tiny subset. It doesn’t treat the undefined behavior correctly, stuff like this.

I guess in the real world, one of the most famous small compilers is TCC the tiny C compiler by Fabrice Bellard. He’s the guy who also built FFmpeg. And he also built Linux JS. This was, I don’t know, if you remember this, or if you ever saw it, it’s like implementation of Linux in JavaScript that runs a new browser. And this was 10 years ago if I remember correctly. This was crazy. And this guy is basically … He programs that he drops this huge bomb, but then he goes back and programs. And he basically built FFmpeg just incredibly fast, video transcoder, collection of utilities, comes back with Linux JS and he also built something called the Tiny C compiler. And this is a fairly complete C compiler as in it compiles basically the complete C language for, I think, multiple architectures. And it comes in at around 70,000 lines of code, which is-

Adam: Oh, wow.

Thorsten: … It’s still a lot, but compared to 50 million lines of code, it is tiny. But if you then say, no, we’re only going to support one architectural or just a subset of C. We are going to ignore the pre-processor for example. And just C itself, we are only going to compile that. Then you can go much lower.

And there’s two projects, one that I have in mind right now, the one is called C4. If I remember correctly, it is a C compiler in four functions of C, which is kind of joke as in like tongue in cheek, you probably shouldn’t have done it four functions. The thing is it’s really one file and you look at it and it has read function that reads in source code. It has a parse function that parses it, then I guess, compile and emit.

But there’s another project that is 10K lines of code. So slightly bigger. And it’s called 8CC. And I dunno where eight come from, but it’s a C compiler that can actually compile itself. So it’s self hosting, and that is 10,000 lines of code. And it’s pretty readable. It’s pretty easy to understand, it’s clean C code. There’s not a lot of fancy stuff going on. So if you’re looking for something like examples of how to build a C compiler, this is a good thing to look at.

The guy who wrote it. He actually bought my book and wrote me a message on Twitter saying, “Hey, I love your book. You’re the dude who wrote [8CC 00:57:04].” “Then why do you need to read my book?” And he said, “You’re my teacher when it comes to explaining code.”

Adam: Yeah, I really like your books. Somebody was asking me, “Oh, why this book? And not unlike the dragon book or something.” And I was, “It’s just like code and explaining how the code works and then here’s the code as well.” So I like this approach is very developer-centric.

Thorsten: Yeah. Thank you. I appreciate hearing that. And also the main reason why I wrote it, because I need this like a language that is not a toy language. It’s not a 50-line implementation of something, but it’s also not a huge complex compiler with lots of optimizations and stuff. But like the middle thing explained line by line with tests, which I really enjoy. Yeah. So I had to write it myself.

Adam: Nice. That was a great book.

Thorsten: Thanks.

Adam: So thank you so much for joining me. It’s been a lot of fun.

Thorsten: Yeah. Thanks for inviting me. This was a lot of fun, I really enjoyed it.

Where the Compiler Goes Next

Adam: That was the show. Thank you for listening to the CoRecursive podcast. I’m Adam Gordon Bell, your host. If you’re listening right now, reach out and let me know what you thought or just that you listen. In the Slack channel, we have recently been discussing great books on software design. As a guest, a couple episodes back mentioned James [Couple 00:58:20]. There really are very few great books on this topic, which is kind of crazy. Until next time, thank you so much for listening.

Support CoRecursive

Hello,
I make CoRecursive because I love it when someone shares the details behind some project, some bug, or some incident with me.

No other podcast was telling stories quite like I wanted to hear.

Right now this is all done by just me and I love doing it, but it's also exhausting.

Recommending the show to others and contributing to this patreon are the biggest things you can do to help out.

Whatever you can do to help, I truly appreciate it!

Thanks! Adam Gordon Bell

Support The Podcast

Learning About Compilers

and Bytecode With Thorsten Ball

A Speech in Two Languages

Why Learn Something You’ll Never Build?

The Runtime That Barely Shows Up

Code That Reads Like a Book

Turning Strings Into Trees

Just a Dictionary and a Switch

Paying the Cost Before You Run

How Ruby Got a Secret Compiler

Compiling Just in Time

V8’s Tower of Compilers

From Trees Back to a Flat List

A CPU Built in Software

Stack Machines vs Register Machines

Why One Byte Changes Everything

It All Comes Down to Simple Blocks

They Couldn’t Build My Website

Fifteen Million Lines of GCC

Where the Compiler Goes Next

Support CoRecursive

Episodes