Introduction
Adam: Welcome to CoRecursive, where we bring you discussions with thought leaders in the world of software development. I am Adam, your host.
Jim: It is asking them to give things up, but the thing is what you get in return is exactly this guaranteed memory safety, and this guaranteed freedom from data races. It’s just this huge win.
Adam: Writing secure, performant, multithreaded code is very difficult. Today I talk to Jim Blandy about Rust, a programming language that is trying to make this a lot easier. We also talk about why it’s so hard to write secure code. Jim works on Firefox, and his insights into the difficulty of writing secure code are super interesting. I also ask Jim about Red Bean Software, a software company that refuses to sell software at all.
Jim Blandy is the co-author of Programming Rust, among many other things. Jim, welcome to the show.
Jim: Hi, thanks for having me.
Adam: It’s great to have you. I have your book. I really hoped I would have made it a long way through it before I talked to you, but I can see the bookmark, it’s about a quarter of the way through.
Jim: It ended up a lot longer than we wanted it to be.
Adam: I don’t think it’s your fault that I haven’t made it very far.
Jim: What we had in mind originally, we wanted to do something that was more like K&R, like this slim volume that just covers exactly what you need, but basically Rust, it’s a big language that is trying to address the things that people need. It ended up just being we took a little while to cover that.
Adam: It’s not a small language, I wouldn’t say.
Jim: Yeah.
Adam: What is Rust for? What is it targeting?
Rust Is Targeting Bitter C++ Programmers
Jim: Well, I mean the funny answer is that it’s targeting all those bitter C++ programmers who are sick and tired of writing security holes. Basically Jason and I, my co-author Jason and I, we both have worked on Mozilla’s SpiderMonkey JavaScript engine. That is a big pile of C++ code. It’s got a compiler front end. It’s got a byte code interpreter. It’s got two JITS. It’s got a garbage collector, which is compacting and incremental and generational.
Working on SpiderMonkey is kind of hair raising, because Firefox, even as it’s not the lead browser, we still have hundreds of millions of users. That means that when we make a mistake in the code, we can expose just a huge number of people to potential exploits. Of course, as when you’re working in this field, and this is true, I think, of all the browsers, you get the CDEs, things get published. You find out about exploits. It really is very humbling.
Whether you are writing the code which is going to go out in front of all these millions of people, or whether you are reviewing the code, somebody has finished their patch and they flag you for review, if you’re the reviewer you’re like the last line of defense. This is the last chance for a bug to get caught before it becomes an exploit. You get used to it. You accept that this is the way things are done.
Then you start working in Rust. I just was curious about Rust because the original creator of the language, Graydon Hoare, is a personal friend of mine. We worked together at Red Hat, and modern Rust has gone far beyond what Graydon started with, so I wouldn’t say that it’s his language anymore. I was curious. I’ve been following its development since the beginning, and so I was curious about it.
Once I really started getting into it, I realized that this is systems programming, where I as the programmer am in control of exactly how much memory I use. I have full control over how things are laid out in memory. The basic operations of the language correspond closely with the basic operations of the processor. As a systems programmer, I have all the control that I need, but I don’t have to worry about memory errors, and memory safety.
Just like I say, when you’re working on a security-critical C++ code base like Jason and I have been, you get used to it, and you internalize that this is just the standard that you’re being held to is actually perfection, because that’s what it is. The smallest mistake, and these people, when you read about the exploits that people show off at Black Hat, it’s just amazing, just the ingenuity and work, and just blood, sweat, and tears that people put into breaking things is really impressive.
You’ve internalized that, and then suddenly you work in Rust and that weight is lifted off your shoulders. It is like getting out of a bad relationship. You’ve just gotten used to being just treated badly, and then suddenly somebody is reasonable to you, and you’re like, “Holy cow. I am never going to do that ever again.” Then the next thing is that when you get to work on concurrent code in Rust, actually trying to take a problem, distribute it across multiple cores, Rust is constructed so that when your program compiles, once your program compiles it is free of data races by construction, assuming that you’re not using unsafe code.
In C++ everybody thinks that their multithreaded code is fine. Everybody understands what a mutex is and how it works. The primitives are not difficult to understand at all. Then you end up getting surprised by what’s actually going on in your code when you have to work on it. One of the engineers here at Mozilla, Firefox is a heavily multithreaded program. I think when you start it up there’s 40 or 50 threads that get going. The garbage collector does stuff off-thread. The JavaScript compiler will push compilation work off to a separate thread.
We do I/O, like for example when a tab is trying to write something to local storage, that I/O is often pushed off to a worker thread and it’s handled asynchronously on the main thread. It’s a very heavily multithreaded program. We had an engineer here at Mozilla who decided that he was going to use TSan, the ThreadSanitizer tool, to look for data races, to actually look at our code and observe how well we were doing in keeping data properly isolated, to keep data properly synchronized.
What he found was that in every case where Firefox uses threads we had data races. Not most, every single case.
What Is a Data Race?
Adam: That’s astounding. Let’s back up a while. What’s a data race?
Jim: A data race is when you have one thread write to a memory location, and then another thread reads it, but there is no synchronization operation that occurs between those two. It’s not like nobody releases a mutex and the other person acquires the mutex, or the write isn’t atomic, or there isn’t a message sent. There are any number of primitives the language provides that ensure memory synchronization.
The reason this is an issue, it’s an issue for two reasons. One is that whenever you have any kind of non-trivial data structure, the way they’re always implemented is you have a method or a function, just any operation on that data structure, and the method will temporarily relax the invariance that that data structure is built on, do the work and then put the invariance back into the place. For example, if you’re just trying to push an element on the end of a vector, usually it will write the new element to the end of the vector, and then it will increment the vector’s length.
Well, at that midpoint between those two operations, the vector’s got this extra element that it’s supposed to own, but the length doesn’t reflect that. There’s this momentary relaxation of the invariance of the type that the length actually is accurate. Even more so, if you are appending an element to a vector and the vector has to reallocate its buffer. First it’s going to allocate a larger buffer in memory. Next it’s going to copy over the existing elements to that new buffer, at which point there are actually two copies of every element, which is kind of strange. Which one is the owning copy? Which one is live?
Then it frees the old buffer, and then it sets the vector’s pointer to point to the new buffer, and like that. That’s a more complicated operation, where in the midst of this operation the vector is actually in this wildly incoherent state.
By the time the method returns, the vector is guaranteed to be back in shape and ready to use again. When you have data races, getting back to data races, the problem with unsynchronized access is that it means that you can have one thread observing the states of your vector, or really of any nontrivial type, while it is in the midst of being modified by some other thread. Whatever invariance the vector’s methods are counting on holding in order to function correctly may not hold. That’s the language level view of things.
Then modern processors, of course, add even further complication to the mix, where each processor will have its own cache, and although they do have cache coherency protocols trying to keep everything synchronized, turns out that even Intel processors which are trying to make fairly strong promises about memory coherence, it’s still visible that each processor will queue up right to main memory. That is if you are one core on a multicore machine, and you’re doing a whole bunch of writes to memory, those writes actually get queued up and reads on that same core, if you try to read a location that you just wrote to, it’ll say, “Oh, wait. I see in my store queue that I just wrote to this. I’m going to give you the value that I just wrote.” Even though the other cores will not have seen those writes yet.
The other thing that synchronization ensures is that at the processor level the memory that you are about to read is guaranteed to see the values that were written by the processor that wrote to them, assuming that both have executed the proper synchronization. A data race is a write to a location, op by one processor, by one thread, and then a read from that location from another thread without proper synchronization. It can cause a variance to be evaluated, and you can encounter memory coherence errors.
Adam: The hardware thing you mentioned is interesting. Maybe it’s a bit of divergence. How does that work? If there’s writes queued up to a certain sector or something, and you are reading from it, does it block until those writes go through, is that what you’re saying?
Jim: This is something, so the processors change over time, and the different processors have different details about exactly how they do this. I’m not sure that I am going to be able to accurately describe current Intel processors, but this is as I remember it. What you’ve got, at the basic level you’ve got the caches that are communicating with each other about what they have cached locally. For example, if nobody has read a particular block of memory, then that’s fine.
When one core brings a particular block of memory into its cache, it’ll actually mark that and say, “I’ve got this, but I haven’t written to it yet. It’s okay for other cores to read that memory.” Maybe all the cores, maybe it’s a big block of read-only memory, maybe it’s, I don’t know, maybe it’s static strings or something like that, and so all the cores can bring copies of that memory into their caches, and then use it.
However, before a core is able to write to a block of memory, it says, “I need exclusive access to that.” It actually broadcasts out on this local bus for the purpose of this kind of communication, and says, “All you other cores, I’m about to write to this block of memory. Please evict it from your caches and mark it as exclusively mine.” All the other cores, they kick out that block from their caches. They say, “We don’t know what’s in this block of memory anymore. Only that guy knows what’s in it.”
Then that processor that’s obtained exclusive access to that block of memory can do what it pleases. Then in order for the other cores to actually even read from that memory now, they have to go and get a copy of it back from, or force the core that was writing to it to flush what it had back to main memory, and so then it goes back into the shared state. They call it the MESI protocol, it’s M-E-S-I, which is like, oh jeez, I can’t remember what it is.
What Is the MESI Protocol?
E stands for exclusive. The four letters are the names of the four states that a particular block can be in. E is exclusive access, which is when you’re writing to something. S is for shared access, when it’s actually just everybody has the same copies, and everybody’s just reading from it. I think I is invalid, where somebody else is writing to it, and so your copy of it is bogus. That’s just keeping the caches coherent, but then the other thing is that writes are a lot slower than reads, and so each core has a queue of the writes to memory that it has made that it is waiting to write out to main memory.
If you do a whole bunch of stores, your store queue will get filled up with a list of these things going out. If the core which has done the writes tries to read, then certainly it knows what the current values of things are. The other cores can’t see it yet, can’t see those writes yet, and so the way that you can get incoherence, the way that you can end up with different cores having a different idea of what order things happened in is when one core gets a result out of its store queue, and then the other core gets the results out of main memory, and so you can end up with different cores seeing writes to memory seem to happen in a different order.
The history of this is actually really interesting. For a long time, Intel would have sections of their processor manuals where they’d try to explain how this worked. They would make these very friendly promises like, “Oh yes, everything’s coherent. Don’t worry. You just do the writes you want to, and everybody sees them.” Then there was this group. I could look up the reference later if you’re curious, but there was this group in it’s either Cambridge, I think, or Oxford.
A very theoretically inclined group who basically said, “We’re going to actually make a formal model of the memory. We’re actually going to formalize the memory model that Intel has given us here.”
Adam: Formalize in what, like in …
Jim: As like they made up a little logic that says which executions are acceptable, and which executions are permitted by this, and which executions are not permitted by this. Now, again, the specification doesn’t say exactly what happens. It just says what the rules are. It says this could never happen, this might happen. It identifies a set of acceptable executions, not a specific. It doesn’t tell you exactly which one the processor’s going to do. It just specifies a set of acceptable executions, or a predicate that you could run on an execution to say this was real, or this is not acceptable.
What this research group did is they said, “Well, let’s take them at their word. We’re going to write tests. We’re going to use this specification that we’ve written, that we made up, because all we’ve got is English to work with and we’re going to generate a ton of tests that we will run on the actual processor to see if the processors actually behave the way they are claimed to behave in the manual.” You can tell obviously that the answer is no, that Intel themselves in their own documentation did not correctly describe the behavior of their own processors.
The great thing about it and what was really powerful was that their techniques allowed them to just generate lots of tests, and then find ones that failed, and then they were able to produce them. When they published, they had very short examples, “If you run this sequence of instructions on one core, and this sequence of instructions on another core, you will observe these results which are forbidden by the spec.” It was really nice. It was really just like, “Here’s your book.”
Basically what they found was that in general, yes, the MESI protocols do work as advertised, but the thing that you really have to add, the thing that you add to the picture to make it accurate is the store queues, the write queues.
Adam: Because if you have a write that hasn’t happened yet, then you’re going to have this.
Jim: If you have a write that you’ve done, if you’ve just done a write, you will see that write before other cores will see it. This is the kind of thing, just to bring this back to Rust, this is the kind of thing where it raises, I think, a programmer’s macho hackles. You say, “Well, that seems pretty tough for most people, but I can handle it.” Everybody says that. I think that. I catch myself thinking that, and it’s not true. You’re not up to the task of being perfect.
To have a language where you can start pushing your algorithm out across multiple cores, pushing your code out to run in multiple threads and just know that you may have bugs, but they’re not going to be these bugs that depend on exactly the order in which memory writes happened, and the exact instructions that the compiler selected and things like that is just a huge win.
Adam: Data races, they’re out.
Jim: Data races are out.
Adam: How?
Jim: Well, so the key idea of Rust, which is something, and this was, I think, really the thing that most programmers get hung up on when they learn it, is that Rust takes control of aliasing. By aliasing, I mean the ability to reach the same piece of memory under two different names, under two different expressions. The example that I give in the book is, I actually give a C++ example. I say, this is C++, mind you, this is not Rust. Say you got int X, and it’s mutable. It’s not constant X, it’s just int X.
Rust Aliasing
Then you take a const pointer to it, so I say const intX or P. I’ve got a const int pointer P, and I say equals & X. I’ve got a const int pointer to a non-const X. Now, the way C++ works, you cannot assign now to P. If you try to assign a variable to P, or use the increment operator on it or something like that, then that’s a type error. You’re forbidden from using P to modify the ref into the pointer. You can assign to X no problem. You can go ahead and change the value of X anytime you want, and so it’s not the case that just because P is a pointer to a constant int that the integer it points to is constant.
How perverse is that? What does const mean if it can change? Because the thing is, I want to make clear that there are uses for this kind of thing, and it is pretty useful to say, well, through this pointer I’m not going to change this value. I’m not saying it’s useless, but it is kind of not what you’d expect. If you think about what it would take to fix that, to say, well, if I’m going to say that this pointer to this thing that this is really a pointer to a constant thing, that would mean that for as long as that pointer P exists, a pointer to a const int, that all other access or that all other modification of the thing that it points to has to be forbidden.
As long as P is pointing to X, you have to make sure that X can’t be modified. That’s what I mean by aliasing, that * P, that is dereferencing the pointer P, and X are both things that you can write in your program that refer to the same location. This aliasing can arise under pretty much any circumstances. Anytime you have two paths through the heap that all arrive at the same object, anytime you have a shared object in the graph of objects, that’s two ways to get to the same location. There will generally be two different expressions that you could write to refer to the same object. JavaScript lets you do this. Java lets you do this, basically every language lets you create aliases.
Shared and Mutable References
What Rust does is it actually restricts your ability to use pointers, such that it can tell when something is aliased, and it can say, “For this period, for this portion of the program, these objects are reachable by …” Basically there’s two kinds of pointers, there’s shared pointers, and then there’s shared references and there’s mutable references. It’ll say, “These objects are reachable by shared references, and thus they must not be changeable.”
You know not just that you can’t change those values through those shared pointers, but you know that nobody else can change them either. It’s really powerful. When you have in Rust, if you have a shared reference to an int, you know that int will never change. If you have a shared reference to a string, you know that string will never change. If you have a shared reference to a hash table, you know that no entry in that hash table will ever change while you have that shared reference, as long as you have that shared reference.
Adam: Once that reference goes out of scope, then changes could happen.
Jim: Exactly. Then the other kind of reference is a mutable reference, where what it says is you have the right to modify this but nobody else does. Nobody else has the right to even see it, and so a mutable reference, basically it’s a very exclusive kind of pointer. When you have a mutable reference to a hash table, nobody else can touch that hash table while you have it. That’s statically guaranteed. It’s part of the type rules. It’s guaranteed by the type rules around mutable references.
You can imagine that any type system which can guarantee this thing about, “Oh, there’s nothing else, there’s no other live way in the system to even refer to the referent of this mutable pointer,” that’s a pretty powerful type system. Working through the implications of that, I think, is where most people stumble learning Rust, that there is this strict segregation between shared references, where you have shared immutable access, and where you have mutable references where it is exclusive access. There’s this strict segregation between sharing and mutation.
The way that Rust accomplishes that is, I think, really novel, and it’s something that people aren’t used to. I was having lunch with a very accomplished programmer, just sort of old friend and we hadn’t talked in years. We were talking about Rust, and he says, “I can’t create cycles. I’m a programmer. I know exactly what I want to do with those cycles. I want to have data structures that are arbitrary graphs, and I need those data structures, and Rust won’t let me make them and so I’m not interested.”
I think he’s wrong, or I think he’s making a poor choice, but he is correct in his assessment that basically Rust really is asking you to give up something that is just such a fundamental tool that most programmers have just internalized, and they’ve learned to think in those terms. It is asking them to give things up. The thing is what you get in return is exactly this guaranteed memory safety, and this guaranteed freedom from data races. It’s just this huge win.
The way Rust works when it does work is when you can take that, I think I mentioned the programmer machismo. I want a gender neutral term for that, or basically the programmer’s pride, like that little bit of confidence that you’ve got, you want to flip that from people saying, “I can handle data races. I can handle unsynchronized memory access. No problem.” You want to flip them from thinking that to thinking, “Oh, I can write my code in this restricted type system.” You want to make them say, “I can get things done even though Rust is restrictive. I can overcome these things. I can take this limited, buttoned down system and make it sing.”
Adam: Maybe people just shouldn’t be so invested in their own pride. I don’t know.
Jim: I’m not optimistic about that ever happening.
Adam: One thing is it sounds like what you’re talking about, it’s like changing the relationships you have with the compiler. I think some people view a compiler as like …
Jim: Very much.
Adam: … a teacher with a ruler that hits you on your hands like, “Don’t do that.” There’s an alternative way where maybe it’s more like an assistant.
Jim: Yeah. What’s going on a lot with Rust is that your debugging time is getting moved from run time to compile time, that is the time that you would spend chasing down pointer problems in C++, you instead spend in a negotiation with a compiler about your borrows and about your lifetimes. The thing about it is the difference is that tests only cover a selected set of executions. My tests cause the program to do this. It runs it through its paces in this specific way, whereas types cover every possible execution.
Adam: Definitely.
Jim: That’s the property that really makes it wonderful for concurrency, which is that with concurrency you have to just give up on your tests really exercising all possible executions, because the rate at which different cores run the code, and how the threads get scheduled, and what else happened to be competing for your cache at the time, none of that stuff is really something that you can get a grip on. Having a type system that says all possible executions are okay is exactly what the doctor ordered.
The RustBelt
Adam: Are we at risk of there just being a problem with the type system?
Jim: Yeah, sure. If the type system isn’t sound, then you lose, or we lose. In fact, so one nice thing is that the people who are the lead designers of the type system right now, as I understand it, are Aaron Turon and Niko Matsakis. In particular, Niko is the one who had this insight about, “Hey, we have the possibility of really separating sharing and mutation and keeping those two things completely segregated.” That’s what I think is really the defining characteristic of Rust, or rather the defining novelty of Rust.
When they talk about type systems, they’re playing with PLT Redex, which is a system from the PLT group that made Racket and all that stuff, for playing with formal systems, and looking at derivations in formal systems. They’re not proving things about it. There is then a project called RustBelt. There’s also a conference called RustBelt, but RustBelt is a project at a German university where they’re actually trying to formalize Rust. It’s a research program where they say, “We are a group of people and we’re going to work on finding formal models of the Rust type system and Rust’s semantics.”
In particular, there’s a guy, Ralf Jung, who is really taking this on. He is working on machine verified proofs of the soundness of Rust’s type system. Now, it turns out that there are aspects of Rust’s that make this very interesting and challenging, and to turn into something that just has never been done before. In particular, all of Rust is built on a foundation of unsafe code. Unsafe code is code where it uses primitive operations whose safety the compiler cannot check.
These operations, they still can be used safely. They just have additional rules in order to be used safely that you as the programmer can know.
Adam: What do you mean to say that it’s built on a foundation of unsafe code?
Jim: Well, the Rust vector type, for example, the vector type itself is safe to use. If you are writing Rust code and you use the vector type, it’s this fundamental type in the standard library. It would be the analog of Haskell’s list, or something like that. You can’t not use it. Basically if you are using vector, then you are at no risk. Any mistakes that you make using vectors will be caught by the type checker and the borrow checker.
Adam: At compile time.
Jim: At compile time. Vector is safe to use. Vec is safe to use, but the implementation of vec uses operations whose correctness the compiler itself cannot be sure of. In particular, when you want to push a value onto the end of a vector, what that’s doing is that’s taking this section of memory which like, again, you got to imagine the vector has a big buffer, and it’s got some spare space at the end of the buffer. You’re going to push a new value, say you’re going to push a string onto the end of that vector, so a vector of strings.
You’re transferring ownership of the vector, you’re transferring the ownership of the string from whoever’s calling the push method to the vector itself, and so there’s a bit of uninitialized memory at the end of the vector’s buffer, or towards the end of the vector’s buffer which is now having a string moved into it.
In order for that to be correct, in order to make sure that you don’t end up with two strings, thinking they’re both the same, own the same amount of memory, and in order to make sure that you don’t leak the memory, it has to be guaranteed, or it has to be the case, it has to be true that the memory that you’re moving the string into is uninitialized. Whether or not the location that something gets pushed onto is uninitialized or not depends on the vector being correct.
That is, the vector knows the address of its buffer, it knows its length, and it knows its capacity, the actual in-memory size of the buffer. The vector has to have, A, checked that there is spare capacity, that the length is less than the capacity. That length has to have been accurately maintained through all the other operations on the vector. If there is a bug in the vector code, and the length ends up being wrong, then this push operation which transfers ownership can end up, say, overwriting some existing element of the vector. Then that could be a memory flaw, a memory problem.
The nice thing is that vec is a pretty simple type. It’s built on some modules which have a very simple job to do, and so that is a small piece of code that we can audit to ensure that the vector is using its memory correctly. Once we have verified by hand, by inspection, that the vector is using its memory correctly, then we can trust the types of the vector’s methods to ensure that the users will always use it correctly. The users have no concern. It’s only we who implement the vector who are responsible for this extra level of vigilance, and making sure that we’re getting the memory right.
Adam: The type system can be and is being formally verified, but the libraries need to be hand audited. What’s vector written in, is it written in Rust?
Jim: Vector is written in Rust, and that’s the key, is that unsafe code in Rust is this escape hatch that lets you do things that you know as the programmer, that you as the programmer know are correct but that the type system can’t recognize as correct. For example, vector is one of them. Vector itself is written in Rust. It uses selected unsafe code.
This is exactly what the RustBelt project is tackling, is that in order to really make meaningful statements about Rust, you’re going to have to actually be able to handle unsafe code. Because the primitive operations of Rust, like the synchronization operations, the stuff that implements mutexes, the stuff that implements interthread communication channels, or the basic memory management things that get memory, that obtain free memory for a vector’s buffer, or that free a vector’s buffer when the vector is disposed of, or the I/O operations that say, “Look, we’re going to read memory. We’re going to read the contents of a file, or data from a socket into this memory and without wiping out random other stuff around it.”
All of those things, they’re code that no type system can really, well, yeah, I think we can say that. They’re primitive operations, and so no type system can really say what they do. You can use unsafe code and make sure that you use them correctly, and then assuming that your unsafe code is right, you can build well-typed things on top of those that are safe to use. This two level structure of having unsafe code at the bottom, and then having typed code on the top is what allows people to have some confidence in the system.
The RustBelt people actually want to understand the semantics of unsafe code, and actually spell out what the requirements are in order to use these features safely. Then they want to verify that Rust’s standard library does indeed use them correctly. They’re really going for the whole enchilada. They want to really put all of Rust on a firm theoretical foundation. It’s really exciting.
Adam: The tradeoff as a user of the language, it seems to make sense to me. You’re saying rather than needing to audit my code to make sure these issues don’t exist, I can trust that the system’s been formally verified except for these unsafe primitives which have been audited themselves.
Jim: Yeah. Basically if you don’t use unsafe code, then the compiler prevents all undefined behavior, prevents all data races, prevents all memory errors. If you don’t use unsafe code, you are safe. If you do use unsafe code, or unsafe features, you are responsible for making sure that you meet the additional requirements that they impose above and beyond the type system.
Implementing Unsafe Code
Either you can figure out how to fit your problem into the safe type system. The nice thing about Rust is that the safe type system is actually really good and quite usable. Most programs do not need to resort to unsafe code. You can either work in that world, which is what I almost always try to do, or if you really need, if there’s a primitive that you really know is correct but that the type system can’t handle, then you can drop down to unsafe code and you can implement that.
One of the strategies that we emphasize in the unsafe chapter of the book, it’s the very last chapter after we presented everything else, one of the strategies that we encourage people to use is to make sure that or to try to design interfaces such that once the types check that you know that all of your unsafe code is A-okay, and then that means that you’ve exported a safe interface to your users.
If you have an unsafe trick that you want to use, you isolate that unsafe trick in one module that has a safe interface to the outside world, and then you can go ahead and use that technique and not worry about its safety anymore, use the module, and then the module’s own types ensure that it’s okay.
Adam: The unsafe code doesn’t escape, right?
Jim: Exactly.
Adam: It sounds similar to the idea like people will be writing some Haskell function that claims to do no side effects, but for performance reasons maybe it’s actually generating a random number. Maybe that’s a bad example, but it’s totally hidden from the user. It acts pure from the outside, whatever may happen.
Jim: Yeah. That’s a good example, because the question comes, the question arises, is it really pure from the outside? If they did it right, if they really actually kept all of the statefulness local and isolated so that you can’t tell from the outside, then everything’s fine. Whoever’s using that from the outside can use it and not worry about it, and they get the performance, and they don’t have to worry about the details.
Rust Performance
Then inside, the people who wrote that code, they have extra responsibilities. The normal Haskell guarantees of statelessness don’t apply to them, because they’ve broken the rules, or they’ve stepped outside the rules, and they’re now responsible.
Adam: You mentioned the type system of Rust, and actually it has a lot of features that I guess you wouldn’t expect from something that’s, well, maybe I didn’t expect it. It has a lot of functional feeling features.
Jim: I’m really glad that you brought that up. I’ve talked about safety and I think I’ve talked about performance, but the really nice thing about Rust is that it is not by any means a hair shirt. It is actually really comfortable to use. It has a very powerful generic type system. The trait system is a lot like type classes in Haskell, if you’ve used type classes in Haskell. I mean everybody uses type classes in Haskell whether they know it or not. Rust has traits.
The whole standard library is designed around those generics and those traits, and it puts them to full use. It’s actually a super comfortable system to use. I did a tutorial at AusCon in Austin last May, where we went through, to the extent that you can in three hours, writing a networked video game. That involved 3D graphics. It involved a little bit of networking. It involved some game logic. When I was working on it, obviously I had to have the game ready for the talk. I put it off, and so I had to do the last stages of development in a rush, and it was fantastic. It was like I had wings, or something.
Because once I’d gotten something written down, once I’d really gotten the types right, it was done. It was done. If I had been working in C++, I would have had to randomly take three hours out of the schedule to track something down and debug it. Because it was Rust, I just got to keep going forward, forward, forward, and so it was just like really great progress. Rust has all these batteries included kind of things, like there’s a crate out there called Serde which is for serializing/deserializing, a serializer/deserializer.
Utilizing Serializer and Deserializer
It is a very nice collection of formats. There’s JSON, there’s a binary format, there’s XML, there’s a bunch of other stuff, and then a set of Rust types that can be serialized, string, hash table, vector, what have you. Serde is very carefully constructed so that if you have any type which it knows how to serialize or deserialize then you can use that with any format that it knows how to read or write. You just pick something off of this list, and then pick something off of that list, and you’re immediately ready to go for sending something across a network.
Naturally, if you define your own types, you can specify how they should be serialized or deserialized. You define your own custom struct and say, “Well, here’s how.” The thing is, that’s real boilerplate stuff. There is actually this magic thing that you can say, you can slap on the top of your own type definition, you can say derive serialize and deserialize. What that does, I mean I guess Haskell has something like this too, that automatically generates the methods. It looks at the type and automatically generates the methods to serialize and deserialize that type. It is super easy to get your stuff ready to communicate it across a network.
For communicating people’s moves and communicating the state of the board, it was just a blast, because there was all of this boilerplate stuff that I didn’t have to worry about. Those are just the kind of power tools that are wonderful.
Is Rust for Non-C++ Developers?
Adam: I think just for a callback, I think that’s like generic derivation. I did have Miles on the show earlier. He wrote something similar for Scala. Haskell has it. I think it was originally called Scrap Your Boilerplate. A very cool feature. A lot of boilerplate can be removed by things like that.
Jim: Scrap Your Boilerplate is done within the Haskell type system, if I remember that paper right. Serde is doing a little bit of procedural macro kind of, “I’m actually going to …” [inaudible 00:49:03] looks at your type definition and decides what to do with it. I wonder maybe that stuff could be done in a Scrap Your Boilerplate style. I don’t understand Scrap Your Boilerplate well enough to say. It’s that style of thing. Those are just wonderful power tools.
Adam: I think you’re making a good argument. Rust, apparently hard to learn. This is what I’ve heard. However, once you learn it, there’s a superpower. Is this superpower applicable to non-C++ devs? Is this a useful skill for somebody who’s throwing up web services?
Jim: I think so. You had Edwin on talking about Idris, and Edwin made a comment that I want to push back on. He said, “I don’t think that types really help people eliminate bugs that much, because unit tests are still useful.” Right now I work in the developer tools part of Mozilla, and we have a JavaScript front end. The user interface for the developer tools in Firefox, they’re written themselves in JavaScript.
It’s a React Redux app, basically, that talks over a JSON-based protocol to a server that runs in the debuggee, and looks at the webpage for you. I’m proud to say that my colleagues are enthusiastic about the potential for types. They really see the value of static typing. We are more and more bringing or using flow types in JavaScript. We’re bringing flow types into our code base. It’s not done. We haven’t pushed them all the way through. There’s plenty of untyped code still, because JavaScript flow types let you type one file and leave the rest of the system untyped. You can gradually introduce types to more and more of your code as you go. We’re in that process.
How Errors Are Missed?
Of the bugs that I end up tracking down, I think, I don’t want to put a number to it, because I haven’t been keeping statistics, but it feels like at least half of them would have been caught immediately by static typing.
Adam: I’ve heard people say this when TypeScript, moving to TypeScript, which is similar, that often …
Jim: Yeah. Same idea.
Adam: … they found not a super obscure bug, but a little bit, like corners where things would go wrong that the type system was like, “What are you doing here?”
Jim: The thing is I think the people who work in Haskell or certainly somebody who works on Idris, I don’t think they really know what the JavaScript world is like. It’s just insane what people do. In JavaScript, if you typo the name of a property on an object, it’s just a typo, you capitalize it wrong or something, that’s not an error. JavaScript just gives you undefined as the value, and then undefined just shows up at some random point in your program.
Until you have complete unit test coverage of every single line, you don’t even know whether you’ve got typos. That’s crazy. That is just not what humans are good at, and it’s exactly what computers are good at. To put that on the human programmers’ shoulders doesn’t make any sense.
Adam: Now, to be fair to Edwin, he does have t-shirts that say, “It compiled, ship it.”
Jim: I thought that was a really good podcast. Like I say, we’re all fans or we’re all really curious about Idris. I think that we don’t want to undersell the benefits of static typing. Back to your question, for people who aren’t doing systems programming, why would they be interested in Rust? Rust is just a really good productive language to work in. It will make you worry about a bunch of things that maybe you thought you shouldn’t have to think about, but in retrospect I feel like I’m happy to have those things brought to my attention.
Rust for Single-Threaded Code
For example, at the very beginning I talked about how data structures, the method of a data structure will bring it out of a coherent state, and then put it back into a coherent state. You want to make sure that you don’t observe in the midst of that process. You can get those kinds of bugs even in single threaded code. You can have one function which is working on something and modifying something, and then it calls something else, calls something else, goes through a callback, and you have several call frames. Then suddenly you have something that tries to use the very data structure that you are operating on at the beginning, but you weren’t aware of it.
Basically nobody knows that they’re trying to read from a data structure that they’re in the middle of modifying. That’s something, it’s called the iterator invalidation. In C++ it’s undefined, but if you’re in Java, you get a concurrent modification exception. I just mentioned this to a Java programmer, and he was like, “Oh yeah. CMEs.” They had a name for them. They knew. Also, that’s totally a single threaded programming error, and that’s also prevented by Rust’s type system.
I feel like Rust’s types actually have a lot of value even for single threaded code, which is not performance sensitive, but it’s just really nice to have. It’s really got your back in terms of letting you think, or making sure that your program works the way you think it does. I think it has a lot of applicability as a general purpose programming language.
Adam: The one thing we didn’t talk about, but I think that you touched on briefly at the beginning was to do with security. We talked about data races, but you also mentioned security.
Jim: Most security, well, sorry. There are lots of different kinds of security holes. According to the collected data, there are a few people who collect statistics on the published security vulnerabilities and what category they fall into. Is it SQL injection? Is it cross-site scripting? They categorize them. The category that I’m interested in, for this particular case, is the memory corruption, memory errors. Those have been a consistent 10%, 15% of all the security vulnerabilities being published all together. They’re still a very big issue.
Catching Security Holes
Most of the time, almost all the time what’s happening there is you’ve got a buffer overrun, or you’ve got a use after free, or you’ve got some other kind of dynamic memory management error, which normally would result in a crash, but in the hands of a skilled exploit author can be used to take control of the machine. After you have seen enough of these attacks, you start to feel like pretty much any bug could be exploited with enough ingenuity.
It turns out, I can’t find this post anymore, but the Chrome security team, Google’s Chrome browser’s security team had a blog post just about security errors caused by integer overflows. Integer overflow sounds so innocent, but it turns out that if that integer is the size of something you are 90% of the way to perdition, because basically you can make it think that something is much bigger than it actually is in memory, and then you’ve got access to all kinds of stuff you shouldn’t have access to. You’ve broken out of jail.
Having a type system which prevents memory errors and which basically makes sure statically that your program doesn’t behave in an undefined way really does close off a very significant opportunity for security holes. One of the quotes we open up, one of the chapter quotes we open up the book with was a tweet by Andy Wingo who is a great JavaScript hacker, and a great free software hacker. His tweet was simply, basically there was a bug in a TrueType parser, a font parser. That was one of the bugs that was used to break into the machines that were controlling the Iranian nuclear purification facilities.
The Root Cause
Adam: I didn’t know that. What was that called, the Stuxnet, right?
Jim: Yeah. Stuxnet. Basically it was built around a flaw in TrueType. TrueType, it’s a font parser. TrueType is security-sensitive code, so basically all code is security sensitive. You can no longer say, “Oh, you know, it’s just graphics code.” It’s not true. If you’re writing C++ code and it’s got control of memory and it’s doing pointer arithmetic, you’ve got to be on your toes. The standard is perfection.
Adam: Rust, same as a data race, it takes a certain class of these vulnerabilities off the table.
Jim: Yeah. In Rust without unsafe code if your program types, then we are saying it will not exhibit undefined behavior.
Adam: Undefined behavior is often the-
Jim: Is often the root of the security hole.
Adam: Awesome. We’re reaching the end of our time here. One thing when I was Googling you that I found is your Red Bean Software site. I actually ended up forwarding this to a couple of my friends. It says on it, to all intents and purposes, it appears you have a consulting company that does not sell software. Is that correct?
Jim: First of all, that’s really, really old. My friend Karl Fogel and I, we ran a little company called Cyclic Software, and we were selling CVS support. We were the first group to distribute network transparent CVS. We didn’t write it, but somebody else wrote it and they said they didn’t want responsibility for it. We were the ones who were distributing it. We’re kind of proud of that, because it was network transparent CVS that was really the first version control system that opensource used to collaborate on a large scale, and then got replaced by Subversion, and Git, and Mercurial. Network transparent CVS was really how things got started.
We had Cyclic Software, and then we decided we didn’t want to run it anymore, we couldn’t run it anymore, and so we sold it to a friend, and we realized we had to change our email addresses. We had to tell everybody don’t email us at jimb@cyclic anymore. That’s kind of a bummer. We realized that we were going to be changing our email addresses every time we changed jobs. We resolved to create a company whose sole purpose was never to have any monetary value. We would never have to sell it, and so we could keep a stable email address for the rest of our lives.
It’s a vanity domain, and lots and lots of people have vanity domains. Our joke is that it’s a company whose purpose is never to have any value.
Adam: I found on the front page it says, let me read this, “By buying our products you will receive nothing of value, but on the other hand we will not claim that you have received anything of value. In this, we differ from other software companies, who insist in face of abundant evidence to the contrary that they’ve sold you a usable and beneficial item.”
Jim: That’s Karl.
Adam: Well, it’s been a lot of fun, Jim. Thank you so much for your time.
Jim: Thanks for having me. It was fun.
Adam: I enjoyed your book. I’ll get through it eventually. I think I’m on chapter four. I need to keep working.
Jim: Yeah. Stick with it through the traits and generics chapter, because once you’ve gotten to traits and generics, then that’s really where you’ve really got everything you need to know to really read the documentation and understand what you’re looking at.
Adam: Awesome. Thank you.
Jim: We’re really sorry it’s chapter 11. We tried to make it as fast as possible.
Adam: No. It’s all good. Take care.
Jim: Take care.