CORECURSIVE #116

Godbolt's Rule

When Abstractions Fail

Godbolt's Rule

What do you do when your code breaks and the only fix is to dig into the runtime below?

Matt Godbolt lives for that. Tile-based renderers, color-coded scanlines, zero-copy NICs—each story is a clue that leads past the abstraction to the real machine. He shares the rule that guides him: master your layer, learn the one below, and know the outline of the layer under that.

Matt Godbolt’s journey proves the real breakthroughs are hidden behind the abstractions where you are comfortable and familiar.

Transcript

Note: This podcast is designed to be heard. If you are able, we strongly encourage you to listen to the audio, which includes emphasis that’s not on the page

Three Motorcycles In A Trench Coat

Adam: Hello, this is CoRecursive, and I’m Adam Gordon Bell.

Today I want to talk about abstractions. And when they break and how to see past them.

Like you have this idea of how a network request works or how memory works. And it’s very simple and concise, but it’s also a lie, or at very least a simplification. I was reading this book called Database Internals and about half of the book digs into this core challenges of database design, which is how to spend less time reading and writing to disc. There are all these tricky trade-offs and a lot of the big breakthroughs in databases in the backend of a database come from finding smarter ways to handle disc input and output for a particular type of workload so that things work fast and you don’t lose any data.

But here’s the thing, it’s all very tied to how discs work, right? Whether a spinning old fashioned disc or a modern SSD disc, how they perform and what they’re good at, and what they tell the computer they can do.

But here’s the thing that surprised me, kind of blew my mind. ‘cause once you dig into actual how a hard drive works, especially a solid state drive, it only gets more confusing because the disc interface it, it’s actually a lie. It’s such a simplification. It’s just this convenient abstraction, but it doesn’t match what’s actually happening under the hood in some pretty important ways.

It’s like realizing that car you’ve been tuning to get faster is actually three motorcycles bolted together under a plastic shell. And the reason that some of your clever tuning worked and some of it didn’t, is because you didn’t know about that illusion. You didn’t know about the reality underneath. You weren’t debugging the real thing. You were debugging its shadow. Abstractions help us. They give us a easy concept to hold onto, but sometimes they trip us up.

And so, as you can tell, I was excited about this concept and I, I had to talk to somebody about this and I knew just the person because the, the craziest abstraction, the one that blew my mind is on AWS a lot of data lives in RDS where it’s running Postgres or, or MySQL, but that disc that it’s optimized to write to is, is actually another machine.

The abstraction is such a lie that you don’t even realize that every time you write to disc in your database, it’s actually a network request.

Matt: It’s a, it’s like a SCSI hard disc where you, where you rip out the SCSI controller and just plug it into the network and say, okay, as long as someone sends a packet, a SCSI looking packet to this, I can read and write a block. It’s like bonker stuff. Yeah.

Adam: Yeah. Because that is often like the bottleneck unlike so many systems, and they just put a network in the middle of it. I mean, it’s, it’s, they’re pretty good. It’s a fast network. It’s not the same as having, and that’s, it always used to confuse me, and there’s this provision ops and whatever.

Matt: You’re like, why does this matter? It’s in the machine, surely. It’s like, well, this one is because it’s, you know, a, a physical disc that you are provisioning in a machine that happens to have a disc in it. But everything else is virtualized through a file system that’s really writing to what it thinks is local storage, but is actually a network blast off an iSCSI packet or whatever.

It’s actually using, behind the scenes, racks and racks and racks of discs. Yeah. So, and, and then, you know, you can take the obstruction even further. Those discs, right? They present somewhere like that. Here’s a number of cylinders, sectors, whatever, but it’s like, that’s not really what’s happening either.

They, if it’s, if it’s SSD backed, then the SSD has a mapping table between where you say to put the data and where it’s actually gonna put the data because it has to do wear leveling. It says you only write over and over and over stuff. I don’t wanna keep writing to the same place over and over, over again.

I need to keep moving where I’m writing around the physical characteristics of the system in order to not have the boot sector get completely screwed because you keep rewriting whatever you do, you know, some sector like that. And so that’s a lie as well. And then, you know, back to your point about how, you know, a traditional databases, you write a file and say flush, and the, and the operating system says it’s good, and then you’re like, but it isn’t, is it?

No, because it’s, it’s not yet been flushed to the disc controller. Oh yeah, it has been flushed. The disc controller. Oh yeah. But the, the disc controller has a BA, A, a, A cache. Oh, that’s cool. Alright. But it’s been written to the drive. Yes. Yeah. No, because the drive has several CPUs on there that are doing some of this, like scheduling of IO and whatever, and they’ve stored it for now and they’re waiting for the disc head to come back round before they actually put it in disc to be faster, you know, when does it actually get to the dish?

Just tell me. I almost need to know.

The Abstraction Hunter

Adam: That’s Matt Godbolt and yeah, to me, he is the king of pulling back these abstractions. You know, I knew when I realized that discs were more complicated than I thought, metaphorically, you know, I was discovering my car is actually three motorcycles that Matt would already know this, and he would’ve more facts like that.

One of the motorcycle engines was not in fact a motorcycle engine, but really a lawnmower engine running sideways. That’s just how curious he is. He’s the person who gets into these details.

And that’s the same curiosity that led him to build Compiler Explorer, a tool that shows you exactly what assembly your compiled code turns into. It lets you sort of pull back the veil on the compiler abstraction. I feel like Matt is pure curiosity in motion when you, when you talk to him, you can’t help but feel energized.

Matt: I, you know, everyone knows I get excited about it and, you know, my, my eyes light up and, and, uh, curious thing, my, the inside of my nose gets itchy when I’m excited.

There is something about what we do as software engineers that is magical. We can use magic words to make a computer do something that we couldn’t do or to make it do something.

Uh, I, I dunno, sort a list of things, thousands of things that would take you forever. Uh, it just feels powerful and it feels exciting.

But also just like the constraints you have in these systems where it’s like, well, it’s not really supposed to work this way, or it’s, it’s hard to get it to work at all, but you kind of fight through those constraints. Well, you use the constraints as a positive and kind of come out of it and go like, oh, this is, and when you’re down really deep down in the hardware, there’s loads of funny little constraints that you can take advantage of.

And I think there’s just an intellectual, uh, like riddle or solve a puzzle to solve there where you’re like, Hey, wait a second, if we use this thing, we can, we’ve already squared, it’s already in the R two register. Don’t touch the R two register. We’re gonna do something else over here. I dunno, it feels like you’re solving some cool Sudoku puzzle.

You get that endorphin rush Ooh, you know, how ethernet works. We can do this clever thing with it.

And it contributes to something that’s fun and interesting that you could explain to, I dunno, your mom, and say like, Hey, I’ve made a game. And she’s like, that’s nice dear. Right?

Adam: So that’s today’s show. Matt is back, if you haven’t listened before, he was here in episode 57, which was five years ago. Crazy enough. And there Matt talked about pulling things apart and about building games, and we’re gonna do the same a bit today, but this time we’re taking curiosity all the way down to the metal to where hardware and software blur.

Sometimes with Matt, I can get a bit out of my depth, but I always find it inspiring and I always find it interesting. So let’s do it. Matt’s story starts when he was in university.

From Bedroom Coders To Real Studios

Matt: And it, the end, towards the end of 95, beginnings of 96, I started looking for a job and there was this thing called IRC, which is Internet Relay Chat. And I was in one of the many chat rooms and I happened to mention in passing and somebody said, Hey, you should try, you know, apply to where I work.

It’s a, it’s a cool game development place. So I, I did got in and they were like, when can you start? I’m like, well, no, I haven’t actually finished university yet. So I don’t think I can start until I’ve graduated. I said, okay, well how about you come in in the summer, do some work for us, and then come back again once you’ve graduated?

So that was, that was what I did. And so when I joined in 1996, they didn’t know what to do with me. Although I had applied for and got a programming job. There wasn’t really any intern like work, so I was just a games tester.

Adam: A game tester at Argonaut Games. Argonaut was a small studio in North London and they were the studio that could mix hardware and software in ways that most places didn’t. They built the Super FX chip that made 3D possible on the Super Nintendo, for instance, and it was an interesting office place.

Matt: It was essentially what you would imagine that you would get if you took a whole bunch of people who had been programming in their bedrooms when, from when they were like teenagers, 12, 13, 14 upwards. And as soon as they had reached the age where you could legally employ them, you had transported them from their bedroom into this building.

The, the environment was a converted car dealership. So it was a very weird, very long, thin building that was designed to have, you know, you could walk in and there were cars that would’ve been parked down the side in inside.

And so it was, I mean, and, or, you know, programmers, artists, designers level, uh, uh, creators, um, animators. And so everyone was self-taught. There wasn’t really any formal training. There was no software engineer. Various limited software engineering ideas at the time. And it was, yeah, it, it was pretty fast and loose.

The hours were very, very, very long. Um, the, the pressure from publishers was high and, uh, as you might imagine in a more liberal, chill, artsy environment, there was a lot of things going on whose legality were questionable, especially in the evenings once things, once the management had gone home.

I mean, they were very, very incredibly motivated people. They believed in what they were doing. They knew what they had to do, and they went on and, and, and got it done. And then I think, you know, we, we did pick up software engineering practices as time ticked on.

Adam: But at the time, the big project at Argonaut was Croc: Legend of the Gobbos for the original PlayStation, and Matt was seated in the Crock room.

Matt: The Crock Room was a room that had probably 12 people in it, 12, 14 people, something like that. In sort of dotted around there was like the, the animator with the one precious SGI machine that was, that ran the animation software. And so she was like set up with that.

Everyone else had, you know, some form of like 4 86 ish era, uh, DOS computer, um, desks were covered with crap, frankly, you know, anything and everything that was around, uh, you could tell where the animators were and where the artists were because they would have every kind of manga doll or, you know, you know, transformer or it was, you know, like a shrine to the things that they found really interesting and, and, and exciting.

And, and it was a real eye-opener for me having sort of come from a very boring background. Um, that, you know, wow, there are people out there that are like arty and I’m used to nerds and these are different, these are nerds in a different way. They’re a very different kind of nerd, but they are extremely talented and a very non-overlapping with me until that point. And so that was a huge, uh, uh, eye-opening experience.

Adam: Croc actually started as a Yoshi game for the Nintendo 64, but, after some deal fell through, the dinosaur got turned into a crocodile, and the game shifted to the PlayStation, and Matt’s first job was testing this PlayStation version of this little crocodile.

Matt: That was my first introduction to Argon games, was playing through a VHS recorder, which was an amazing transformation in the testing ability. If you imagine, you know, like you’re playing a game and it crashes and you tell somebody and they’re like, what did you do? And you’re like, oh, I didn’t do anything. It just crashed. And they’re like, no, I bet you did something. Oh, no, I didn’t do anything. And then so it became an argument and just like you went to the VHS, you rewound the tape, right? You went, no, look, you see, I was just standing there, nothing was going on. And then it just froze.

From Game Tester To Programmer

Adam: That summer, Matt got thrown into the deep end of game development. It was hands-on and it was unpredictable, and he loved it. And then when he came back after college, he wasn’t a tester anymore. He got his first real programming job, which was to take that same little green crocodile who honestly looks a lot like Yoshi and make him run on the PC.

Matt: One of the things that we had was a tie-in deal with Intel. And they wanted to sort of put us in the, uh, ship the game alongside the motherboards for certain of their new chips, upcoming chips, as like a, a, you know, hey look, it runs great on. And of course back then, you know, PlayStation was like head and shoulders bit above of what most PCs were doing, or at least, you know, without spending a huge amount of money on these newfangled GPUs of which, you know, how different the world is these days. So they were like, no, no, the CPU is fast enough to do this work being Intel. Um, let’s show you them by, by showcasing it. So I was very lucky to be given one of the engineering samples of the then new Pentium II. And so I got to play with this beast, uh, 2, 2 66 megahertz, if I remember right.

And we had, you know, uh, if you put a bunch of game developers in a room and, and connect all their computers up, what’s the gonna happen? Every lunchtime you’re gonna play Quake, right? Or Doom or anything. And so I was, I, for a short period of time, I was the best Quake player in the entirety of the, of Agonal, but only because my frame rate was four times quicker than everyone else’s.

And so I, I have no skill. I was, it was just that it was like shooting fish in a barrel, right? They couldn’t keep up with me. Uh, that changed pretty quickly. But yeah, so, uh, the, the day in day out was take fairly grotty C code that has been beaten to work in on the PlayStation, along with tons of assembly code that they’d also written for the various, uh, GPU, GPU-like chips that are in there, uh, and port it to, to, um, work in Visual Studio and compile in Visual Studio and then connect it to DirectX, which was this relatively new thing back then.

So I, I learned far too much about how DirectInput works. I did the front end. Uh, so I had to deal with all of the keyboard remapping and reading the mouse and the joysticks and all of those things along with a bunch of other bits and pieces that were just bits in between the bits.

The Hubris Of Youth

Adam: By then, Matt and Argonaut had grown up. The company had moved into a more normal office space, and for the first time it felt like a real studio, not just a bunch of bedroom coders all working together.

At the same time, the industry was changing. PCs were slowly growing in importance, and John Carmack and others were showing what was possible with 3D technology with games like Doom and Quake. Studios started to realize that they might need their own in-house engines, that they shouldn’t just be coding every game from scratch.

Games were getting too complex, and for Argonaut, this meant trying something new: building a 3D engine called BRender, short for Blazing Render. But then the Dreamcast was about to launch, and everybody was excited, and a game producer pitched an idea for a new game and a new engine that will be built just for this Dreamcast, Sega’s new, exciting console.

Matt: And I, my ears pricked up because by complete coincidence, the producer of that game was, who was pitching the idea to Sega happened to have a spare, it was the only seat spare in the, in the office.

And so he was sat in the same room as the a TL group who were doing technical support for this BRender product, and then me in the corner doing Croc. And I heard him talking about this new project and I said, oh, I could write an engine. I mean, honestly, the hubris of youth, right? And so I was like 22, didn’t really know what I was doing. I, again, no formal training in this, but then neither did anybody else. I didn’t know what, I didn’t know about 3D or Matrix transformations or high performance anything.

Right. I was, I was, uh. I’d done a ton of programming as a kid growing up and, you know, it was all in assembly. Up until that point I’ve sort of learnt C kind of as a begrudging, I guess I need to learn this thing. So that’s my, my background. So I felt I was well positioned and this guy, Nick Clark, his name was, gave me the break and said, yeah, sure.

And then he threw me in with a couple of other programmers who just finished a Sega Saturn Port of Croc. Like we’d all suffered through the same pain of porting Croc to a new platform.

Adam: That project became the game Red Dog, the Dreamcast project that gave Matt his first real shot at building a game engine. And before long he was all in. Here’s what made the Dreamcast stand out. Its graphics pipeline. Didn’t use a traditional full screen frame buffer. Most systems, they render each frame of a game into this offscreen frame buffer.

It’s like a big block of memory that stores the color values for every pixel. And then once that frame is complete, the buffer is swapped onto the display where you can see it.

When Hardware Lies To You

Matt: Which you know, is, has for every pixel. It has what color that pixel is stored in memory. And then a typical graphics accelerator draws triangles into that by being given a triangle one, giving a triangle one at a time. And as it gets that triangle, it goes, okay, well here’s the extent of that triangle.

I’m just gonna fill in those pixels now. Okay, next triangle. And there may be cues and there may be, you know, other bits and pieces, but broadly that’s what’s happening. Every time a triangle is drawn, the frame buffer has been updated. And that frame buffer may be the, the shadow, the, the, the back buffer.

And then you may switch it around to sort of suddenly go, haha, here’s all the things I’ve just drawn. But that’s it. The, the PowerVR chip that lives inside the, uh, Dreamcast was different. Every time you gave it a triangle, it goes, that’s lovely, thank you for this. I will note that down. Uh, and you’re like, well, did you draw it?

It goes, Nope, but I remember to draw it later.

And then at the end, provided you didn’t run the graphics unit out of its memory for storing all these extra triangles you’re storing and the link list and everything, and it would look through the list of triangles that happen to overlap that 16 by 16 grid, and it would only draw them into a tiny 16 by 16 buffer, which could be in like chip cache, right?

Rather than being on the main video memory of the system. It was in a, and it could be done in floating point, because why the heck not, I’m gonna store 32 bits per pixel, or 16 bits per pixel in floating point. I’m gonna render all of them. And when I finished, I’ve got the perfect 16 by 16 tile that needs to go in the top left hand corner of the frame buffer.

And now, and only now do I dither it down to the horrible 16 bit that is all we can afford to store our screen in. And so then I send it out to screen and I start working on the next one. And so it was this deferred rendering thing. It was fascinating.

Adam: In other words, Red Dog. The game looked cleaner because of the engine, but also really because of the Dreamcast PowerVR chip, which didn’t render like other consoles. It processed each 16 by 16 tile in full detail, and then it could cull out all the hidden surfaces before doing the shading.

And then the dithering was only done when they wrote those final pixels. So by designing the engine around this tile based approach, Matt unlocked sharper colors and smoother gradients, and a quality that could rival the PlayStation two or even the Xbox.

But once that part was working, weird bugs started to appear and the development workflow on the Dreamcast started to become very top of mind.

Colors In The Dark

Matt: When we develop video games for a console, you get a very special version of the console. It may look like a just a tower PC that just happens to have all the components of that console inside of it in some way. It might be look like a retail version of the console with a different color, uh, or, or slightly taller or things like that.

And so the Dreamcast was like a mini tower PC and it connected back to the host through SCSI. It looked like a SCSI device as far as the PC was concerned.

And it actually, there was some clever stuff going on too. Communicate both for the debugging of it, but also, so the PC could pretend to be a CD-ROM drive to the Dreamcast so that you could boot your game off of the CD-ROM drive.

Adam: As they got closer to shipping the game, the team started burning real discs.

These were called GD-ROMs, but they were basically CD-ROMs. And these let them see how the game would actually run on real production Dreamcast hardware.

Matt: So you put it in the machine, you burn the disc, and now you can take that disc and for the first time you can put it into a nearly retail Dreamcast. This Dreamcast has had its copy protection taken out, so it will boot anything you put into it. So you put it into the disc drive, you close the lid, you turn it on and it spins up and nothing happens.

And you’re like, mm. But, but I, I have a ga the game works on the emulator. It works fine, it works perfectly well on the emulator over here where I’m emulating this, sorry, this is the CD emulator on a real Dreamcast reading this, uh, CD that’s actually your, your PC, an ISO image you’ve built on your PC.

Why does it not work on a real Dreamcast? What the heck is going on here? And so how do you debug that? You know, normally you’d bring out your debugger, you put print Fs in or whatever, but it’s a retail Dreamcast or near enough a retail Dreamcast. And you need to know where it crashed. So the solution I came up with was, so there is a single hardware register inside the Dreamcast that you could just poke to at any time.

You know, in C star, volatile Charles star blah, or D word star blah equals some number. And that controlled the TV output color when there was no picture. So there is no picture when you haven’t configured the screen. There’s no picture around the border, the top and the sides of the picture.

But if you look at old, uh, video game hardware, you know, the six 40 by four 80 screen was actually sort of in the middle of the CRT screen and the bit around the outside was the border, right? And that was out of spec, really. And it was meant to be black and there was color. There’s some signal processing stuff, but you could set the color to be whatever you like there.

Taking an aside to this, even the story for the longest time profiling on these machines, right? How do you profile your code? You wanna have a really low overhead. How long did this part of the code take? So we would use that register while we, the game was running, we would say, okay, I’m about to do the, uh, transformation.

Then you would do the transformation code and you would set the, the border color to be red. And then you would do the transformation code they say about to the AI, you’d set it to green, you’d do that code. And then because the beam of the CRT is tracing out the whole time and that right into that register takes instantaneous effect, that little border color around the outside, you’re using it as a time measurement, right?

How many scan lines of a 60 hertz TV screen did it take before the next thing happened? And you know as well that it’s very visually representative because as you get closer to the bottom of the screen, you’re getting closer to dropping a frame and being slower than a frame rate, a frame refresh, right?

So it’s a very visceral, visible thing. And so you would talk, any, any developer of like my vintage, you worked in the game industry would refer to things in scan lines. That was your unit of time currency was, oh, I managed to shave a scan line off of that routine or whatever. You know, that was what you would do.

Your Unit Of Time Is Scanlines

Matt: And it would be one, I don’t know, 500 and of a 60th of a second type of thing. It’s awesome ‘cause it’s, it’s so visual, right? Like it really gives you an and it’s super low overhead. ‘cause you’re just changing. It’s writing a single value somewhere. It’s brilliant. Anyway, so we knew that this existed and we’d used it for this purpose, and it was very valuable for, for that. But that’s what we ended up using to debug this issue. So I peppered the code with different colored, the initialization code with different colors, and then we burnt one of these discs and now there’s a one x speed and they’re like 850 meg discs because we packed it through full of the rafters.

So you’re sat on tenter hooks waiting for the stupid thing to, to, to burn. And the technology of the time was incredibly sensitive to movement. So if you banged into it, it would go oh fault. The, the lens would lose track of where it was writing. And our, admittedly by this time we were in our nice headquarters in North London, there was like an area that we kind of marked out like, do not walk around here because the suspended floor is bouncy enough that it will knock the machine around and we’ll maybe lose a, uh, a, a burn.

So, so this was sort of behind my desk and I was like, frantically changing the code, building a new image, burning the disc, putting it in the machine. It’s yellow. Okay, now I kind of had a chart. What does yellow mean? Yellow means it got to this line of code, but clearly didn’t get to the thing that changes it to purple.

That’s afterwards. Okay. So now we put, we do a new build with the colors reset and wherever that yellow was, we start with red and we move through the spectrum and you kind of, I was gonna say binary search, but you know, however many colors were distinguishable search. Through, and eventually we found it and it was the absolute classic of C programs of the era, although there was a little bit of C++ in there as well.

But from an empty, from a cold boot, like literally turning the machine off, putting the, the disc in and turning it on. Memory is just random numbers, right? Um, you hit this issue and I, I don’t even remember exactly which line it was. I’ve even, I have got the source code, it, we have open source, the, the source code to this and I can’t find where it was, uh, for the life of me.

But it was a relatively like single line change of like, oh yeah, equals zero on the end of a line or something like that. And then that fixed everything and you’re like, crikey, you know?

Adam: Do you, do you remember what color it was?

Matt: I don’t remember what color it was. No, no, unfortunately not. No, I, I don’t, I don’t remember which colors we used or how many I could actually distinguish, you know, uh, unambiguously.

The Inside-Out Crocodile

Matt: So, and the irony was, um, that a very similar bug had happened on the Croc Saturn version that I alluded to before where a hardware register, so this wasn’t uninitialized memory in the CPU’s like domain. This was one of the many registers that was inside the GPU. The, the graphics unit was not being set and exactly the same thing.

If you booted it through the normal screen, all that a hardware register has been set up. But if you just booted from cold, it wasn’t set correctly. And the, the worry about this or the, the, the, the, the worst part of this is this got to the retail, right? They didn’t notice this until they’d already shipped them to stores.

And then it was only on cold boots with the ones that had come from the factory that the issue. And it, the thing is almost worse. Again, it’s not that it didn’t work, it’s the fact that actually some of the graphics were inside out. Which if you have a cute crocodile with a, a tongue and a tooth, you know, inside his head the wrong way round, that’s kind of disturbing A whole generation of children were, were, were terrified of this sort of garish looking, uh, inside out crocodile head, which, you know, whoops.

They, they put it, they actually, for the first set of things, they put it, printed a little slip in saying, you know, Hey, to play Croc, turn it on with the lid open first so that it goes into the, Hey, insert a disc thing, which is enough to initialize the registers. Then put the disc in, close the lid.

so the same thing twice

Weekend Miracles And Suspended Floors

Adam: Back then making a console game felt like chasing a moving target. That moving target being PCs, because PC hardware was evolving at warp speed. There was new 3D cards, there was new drivers, there was new buzzwords every few months, Quake II, and the Unreal Engine dropped.

And all of a sudden, overnight, everything else looked kind of dated and consoles couldn’t upgrade. You got one shot with the hardware you shipped on and that gap was starting to show. Especially for the Red Dog team.

Matt: So they said, look, I don’t think this game’s up to scratch. This is Sega who are publishing it for us. We need, we need dynamic lighting. And this was on like a Thursday, or, you know, I think we’re gonna have to reconsider whether this is okay to go forward or not.

And I didn’t get any sleep that weekend, but by Monday morning we had a full lighting system.

It wasn’t too bad to actually put in, I think we were kind of relenting on it ‘cause we’re like, well, we don’t have the, the time budget to spend doing extra lighting, so we wanna put more explosions and things in rather than light the, the scene all the time.

But, that was a fun Yeah. Time of being under the gun. But yeah, it, it came out, it was, it was reasonably well received. And you know, it’s, there’s something nice about going into a store and seeing something that you made, you know, obviously I’d seen Croc Croc had done really, really well.

But that wasn’t my game. It was a game I worked on. Whereas Red Dog, I was there from the beginning. You know, I felt like I was part of the team that came up with the ideas. We argued about what the arc should look like. I designed the engine, I designed all of the, uh, the tooling for it. I wrote the build system.

You know, there was a load of stuff around around it that felt like, oh yeah, this is, this is mine. And it worked out. So very fortunate. And again, that was just because one guy, happened to mention it loud enough and I, and I was, I put my hand up and said, I can do that. And he believed me, so I endlessly thankful to, to him.

Adam: Their next project after Red Dog was something fun.

Matt: We prototyped a sort of, um, real-time strategy-ish, team-based combat game using the Red Dog engine on the Dreamcast, which was wonderful ‘cause we could throw away everything and use all the stuff that we’d learned. And I was able to get something running at 60 frames a second at some beautiful thing, rather than the 30 frames that that Red Dog was.

So, it was really sweet and it was lovely and whatever, but you know, the Dreamcast was probably dead on arrival when it, when it even, uh, shipped. You know, it was, it, it was not well received commercially overall. So unfortunately it was never to be on the Dreamcast. So anyway, we showed the, the company, and I’m forgetting who it was now, um, and they were like, yeah, that looks good, but we’ve got this SWAT license.

When The Console Dies

Matt: Uh, so we were lucky. We got one of the swats, you know, special weapons and tactics US police.

Can we kind of like put them together? And we, we’ve never done a SWAT game on a console before. So we pitched some ideas, we came and that became SWAT: Global Strike Team. And it was originally an Xbox exclusive.

Adam: Instead of reusing the Red Dog code base, the team built a new Xbox engine with one big focus.

Matt: Uh, so, we want really beautiful, sharp shadows and we want lights that are very dynamic so you can shoot out any light. ‘cause this is a, it’s a game, you know, like you want to be going stealthy, so maybe you wanna take out all the lights and then you go in with your night vision goggles and stuff.

So ideas we had for the engine would actually become gameplay elements later on. And so we had this really cool and convoluted system for doing our lighting, which I still think is fab. I dunno that anything else has ever done it, but it was definitely not the way the rest of the industry went. And so what we did is, for every light in the scene, we effectively shot out rays and worked out what it could see, right?

What would be the geometry that it could see. And then we stored a separate mesh of the weird spidery mass of where the light would land on all of the surfaces. So instead of storing it as a texture, we stored it as actual geometry. So you would, uh, have this, uh, this area of lighting. Now that was hugely computationally expensive at the time.

You’re taking scene geometry that’s got thousands and thousands of polygons. You’re taking every light and you’re saying, what, what can, what geometry can this light reach and then cut out the shadows effectively where the light can’t be reached? Store that as well. But it meant that we could store each light individually.

And so we had this, it was, it was great.

Adam: Turning lighting into geometry is challenging, but it could be done with an understanding of the Xbox hardware. It totally paid off. They ended up getting crisp shadows, shootable lightbulbs, and scenes that reacted to the players in real time. The line between engine design and game design was blurring together.

Matt: Anyway, this was fab. We were having a whale of a time. We learned so much about how big, big, big C++ projects end up looking because we were writing one, but then a spanner came or a wrench, uh, depending on your side of the Atlantic in, in the works and said, well, this Xbox exclusive title, the Xbox, eh, not doing so well.

It’s looking like a Dreamcast all over again. We should probably also do this for PlayStation 2. And that was when the sort of the bottom fell out of our world, because we’re like, but all of these tricks that we’re doing aren’t really PlayStation 2.

Lying To The Hardware

Adam: The PlayStation 2 was powerful, but it was super low level. Developers had to juggle these DMA engines and vector units manually with little support for the type of like per-pixel effects that the Xbox could do. The Xbox GPU could handle all the fancy lighting out of the box. It had these shaders, but the PS 2, you could push pixels fast, but it lacked the shaders.

Matt: But we were able to get the same lighting system in, which is like a really difficult thing. We were able to come up with a way of remapping the 3D texture into 2D dynamically, uh, so that we could get the same kind of light fall-off. And then the blending, oh, so the blending was a real trick. Uh, the, we essentially lie to the hardware and say, Hey, you see this screen over here?

You think it’s the screen, right? We’ve just drawn all these lights to it, but actually it’s not a 32 bit per pixel screen like you thought it was. We’re just gonna tell you that it’s an eight bit per pixel, um, alpha image. Right? So you’re basically telling it, it’s the wrong format now because of some really strange hardware characteristics.

And for convenience of being able to, when it’s in 32 bit mode, write out red, green and blue to separate RAM chips in the system so that that blending was fast, so that it could effectively be reading and writing, uh, red and green and blue in. They’re on literal, physical, different chips. When you changed it from one mode to the other, you suddenly be became exposed to that.

So if you could somehow get a microscope and peer into the memory of that 32 bit color image, you would see that what you would see is blocks of all the red pixels grouped together, then all of the blue pixels and all of the green pixels, but in weird banding and alternating again, because this was really a hardware optimization that was meant to be hidden from you.

But you’d lied to it and said it’s an eight bit. But what that meant was you could pluck out the red pixels by drawing thousands and thousands and thousands of tiny triangles that drew out the regions where you knew the red pixels were gonna be. And that gave you an alpha image of the red color, and then you could just draw using that as the source texture.

So this is where, this is where I’m reading from. You draw a big red triangle over the main screen with this texture on it, and that effectively gives you that, that multiply. But just for the red component. Then you do it again for the green, and then you do it again for the blue. And each of these is like thousands. But again, luckily there were these funny little DSPs. I mean, you could write a program that generates these millions of stupid little pointless triangles just to draw again, without a photograph or an image of me pointing it. It’s difficult to convey. So I hope that your listeners are, I’ve got a decent visual understanding of what I mean,

Adam: Basically they’re hacking the frame buffer so that they could pull out this red and green and blue as separate layers and rebuild the lighting step by step.

It is a classic matte move. Ignore the rules. See how things actually work under the covers, and talk straight to the hardware.

Matt: But like in principle, it was a massive trick and it was one of those like, oh, that’s clever.

And it wasn’t my trick, it wasn’t Nick’s trick. We found it. It was one of the things that actually, I think it was Mike Abrash, uh, who was working, I think for Sony at the time.

Adam: And that’s why Matt is proud of this. There was a business constraint and it forced them to do something uncomfortable.

The Dreamcast prototype became this SWAT project on the Xbox where they built this new engine, but then they had to stretch it and find ways to do it on the PlayStation two through deep hardware specific hack. And the payoff wasn’t just getting these ports working, although that was a big thing.

It was also this lesson about how knowing about the next layer below you gives you options when the rules change.

The Network Card That Knew Too Much

Adam: After games, Matt carried the same instinct for pulling back the curtain into a new world, high-speed finance. The problems looked different, but the pattern was the same. When something felt off, he couldn’t stop until he understood what was happening.

Why was this abstraction breaking down in game engines? That might mean poking hardware registers and debugging with colors, but in trading systems, it would be more like chasing down timing bugs, asking what the machine was really doing underneath all of the abstractions. Here’s a real example: a weird issue started cropping up at market open. An expensive high performing network card kept dropping packets. It wasn’t the most urgent problem, but something about it had that same feel as debugging the Dreamcast.

Something about it implied that the operating system abstraction wasn’t quite working the way it said it should, and so it grabbed Matt’s attention and he started digging in. The first thing he noticed was that this network card needed a huge chunk of memory ready to go.

Matt: And it had a flag that let us, uh, pre-fault in all of the memory that it was gonna use so that it was never, um, gonna be demand paged. So normally when you allocate a big slab of memory, uh, in, uh, in Linux, uh, Linux goes yes, and it hands you back an area of memory that has no memory in it. And then as you start reading and writing to it, it actually starts finding the memory and giving it to you.

So that means that if when you allocate, you know, 20 gig, it’ll go Yep. And it’s like, but I haven’t got 20 gigs. It’s like, yeah, hopefully you won’t look at all 20 of those gigabytes, because if you do, we’re in trouble. But if you don’t, no harm, no foul. Right. You know? Uh, so, but that’s really important because that process of faulting in those.

Those pages takes time. And it’s also magic, invisible time. You know, as far as you are concerned, you just read some bites from the network, but actually, or you wrote some bites in this instance, but actually in this instance that caused the operating system to go, whoops, I lied to you. There’s no memory there.

I’m gonna come in right now. You know, nobody expects the operating system. It comes in like the Spanish Inquisition and starts like taking up loads of your time. And you’re like, I’m in the middle of doing something really time critical and now you are allocating memory for me. This is bonkers.

Adam: The vendor saw this problem coming. That’s why they had added this flag. Their idea was to avoid page faults by reading all the memory upfront. They were basically poking through this virtual memory abstraction to force pre allocate everything. But even with this workaround, things were still breaking and the way it was breaking is what caught Matt’s attention because it just didn’t make sense.

Matt: Why the hell, you know, this network thread is the one that’s actually having the problem and it should be doing nothing other than looking at these bits of memory. Very long story. But we discovered a tool called SystemTap, which allows us to like really hook in the lowest level of the operating system and go, why are you doing this?

What’s going on? And we discovered that the, the, the network process when it was under heavy load was trying to take out a lock and we’re like. What are you doing? Network code. You’re literally like zero-copy network code with zero lock, you know, lock-free code and everything. Where the hell is this lock coming from?

Adam: Skipping over some steps, here’s what Matt found: the flag for pre allocating memory tried to trigger page faults by reading a byte from each block in that 20 gig chunk.

Matt: Okay? Seems reasonable. That definitely causes it to become faulted in, except that it had been written in C and when you read from memory, there’s no side effect for reading from memory. And so a compiler, an optimizing compiler will go, well, you just read that and did nothing with it. We optimized it away, but it worked for years because it had, you know, compilers had been crap.

Luckily, there was this website that I could use where I could submit a bug report and the patch to say like, think you’ll find that you are, you are not actually faulting in the memory after all, and there’s a much better way to do it anyway. But, you know, but we dug and we dug and we dug and we found that issue.

And I learnt what SystemTap was, and that has been a, you know, regular part of my armory for this kind of like tool going forward. And we solved a problem, which was not just in our code, it was in, I mean, we solved our actual business goal, which was like, don’t drop packets during the open. And we made the open source software better.

So, you know, everyone was a winner in this and it took a long time to get here. And, you know, there’s that, there’s that famous XKCD, well there are many famous XKCD, right? But you know, there’s an XKCD that talks about automating your tools and you know, like if it takes you one minute and it takes you two minutes to build a tool, it’s not worth it.

Those kinds of things, right? That makes you, and I fundamentally disagree unfortunately, with that particular thing because it does not take into account the lessons you learn along the way. And so while you could sort of say to me, you know, you know, if this, this, this issue wasn’t quite as, you know, business limiting as it was, it was actually causing us trouble, but it was just one of those niggling problems.

Like, why is that? And I’d left there. Okay. You could have said like, it’s equivocal whether it was actually better or not. You sure? But I learnt system tap and I learned how kernel bypass worked and I learned how the, the operating system does this and I learned a compiler thing. You know, all of these things are more important than actually solving the problem.

And so I think the journey is worth it. Even if the, the thing you get to is not, you don’t even, maybe you don’t even get that, right. It’s just you’re learning all the way.

Godbolt’s Rule

Matt: Maybe you don’t know how the operating system itself exactly works, but know that page tables exist because that may be some reason the why your thing doesn’t work.

You don’t, again, it’s gonna be like somewhere in your head, um, that there’s a place, there’s a starting point for me to look at, to know that these things exist. Not being familiar with it is fine, but just knowing that it’s there. And similarly, like if you’re up in the web world, like what layers are between the webpage that I’m writing in this format and the actual HTML that goes to a browser, how does a browser actually understand the HTML?

Right? There’s a lot of subtleties in there. Why does the browser make these stupid decisions sometimes? Well, if you thought about how a browser has to work, well maybe you should think about that. Or, you know, why doesn’t my thing work this way? You know, like when I had a friend who’d always say like, oh, this can’t be that hard.

How hard can this be? And it would be like package management or build system. And then I’d say, go give it a go. And he comes back and says, it was really, really hard. I’m glad we used something else. I’m like, yes. You know? But those kinds of experiences I think are part of what makes you understand like where the boundaries are.

And all you need to do is be able to look, see beyond that boundary, and then be aware of the boundary beyond that.

So like, we all work on a convenient level of abstraction, right? The, the, like the, the floor will not fall out underneath me.

I don’t have to understand exactly why, but I, I take it as red that I’m not gonna fall down, uh, anytime soon. And when you’re working in software, it’s so convenient to be able to say like, I’m just gonna call this function and it does the thing that it says it’s gonna do and nothing more or nothing less.

And I don’t have to look at it anymore. I don’t have to understand any more than like string copy or printf. How does printf work? I’ve never stopped to think, but it doesn’t matter because it works, right? So you live at that abstraction and it gives you so much power because it frees up brain space to do loads of other things that are important.

But what I would like to, this is my thesis now, is that like, while it’s, you should have a layer of abstraction you are, you are familiar with and you’re comfortable with, you should also have a decent understanding of the layer beneath. If you’re a C programmer, you should probably understand how the C runtime works to some level.

And you should be understand how the operating system interaction works. You don’t have to know exactly, but like have some knowledge because one day printf will not work. And then you’ll be like, what? And if it’s a complete mystery to you, you won’t know how to take the first level off and look at it.

So I think you should know one level well have a working knowledge of the level beneath you. And then fundamentally, this is like the, probably the, the, the strongest thing is be aware of the shape of the layer beneath that.

Write This Down

Adam: That was the show. That story is a good reminder that real progress happens when you stay curious, ‘cause if you’re curious, you can dig into tough problems and you won’t be afraid to relearn when you get things wrong.

Because if you really understand the layer you work in, and if you understand what’s below, you can turn these limits into skills. Let’s call that Matt Godbolt’s rule. You should know your layer well, but you should also know one layer below it a little bit, and you definitely need to know the shape of the layer that’s beneath that.

So yeah, write that down. That’s Godbolt’s rule, and probably also something to be said for paying attention to what’s exciting to you, right? For Matt, it’s this digging into the layers below. For you, maybe it’s something different.

Thank you to everybody who’s been sending me emails lately. I do very much appreciate that.

Emails, LinkedIn messages, Twitter messages, or Blue Sky. I appreciate it all.

Sometimes I’m super motivated, and sometimes I’m not motivated at all, and definitely somebody telling me that they appreciate what I do raises the bar so seriously. Thank you, and thank you so much to my supporters. You can go to corecursive.com/supporters if you wanna join them. And thank you to Matt.

And until next time, thank you so much for listening.

Support CoRecursive

Hello,
I make CoRecursive because I love it when someone shares the details behind some project, some bug, or some incident with me.

No other podcast was telling stories quite like I wanted to hear.

Right now this is all done by just me and I love doing it, but it's also exhausting.

Recommending the show to others and contributing to this patreon are the biggest things you can do to help out.

Whatever you can do to help, I truly appreciate it!

Thanks! Adam Gordon Bell

Audio Player
00:00
00:00
00:44:13

Godbolt's Rule