Intro
Adam: Welcome to CoRecursive. I’m Adam Gordon Bell and this is episode 100. 100 episodes, knowing how long it takes me to make each one, getting to 100 is almost unbelievable. And today, something special. I’m going to share with you a story. It’s a fictional story about debugging code, and I don’t want to spoil it, but when I first read it, it blew my mind.
And I knew I wanted to somehow include it on the podcast, but I was never quite sure how. Well, today I figured it out. That’s what we’re going to do. It’s a story from Lawrence Kesteloot, a story about a team of software developers who are forced to challenge their understanding of technology and themselves.
After the story, frequent guests, Don and Krystal are back with some hundredth episode reflections, but first, join Patrick, Dave, and I as we try to debug some code in this fantastic short story.
The Metal Network Switch
Narrator: After a minute, Patrick returned with an old metal case network switch. The room fell silent as he plugged it in. Our project and so much was hanging in the balance. I stopped breathing as Patrick struggled to get the plug lined up with the port.
I stared at the front panel lights and felt Dave doing the same.
My eyes watered. Patrick plugged it in. The front lights immediately lit up and flashed actively.
That was not good. Heat rushed into my hands and face and Dave, about to say something, suddenly lurched for the nearest garbage pail and vomited.
Day One
Narrator: Three weeks earlier, everything was different. We were past the honeymoon phase of our project, that’s when you move from exciting ideas to the hard realities of making them work.
The simple elements from our design phase were proving to be unexpectedly complex. And not because they were interesting or challenging, but just due to minor unforeseen issues. Patrick and Dave had done the design, of course, and it humored me as I made various suggestions, which in retrospect must have been obvious to them.
It had been fascinating to hear them debate design issues. So many of their arguments were based on intuition rather than hard logic. I learned more each week than my entire last semester of school. Before I started here and things got really chaotic, I was mainly comfortable with Python and JavaScript.
I knew them inside and out from computer science classes, but in this new job, I was thrown into the world of C and low-level programming. And honestly, it was a bit intimidating at first, but as I kept at it, things started to click and I started to understand the debates. I found myself siding more with Dave. Dave had a great laugh. He was a bit on the heavy side, but he dressed very well. I liked the insights he had about management and the process of software development. And he liked to teach me things.
Patrick and Dave had split up the work they had designed, and they had given me tests to write. It had been a quiet day when I heard Dave whisper.
Dave: What the hell?
Narrator: I sensed an opportunity to learn. So I walked over to his desk. He didn’t notice. He was quickly switching back to his editor, adding a line of debug output, compiling, running, shaking his head, and then switching back again.
Adam: What’s the matter?
Narrator: He waited until the program finished running to answer.
Dave: I can’t figure out this bug. I can’t figure out where this number is coming from here.
Narrator: He pointed to a line in his debug output. Dave and Patrick like to debug with print lines rather than stepping through in a debugger as I like to do.
Adam: What happens when you step through in a debugger?
Narrator: This was a tease. He knew a lot more than I did, but I was always poking him about his old-fashioned workflows. Dave hesitated. Then, to my surprise, he opened up the debugger.
I showed him how to set breakpoints, and together, we dove into the tangle of his code. For the next few hours, we traced this mysterious value back through his code base.
Dave: Patrick, can you help us here?
Narrator: This surprised me more than the debugger. Dave rarely asked Patrick for help. Patrick came over and Dave explained the situation.
Patrick: The program is crashing. And that’s because of this bad dereference here. But the value is correct here.
Narrator: Dave pointed out some code while Patrick scanned the screen. Then a thought occurred to me. Dave was still explaining and I didn’t want to interrupt, but my heart was jumping as I waited for Dave to take a breath.
Adam: I think it’s a compiler bug.
Patrick: Blaming the compiler is a last resort. Same with the standard library. Chances are vastly greater that your brand new code has a bug, rather than code being used by thousands of people.
Narrator: I nodded wisely, but I felt my cheeks blush. Patrick had a way of making me feel inexperienced, constantly, though I doubt it was intentional. Once when I proudly solved a minor bug, his response was a simple nod, and honestly that felt good. His direct style often left me feeling eager to prove myself to him. Patrick was cracking his knuckles. He asked a question about threads, and volatile, and aliasing, questions that I didn’t entirely understand, but I wish I thought of them. Dave considered each and said that none of Patrick’s concerns applied.
Patrick pursed his lips to the right, the way he always did when he wasn’t quite sure what he was gonna say.
Patrick: Hmm, I don’t know. Weird.
Narrator: And then Patrick returned to his desk. I was relieved that Patrick hadn’t crushed us with an obvious solution, and I guess that Dave felt the same as he sat back up from his slouch and started typing at his machine.
Dave: Is there any way to show assembly right here where things go pear-shaped?
Narrator: I showed him how to do this and wondered if he was giving my theory a chance. He typed the commands and the lines of C were split apart by dozens of lines of assembly.
Dave: That’s not right. We got the command wrong.
Adam: I’m sure that’s right, I know the debugger well.
Dave: No, it’s not. That’s far too much assembly for this line of code. And some of these aren’t even legal assembly instructions.
Patrick: Not possible.
Narrator: Patrick was coming back over from his desk, and Dave was blinking slowly at his screen.
Dave: Well, I mean that I’ve never seen these. They’re not what a compiler outputs. This isn’t the right place.
Narrator: Dave was flustered, and I felt even more inexperienced than before we started.
Dave: I’m, I’m burned out. We’ll look at this again tomorrow.
Narrator: He left, and I sat down at the computer. The only thing I had contributed all day was my knowledge of the debugger, and Dave even thought I got that wrong. I looked at the assembly on the screen. Some of the instructions were obvious, and some were cryptic. I googled around for an assembly reference and started tracing the instructions, keeping notes about register content in my notebook as I went.
Dave was right. These instructions made no sense. Not only did they perform too much work for the corresponding C code, but they didn’t even make sense internally, to themselves.
But I convinced myself that this was at least the right section, because a few of the instructions matched the surrounding lines of C code. I focused on one instruction in particular.
It was subtracting a register from itself. That’s not necessarily odd. It might be an efficient way to set a register to 0. But then this register was used in other instructions. The compiler must have known it had been 0. It should optimize it out. I hesitated and then I called Patrick over. I explained what I had found.
Patrick: That’s not a normal subtraction instruction.
Narrator: He was right. I looked it up more carefully and found that it was a variation that also subtracted out the carry bit. This was a way to get the carry bit into a register. I worked my way backwards to see where the carry was set, the code got increasingly convoluted, and I repeatedly made incorrect assumptions that threw me off course. I turned around to ask Patrick a question and found an empty desk. Somehow I’d lost track of time, and it was nearly midnight. I set the office alarm, and rode my bike home.
Day Two
Narrator: I got in late the next morning. I hadn’t slept well, and it had taken me hours to fall asleep. Each time I closed my eyes, I saw assembly instructions in large, bright letters. At work I walked straight to Dave’s desk to discuss the previous night’s investigations, but he wasn’t interested.
Dave: You were right. It was a compiler bug. I tweaked the C code and I’m no longer triggering it. That weird assembly is gone.
Narrator: I was in the awkward position of having to contradict his compliment.
Adam: I don’t know if it’s a bug. The code I saw wouldn’t have been generated by a bug.
Dave: It was definitely a compiler bug.
Narrator: I had worked with Dave enough to know that the more confidently he spoke, the less secure he was. He was tired of being held up by this problem, and so he just wanted to drop it.
But later, Dave walked up to my desk with a grin and a cup of his favorite artisanal coffee. I thought all coffees tasted fine, but he was super picky. So I always pretended to be picky as well, so he would invite me along to his artisanal coffee place. I actually felt a bit stung that he had gone without me.
Dave: You remember that compiler bug from Tuesday?
Adam: Yeah,
Dave: Happened again this morning. Same file, same problem. That weird assembly is back. The strange thing is, I didn’t change the
Adam: maybe you changed the header file
Dave: no. Let me check git.
Narrator: He came back two minutes later. No relevant files had changed. Patrick walked over.
Patrick: This is no good. Here it’s a crash, but it could be something more subtle. In four months, our system will be so complex that a subtle problem will take two weeks to track down. Can you look at the compiler’s release notes and see if we can upgrade?
Narrator: I looked briefly and found what I expected. We were on the latest and most stable version. I went further and searched for some permutation of weird assembly or compiler bug and found nothing in the release notes.
I compiled Dave’s code on my computer, and then disassembled it. The strange code was there, at the same place it had been on Tuesday. I recognize the subtract with carry instructions, and a few others from my previous investigation. I also looked through the rest of the code. It was striking how the unusual instruction didn’t appear anywhere else.
I had an idea. I thought maybe I could figure this out by looking at the source of the compiler. I downloaded it and started poking through it. It was a tangled mess of compiler passes and plug in frameworks and embedded languages. I had never been so overwhelmed. I went straight to the files that described the translation from the abstract syntax tree to machine assembly.
I did a grep for that subtract with carry. And it wasn’t there. I looked for a few of the other odd instructions. Most of them weren’t there either.
Deli Sandwiches
Narrator: Pretty soon it was lunchtime. The small office we used had a patio and Friday we DoorDash, some deli sandwiches. I preferred the nearby subway, but Dave hated the bread while eating. I mentioned what I found.
Patrick: That’s super weird. Which other ones weren’t in the translation file?
Adam: I don’t remember a few more than involved the carry bit, some vector instructions.
Patrick: But that compiler doesn’t do vectorization.
Either the instruction was generated by mistake or you’re disassembling non code.
Dave: Oh, it’s definitely code. This is what’s causing our bug. It’s being executed.
Adam: And it’s not generated by mistake. It’s a math instruction, but I don’t think it’s being used for math. Neat.
Narrator: We had finally hooked Patrick. After lunch, he and I sat down at my desk and walked through the parts of the code I knew best. I showed him how though, although the instructions were obscure and strangely used, they actually made sense altogether.
There was a deliberate flow of data.
Patrick: look for a backward jump instruction.
Dave: Why?
Patrick: target of such a jump may be the top of a loop. It’s a good place to start the analysis.
Adam: You’re actually going to figure out what this code is doing
Patrick: Damn right I am.
Narrator: It took the rest of the afternoon to pick through the convoluted jump targets and instructions. That snippet, it turns out, was finding the sign of an integer. That’s it. Anyone else would have done a simple comparison, but the four instructions used were a mess that either set the carry bit as a side effect or used it in an unorthodox way.
Patrick: you know, this isn’t even the interesting part. I want to know how this code gets in here. You said these instructions weren’t even in the translation file?
Adam: That’s right.
Patrick: They must be elsewhere. Let’s grep the source for both the symbolic version and the opcode.
Narrator: I did a recursive grep and it came up empty. I didn’t know what to try next.
Patrick: Try the binary.
Dave: What binary?
Patrick: The binary of the compiler.
Narrator: I would have never thought of that. Actually, before I started this job a year or so ago, I never thought much about assembly or what compilers did. I certainly would have never downloaded the source of a compiler or inspected the internals of a binary.
That was the cool thing about Dave and Patrick. They weren’t afraid to dive into the details, and I was learning a lot. They sometimes seemed almost compelled to understand what was happening underneath everything, and that approach was rubbing off on me. I grabbed for the name of the assembly instructions and found nothing, but looking for the opcode, I turned up hundreds of hits.
Adam: Interesting. It’s not generating individual instructions. It’s dumping chunks of prebuilt code
Patrick: Why do you say that?
Adam: because these OP codes are all together. It’s not a lookup table of C two assembly.
Patrick: Maybe we need to take apart the assembly.
Narrator: this would mean disassembling the binary. They had had me do this before. I had to reverse engineer a vendor’s code and patch some problems that they weren’t fixing. So I disassembled the whole compiler binary, and I looked for suspicious instructions. And I did, I found some large sections that looked like the obscure code that we had been trying to trace down.
Adam: The compiler’s infected! No wonder we couldn’t find it in its source code.
Patrick: Okay, let’s start by recompiling the compiler from source. I’ll poke around the web to see if anyone’s seen this before. When you’re done with the compiler, recompile all your own code and see if Dave’s bug is gone.
Narrator: The compiler took two hours to recompile, not including the time I spent learning the convoluted build process. Meanwhile, Patrick found nothing online. I rebuilt our source tree and ran the tests. They failed. In the same place.
Dave: Maybe the bug was due to something else, after all?
Patrick: Disassemble the compiler again.
Adam: I did, and the foreign codes were still there. this is a fresh build of the official sources.
Dave: They must have gotten infected. Perhaps someone hacked the download site and replaced them with modified sources?
Adam: Yeah, but I never found these opcodes in the compiler source.
Dave: But if it’s a hack, it would try to disguise itself,
Narrator: Dave had a point, but I thought I could track down where this code was coming from. I found where the compiler emitted the instructions, and I set a conditional breakpoint. That breakpoint should only hit when the obscure opcode is omitted.
Week Two
Narrator: The following Monday, I started compiling Dave’s code, using my compiled from source compiler with the debugger in place. And I got a hit. I worked backwards to the code that had filled it, and it was all straightforward loops based on the translation table. It was all clean. This didn’t make any sense.
In desperation, I started paging through every source file of the compiler looking for any code that might be responsible. Much of it was just manipulating the syntax tree. Then it occurred to me, the hack couldn’t be there. The backend translation code was clean and it had to get through that. The hack must be in the backend itself. In fact, it would have to be after register assignment. That narrowed down my search, and I spent the afternoon looking through these files before it was time for lunch again.
Monday was always pho day. We actually went to Pho World, which was so low rent it didn’t even have a menu, but it was delicious. Dave had been the one who found this place and we all followed his lead on what to order and then sat around a small plastic table with our soups.
Adam: So I couldn’t find it, either backwards from the breakpoint or forward from the code.
Dave: Are you still debugging that? Don’t we have more important work to do?
Patrick: No, we have to figure this out. We can’t build a product on a shaky foundation like this.
Adam: It’s just so odd that the C source would be clean, but the assembly have these weird opcodes that would break our project.
Dave: I thought you’d tracked it down to the compiler.
Adam: I’m talking about the compiler. I mean, the bug is emitted by the compiled compiler, but it’s not in the compiler source.
Dave: But the compiler is responsible for that as well.
Adam: What do you mean?
Dave: Well, how did you compile the compiler? It’s compiled by a version of itself.
Narrator: I didn’t understand what Dave was saying but Patrick had looked up and seemed on the verge of an insight. I waited for his explanation.
Patrick: So, the compiler detects and modifies your program for reasons still unknown, but also detects and modifies itself when compiled.
Dave: Exactly.
Patrick: How would that work? A hacker adds this code to the compiler and distributes it to everyone. The code detects that it’s compiling the compiler and adds itself back into the binary.
One revision later, the hacker removes the code from the official source. The hack then perpetuates itself forever with no trace in the source.
Narrator: I was sorted on eating my soup at this point, and I was also a bit confused.
Adam: Why? Like, what’s the purpose of that? Why even do that?
Patrick: I don’t know. We didn’t get far enough into the analysis of Dave’s code. The obvious thing would be some kind of password validation code modified to always accept some backdoor password.
Dave: Um, so let’s go back to an old version of the compiler and use that.
Patrick: You mean the binary of an old version. The source won’t help us because compiling it would infect it. I don’t think we have binaries around for old versions of the C compiler and we don’t know how far back this goes.
Adam: Okay, well, how about this? I’ll write a utility that you can run on a binary and it will tell you if it detects the suspicious use of the opcode. That opcode isn’t used very much.
Narrator: We all agreed this was a good plan. Although it did feel like we were missing something, something simple and dumb, but it didn’t take me long to write this utility. I just ran the binary through the disassembler, then did a few greps to find the instructions that I was looking for. And then I tested it out.
It found the code in Dave’s program as well as in the compiler. I set it loose on the executables on my system and gave it time to run. Then I printed out what it found and showed it to Patrick. It was three pages long, in pretty small font.
Patrick: This is not good. The Java runtime, the Python runtime, Chrome, the compiler, and a bunch of other programs that probably don’t matter.
Adam: Wait, why, why do those runtimes matter?
Patrick: Because if we can’t trust the C compiler, then we have to write a new one. But what language are you going to write it in? And are you going to trust your new compiler to the hacked Python interpreter?
Dave: . You’re not going to write a new compiler. You two have dove off the deep end. Don’t get all conspiracy paranoid. It’s probably just a compiler bug.
Narrator: I left Patrick staring at the list, and went back to my desk. I still didn’t have the answer to the question I’d asked at lunch. What was the purpose of this? I had enjoyed reverse engineering some of these bits of code, but frankly, I feared that we’d gone off track, and that Patrick was going to ask me to write a compiler and assembly.
So I put on my headphones, and I opened up the debugger to the part of the C compiler that looked obfuscated. Maybe I could figure this out.
Again, I found the overuse of instructions that involved the carry bit, unusual use of vector instructions, and convoluted and sometimes unnecessary jumps. It didn’t seem like a compiler would generate this.
This was handcrafted to be difficult to understand. I set out to figure out its purpose. Sometime later, I saw the custodian coming by to pick up my trash. I took off my headphones. Dave and Patrick were long gone. I was.
But I had pieced together a rough idea of what this did, or at least some parts of it, and I felt a rush of energy. I was close to an answer.
Day Three
Narrator: The next morning, Patrick sat next to me, and I explained it to him.
Adam: Here they used the vector instructions to get the sum of squares, which is just a convoluted way to compare these two byte arrays.
Narrator: This was the climax of my ten minute explanation.
Patrick: Wait, so what are they doing?
Adam: They’re doing pattern matching.
Patrick: Why all the convoluted crap?
Adam: Well, it’s a fuzzy search, and I guess it’s pretty fast.
Patrick: Yeah, but this is the most convoluted Rube Goldberg way of doing something I’ve ever seen.
Narrator: My shoulders sunk, and I fiddled with my mouse. I was disappointed. I thought I could impress Patrick and I had worked so hard on this and honestly, part of it was just about finally impressing him.
Adam: So what now?
Patrick: I don’t know. Let me catch up on email.
Narrator: He left for his desk. I was deflated and my muscles ached. I spent the rest of the morning browsing Twitter and reading rumors about a new Apple product. At lunch, Patrick recounted my findings to Dave and it seemed like he remembered every detail I told him about what the code did. Dave grinned and shook his head with every new complication to the algorithm.
Maybe Patrick was listening. And I realized, listening to it all, that Patrick had been right. It was, it was too convoluted of a way to do something this relatively simple.
Dave: you ever seen those obfuscated programming contests? This is just like that.
Patrick: a friend and I used to compete with each other to write obfuscated programs in college.
This is nothing like what I saw this morning.
Dave: Well, maybe other people do it differently.
Patrick: I feel strange about this code. I don’t know how to explain it. It just feels cold and odd.
Narrator: Dave and I looked at Patrick. He was dissolving wasabi into his soy sauce, and I didn’t interrupt his thought gathering.
Patrick: It’s like those chess programs. So they don’t have intuition about what will work or feelings about the board position they just try every possible option and pick the best one. And this kind of feels like that. Like someone tried every possible combination of instructions until they got code that did what they wanted it to do.
So there’s no beauty. The code is just ugly.
Dave: why would somebody do that?
Patrick: Yeah. Maybe no one did? Maybe this is all computers doing this?
Dave: This seems like it’s a little bit too complicated. Like, are we, are we sinking down a rabbit hole here?
Patrick: Well, what’s your explanation?
Dave: Um, not self aware, artificially intelligent robot overlords infecting my object marshaling code. I mean, that’s not a plausible explanation.
Patrick: Well, what’s your explanation?
Narrator: Dave exhaled and took the question seriously. It was hard to refute some of Patrick’s points. We had never seen a compiler generate this type of code,
a human would have a hard time writing this just understanding all the jumps and the self referential code. Let alone the needlessly arcane instructions and using them in strange ways, it was a, it was a lot to throw in the mix. And if anything, using those obscure instructions just drew people’s attention to the code. In fact, it was the only way I was able to track it down.
Dave: Okay, so what’s your explanation? Your full explanation?
Patrick: I don’t know, maybe some artificial intelligence program that ran amok, or you know how computer viruses evade virus scanners by modifying their own code. Maybe it started out that way, with a virus that was programmed to modify itself while retaining the same behavior, and it kept changing and evolving.
Narrator: We were all silent for a few minutes. I was trying to see if this explanation made sense. What Patrick was proposing was something like a worm that propagated through compilers.
It was like some lines of assembly that injected itself into a compiler the same way a virus injects itself and takes over a cell. And then that compiler repeats the process, and somehow Dave’s code had triggered it.
We had found part of this thing’s reproduction logic. We found its pattern matching just by dumb luck.
At first it seemed unlikely, and I wasn’t sure if Patrick and Dave were thinking about this the same way I was, but it would actually only take a single instance of a program somewhere going in this direction. And then just like the virus, just like the common cold, it would grow and it would propagate in the wild.
Once it was in compilers and runtimes on machines like ours, it would just spread and spread and spread.
Ask the Internet
Adam: Well, wait, this didn’t start in our lab. We just got pre built binaries from the official distribution. Someone must have run into this before. I can’t believe that the official compiler is inserting half broken code into binaries and we were the first to notice it.
Dave: Exactly. So we can’t just have uncovered something this big. That is so unlikely to be us. I’ll, um, I’ll post something after lunch and someone will have seen this or know the cause. I’m certain.
Narrator: An hour later, I got a message from Dave with several links to places he had posted our findings, and asked if anyone had ever seen anything like this before. One of them was a post on AskHN, where he was begging people for help, so that his coworker didn’t force him to write a compiler all in assembly.
Honestly, that made me chuckle. I spent the afternoon poking through more Mysterious Code and occasionally refreshing Dave’s posts online. On a few, like Hacker News, we were completely ignored. But on most, we were vaguely ridiculed.
I biked home, exhausted, and fell asleep on my couch.
Next Morning
Narrator: The next morning, I found Patrick looking at my printout of infected programs. I walked over to his desk and stood by him.
Adam: What are you looking for?
Patrick: A way for us to write a compiler in something other than assembly. The assembler’s not infected, but so much else is.
Dave: What about the browser? we write it in JavaScript?
Patrick: Yeah, it’s infected.
Dave: This is crazy. We can’t be writing a compiler in assembly. No way. We must be missing something here.
Patrick: It’s not that awful, really. We’ll spend a day writing useful low level routines, and after that, assembly’s not much more painful than C. It only needs to be able to compile a single program, the existing C compiler, and then we’ve cut the chain of infection.
Adam: The C compiler took two hours just to compile. It’s pretty complex.
Dave: Do we have to write a linker as well?
Patrick: Nope, that is safe.
Dave: Wait, let’s back up. Last week, I was able to fix my bug by modifying the code enough to avoid triggering the, um, the, the problem.
Patrick: But then it came back without you changing the code.
Dave: Yeah, I know. But before we write this thing, let’s at least try to modify the compiler’s own code. Maybe it will work the same. A small change somewhere and the bug, hack, whatever it is, won’t trigger and we’ll have a clean compiler.
Patrick: You could, I guess, but where would you make the modification? Remember that we think this was introduced several revisions ago. That means the pattern recognition is pretty solid. The compiler source is changing and this keeps being added back in And each test will take you two hours.
Dave: Well, I can compile it in the background when we start to write this thing.
C Compiler in Assembly
Narrator: So it went. Dave downloaded the compiler source and compiled it and found the obfuscated opcodes. He mapped that back to the original source, changed the source a bit, compiled it, and so on. I could sense him losing hope as he realized that the hacked code when you traced it back was spread out throughout the program. It wasn’t just isolated to one spot.
Meanwhile, I got the sense that Patrick was just itching to build a compiler and assembly. He wrote some basic string manipulation routines and generated a binary, and then he had me test it with my program and it was clean. He had found a way to cut the infection.
I brushed up on my assembly. I had only ever written assembly once. At school, we had some operating system class where we had to use it. It was hard, but there was something pure and raw about manipulating registers directly, and about knowing exactly what was going on.
Afternoon, Patrick was ready to give me an assignment. I was going to write the C preprocessor. Patrick, meanwhile, had started on a simple recursive descent parser. We were building the world from scratch. It, it honestly seemed like an insane plan. And we continued like this for days. Patrick would hand us some simple assignments, I would do it, or Dave would do it, while waiting for the compile to finish, working on his own plan. He was still playing whack a mole with various segments of code, trying to trick it into generating a benign version of the compiler. But each success would cause a regression elsewhere.
Two weeks later, Patrick’s plan was winning out. Our assembly compiler was able to get through a pretty large fraction of the original compiler code.
And then it was able to get through it all. I ran my analysis program on the result, and it was clean. We had a clean version of the compiler.
It was two in the afternoon, and we had forgotten to eat lunch in our sprint to the finish line.
Patrick: Let’s start a rebuild of this code and go eat.
Narrator: At lunch, my mind was too wired to relax, but, but really I was too tired to make conversation. Dave was talking about local politics, and I didn’t really care. I just wanted the food to arrive so that I’d have an excuse to stay silent. All that was on my mind was this project we’ve been working on. And then finally, Dave brought it up.
Dave: You know what bugs me? Well, we’ve never come close to figuring out the purpose of those modifications.
Patrick: If I’m right and this is machine instigated, then there doesn’t have to be a purpose.
Dave: Why would they do it then?
Patrick: Viruses don’t spread because they have a purpose. They spread because they’re good at spreading.
Dave: So you think it was a virus,
Patrick: Well, it is in the sense that it spreads. It puts stuff into our code.
Dave: But it doesn’t put itself into our code. That wouldn’t make sense. I wasn’t writing a compiler. This was just some network code.
Patrick: Well, if you’re going to spread, then network code would be a good routine to infect.
Narrator: I felt a bit ashamed. How had we not thought of this before? We were more than two weeks in and we were just focused on getting our project back on track. And we hadn’t taken time to understand this alien bug that had infested our system. What was it trying to do?
Dave: But my network code wasn’t talking to a compiler. I mean, this is a compiler bug, a compiler virus. I don’t understand. What you think it’s doing.
Patrick: I don’t know what it’s doing. I wonder if it’s sending stuff over the network.
Dave: We could check that with Wireshark.
Narrator: Suddenly all I wanted to do was go back to the office and try it. The sandwiches arrived and we just wrapped them up and immediately drove back. I had Wireshark already on my computer, so we all went to my desk. I ran the program and recorded a few minutes of network activity.
Dave: That’s a lot of stuff.
Patrick: Yeah, let’s just pick one. Narrator: I looked through the list and visually picked one that seemed to recur. We found it was SSH and I remembered I had a shell window open. I closed that and recorded another minute.
This time we had fewer packets. We picked through them one by one. Time, synchronization, Gmail refreshing, various programs checking for updates. We added each to Wireshark’s filter, once we had convinced ourselves they were innocuous. Then I did a 10 minute capture. There were a few more packets. Again, all innocuous.
It was what we had expected, of course.
Dave mumbled something and he walked off to the kitchen.
But then I had a thought, and I felt chills run down my arm while I frantically searched for the papers on my desk looking for the printout. It was at Patrick’s desk, that’s right.
I started looking through the list of infected programs. My eyes zipped down the list, cursing myself for not sorting it. And then my stomach began to squeeze tight, as I found it, halfway down the second page.
Wireshark.
Patrick guessed what I was looking for, and he read the reaction on my face.
Patrick: I guess we can’t trust it.
Dave: Trust what?
Patrick: Wireshark, it’s infected.
Narrator: down at his desk and unwrapped his sandwich.
on the back of my computer where the ethernet cable plugged in. A light was flickering every few seconds.
Adam: Let’s look at the lights on the Ethernet connection
Patrick: shut down all the programs we found earlier.
Narrator: I closed the browser, the chat programs, and the various processes and tools. I couldn’t shut down everything, there was always something left on the operating system. But the flashes were pretty infrequent. I could correlate them with the packets found in Wireshark. This was good. Wireshark couldn’t be hiding packets.
We’d have seen the light flicker. And they seemed to correlate one to one.
I looked at Dave, and he was smiling, a little bit smug, right? He wasn’t totally bought in on this plan. And that, and that bothered me. If Wireshark was infected, then, then anything could be. Right? Like, no useful program nowadays doesn’t communicate over the network. I think we had to reject this idea that a virus crafted so carefully and strangely would just restrain its activities to my local machine.
How could something be in hundreds of programs on my machine, that giant three page list? But not be using the network. Patrick suddenly stood up. He must have been thinking the same that I had. He kicked his chair to the floor, he walked into the hallway, and he came back 10 minutes later with a piece of equipment the size of a small suitcase.
He approached me with it. He shoved everything off my desk. It was a digital scope. He had gotten it from the hardware engineers upstairs.
He reached into his pocket, pulled out a breakout cable with RJ 45 plugs. He plugged it into the back of my machine. He didn’t actually know how to use the scope, so I brought up a shell window and generated lots of traffic. Eventually he was able to clearly see all the packets.
I closed the window and we waited, shifting our eyes between the scope and the ethernet light. It was only a few seconds until the scope flickered with activity. I had been staring at the ethernet light and I couldn’t be sure that the light had shown anything.
So I brought up Wireshark to have a history of packets.
Patrick: Never mind that. You look at the light and call out each packet you see. I’ll do that with the scope.
Narrator: Dave got up and walked casually to us, standing behind me.
Now,
Patrick: Yes.
Narrator: Now,
Patrick: Yes.
Narrator: This happened several times, then Patrick said now, and I saw nothing. Then this happened again, and I started to feel goosebumps on my arms.
Patrick: Whoa, look at this.
Narrator: The scope was showing a long stretch of activity. I looked back at the Ethernet light. They were dark.
Dave: So you’re not going to tell me that the Ethernet driver is infected
Patrick: Yes, I am. It is totally hiding packets.
Narrator: I looked up at Dave. His face was pale. His eyes darted between the two pieces of equipment. I was paralyzed. Then Patrick stood erect in his chair, staring at the wall. This trance lasted only two seconds before he stood up and ran into the hallway. You already know what happened next. He came back with the old switch from years earlier and plugged it in.
This was proof. Its lights flickered in sync with the scope. It saw the packets that were censored by Wireshark, and the packets that were censored by the Ethernet activity light. Then, goodbye lunch. Dave left for the bathroom, and Patrick and I cleaned up what we could of the mess.
The smell amped up our panic and fear. My hands were starting to shake. We went outside and sat in silence at our patio table, our sandwiches half eaten in front of us. Honestly, none of the things I wanted to say seemed worth saying.
In my internal dialogue, I went back and forth from convincing myself we were mistaken to convincing myself that we were doomed. I wished desperately Patrick would say something at all, he was the most experienced out of us. He was. Then the patio gate opened, and the mailman walked in. He stopped by our table, and he picked out a letter from his bundle, and put it on the table, and continued in the building.
The Letter
Dave: It’s addressed to both of us.
Narrator: He was talking to Patrick. He ripped open the side, and pulled out a letter, handwritten on loose leaf paper. He fumbled with the sheets for a few seconds, and then read the letter aloud.
Dave: We found your posts online. For three years, we’ve been waiting for them scouring the internet and monitoring forums. You must know that this has happened before several times. The first was four years ago to our team in Virginia. We found our binaries modified.
We could recompile the code to clean them, but we found the binaries mysteriously modified again, only a few hours later, the next case. was only a few months later to an unrelated team in San Diego. Both the binaries and the source had been modified.
It was another year until we found the third case, a team in Spain. The binary was dirty, the source was clean, but recompiling did not fix the problem. The compiler’s source code had been modified to insert the strange opcodes.
Each team uncovered the worm’s weakest point. point and developed a solution.
This weakest point was then patched for the next attack. Each generation pushed the worm deeper into the system. Now it’s your turn. Not only has the compiler been modified, but its source is clean. It infects itself on recompilation. Our own machines have been infected for nine months this way. So has the rest of the worlds.
You may wonder then why we were eagerly awaiting for your post. To explain this. We must make two observations. The first is that anyone could have guessed these weaknesses. It takes a fool to modify a development team’s binary, expecting it not to be recompiled. Anyone would have skipped that step.
There’s no need to learn that lesson. Modifying their source is similarly naive. , yeah, these modifications are technically very sophisticated. Who would be so technically advanced, but so socially naive? Machines. It wasn’t until the third attack that we came up with this hypothesis, and now, we’re convinced.
The opcodes were clearly generated by trial and error, by generating a random sequence and testing it to see if it behaved correctly. Only a machine would do this.
The second observation is that all these years, the worm has been widely spread by humans.
But innocuous to infected programs, yet it is not innocuous to these four teams. They were unable to finish their projects. They tried simple workarounds, but these workarounds persistently failed. Only a single team worldwide was affected by each generation of the worm.
The machines must have known that their worm had weaknesses, but they They didn’t know what the weaknesses were They forced a small team to be affected by the worm until that team found the weakest point and circumvented the worm.
The machines then patched the weakness and tried again. This is a large scale version of what they do when they’re generating opcodes.
They try different things until one works, instead of planning it out as a human would. We expect a few years to pass before we see the next team post to the forums.
We can already predict their findings. The compiler’s binaries will be clean. Or rewriting the compiler and assembly won’t work. The worm will have been pushed deeper, perhaps into the text editor, the assembler, the linker, the file system, the hard disk interface, or maybe the CPU itself.
Narrator: Dave couldn’t finish reading. I don’t think there was much left of the letter anyways. He put it down on his lap, and we were silent for what seemed like hours. Eventually Patrick left, and then Dave left, and the November night came in early and cold, but I couldn’t move.
I went over our adventure again, scrutinizing every decision, questioning our assumptions. The biggest jump seemed like blaming the machines. Carl Sagan, his words echoed in my mind. Extraordinary claims require extraordinary evidence. And we didn’t have it. In fact, we had nothing. We just had a gap where the explanation should be.
Could humans be behind this? People create computer viruses all the time, maybe we’ve just stumbled upon a regular virus and we’ve blown it out of proportion.
Who’s to say the Virginia team isn’t overacting? That seems much more likely than machines orchestrating this.
I felt the weight lift as I considered this. These virus writers, they may well have written a program to generate opcodes randomly. They may have started simply, and over the years, made their attack more and more sophisticated. Perhaps the authors were autistic genius savants who just think so differently from me and Dave and Patrick that it seems alien to us.
I let that thought linger. But then I started thinking again. This new human virus theory seemed even less likely than the machine one. We had no idea what machines could do. But I could be pretty sure that no human would approach writing a worm like this.
If you see someone playing chess by trying out every possible move, brute force, you have a safe bet to assume they’re a computer, and not a human. That’s how this felt. But in the end the details didn’t matter. We had to warn everybody about what we found. We needed to act, we needed to act fast before the attackers could mess with the core parts of our computers.
If this worm dug deeper, we’d be in real trouble.
But then, I imagined posting it online again, and getting heckled by people. Who should I tell? The government? The Virginia team must have tried that. Why didn’t antivirus programs detect this? I imagined trying to talk to officials and how they would laugh at me. I imagined screaming back at them and them thinking I’m crazy.
I suddenly stood up, clenching my fists and pacing back and forth in the patio area.
As a programmer, I always talked about computers in my mind as if it was a human.
Yet when faced with action that seemed to actually come from a machine, it just seemed so alien. It was like a void. I picked up my sandwich wrapper and went inside. I dumped the sandwich in the trash and then opened the fridge absentmindedly and stared at the drinks.
Then a happier thought occurred to me. Nothing we had seen was malicious. Other than occasionally bothering a team like ours, there was no evidence of malice here. Patrick had been right when he said a virus doesn’t spread because it has a purpose, it just spreads because it’s good at spreading. This worm might be a permanent tag along, it might be like mitochondria, a symbiotic bond with us and our compilers.
Trying to eradicate it might just provoke a hostile strain. Maybe we just let it be. It’s an unsettling thought, but it is workable.
I shut the fridge, I put on my jacket, and I walked over to the alarm panel.
The LCD on our panel was a computer too. Right? Was it infected? Did it know of our new compiler? Was it going to let me arm it? Were the magnetic doors locking going to let me out of the building? I was starting to spin. I entered the arming code and the countdown began. I stepped out onto the patio and approached my bike.
The computer had let me leave. I looked at my bike and I smiled. It was simple, no computers in it. But then I thought of the traffic lights I would have to drive through to get home. I thought of my credit card. I thought of the cars and telephones. I tucked my right pant leg into my sock and unlocked my bike.
And then I had another thought.
Maybe it’s not all doom and gloom. Maybe we are a transitional species. Nearly all species eventually get replaced by something else. In the long run, we would be as well. We remember the dinosaurs. We worship the dinosaurs. We put them on display in our museums. We make movies of them. I hope the machines will remember us too.
Don and Krystal
Adam: Alright, that was the story. Thank you, Lawrence Kesteloot. You can find a link to, uh, his blog and more about him in the show notes.
What did you guys, uh, think of the story?
Don: like he’s, he’s going for like, uh, kind of like a more, um, optimistic look, but like foreign actors do this stuff all the time.
We live in a world where, like these types of attacks are becoming more and more sophisticated and like having, you know, a worm work its way into compilers is like a credible threat. it’s like, it wasn’t malicious. It’s like, well, what was it doing? Who’s it talking to? It could have been made by somebody, right?
Krystal: Yeah.
When I saw Virginia, I thought it was, at first I thought it was like the NSA or something.
Adam: Exactly. So Ken Thompson, creator of C and Unix when he got the Turing award, he gave this talk called Reflections on Trusting Trust. And so in his acceptance for getting the Turing award, he said, Hey, it’s possible that if I had put something into that very first C compiler, basically when it compiled Unix, it put in a backdoor so that I could log into any Unix machine. That’s totally possible.
And then he’s like, and I could take it a step further and make it so when it compiles the C compiler, which it is itself, it would put that in as well. So then it wouldn’t be in the source, right? It would only be in the opcodes. And he’s like, in that way, since I built the first compiler, that means all of them since have been compiled by some version of that original program I made.
That could still be in there. That was like a speech he gave, like, that’s like a mic drop. Oh yeah, I might’ve infected the world with some secret thing, by the way.
Krystal: What is his, um, his redemptive arc? Like, I’m a good person so I didn’t actually do that?
Adam: He’s talked about it. He was working on some project when he was doing his PhD, or maybe when he was at Bell labs and, and somebody in the military brought up this as, as a weakness when you compiled programs that you could introduce something like that.
And so he did try to create something like this. basically the, he made a version of the compiler that would reintroduce itself. But, like somebody ran into a bug and then it emitted some nonsense and they were like, what’s going on? He’s like, oh you found my thing.
Krystal: So he was a troll.
Adam: But he only made it a month or two and I think recently he released the original code. But so it failed right like but the thing that’s being suggested here and put on AI’s is a real thing, It’s not just what compiles your program, right?
And looking at the source of your program, what compiled that and like, how far back can you go?
Krystal: I think it’s interesting about the whole, the implication of software being built in this kind of trusted community. I don’t know, we kind of assume that people are trying to do good when they build software. but I don’t know, like, hackers exist, or maybe, you know, people who are pissed off at a company will deliberately build crap.
Adam: It gets so easy with ChatGPT and stuff like the XY bug that was recently found in SSH, somebody got somebody to commit some very convoluted code that turned out it had a backdoor in it. But if you own the LLMs, like if you own ChatGPT, it’s not that hard to, in the midst of all your helping the person out, to be like, “Hey, throw this in here too.”
If we become less good at reading code, because we’re just doing whatever the AIs tell us to put in, wherever, it’s easy.
100th Episode
Adam: This will be the hundredth episode of CoRecursive. You guys have both been the most frequent guests.
Don: Woot!!
Krystal: I just wanted to say that, you know, since this is the 100th episode, CoRecursive in terms of like the Slack and the space and this whole experience was a really great place of community for me. So like meeting Kevin, who I actually got to meet in person in Calgary, twice last year and, you know, other people on the Slack, it’s just been really great.
And I think sometimes when you build something that you’re just like, “Oh, I want to meet one person. I want to connect. I want to talk about this thing.” And it ends up being this whole community and this whole support group. I think back now, like it’s what, throughout my entire grad school, I’ve been on CoRecursive.
I was like, what? Yeah, it’s really meaningful. And I, I just wanted to like, draw attention to that and say thank you for the hundredth.
Adam: Thank you. Thank you, Krystal, for participating, being a guest, being a part of the community. We always love getting your updates about the thousand things you’re up to as you’re doing your graduate work, traveling the world. And yeah, thank you everyone who’s out there and who’s been listening to the podcast.
Thank you to those in the CoRecursive Slack who are always trading war stories and side projects and successes and failures and, thank you also to all the supporters who, you know, donate to the show, help keep it going.
Supporters will probably get a bonus episode pretty soon where we catch up with Krystal and Don. And, uh, recently they’ve gotten some behind the scenes video about how I make the podcast. If that sounds interesting, check it out.
And until next time, thank you so much for listening.