CORECURSIVE #109

HATETRIS

Obsession, Friendship, and World Records

HATETRIS

What if a simple game became a gateway to computational breakthroughs? David Freiberg and Felipe set out on a journey to conquer Hatetris, a notoriously difficult JavaScript game. Their interest ignited when a new world record was set, showing that surpassing the game’s high score was possible.

Their journey was full of challenges, from building an emulator in different programming languages to tackling complex algorithms. They pushed the boundaries of what’s possible but the story didn’t end there.

Collaborating with fellow enthusiasts, including a Japanese Tetris expert, led to further breakthroughs. By sharing insights and building on each other’s work, they set a records after records. Their story highlights the power of curiosity, collaboration, and the joy of discovery.

Transcript

Note: This podcast is designed to be heard. If you are able, we strongly encourage you to listen to the audio, which includes emphasis that’s not on the page

Intro

Adam: Hello, this is CoRecursive, and I’m Adam Gordon Bell.

We’re diving into the people behind a piece of software because sometimes what looks like a simple puzzle can unravel into a web of complexity. Today, we’re exploring the story of two people who got caught in a JavaScript game from 2010.

It seemed like just another Tetris clone, but it quickly became an obsession that would challenge their skills, their wallets, and even their relationships.

David: My name is David Freiberg. I am a material science PhD and I am very very obsessed with a JavaScript game that came out in 2010.

Felipe: software engineer at HubSpot. I’m not quite as passionate about this problem as Dave, but I’ve gotten dragged kicking and screaming into it, so I now guess I’m a domain expert on it.

The Creation of the Game

David: Back in 2010, Sam Hughes created hatetris, just published it on his blog as a novelty. About a month after the game was published, a Japanese Slashdot commenter named Diyasuke got 30 points. And it stayed there for years and years and years.

In 2015, I took a class in general game playing on MIT OpenCourseWare. This was an interesting class. Instead of writing a program to play chess, checkers, or Connect Four, you were writing a program that could play any game given the rules. Instead of just being given a position, you were given a position along with all of the relevant rules that went with it.

I started reading QNTM and I started looking at Hatetris. I messaged Felipe a couple of times asking him if we could apply it to Hatetris, could we apply this game playing? Felipe quite wisely pointed out that my general-purpose game player could barely beat me at Kinect 4. It couldn’t beat me at checkers.

Hatetris is way, way harder than pretty much any human game. Yeah, Felipe was right about that, so we shelved it.

Computational Challenges

David: So what happens next is that in June of 2021, a Japanese computational Tetris expert, and yes, there is such a person, named KnewJade, sets a new world record.

We thought it might genuinely be impossible to get past 30.

Adam: This max score belief was based on simulations that they had run. Unlike Tetris, hats is totally deterministic. You always get the worst piece based on the well you have. And even in narrow wells, for that place where the pieces fall, even in narrow wells, the game state ballooned faster than you’d think.

So 30 points and 30 lines cleared. It felt like an absolute ceiling. But then KnewJade hit 32, so clearly they had missed something.

Felipe: Then Dave started working on an emulator, and I was like, okay, we’re doing this, I guess, because Dave has started working on an emulator, and I’m not going to be able to gently guide him away from the idea of doing this.

That’s what kicked off the project in my brain. That’s where it went from an impractical idea, we’re discussing, which is 90 percent of our conversations, 80 percent of our conversations, to the idea we are doing.

Pair Dynamics

Adam: This is normal for these two. But before we get deeper into how they tackle this challenge, you need to understand something about them. Dave and Felipe. They’re buddies from back in high school, but they are not casual in their approach to these challenges. Dave is the instigator of projects. Felipe is the one who reigns them in with practical concerns. But whatever they’re working on, they’re always knocking around ideas.

Felipe: Dave will send me some idea like, I was thinking about how we can use machine learning algorithms to solve a game. And I will say, Dave, that is wildly impractical. How do we make your idea practical?

Because I like the core of it, right? I really want to, like, maybe we can’t solve every game; maybe we can focus on one game. And we’ll narrow the idea, we’ll refine it, we’ll talk about it, and we’ll come up with sort of a practical approach. And that’s usually where the conversation will end. I’ll be like, yes, we can’t predict all stock markets forever, but you know, you could maybe focus on predicting one stock, and you could use all these various factors to figure it out.

And I read this good paper. Maybe you go look at that. And then, if Dave actually likes the idea, two days later to a week later, he will show up and say, I wrote a Mathematica script for this; here is the data that I got from it, here’s the parts that don’t work. I didn’t use a database because I don’t believe in those, and all the data is sitting in text files in my hard drive right now, and it’s full of these. And I’ll be like, Dave, I gave you like four practical ideas here; why did you do it this way?

Pair Programming

Felipe: And so, honestly, it’s most, most of this stuff goes as a snippets of conversation had over Discord, right? It’s like, Dave sends me a message that said something along the lines of, ‘I just tested the algorithm sorter and I got these results.’

And I say something incisive like, ‘Well, those results aren’t very meaningful, Dave. What can we actually learn from these?’ And he’s like, ‘Well, let me pull up Mathematica. I’ll make a graph.’ And we’d look at the graph and make some more commentary on it. We’d have sort of a discussion about it, right? And the conversation would pick up on and off throughout the day as we’re available, so I’ll be like, ‘Okay, I have to go to make dinner.’ I’ll go make dinner, and I’ll come back, and Dave will have posted five more paragraphs in my Discord, and I go read them, and I make a bunch of commentary, and Dave’s off swimming because he’s, he’s done a, he’s practicing for an Iron Man.

Were you practicing for an Iron Man at that time? I think you were. And so he’s like, ‘I just got back from swimming 8,000 miles. Let me post some more thoughts.’ And so like,

David: 8,000 meters, come on.

Felipe: Dave. Your magical exercise numbers are meaningless to me.

David: I do a lot of biking, and I do a lot of swimming. So, a lot of this is, I’ll look at a whole bunch of data right before I leave. And then, as I’m biking to the river, or when I’m in the river, I have nothing else to think about or focus on, so I’m just sort of mulling over the data in my head. When I get back, I send Felipe messages in the same way that cannonballs are sent from cannons sometimes, and so he’ll be hit with a barrage of a couple thousand words based on whatever I was mulling over.

Any reference to practical stuff like VS Code is going to be Felipe. He forced me to use things like Git and version control and backup copies. Still haven’t forgiven him entirely.

Felipe: I mean, we, we only lost most of a project due to not keeping any backups and not using version control. So, you know, it’s not like I, it’s not like I’ve managed to get you there until we did that.

The Projects

David: Yeah, this is very true.

Adam: So this is a thing that Dave and Felipe do. They work on projects together. They send around messages, and then at night, depending if they’re busy or not, they might jump on a Discord voice call. They might get VS Code Live Share running. They might work on some coding. Sometimes the projects get somewhere; sometimes they wouldn’t.

Sometimes at the end, they’d write up their work and share it on the blog. Most of the time, they wouldn’t get around to that. But yeah, Hatetris was always one of these topics on the back burner. It would come up now and again. Dave wanted to tackle it; Felipe pointed out it seemed so out of reach. So, other projects would get priority.

But like many backburner ideas, Hatetris was quietly preparing to take center stage. Because little did they know, once they truly committed to beating this very hateful game, there’d be no going back. Because when a project grabs your interest, especially when there’s a group of you, the effort you pour into it over countless evenings and weekends can be surprising.

It can add up. So yeah, KnewJade hit a new record: 32. And Dave thought maybe they could beat it. But first, they needed to build an emulator.

David: So we built not one emulator. We built six total, starting in Mathematica just to get something that worked and to get an idea of how slow things were. In Mathematica, a single move took about a tenth of a second to calculate, and there’s a fair amount of computation that goes into Hatetris. Because the Hatetris algorithm doesn’t give you a piece at random. It looks at the well and it calculates which piece is going to be worst according to a minimax algorithm. It makes sense that it would take a little bit. Our current one runs approximately 200,000 times faster. So, currently, we run about 5 microseconds per move per core. So even if, even on a single thread, that’s still tens of thousands of times improvement. The very basic emulator consists of just trying every possible spot that a piece could go and doing the minimax algorithm from there. It was not very fast in Mathematica.

So, what Felipe eventually convinced me to do was rewrite it from Mathematica to Python, which was barely any faster, and then from Python to Rust. And this, I will blame Felipe on.

Felipe: I do take the blame for that particular approach.

David: As background, we had no other experience with the language. We did six AOC problems, and our seventh thing was re-implementing Google’s AlphaZero algorithm, complete with GPU neural nets, from scratch.

It was a bit of a leap.

Alpha Zero

Adam: Yeah, AlphaZero, that DeepMind project out of Google, that was their inspiration. I mean, there is a published paper with formulas, and stuff in it, so how hard could it be for the two of them to reimplement it all?

Felipe: At the time when he brought up Alpha Zero, I was playing a lot of Go, and I had seen what Alpha Go could do. And so I was like, Oh, actually, I do think this has a lot of potential. Let’s talk through the actual practical implications of doing this. And at that point is when I knew this was the idea we were going to do, right?

We took the parts that we understood and implemented them, the key element being the parts that we understood. Because at some point, I remember we were talking about this, and then the topic of Dichroic noise came up, and I’m like, I don’t know what that is, Dave.

And Dave looks at me and says, I also don’t know what that is. And I’m like, So are we using it? And I’m like, I don’t think we actually implemented that anywhere. Does it matter? And then we look and we’re like, It looks like it could be important.

David: But the problem is that when we actually implemented it, our results got worse, because they were trained without the Dirichlet noise.

Felipe: Yes, because we didn’t understand what we’re doing, Dave, because we didn’t understand the paper.

Running Zero

Adam: Even if they didn’t totally get all the details, they got an AlphaZero approach up and running. And then they could run it, and they could watch the logs, and they could see if it was actually learning anything and getting better.

David: So you would see a text file. And it was a text file because Felipe never felt like writing a proper log, and so I did it. And what you see there is when we’re in the AlphaZero phase, you would see a printout of a game. And every new game that was played, you would get a list of lists of numbers, where each list of numbers represents well written as binary.

You would get logs of these, and each well would come with a score if it had gotten any lines. It would come with the network’s heuristic estimation and a couple other bits of metadata like how many children it had and so forth.

And you would see tens of thousands of these. We would get one every few seconds, and we would control F through to manually look for the high scoring ones. Most of them were not terribly high scoring, of course.

Felipe: I have a very fond memory of sitting with Dave on a voice call, actually, and just looking at games that had been played through and like trying to figure out if there were commonalities or traits and what the network was trying to think of as we went through these things. If I remember correctly, you made a little emulator that let us look at the games in Mathematica, and it was like clicking through and seeing, okay, what happened here?

Why did it do this? And then looking at a movement being like, that move is terrible, why would it do that? Why can’t the network figure out that it shouldn’t be doing this? It was probably like one in the morning or something as we’re doing this.

The Problem

Adam: But yeah, there was a problem with this AlphaZero approach.

Felipe: What we needed was a data center’s worth of TPUs hooked up to a bunch of hard drives and. Computers running all day so that we can change meta parameters and test things. Um, so that was a big factor our implementation was the implementation of two people who read and half understood a paper and then skipped over some fuzzy details that clearly didn’t matter that, um, turns out probably kind of mattered, they were probably important and , I think we just fully didn’t grasp the scope of what needed to happen in order to make this an effective solution.

So we were kind of like. Stuck in that circle of tweak a thing, it gets worse, but did it get worse because we tweaked the thing or did it get worse because we’re in a local minima and don’t know how to fix it. And then we don’t can’t iterate on that because we don’t have the hardware in order to iterate on it properly.

So it was just a thing where it was like, that feeling you get when you know, you’re approaching a problem in the incorrect direction and that yes, you can keep tweaking things, but nothing is going to get you out of this hole is I think the sentiment.

Adam: Imagine climbing a mountain and then realizing that the peak in front of you you’ve been trying to scale is not the top but just a false peak. The real peak is behind it. That’s where they were. Resources were running thin, and each passing day it felt like a warning. This problem was much bigger than they thought.

Felipe: You’d get some obscure error. That is like some rust backtrace error. That is a, that is a segment overflow somewhere. You’re like, what does this even mean? Why is this segfaulting? And we just look at it and either fix it or say, actually, we won’t use this library anymore. It’s fine. We don’t need to use sparse matrices. That won’t help anyway.

Adam: But yeah, there was also moments of hope.

David: My personal record for playing Hatricks myself is 10 points. So the moment when it first beat me, that was exciting, and there was a couple weeks period where it slowly rose from single-digit points to 22 points. That did feel like it was getting better, and then it just stalled. No matter what we did, we couldn’t improve it. But there were moments in there where we were really happy.

There were moments in there where we found ways of speeding up our emulator and getting more games per second. I’m not saying it was all bad. It’s just the bad is the part we remember because the bad, chronologically speaking, was the majority of that time.

TPU Grant

Adam: For months, they kept pushing on this problem, working late nights, exchanging code. But now, on the edge of a breakthrough, they, they just felt so much doubt.

Felipe: This is kind of one of the moments that happens in any given redirection point, right? Where you’re like, I’m still interested in the core idea behind this. I still want to solve this problem. But whatever it is I’m doing isn’t working for whatever reason, and honestly I can pretend that this is a measured decision where you sit down and you make a list of pros and cons, but really it’s how annoyed am I at this and do I still want to keep fighting with this specific set of problems, right?

I think we had a conversation where I basically said, I don’t think at the scale we have, unless we’re going to shell out a ton of money to Google, we can do this. Is that when we applied for the TPU grant? Because we looked at this, we looked at like, what would it cost us cloud wise if we wanted to do this properly?

David: The answer was it would take about a month’s worth of a thousand TPUs running full time. So, we applied for a grant to get a month’s worth of a thousand TPUs running full time, and we got the grant.

Think about the Merchant of Venice: the pound of flesh, not including any blood. That’s more or less the scenario we encountered here, where we had the ability to use for free thousands of TPUs for about a month. We did not have the ability to use for free any of the CPUs that we would need to coordinate those TPUs.

Felipe: For the hard drives.

David: Granted, the TPUs were still doing the majority of the calculation, vast majority, but the cost, even of that, of the ancillary controls and logging and so on and so forth, would have been well over our budget.

Felipe: Our budget of 0 in a shoestring, to be clear.

Knewjade

Adam: So frustrated, and realizing that this neural network approach they had been throwing themselves at was just beyond the compute resources they could afford, they were ready to give up.

But then KnewJade appeared on GitHub and started shaking things up.

Felipe: He posted a gist that included his entire approach in Japanese and Google Translate was sufficient with pictures to figure out What was going on? We’ve been looking at Hatetris for quite a while at this point, so we, like, there was nothing in the explanation that we needed to have explained to us as far as, like, how does the game work?

What mechanism am I using? We were mostly interested in the broad strokes, like, What are you looking at? And he made some really good graphics actually that really like captured the idea of the shapes of things that he was looking at and that was very helpful and it was a pretty succinct explanation.

It didn’t get into a lot of like the technical implementation details which was exactly what we needed so it was just a really good bright up job on his part even in Japanese because which we don’t understand.

BeamSearch

Felipe: We’re like, none of this is that hard compared to what we’re trying to do. I bet we could come up with a better heuristic. We’re good at looking at problems. And I think that idea sort of overtakes everything else. And we’re like, okay, what does a heuristic look like if we’re starting to think about this problem? What are the things? We should consider that KnewJade hasn’t thought about looking at this problem because sure he’s a computational Tetris expert That’s been studying this problem longer than we have and come up with a better solution But obviously we can do that.

We can do better than that. We can come up with a better idea and I think that’s what starts things. Is that right Dave?

David: Yep. We decided that good artists borrow and great artists steal. And that is precisely what we did.

Adam: And then, while they were re-implementing, or stealing, or whatever you want to call it, KnewJade starts posting new high scores.

Felipe: I think Dave sent me a message that said something like the new record is 66 points, and I was like, that, come on, you can’t be right, Dave. The last record was 45 points, and we’ve speculated that the roof is at like 50, maybe 60, because of course, every time there’s a new record, right, we raise the roof by about 10 points. Like, there’s no way it’s humanly possible. At this point, I say something along the lines of famous last words, oh, the maximum score is probably around 100. Then I think, that makes sense.

And so we say, okay, let’s just start by recreating the work. See if we can get something similar. Then we can start tweaking the heuristic to make it more suitable with our understanding. What we’ll bring to the table is we’ll do a lot more data analysis, and we’ll have a better idea of a heuristic or things we can add to it.

David: From there, it took us about five months because we made two crucial mistakes, in our re-implementation. And each mistake cost us literally months of runtime.

The Record Attempt

Adam: Re-implementing KnewJade’s solution isn’t super easy, especially if you don’t have a PhD in computational Tetris, or you’re not familiar with computational game searches.

Imagine exploring multiple paths of a maze simultaneously. You go 10 different ways through the maze, and when some of them dead end or look like they’re not promising, that’s when you can branch off on some of the ones that do look promising so that you always have your best 10, your best ones running in parallel forward through the game space.

You drop the rest and you focus on these winners, but you’re always moving through multiple paths at once. And how many of those you move through at once, that is your beams width. That’s what makes it beam search. But simulating all of these demands a lot of computing power.

Felipe: The conversations with these things always go like this. How long could this take? I don’t know, a day? Sure, a day seems reasonable. And then it’s a day later, and I’m like, how far has it gotten?

Well, it finished the first beam. Okay. So, two weeks?

Very hopeful, I was going to run for less than a month, and then it was just much slower than we had expected.

And so, like, each run takes a month, and it’s a month of looking at the logs, seeing the score slightly increase, and me like, “Oh God, I hope we didn’t have a bug in this portion of the code, because if we did, we’re gonna get two weeks in and have to start over.”

And we’re looking at log files, and we’re discussing what’s going on. Obviously, it’s a lot of just refreshing log files and talking back and forth, being like, “Well, the newest highest score is this.” I remember I would sign off for a call on Discord.

I’d see a message from Dave that would be like, “The current score is 30.” And then a message an hour later, “The current score is 32.” And we’d hop off a working session, right? And I’d hop back onto the console, connect to look at the logs, and I’d send an echo message to anyone else on the server, being like, “Dave, are you still online looking at the logs?”

I’d get a message back being like, “Yep.” Like, “Yeah, okay.”

Adam: And so you did climb to 32. Well, the record was what at that point?

David: The record at this point was 66. And we eventually got our way up to 53 over the best one of these searches, which was still very good. It was the second best in the world. It was better than anybody else except KnewJade, but it was not quite good enough.

And so that’s when Felipe got the idea. Instead of using the single desktop computer that was at this point already five or six years old, let’s actually purchase some computer time and just get this over with in a day.

Adam: This is where things got serious. Instead of running this program at home and it’s just something fun, they were about to rent a 72 core machine in the cloud. The clock would be ticking. This was on AWS. If anything went wrong, that money would be gone for good, leaving them with nothing but an unfinished beam search.

Now this puzzle wasn’t just about pride; it wasn’t just exploring. Their wallets were now at stake as well.

Felipe: Yeah, so I’ve done a bunch of DevOps stuff, but AWS is my most familiar cloud provider. I was like, AWS seems like a reasonable choice for this. And so I went out and I priced it and I looked at various solutions. I said, okay. What we’re looking for is a large number of threads. I forgot the instance type. This one has a lot of threads. If we’re going to run this for 24 hours, this feels like a reasonable price to pay to get this project done.

At this point, we have a shiny brand new heuristic. We’ve added our own new magical fields to it that will totally make everything better. We’ve gotten 53 points, so we know that it can be done. I think the math we did was like four cores take a month, divide the month in, and we need this amount of cores. It was like, I forget how many it was—a few hundred.

Do you remember, Dave? I think it was like…

David: I think it was 72. Yes,

Felipe: 72 cores, a lot more RAM, we can make this run quickly. And we did some back of the napkin math and came up with a budget. And then I said, this is how much it’s going to cost, Dave. And Dave was like, okay, we can go have these on it, that feels reasonable. Never trust my budgeting decisions ever. I have always, every single time, wildly underestimated the cost of doing the thing. And so we get it set up. I set up the AWS account. None of this is like very challenging because it’s all just set up to be click and go.

Set up the instance, SSH into the instance, copy your code over because now we’re using version control so we can just git clone the stuff over. You’re welcome, Dave. Install the dependencies, get everything going, and kick it off. Right? And honestly, it’s blazingly fast. But you’re like, wow, this is going very quickly. Exactly as we predicted.

The Run

Adam: By midnight, they were glued to the log file, hoping each new entry would push them past 66 points. Each beam would make progress, and one would pass the world record.

David: So then problems start to come up. We actually shoot past 66 points. And then we start to realize, well, wait a minute, if the game lasts longer than we predicted, we’re going to have to pay more than we predicted.

Adam: This was at the 48 hour mark. This is twice the expected run time, but their code still has to finish. They have to let it run because it’s cleared 66 lines, but until they lose the game, until they finish the search, the program won’t finalize; it won’t print out the score.

So you’d hit a new world record, but you couldn’t see it, you couldn’t log it as a world record. You just knew that if there were no bugs in your program, it existed.

So the better your program does, as you’re sitting there watching it, the worse your bill gets. And then there’s Amdahl’s Law.

David: Because we forgot, multithreading only works for the actual beam. When you’re saving everything, when you’re saving the progress from one time step to another, the file read and write, that’s single threaded. That’s not sped up at all. That was a negligible amount of time when we only have four cores, but when we have 70, that’s now the majority of the time.

And so, our cloud computing budget sort of retroactively increased until finally the game finished just before midnight and we got 86 points. A 20 point improvement on the record. Huge, huge, huge, milestone.

Felipe: This is a Friday night, right? My wife is like, are we going to do anything? I’m like, I have to stare at this screen and see if we’re getting a record. She’s like, okay, we have plans tomorrow. You remember? I’m like, yeah, this is fine. Dave and I will be done by this, by like, two in the morning at the latest, right? And so now it’s midnight. We have a world record. We can see the world record, right? And we’re like very confident there are no bugs.

David: Yes, so the issue is, at the end of the game, we now have a position, and that position is the final position of the winning game. That’s not the entire game. We need every position that every piece has gone in for the entire history of the game. To do that, we then reference all those save timesteps we made earlier, which is also a single threaded process, which is also something we didn’t anticipate.

This is even slower than going forward. We thought it was going to take at most an hour, but an hour later it was barely started. We did some back of the envelope calculations, and we predicted that it would finish by the time we woke up in the morning. It did not. By the time we woke up in the morning, it turned out our back of the envelope calculations were based on the smaller time steps at the end because the beam was running out of options; the number of wells stored at each time step was smaller. When we woke up, it was going to take us another 12 hours to finish.

Felipe: We’re looking at our budget, like. I guess we can’t stop it at this point, right?

Adam: Cuz it’s all gone, right? If like you have no way of putting it together, if you stop.

David: Correct, if we stop at any point before it finishes, then there was no point to the entire run. We need every position of every well, or else we just can’t get there.

Felipe: Mind you, this is peak software design, right? You should always design your processes so that if stopping them at any point destroys all your work, that is how you are supposed to design software.

David: No backups. It keeps you on your toes, I think. But finally, we had the record. It was 86 points. We immediately downloaded it. We got the files transferred off AWS. We stopped the money spigot going into AWS.

Adam: But they’re not done yet. Their wallets are taking a break, but the simulator outputs its internal state for each move, which doesn’t match the format that the highest scoreboard needs. That requires a move by move, replay.

David: We took our list of numbers and had a Mathematica script that turns it into a list of pictures. Each picture represents where the piece is going to go. Then we sit there on a screen share for an hour, manually hitting arrow keys to move the piece into the appropriate position. We, uh, we vowed to automate it before the next record, and then we didn’t, and the next record took even longer.

Fortunately, Hatetris has an undo button, so it was possible to fix mistakes. It also has a replay code button where you can enter the replay code of a game.

And that’s very much preferable to doing it by hand.

Felipe: Because there’s a deterministic replay code, so you can generate it by just knowing what the moves are. We just never wrote the script for it, so we were like, how long could this take?

David: Though we had it. And by this point, we had found KnewJade. We found him hanging out on a computational Tetris Discord server, and we were able to get in touch with him that way. He sent us his congratulations. He mentioned offhand that he had his eyes set on a new approach, but that, that approach was going to take a backseat for now.

The Record

Adam: They post their world record of 86 on the Hatreus site. It’s a big jump! And they’re finally landing a world record at Sam Hughes to share the news.

David: Yeah, he, he congratulated and confirmed that the record was legitimate. He was extremely impressed at the level of effort we’d apparently gone through, because we mentioned at the time that it had taken us 11 months to get this. And, yeah, from there, we write up a blog post.

Felipe: And what reasonable human beings would do at this point is stop thinking about this problem you have already solved and gotten the world record for.

Adam: Of course, reasonable is an interesting word to use when you’re talking about dedicating the resources of a small engineering team basically to an obscure game from 2010’s leaderboard. Reasonable is not the point, and so they, in fact, did not move on. In fact, they planned to give a talk about all the work they had done.

Bird in the Hand

Felipe: So, to do the DC Rust talk, we spend like three weeks planning this talk and making slides. Our talk is way too long initially, and we cut it down. There’s a whole conversation of, does our audience want to see this fancy infographic I made? Maybe not; maybe we just cut that.

And all this time we’re talking about this talk and about our project, and you look back at the project, and you look at your code base, and you’re like, well, we could have done that slightly better. Well, that’s a tweak we could have made. Now that we’re looking at it, Dave says, well, if we’re going to give this talk, it would make sense to open source our code so people can go look at our mistakes. And I’m like, well, I don’t want to do that without a README. Dave says we should probably clean up this horrible spaghetti mess over here, where we have one giant method that does everything. We could add some unit tests, maybe, just so people don’t think we don’t ever write unit tests. We don’t ever write unit tests.

While pondering all this, right, back steeped in their hatred’s work, Dave gets an idea for a new heuristic where you just look a little bit more ahead to see if you can clear a line in the next couple of moves. That gives you much more grounding for your heuristic about whether you’re in a good place. He calls it the bird in the hand heuristic, I believe, at the time. We have a whole conversation about how, when this gets published, we can name it after him because that would make a lot of sense.

Then I get a message like two hours later that says, “I explained the idea to my dad.” He says, “Oh, that’s a quiescent look ahead. That’s used in chess engines all the time.” I’m like, okay, that fits.

Adam: Your dad is a, is a chess engine person?

David: He’s a programmer, amateur mathematician, and the world’s foremost Petri nets enthusiast. I do not say world’s foremost Petri nets expert, there are people who know more than him. There is nobody who loves Petri nets more than he does, though. I grew up listening to cautionary tales about how if you’re training a genetic algorithm, you’ve got to remember to include mutations or else you’ll get stuck in a local maximum.

You learn a lot through osmosis, even if you’re not trying. And so I still, we’ll frequently call him up and talk about this sort of thing. He also attended our Rusty C talk. He’s always interested in new ongoing programming projects.

Adam: That’s wild. It’s funny to me that you could say, like, Oh yeah, and then I remember what my dad told me when I was 12, like, Consider a beam search before you go into a monte carlo approach.

David: Yeah, but that is actually the kind of advice you would have given me when I was 12. But we, for many years, did Project Euler problems together.

We’re both in the top 0.1 percent of Project Euler solvers, and Felipe can attest to getting dragged into that as well. He and I have actually solved the hardest problem on the entire site. It took us about a year on and off.

Adam: But what happens next?

David: This was basically the first and maybe only really fist-in-the-air moment. Where you’re just, you can’t believe how well something’s working. Our huge beam search, the one that caught, took days to run, which was a width of 25 million. So we were looking at 25 million positions in parallel.

Adam: So they test their new heuristic. They try it with the width of 100,000. That’s 250 times fewer games in parallel than their big AWS run. But they have this better heuristic, right? So they’re searching through this tree of possible games, but they’re navigating towards these longer runs. They’re simulating better games. And so, even though it’s 250 times fewer games, they get a score of 82 points. Nearly their world record.

Felipe: And, to be clear, I did not believe this number. Dave told me this number; I was like, Dave, we have a bug somewhere. That can’t be right. It’s impossible that this heuristic got that good that fast.

David: And I disagreed until we ran it for a million. And for a million, it gave us 148 points. Now, I am also convinced that there’s a bug in the emulator somewhere, because after Rust DC, we did a bunch of optimizations to make it faster, and I am convinced some optimization we did caused there to be a bug and that bug caused the game to be easier.

And it wasn’t helped by the fact that we actually did have a bug earlier on in the implementation that we didn’t catch for a while; it had been giving us false games.

Adam: So the thing to do once again, they need to put together the gameplay. They need to verify the record so.

Felipe: We jump on a zoom call, and we said, we really, really should have written that script last time. Why didn’t we do that? Dave, very sensibly, said because we thought we were done. We weren’t going to do any additional work and never ever again visit this project again. And I said, yeah, okay, let’s, let’s make sure we write it next time. Um,

So you just got a screenshot, but you just got an image that shows where the piece has to go. The shape of the well, essentially.

David: But it doesn’t show how to get the piece there. There’s a bunch that require complex rotations to get past tight corners and all that sort of thing. And our script did not include any of that because if we’d done that, we would have just written an automatic replay code generator, which we after this we actually sat down and did because that was too much.

It takes us about 2 hours to actually input the whole thing.

Felipe: Mind you, every time we make a move, I’m like, this is clearly where we’re going to find the bug, right? Like we’re just joking about that every few minutes where it’s like, obviously this move is illegal. This is not going to work.

David: And so, the old record was 86, and this was 148, which was the largest increase in Tetris history, either in absolute terms or proportionally.

There is no way anybody will top that. That’s beyond any of our predictions for the maximum possible score for this game. It’s over. There’s no way.

Felipe, what happens a week later?

Felipe: Someone gets an even higher score, and this one’s kind of on us. Dave has a friend that he does Advent of Code with. Sometimes Dave hasn’t said this, but he’s often on the Advent of Code leaderboard because he’s very good at that type of programming. I blame his Mathematica skills and his desire to be up at midnight solving a problem.

Dave is having a chat with his friend. Did you show him the Rusty Ctalk, is that right?

David: Yes, he saw the Rusty C talk independently, and then he messaged me.

Felipe: Yeah, so this is Tim. They’re talking, and then says, Hey, I saw your Rusty Ctalk, and I actually know how to write Rust really well. So, I made my own version of the emulator. And then he gets 157 points. After Dave tells him that the bird in the hand here is sick, because that’s just the type of people we are.

Adam: It feels like your dad should have had a recommendation for this. If you’re in a competitive coding contest, don’t tell the other people on the leaderboard your tricks.

Felipe: I can’t endorse that because every single time we’ve worked on a project and we’ve been doing something really hard, and we’ve talked to someone in the same field, they’ve been overwhelmingly happy to share whatever it is they’re working with in amounts of detail that I consider like actual work. Like, it was like, here’s what I’m doing, here’s how I’m doing it, here’s what I’m thinking. Let me know if you want me to look at your code type stuff.

So we take the same attitude. If someone asks us what we’re doing, we’re going to tell them in as much detail as we can, because yeah, it’s a competition, but the goal is not to win. The goal is to defeat hatred at its core. Right, and like the break the will of the game, because that’s what we’re all after.

Dave and I now spend the week sitting on various discord chats as 170 gets posted and then 232 gets posted. And we’re like, we really need to come up with a better way of doing this.

Adam: Are these all your friend’s?

Felipe: It’s all Tim. And Tim has a better computer than we do; I will give him that part. He’s also better at Rust than we are. He’s to ask Dave, “Hey, I see you guys did, what did we do? Used a priority queue, but you don’t need to do that.”

You can just do this. And we’re like, “Oh yeah, I guess we could have done it that way. That would be much faster. Yes, thank you, Tim.” And so he’s tweaking his heuristic. He’s changing it. He’s posting more scores, and Dave and I are seething is not the right term, but we’re definitely sitting in Zoom chats.

We’re like, “Okay, what do we do to like, we can’t beat them on technical skill or on? We need to do something to change things up, and we got nothing. Our conversations are like, ‘Maybe we can do an analysis and do a thing?’ So, do you remember what day of the week it was? Tim posts 232 points?

And we’re like, “Oh, gosh. Okay. Well, I don’t know if we can break the 200-point barrier. Then, 12 hours later, KnewJade shows up on Twitter and posts a score of 289 points!

We talked to him, we shared our heuristic, and he did something radically new, right? He used an entirely new set of techniques, but basically he posts, “Alright guys, this is it, this is the definitive score,” and we say, “Yeah, okay, I guess we’re done working on Hatetris.”

Knewjade’s New Breakthrough

David: So, we started with neural networks and we abandoned it for KnewJade’s heuristic approach. And Tim also used the heuristic approach; he was just better at it than we were. While we were doing that, KnewJade was busy abandoning his heuristic approach in favor of a neural net approach.

But the neural net approach was very different. So, AlphaZero is designed by Google. It’s designed by people who have millions of dollars of hardware. There is another and objectively better algorithm for neural net learning of chess that was developed by a Japanese Shogi enthusiast. Shogi does not have anywhere near the resources that chess and Go do, and this is the algorithm. It’s called NNUE. It uses sparse neural networks to dramatically accelerate how much effective compute you have. If you have a huge neural network, and 99 percent of the inputs to that neural network are zero, your network is effectively running a hundred times faster than it otherwise would be. And he used this to make a world record-setting Shogi program.

And the Shogi program has a name that comes right out of anime. It’s called the End of Genesis TNK Evolution Turbo Type D. That’s the actual name of the program, and we had never heard of it. This came out in 2018. We had never heard that this was a possibility.

Adam: So yeah, KnewJade documented this approach for the record, which is how Dave and Felipe learned about it. He incorporated David’s ‘bird in the hand’ strategy with his new neural network approach.

And this all led to the discovery of the very first loop. And we’ll explore loops shortly, but by incorporating Knewjade’s insights back into their code, David and Felipe quickly hit an even higher record, 366, taking the record back from him.

And then around this time, I pinged them about maybe doing a podcast episode.

Loops

David: So in the same way that preparing for the Rusty C talk got us talking about the problem again, when you messaged us, that also got us talking about the problem again.

And so we started looking, not necessarily at a new approach but a new way of analyzing the data we already have.

Okay. And we actually have a graph of what are called loops. Now, in theory, if you could have a position in Hatetris and then get back to that same position, you could run forever. And to prevent that, the Hatetris algorithm has a built-in loop prevention rule which, if it sees that you’re going to get to a well you’ve seen before, will give you some other piece so you can’t get there. KnewJade’s 289-point record was also the very first record to ever discover a loop. And soon after, Tim discovered his own loop. And soon after, we discovered some of ours.

These were moves which, if that loop prevention rule wasn’t there, would get infinite points. You could just play forever.

And so then the question we’ve been asking ourselves for the last couple of weeks, and what we’ve been investigating, is how many loops are there? And what’s the structure of these loops? What kind of well leads you to a loop? Because obviously, if you can figure that out, you are a very good way towards understanding the game entirely.

Adam: This shifts Hatetris from just being a machine learning, a game search problem, to something that David and Felipe are quite good at: pattern analysis.

You can only take each loop once. But if multiple loops start at the same spot, if you can force Hatetris to send you on a second loop, the game changes.

Learning from Each Other

Adam: It’s exciting to see these ideas sparking off one another. Sharing leads to learning, and everyone’s growing from each other’s insights. Each published idea is building on the last.

David: Our current approach, holding the world record, is based on KnewJade’s approach. KnewJade’s is based on Tim’s. Tim’s is based on ours again. Ours is based on KnewJade’s again. KnewJade’s approach, for 32 points, all the way at the beginning of our involvement, actually did not come up with the idea of using a beam search for Hatetris. That was another programmer by the name of 3Pipes.

3Pipes, in 2019, for a class project, decided to write a Hatetris solver. He made an attempt at it. He extracted the Hatetris emulator from the original JavaScript rather than writing his own, which was a very interesting approach. He made a Beam Search implementation, and it didn’t get very far. He got a score of 16 or 17 points. He finished it for his class. He never touched it again. He moved on with his life. But he wrote down what he did, and he published it online. And KnewJade found it in 2021. And that’s what gave him the idea to start the beam search that led to his world records, and that got this whole thing started off.

It’s not all about flashy thousand-word blog posts; it’s just about coming up with some idea, implementing the idea, and then telling people if it worked or not. Hatred does not matter in a very real sense. This record is completely irrelevant. Nobody but us cares. The concept of keeping a continual chain of knowledge going, of reading what other people have done, and then writing down what you’ve done differently and what you’ve done the same, and whether it worked, and passing that on for the next person, that’s something that’s universal, I think.

Adam: David and Felipe are impressive. It’s amazing how they tackled a specific game problem, pushing the boundaries and making unique discoveries. And yet, the thing that they want to make clear is they are not, in fact, specially qualified to take on tasks like this.

Felipe: It boils down to, if you’re working on a project and you are not the expert on the project, there is a large community of people somewhere, or a small community of people somewhere, who’s probably thought about this problem before. They probably put time and energy into this problem before.

And they’re delighted that someone cares. And so if you show up and you send them a nice message and you say, “Hey, I saw your post about, Oh, let’s say TrackMania. I see you tried this on TrackMania and we’re trying a similar thing. And I was curious about your approach.” They will reply to you. And if they haven’t touched this project in three years, they will tell you as much as they can about it.

And those types of like interpersonal connections with other people across communities, working on things, ties into the idea Dave just gave, but I think it’s even more fundamental in the fact that like people who care about projects are happy to share their knowledge, and if you just seek them out, they will share it with you freely, happily, without any particular complaints.

Adam: And so, yeah, this is the story. This is the story of how two friends who couldn’t understand the AlphaZero paper, but thought, hey, we’ll implement it anyways, got to the frontier, got to the coalface of game search trees, or whatever this field is called. They just kept pushing it, even if occasionally, Felipe had to check in with his wife.

Felipe: Oh, she was, she indulges my, I am doing a project with Dave, because she doesn’t want to hear anymore about Hatred at the Dinner Table, so it tends to be a lot of, are you doing stuff with Dave? Yes, okay, I will leave you to it.

Adam: Hey, yeah, I don’t need the details. On your side, Dave? Is there, uh, I don’t know what your situation’s like. Is there any relationship strife ever caused by these projects?

David: No, I can’t, I can’t say that there is. Uh, I, I, I expect it could happen at some point because I can’t see giving up doing these projects at 3 in the morning anytime soon. Uh, but, I’ll cross that bridge when I come to it.

Adam: Where’d you work, David? Like I missed it.

David: Oh, currently nowhere. I am looking for work. The most recent place I interviewed with, I had five interviews over the course of two months. If you have five interviews, including one where you get a full site tour and it’s a three-hour process, you really are getting your hopes up, even though you’re telling yourself not to get your hopes up. They said, basically, hey, within a week we will get back to you one way or another on the final offer. And that was three weeks ago. I’ve emailed them back. I just never heard anything.

So that’s why we do this sort of thing to distract us. At least that’s why I do it. I can’t talk for Felipe.

Adam: Beyond hitting 366, Hatetris offered David and Felipe a rare escape. Right? It was a place to bond and experiment and swap stories late into the night.

I feel like there’s something deeper here. The reason we chase these seemingly trivial puzzles, I’ve been known to do it myself, maybe not quite as far, but the reason we can pour months of our lives into side projects, into a JavaScript game from 2010. It’s because there’s something fundamentally human about this process of striving. It’s about curiosity and the joy of discovering, and about the thrill of pushing up against the boundaries of what seems possible.

And sometimes it’s simply about finding a good excuse to crack open Discord late at night and tackle something impossible side by side with a very close friend.

Outro

Adam: That was the show. Thank you, Felipe and David. I hope your quest continues. After the interview, I got to hang out with them for a bit. I was shown some of their unpublished work on loops. You know, David showed me a Mathematica map of these loops, and it looks like some demented subway map. But they’re working on something that uses that a new way to break ground.

I’m sure if they do, they’ll post the results on their blog, and by the time you hear it, uh, it might already be up. So check out Hall of Impossible Dreams to see their latest projects. Sometimes it’s hatetris, sometimes it’s the dead internet theory, sometimes it’s Superman fan fiction.

There’s always something they’re sharing and something they’re working on. And in my short time with them, I saw how unique they are, that they’re both very different people, but in a way that makes them greater than the sum of their parts.

And if you’re listening this far, I’d be remiss to not mention that you should join as a supporter of this podcast.

I know there’s somebody out there working on a side project who needs to hear the story of these two. Maybe it’s you. But stories can be powerful, and stories can be something that sticks with you and allows you to learn from somebody else’s experience. So that’s what I’m trying to do here.

If you enjoy it, support me, or just listen. That’s why I always end by saying until next time, thank you so much for listening.

Support CoRecursive

Hello,
I make CoRecursive because I love it when someone shares the details behind some project, some bug, or some incident with me.

No other podcast was telling stories quite like I wanted to hear.

Right now this is all done by just me and I love doing it, but it's also exhausting.

Recommending the show to others and contributing to this patreon are the biggest things you can do to help out.

Whatever you can do to help, I truly appreciate it!

Thanks! Adam Gordon Bell

Audio Player
00:00
00:00
48:11

HATETRIS