CORECURSIVE #053

Unproven Techology Case Study

Choosing The Right Tool For the Job with Sean T Allen

Unproven Techology Case Study

Choosing the programming language or framework for a project can be to the success of the project.

In today’s episode, Sean Allen shares a story of picking the right tool for a job. The tool he ends up picking will surprise you.

His problem: make a distributed stream processing framework, something that can take a fire hose of events and perform customer’s specific calculations on them but the latency needs to be less than a millisecond and the calculations might be CPU intensive. Who would need something like this? The initial use case was risk systems for Wall Street banks.

Transcript

Note: This podcast is designed to be heard. If you are able, we strongly encourage you to listen to the audio, which includes emphasis that’s not on the page

Intro

Adam: Why don’t you state your name and what you do?

Sean: I’m Sean Allen. Until a couple months ago, I was the VP of engineering at Wallaroo Labs. There’s a standing fan about seven feet away. I assume you’re not hearing that?

Adam: I don’t hear a standing fan. It does sound echoey. It’s just-

Sean: I have no non-echoey rooms. It’s basically a couple large open spaces here in Brooklyn, right by the bridge.

Adam: Sean held his laptop up to the window and showed me what I assume is the Brooklyn Bridge. Oh nice. Sean is also recovering from COVID.

Sean: I had all of the symptoms except for the smell thing, but it was relatively mild. The highest my temperature ever got was 99.9. I felt like I had a bad case of the flu for the most part. Now I feel like I have a mild case of bronchitis.

Adam: Hello and welcome to CoRecursive. I’m Adam Gordon Bell. Today, Sean shares a story of picking the right tool for a job. The tool he ends up picking will surprise you. What happened is, Sean wrote a book about real-time data processing. The book is called Storm Applied: Strategies for Real-time Event Processing. The details of Storm don’t really matter here, except to know it’s an Apache big data project. It is written in clojure and runs on the JVM.

What happened is, after he wrote the book, a company called Wallaroo Labs hired him to build a system like Storm, but much faster. You started at Wallaroo Labs and then what happened next?

Sean: We went through a couple iterations of stuff with them and then decided that in order to meet the needs, we were going to have to build something from scratch. Build a framework which was designed for these low latency type use cases where as part of it as well, you want it to be as efficient as possible.

Adam: His problem, make a distributed stream processing framework, something that can take a fire hose of events and perform customer-specific calculations on them. But the latency needs to be less than a millisecond and the calculations might be CPU-intensive. Who would need something like this? The initial use case was risk systems for Wall Street banks.

Building A Risk System

Sean: A risk system could be one which runs alongside automated systems and analyzes the trading output coming out of the automated system to make sure it looks within some realm of reasonable and could then shut down trading, for example. There’s a whole bunch of different types of risk things. Perhaps the most famous is … Have you ever heard the Knight Capital story?

Adam: No.

Sean: Knight Capital went out of business because an automated system started doing trades that it wasn’t supposed to do due to a configuration push to production that went wrong. In the space of 45 minutes, put them out of business.

Adam: This stream processor needs to answer queries in a millisecond 99.99% of the time. Median response time in the microseconds, and it needs to be able to receive hundreds of thousands of requests per second. It needs to be fast enough to pull the plug on a high frequency trading system that’s gone off the rails, so what would you do? What language or framework would you think about using? Let’s play along and see if you end up where Sean and his team did.

Sean: We spent a good amount of time spec’ing out, “This is what we think this needs to be able to do.” And looking at, what is the language or libraries that we want to build this on top of? I mean, the number one way to do low latency is to cut down on network hops. That’s one of the first big things. Even though it’s a distributed stream processor, we wanted to be able to do as much processing as possible on each machine and cut down on the number of network hops you’d have to have.

Adam: Network calls are just slow compared to direct memory access or inter-process communication. You can’t scale out to speed up latency as you’re just adding more network hops. The more you can do on one machine, the faster your distributed processing system is going to be. Maybe this is the reason that Storm isn’t a fit here. How come Storm can’t handle this?

Sean: Storm wasn’t designed for these systems which basically had to be very efficient in very low latency. Storm and a lot of the other Apache big data projects are designed, but particularly Storm, more as a parallelization engine than a low latency data processing system. If you look at the early things of Storm you’d be like, “Here I have some algo, some compute transformation I need to do, and I need to be able to run it on 50 machines because it won’t run fast enough on my machine.”

In some ways, being a bit of a real-time replacement for something like Hadoop, right? Which again is the same type of thing, which is you build that very differently when you’re mostly concerned about, “I just want to paralyze this so I can get it done.” As compared to, “Hey, I need to get this done within a couple of milliseconds.” Right? A lot of the things we do for bank clients that we’re looking at, 99.99% of requests had to be processed in one millisecond or less, right?

Adam: Wow.

Sean: Right. I mean, you’re talking about systems like that and that’s just not something that Storm was built to do. Storm was built to do stuff where you’re talking about probably depending on your thing, your median latencies are going to be in tens of milliseconds probably. If it’s a small thing, it might be single digit milliseconds, but you’re not really looking at, “Hey, we want to have like 15 microseconds be our median latency for a lot of the stuff that we’re doing.”

I mean, it just wasn’t designed for that kind of thing, but I mean, if you were designing it for that type of low latency, you wouldn’t use clojure for example. Writing low latency stuff for the JVM in general is difficult. If you write it like a normal Java program, which in a lot of ways Storm was written like a normal Java program, you’re going to have a lot of object allocations, which is going to involve a lot of memory pressure. You’re going to involve a lot of garbage collection, clojure for how it does stuff, makes that worse.

Those are going to be problems for building something like what we built with Wallaroo and a variety of reasons. It’s they didn’t set out with a goal of building a system like that so they made choices which wouldn’t result in a system like that.

High-Performance JVM Applications

Adam: Yeah. I guess I don’t know this area, but a lot of these projects that are on the JVM, they all end up manually managing memory with some sort of off heap tricks, I understand.

Sean: If latency is a real concern, you’re doing stuff like that. Yeah. I mean, for like a great project, if people were interested in high-performance Java stuff and in a code base, which is pretty easy to follow, the Aeron project, which is a message broker, which does stuff over its own custom UDP that Martin Thompson is one of the big people behind. That’s a great project to look at for doing that type of stuff on the JVM, but it’s definitely not normal JVM programming at that point.

Adam: I think like if I were … Let me take the sample of backend developers out there and I pull one out of the hat and I ask them to build this. I think depending on the age, probably a lot of people would go with C++ to build something like this.

Sean: We definitely considered C++. I’ve done a lot of C++ programming in my life. I don’t think I’m good enough to do it with C++. When I was really learning and doing a lot of high-performance C++ stuff in the 90s and high-performance C++ stuff then is entirely different than now. What we were doing that we needed high-performance with C++ then, you could just write it in plain old Java fashion now and be completely fine. The definition of high performance has definitely changed over the course of time.

But doing this stuff in C++ it was all multithreaded and in order to go fast, you need to not copy memory, which is also where you get into trouble because you can have data races, et cetera, where you need to be careful about how you’re sharing memory. And to make sure that you don’t corrupt the memory, to make sure that thread one over here doesn’t free memory that thread two is still using. It is a variety of ways you go about doing this, usually with locks, et cetera, these types of things.

Still even with doing those, an awful lot of people … The number one way that you see bugs of this usually come about is segfaults is what happens from program crashes, segmentation fault. A lot of people develop different rules for how you can share a memory or what you can’t. I had this whole set of rules, they were in my head, but in the end you can use tools like Valgrind, you can use clojures and everything to try and find where you violated those rules. In the end, it’s on you to carefully follow those rules.

If you don’t and you built an awful lot of code, and you’re not regularly testing it with stuff where it’s going to find your one little mistake that you made at some point, the further in the past that that mistake was, the harder and harder it’s probably going to be to find. We didn’t think that we could hire enough good C++ people to be able to do that. So while we still kept C++ around as an option, we really wanted to have something that had more memory safety baked into it, where the compiler itself would say, “Hey, that’s problematic.” Right?

Adam: Yeah.

Sean: That’s something that Rust is definitely in part one particular approach to trying to solve that.

Considering Rust

Adam: Yeah. That’s what I was going to say. I think when people start talking about data races, Rust people talk about this all the time, right?

Sean: Yeah.

Adam: This is a feature they bring to the table. Did you consider Rust?

Sean: We did and Rust was a strong consideration. The issue there was that there was no high-performance runtime to do what we want to do. We have to write that runtime. Our estimate was … And again, this is an estimate where we spent probably a week to two weeks trying to spec out roughly in … not even t-shirt size, bigger than t-shirt size, how long we thought it would take. We thought it would probably take anywhere from 12 to 18 months to have a really good solid time.

Adam: What’s a runtime? A scheduler?

What Is A Runtime?

Sean: Every language has a runtime. They just don’t necessarily know it. I mean, a runtime is a number of things. It’s memory management. It’s a scheduler. What your particular runtime provides might vary, but yes, definitely scheduling and memory management are probably the two biggest ones. If you’re doing high-performance stuff, then also you’re probably going to be doing stuff asynchronously in some type of thing or some type of message passing type thing so you can hand stuff off, maybe you’ll be using channels like Go does or something like that.

Then, okay, what’s the communication mechanism between threads as well? Having something for that.

Adam: Yeah, because it seems like you need a runtime for handling concurrency?

Sean: The thing that people don’t think about usually when they’re first starting out with stuff, until they’ve worked in a ecosystem for a language that doesn’t have a set concurrency model that comes with it, is what you end up having is you end up having different communities that develop where they have a concurrency model and there were libraries that work with that concurrency model. Like Rust has Tokio now, which is a specific concurrent. There’s a concurrency model that’s built into that. Libraries that might be written to use a entirely different concurrency model are not going to work with Tokio probably and vice versa. You see this with NC++ where there’s a whole bunch of different concurrency libraries and they don’t work well together, you know? If they do, they’re usually stepping on each other and that becomes a problem for high performance.

Adam: Yeah. Like I think of … I’m a Scala developer day-to-day mainly, and there’s Akka people who do actor stuff and then there’s other people who do other stuff. There’s a number of communities, I guess.

Sean: Like the JVM is a runtime and it provides a memory model for how memory works. It provides a basic concurrency model, which is, hey, you build on top of threads and then you use locking primitives in order to do this. Akka wants to, in the end, have a different model that they want to build on top of that, but I don’t know, have you done much Akka programming?

Adam: I haven’t actually.

Sean: Okay. One of the things that comes up a lot is, “Ooh, be aware of this when you’re doing Akka stuff,” is make sure that you don’t inadvertently capture values or references to objects and send them from one actor in Akka to another. Because now you can have two actors that are both able to modify the same data and you now can have data races, et cetera, and everything, which is a problem. There’s not a lot that Akka can do about this because in the end, that is this single global memory space is something which the JVM allows.

You would need a special Akka compiler in order to prevent programs that do that inadvertently from compiling, which if you’re building a library, you don’t really want to have to have, “Hey, here’s my compiler for it.” This is a thing where they’re trying to overlay a somewhat different idea of concurrency and a runtime idea on top of a different runtime and running into some issues there.

Adam: In other words, Akka runs on the JVM, which doesn’t have first-class support for actors. You can make it work, but the runtime is thread-based rather than actor-based. Rust on the other hand tries to have a very minimal runtime environment. Sean feels that that means he needs to build these things himself like a scheduler or air handling, or maybe even garbage collection, which makes me want to ask about garbage collection itself.

Trash Day

Sean: There’s an awesome paper. I was thinking before this. I’m like, “Am I going to get through this without mentioning it?” No, I can’t get through anything that’s on this topic without mentioning it. It’s a paper, it’s called Trash Day. I don’t know if you’re familiar with it, but folks who are listening might not be, which is really about how do you get maximum performance out of a Hadoop cluster?

Adam: Why is it called Trash Day? The paper.

Sean: Well, because it’s about handling garbage, right? When you put out your garbage that’s trash, like when you live in the suburbs and there’s like three days a week when you have to put the garbage out? It’s trash day.

Adam: Yeah.

Sean: Then it gets taken away.

Adam: In other words, in a distributed system, on the JVM, a GC pause causes a slowdown work, piles up or work slows down. The paper makes Hadoop faster by having everyone GC at the same time. That’s your trash day. You get more throughput, but you still have latency issues when that GC happens. The point for us is, the JVM and its runtime won’t work for this use case, even with a performant actor system like Akka. All right. So far we’ve crossed C++ off the list, Rust off the list.

Now it sounds like anything JVM off the list. Sean, did bring up actors though, which gives me a clue about the direction he’s thinking.

Considering Erlang and Beam

Then I assume your concurrency model is going to involve actors of some sort, I guess, right?

Sean: The concurrency model that I really like is that you have something … You start with, how many CPUs do you have? You have a single thread that does work per CPU. You lock it to those CPUs and you don’t let anything else use the CPU. If you want to go really fast, you can use something like c:set to actually set those CPUs apart so that Linux won’t schedule anything on those at all. They’re purely for your program. It can start up and it can have like 12 CPUs that are all for itself. You have one thread per CPU, which will be responsible for running work.

You have something, and it could be actors, whatever your obstruction is over the top of it. But you give people some unit of parallelism of concurrency that they can program to. That’s the model because I’m particularly interested in making things go fast, but yes, I happen to like actors. For me, it’s a really good conceptual model. Although, I’ve seen that lots of people definitely struggle with trying to figure out how to model things for actors, which in a lot of ways I think is because actors are really all about doing things asynchronously, and the way most folks have been taught to do programming is in a very synchronous fashion.

Really thinking about concurrency where things are happening asynchronously can be really difficult for a lot of folks.

Adam: All right. That comment makes my mind go straight to this runtime that was built from the ground up to use actors for concurrency. I guess if you’re going to embrace actors, I mean, Erlang must’ve been a consideration?

Sean: Yes. Erlang was a consideration. We didn’t think that we could get the performance that we needed out of Erlang. Erlang was designed more for consistent latencies rather than consistently low latencies with lots of throughput, which is slightly different. I mean, one of the great things about Erlang is if you graph what your latencies normally are, they’re just flat in a way that you don’t get from the JVM in general, because of garbage collection strategies that are commonly used on the JVM.

Whereas, the garbage collection strategy on Erlang is very different with the message passing and everything. It results in very consistent performance all the time. It’s just that Erlang was not designed to be a high-performance language. The throughput isn’t there, but yeah, Erlang was something that we definitely considered. More than one person who was on the team had prior, in some cases, large amounts of Erlang experience.

Adam: It just won’t hit your one millisecond.

Sean: You can. You can, but it might not hit your thing where we were doing one millisecond at the 99.9 percentile while processing 3 million messages a second in a virtualized environment in AWS. We probably wouldn’t be doing that amount of throughput with that latency. That wasn’t going to happen. The per core amount of computation that you could do with Erlang is in general going to be less than what you would do with C or C++, because again, it had different goals when it was designed.

Adam: It seems like Erlang might not be a fit, but there is this company called Basho that makes a really fast distributed database, all using Erlang.

Sean: For us when we were looking at doing stuff for a Wallaroo, we really liked actors. Actors worked well for us, for how we think about things and modeling them and so Erlang was a natural thing that we were interested in. It was, let’s go talk to the folks that we know at Basho and go, “Here’s what we want to do. Do you think we’ll be able to easily get Erlang to do that?” The answer that came back was, “We love Erlang but no. No, we don’t think you’re going to be able to make Erlang do that easily.” You know?

Introduction To Pony

Adam: All right. Not C++, not Erlang, not Rust, not Java, not Scala, not Akka. I’m running out of guesses. Let’s just cut to the chase.

Sean: Very little of interest has ever happened to me on LinkedIn, but Sylvan, I’ve known Sylvan since he was … He was 16 and I was 17 when we met, but we hadn’t talked for a number of years because Sylvan is very bad at email. I sent him an email and he never applied so I assumed I had done something to irritate him. I didn’t hear from him until he sent me a LinkedIn message that said, “Hey, look what I built.”

Adam: We he had built, what Sylvan Clebsch had built was Pony. The love child of Erlang and Rust, no data races, shared memory without copying and all based around first-class actors and something called reference capabilities. The first thing I think of is I’ve never heard of this language Pony. It cannot be a legit choice to bet accompany on, but Sean sees it differently. I’m imagining this, right? I’m imagining the story and I just imagine you’re like, “Erlang, it’s used in production. Lots of people use it.”

The people who really know it say it doesn’t fit, but you’re like, “Actually, a guy I knew when I was 16 built something that I’ve never heard of, let’s use that.”

Sean: If I hadn’t known Sylvan then I wouldn’t have heard of Pony and it wouldn’t have been a consideration, but I mean, one of the other serious considerations was that we use C++, or we use Rust and so in a lot of ways, I mean, we were very nervous about picking but we dipped our toes in. It was that compared to Rust and writing our own one from scratch is the big thing, the biggest consideration there. Rust had a bigger community at the time, but Rust was still a very, very, very small community then. Really small.

It’s picking up now, but even though it’s got a huge amount of mindshare on things like Hacker News or whatever, the actual community itself is really small when you compare it to a lot of languages. I’m pretty sure that I know way more Scala programmers than I do Hacker News programmers … I’m sorry. Rust programmers at this point.

Adam: Hacker News programmer that is for sure a Freudian slip. Anyways, Sean chose Pony, a language written by his high school friend. Some might say that is a huge risk, especially since the whole company was this product. I think we need to learn a little bit about Pony to understand this choice. Then we’ll come back around to, did this work out for Sean and Wallaroo Labs? What did you get out of Pony?

Sean: We got a compiler, which won’t let you create data races, will allow you to share memory in a concurrent system in a way that’s safe so that you don’t have to do copies to allow you to go faster. We got a runtime which met our general idea of how we would want to go about writing both the runtime in terms of scheduling and basics for memory allocation, so that we didn’t have to spend that 12 to 18 months writing our own.

Embracing New Tech

Adam: You mentioned fixing compiler bugs.

Sean: Yes.

Adam: I mean, that would frighten me from wanting to take on a language, I guess.

Sean: I think that is a thing that should frighten you. It should certainly be. You should go into that with eyes wide open. All of us who worked on Wallaroo in the early days have a bit of scar tissue where even though none of us had hit a compiler bug in forever, we were still like, “Is that a bug in my code or is that a bug in the compiler?” That thought would cross your mind all the time because it’d gotten in there. At least part of the way I look at it is yes, Pony was definitely unproven technology and for whatever your definition of unproven is, an awful lot of things are still unproven that a lot of people are comfortable with now.

One of the things that people don’t think about when deciding like, “Oh, I don’t want to use that thing because it’s unproven.” Is that if their alternative is build it yourself, your thing that you’re building is also unproven, right?

Adam: Yeah.

Sean: It becomes a matter of certainly building it yourself, you’re going to probably understand the thing much better if you build it yourself, which is why when we took on building in Pony, we considered that the language, the compiler and the runtime were part of our project. This was code that we were starting from and we were looking at it as, “Imagine that we’re starting our thing right here, are we comfortable with this being part of our code base?” The fact that it was such, and still is such a really nice clean C code base for the core implementation of stuff was something that made us comfortable.

There are an awful lot of things that I’ve worked in the code bases of over the years where I would not be able to make that statement where it’s just a jumbled mess. It would be a bad idea to take that thing on as a core part of your thing. I mean, that’s really dependent there, but it’s like, hey, if like part of the choices is we’re going to do this in Pony and we’re going to potentially have compiler bugs versus we’re going to build an entirely new runtime in Rust.

Lord knows how many bugs we’re going to have in our runtime like then the likelihood in compiler bugs no longer becomes as much of an issue when you look at it as a trade-off between those things. Yeah.

Adam: It’s interesting that you successfully embraced Pony because I have to assume that there’s limited packaging support in Pony?

Sean: Oh, yeah. I mean, it’s right there on the website. Like, “Hey, batteries not included.” You’re writing almost everything. If you’re concerned with performance, you’re probably going to write almost everything anyway, at least anything that’s going to be in a hot path. That becomes much less of an issue. If you just want your machine learning thing up and running it’d be the wrong thing to use.

Beautiful Code

Adam: One of the things Pony is famous for is this quote from Sylvan Clebsch, Sean’s LinkedIn buddy. Let’s paraphrase it.

Sean: Basically its programming languages are tools. It’s not about ergonomics. It’s not about developer experience. It’s not about all the things that we normally talk about. It’s about getting the job done for whatever that means. It’s a means to an end.

Adam: Yeah. It’s an interesting perspective, right?

Sean: Nobody when we’re designing Pony or anything is like, “Oh, let’s make it ugly.” For whatever we think ugly might be. Ooh, whatever it is that gets … I almost made fun of the developer like experience UX people from them, which is bad. I just would have fallen into making fun of it, because I just don’t understand it. There is something that happens for those people when they’re using a language that they love in this type of way, like Ruby, that I just don’t understand. In the same way that I have a friend who Ruby drives him up the wall and he just can’t stand it.

He finds it horrifying to work with, for reasons I also don’t understand. To me, it’s just like, “Well, I wish it had more things to tell me upfront that I was making an error.” But yeah. The tool.

Adam: I get it. I do get the beauty perspective. There’s the Haskell definition of quicksort, and it’s really small and it just looks like a spec. Then they’re like, “Well, this actually doesn’t work at the performance level.” Then there’s an optimized version where they have to do a bunch of stuff and then it becomes much less possible as a human, right? I suspect when people talk about beauty, they’re talking about like, “Hey, this very concisely reflects what I would like the machine to do.”

Sean: Perhaps. Yes. I think then that … I mean, that’s certainly in the eye of the beholder, right? Because one person is just like, “I want it to sort a list.” The other person is, “I want this to be sorted in as an efficient means possible. Therefore, I have a pretty good idea that doing it like this, this and this, rather than letting a compiler decide will result in better stuff.” I mean, at that point, that could be beauty. You know? I mean, it’s a matter of context from where you are on the ladder of abstraction or whatever it is, for what you’re really interested in.

Adam: I think that there’s even a bigger point wider than performance if you have a really hard problem, you have to optimize for solving that hard problem. Does that make any sense?

Sean: I believe I understand what you’re saying. Yes. I mean, your hard problem is your primary thing. You want to solve the hard problem, ergonomics is going to be somewhere down the line, right? Ergonomics is never the top thing for probably anyone. There are other things that are first. For me, beautiful is, I’ve written you, you work.

Who Should Use Pony?

Adam: Hey, what about who should use Pony? You’re behind Pony now. I believe you’re … Are you like … You’re invested in the language? Who should use it?

Sean: I mean, Pony particularly is good at doing things which are operating over TCP over a network. If you were in a bank and you needed to tap into anything, that card in order to monitor stuff that’s flowing by to make sure that there’s not something unusual happening on your network, Pony is great for that. If you’re building network servers that need to be high performance, then Pony is excellent for that. The concurrency model and the fact that once you get over the hump that a lot of people have of having to do everything asynchronously, the performance is usually much easier to get in Pony than in an awful lot of other languages that I’ve ever worked with.

I do also think that from a not trying to get stuff done at work standpoint, that Pony is an excellent language for people who want to learn language runtime stuff, or just … Because anybody who comes into the Pony community right now and wants to contribute, we will happily accept them, as long as they’re not a dick. We will help them and we will teach them and get them so that they’re productive. I spend most of my time at this point, not working on new features and stuff for everything, but trying to figure out, “What can I do to make it easier for people to be able to contribute to Pony?”

That’s where I spend most of my Pony time these days, is, look if I can eventually, over the course of a year, make it so that five new people came in and they’re contributing stuff, eventually that’s going to be better than my spending all of that time just being an individual contributor.

Adam: In other words, Pony is great literally if you want Sean himself mentoring you. I jumped on the Pony chat. Sean is just there answering people’s questions, helping them out, along with several others. That’s the beauty of a small community. It’s also great if you want to work on a real, but understandable compiler or runtime. If you’ve built a toy language in the past, or played around with runtimes and are looking to continue that learning, it seems like honestly, a great fit.

Sean: It is a really clean code base for implementing compiler features, for implementing runtime features. We have an RFC system where people can bring up ideas for changes that they would want and have them discussed. I don’t expect that a lot of people would sit down and be like, “Pony is the perfect thing for what I need to do for my job.” Because it’s designed to do things which the vast majority of programmers are not getting paid to do. They’re not getting paid to write reasonably low down on the stack type stuff that needs to be handling a lot of stuff concurrently and do it safely, easily and in an efficient fashion.

That’s just not what most people were paid to do. Even when people were writing backend system stuff, if that’s what people were being paid to do, then Rails wouldn’t have taken off.

Adam: Yeah. There are some fun problems down there, low in the stack, I guess.

Sean: There are, and a lot of people really enjoy working on it, but in the end, compared to the broader, like some of what everybody’s doing, it is a niche problem. There will never, ever, ever be a Pony community that’s as big as JavaScript. It just won’t happen.

Adam: Yeah. That makes sense. How do I know if something that seems unproven is worth the risk?

Taking the Risk With An Unproven Language

Sean: A lot of engineers that I know, I don’t think that they follow a very good approach when they’re picking tools in general. I don’t think that they really stop and think about what their goals are and what they really need in order to accomplish those goals. This isn’t just in picking tools, but it’s like I have a feature to implement. Most people usually don’t think through, what really are the goals of this feature? What are we trying to accomplish? What are we willing to trade off? What is important to this? What is not important to this?

Adam: We started with a problem, a problem of making something like an order of magnitude faster than the existing solution than Apache Storm. We chose our tech stack and it was built. There’s one thing that we’re missing to wrap up our case study. All right, sorry. Back to Pony. I feel like-

Sean: Tangents. Do you like tangents?

Adam: Yeah.

Sean: I like tangents.

The End Result

Adam: Did it work? You guys took Pony you built, or you were going to build a Storm, something better than storm, right? Lower latencies? How did it go?

Sean: I don’t like to use the word better because we had different goals, right? Also, for everything I’ve said earlier on about languages, note I didn’t say, “Oh, that’s a bad language or anything.” It’s the goals were different. For what we were trying to accomplish, it was a much better tool for those types of scenarios that we built it for than Storm was. Going back to what I find beautiful, right?

Adam: Yeah.

Sean: About a year ago, when we were at Wallaroo Labs, we put a system into production for PubMatic. They’re an ad tech company. That was the first system to go into production that was going to be taking a ton of data, lots and lots and lots of data for the system we built. We were all like … We all worried about like, “What’s our on-call thing going to be for this?” Et cetera and everything. It’s almost a year later, not a single issue.

Adam: Oh, wow.

Sean: There was one issue and that was when somebody went to upgrade something and didn’t follow the upgrade instructions. For the stuff that we built, it was processing at peak about 80 million calculations a second, handling hundreds of thousands of incoming HTTP JSON requests a second with packful of data, which would blow up into like 80 million calculations a second. Running for a year, not one teeny tiny little issue. To me, that’s beautiful. That’s beauty to me, right?

Adam: Sean focused on the features of his hard problem that led to a seemingly crazy solution using Pony, but it actually made sense and it worked out. He had to minimize latency and maximize throughput so he needed something very performant. He needed to minimize network hops and copying, and he had to do everything async. Maybe you have a use case for Pony, maybe not, but I bet you have to make technology decisions where the right choice could save or sink a project. That’s what this story was all about. Choosing the right tool for the job.

I hope you like this case study. If you have a case study about a project you worked on, let me know, adam@corecursive.com. I think we need more of these case studies so that we can all learn from them. Until next time, thank you so much for listening.

Support CoRecursive

Hello,
I make CoRecursive because I love it when someone shares the details behind some project, some bug, or some incident with me.

No other podcast was telling stories quite like I wanted to hear.

Right now this is all done by just me and I love doing it, but it's also exhausting.

Recommending the show to others and contributing to this patreon are the biggest things you can do to help out.

Whatever you can do to help, I truly appreciate it!

Thanks! Adam Gordon Bell

Audio Player
back 15
forward 60s
00:00
00:00
39:06

Unproven Techology Case Study