Graphql And Sangria With Oleg Ilyenko

Oleg: I think GraphQL API is quite helpful if you have as this fast evolving data model, which you would like to expose to specific clients, and you would like to provide a lot of flexibility for those clients.

Adam: Today, I talk to Oleg Illyenko about GraphQL and his GraphQL implementation, Sangria. Welcome to the podcast.

Oleg: It’s nice to meet you. Thank you for having me here.

Adam: Yeah. It’s great to meet you. Actually, about a month ago, I was working on this API, and I had to add some new end points to it, because they’re building this new React client, and they just wanted some new end points that returned slightly different data. And I started to dig into this what seemed like a small thing, I kind of came to the realization that the API design is maybe a lot more complicated than I thought.

At this time, I started to look into GraphQL, and then I came across Sangria, which you made, which is a GraphQL tool for Scala. I thought maybe we could start with maybe talking about some of these API difficulties, and then transition into talking about GraphQL, and then finally your product.

The problem I had, because I think it’s an interesting way to illustrate where GraphQL could be a solution maybe. There was a user endpoint, and it returns some sort of user object, and it’s being used by certain clients, and it returns certain, you know, just basic user information. Now we have a new client who’d like to call this API, and either they have to make extra requests, because all the information isn’t there, or I need to add more information to this call. How would I solve this problem, say, without using GraphQL?

Oleg: Without using GraphQL, there are different approaches to this. In many cases, if you just have one rest API, you were to probably just introduce this change in the backend, and try to think about how it will look like for all existing clients who maybe not necessarily needs this information. Another approach you can go is to build a dedicated endpoint specifically for this particular client. It is especially important for mobile applications where they’re very sensitive to type of data that is returned to them, and to amount of data.

With GraphQL it’s a little bit more flexible, because every single client always needs to provide information about what kind of data it needs in form of GraphQL query. So this GraphQL query, can think of it as a JSON, but if you skip the values, if you just have the names of the field, the structure, and you send it to the clients and to the server, and then server fulfills those requirements, it provides the missing parts, the value part. The shape of response as the client gets back is the same as the initial query, with exception that it has now values.

Adam: For example, if I was using GraphQL, rather than building two endpoints, one client could just specify that they actually want whatever the organization name as a part of the request, and they would get that back. Or the other client could not include that. Then they wouldn’t receive that.

Oleg: Exactly. With GraphQL API, you would still need to add this capability to the server, but it will not affect any existing clients, because if client doesn’t know about this new field, like organization, it will never ask for it. This means the other clients wouldn’t be affected by this change. The new clients now can ask for this field. And as a side effect of this, it makes it also easier to track usage of API. We often talk, especially in context of mobile applications, we talk about different versions of API, about maintenance of those versions.

Because as soon as you deploy a mobile application, as soon as it went through the review process and it landed on the, let’s say Android or iPhone. It is very hard to make users updated. This means you need to either tolerate those older versions, older clients that inquire or ask for older versions, or you just stop support for those older clients.

With GraphQL, you’re no longer thinking in terms of the version. You’re thinking in terms of data requirements. Client says, “Okay, I need an organization,” and it doesn’t describe the version of the API. It just tells exactly which field its needs, and this is pretty much it.

Adam: Well, what if there’s a field that, for instance, in a traditional setup, I was returning some field as part of V1 API, and then in the V2, I’m no longer returning that. How would I handle that in GraphQL?

Oleg: What is often done, and I doing, for example, it was our API. So the GraphQL itself has, first it has a query language, which you normally use, or you say, “Okay, give me username with the name and organization.” In this case, you will get a user back. But what GraphQL also has in addition to it is the type system. At the same time as you ask about the user, you can also ask an introspection API to give you information about the structure or the user type.

This means user type and scalar types like int and string, they’re all part of the type system, which is exposed together with the actual data in the actual API. And this metadata includes things like documentation. Every single field and type has a description, and every field has a duplication.

The field called …I think it’s called duplicated and duplication reason. This means when you would like to remove, for example, this organization in future, like the organization field, you would first duplicate it. This information is only understood and used by different clients. Like for example, this GraphQL, which is in-browser IDE to write your queries. It will automatically remove those fields. It will not show it to users, so you can still use those fields, but they would be explicitly marked as duplicated and not shown to users who don’t know about them.

Given this and given that server always knows what clients uses, we can track the usage of these particular fields. Like, for example, in our case, we have all of our metrics go to the InfluxDB and then presented in Grafana. And we actually built a dashboard that shows us which fields are in use, and especially which duplicated fields are still in use. We know precisely when particular field is no longer used by different clients.

We can define thresholds. We can define the timeframe for how long do we wait until we drop this field. But when we drop this field, we have a lot of confidence that this field is not used by any client.

Adam: That is very cool, because otherwise you end up with all these endpoints that you’re not sure they’re being used. I mean, I guess you could do this monitoring if you just had a V1 endpoint as well.

Oleg: Surely. One of the issues I have is this global API version is that it’s very imprecise. You cannot declare version 1 of your whole API. And sometimes you just want to change one field, or want to remove one field. You need to be very cautious about introducing new changes or coordinations or changes, because you can’t maintain a lot of versions. Maybe you do one version in six months, once in six months, or something like this.

You can also be more precise about it. We try to be more precise in our Rest API. We also have Rest API, and we try to avoid version as much as possible, just because of maintenance and we’re not a big team.

What we did, we actually introduced a header. As soon as user or client sends to us something that is duplicated, like a duplicated clear parameter, we add this additional header, I think it’s called [Xduplicated 00:09:31] or something like this. And this header tells it, “Okay, you’re using this query parameter, but it’s no longer supported, or it will be trapped in future.” And the same we do to our responses, but it’s much, much harder with responses because we always need to send this information, whether the client actually needs this field or not, because we just simply don’t know.

Adam: Mm-hmm (affirmative). And how do you send it? It’s just included in the response?

Oleg: Exactly. It’s a response header, and it just contains the list of all things that are duplicated.

Adam: Yeah. It makes sense. I agree with you about the global versioning, right. Because say you want to just make one change to the API, and now you need to have a whole new version. And it does discourage innovation on the API. It keeps it very static, where I suppose with GraphQL, if you’re just adding a new field to an existing thing, clients who aren’t using that, it doesn’t matter. They won’t be asking for it. They won’t get it back.

Oleg: Exactly.

Adam: You mentioned overfetching. To my understanding, overfetching is, like just there’s more data returned than the client actually needs.

Oleg: Yes.

Adam: One way I’ve seen this dealt with in just a non-GraphQL API is that you actually specify the fields. The client will just give you a list of, “Hey, get me this user ID seven, and I want name, and I don’t know, some other field.” What do you think of that?

Oleg: Indeed, I saw those different approaches of doing this. In fact, our Rest API includes this type of information as well. We don’t have a black or white listing of fields, but we have a field expansion. You can expand on references. I also saw, for example, JSON API. JSON API is a kind of specification for Rest API, but it includes a lot of concepts like pagination, like field black and white listing, and so on. It has structure. it has semantic in it.

One of the problems is that often there’s no single way of doing this. As far as I saw in my personal experience, different API implemented in different ways which are not always compatible with each other. Every single time you see a new API, new Rest API, you’ll need to learn as a concept that this API uses, the way they allow you to specify the black and white, let you do the black and white listing of fields.

In particular, it becomes more complicated when we are talking about nested structures. For example, if you have a user with a list of repositories, in each repository you have a list of projects, or a list of commits. If you’re talking about, for example, GitHub API. In this case, you’ll need a special language to express yourself. You’ll need to say something like adjacent pairs, where you say, okay, “I would like inside of a user, inside of the list of repositories, I would like to select particular field.

And GraphQL provides an advantage, because it is a specification which describes precisely those concepts. This allows you to do it in a standard well specified way. And all GraphQL API work in exactly the same way. So they implement exactly the same semantic, which means you can build very powerful tools, especially considering that GraphQL API also has a full introspection exposed. You’ll not only know that this is GraphQL API and it follow a particular semantic. You can also discover everything there is to know about this API, including documentation.

Adam: In that way, you start with this idea of where you want to make your API better for each client, so you add these fields, like expand this field, don’t include this field. But a solution to this problem already exists, which is GraphQL. Facebook’s already gone through this and came up with this protocol for deciding what to return in a uniform way.

Oleg: Exactly. And another important aspect. Actually Facebook, when Facebook started working on GraphQL, they started around 2012, and funny enough, they actually started use it in context of mobile applications, even though now, if you look at GraphQL and the whole ecosystem, it is often used in context of web applications, together with React, with Angular JS 2, with [inaudible 00:14:26], and so on.

But in fact, the origins of GraphQL come from mobile space, from iOS and Android applications. Actually at Facebook, they started to use REST API. So they tried to use it. But unfortunately, it was not very good fit as a mobile API. Another thing is also conceptually, it was a different approach, because in most of the cases, when we do the data modeling, just think about Scala code. In Scala, we define all those concepts like a user and organization, a repository, and so on. All those things are connected, somehow related to each other. So we define the data model in terms of types and relationships.

Then when we go to define Rest API, we suddenly start to think about resources, flat resources that somehow encapsulate particular part of API, but they don’t have a lot of notion of relationships between data, especially deep and more complicated relationships. And this is where GraphQL helps a lot, because it allows you not to expose the data itself, but also those relationships between data.

So you can ask for organizations of a user at the same time, and it’s just a normal field. So you’re no longer working with those references. So the conceptual model is with GraphQL. So it’s about types and relationships, instead of tables and foreign keys, for example, or researchers and links.

Adam: And this is a graph in like the computer science sense of like nodes that you can follow to another node with vertices.

Oleg: Exactly. Conceptually, on this array, you can think of the whole GraphQL API, the data inside of it as a graph. But when you clear it, you need to start somewhere. And this is kind of an entry point. With Graph QL, you normally start with a query type. And then you, what you actually get back is a tree.

So you get a tree, a traversal, so to say of, this data, and because often graph have loops and recursive data structures, so when you, as a client, you ask for particular data, you don’t get the [inaudible 00:17:00] shape, but you get the tree shape of the data as a kind of a projection of the graph.

Adam: Yeah. Because you could say, for instance, like get me these four users, and these users’ organizations, and the organizations’ repositories, or something, right? So does that make sense?

Oleg: Yeah, exactly. Exactly. So you’re not only can ask for, let’s say, a list of organization of a user. You can also ask for a particular page, like you can do a pagination inside of the data structures, because the only thing that GraphQL defines are the fields, maybe a nested selection of fields. And every single field can have arguments.

And this is how you do pagination, and searching, and filtering. So you can have, for example, a list of users that are like top 10 users, and then inside of a user type, inside of a user object, and you can ask for list of organizations, but not all of them. Only, let’s say, the last three organizations that user had activity in.

Adam: Makes sense. So I think I kind of understand the motivating examples for GraphQL. So what is GraphQL? It’s not a library. It’s not an implementation. What is it?

Oleg: Exactly. So GraphQL is a query language for an API. It was developed by Facebook at … So it was first developed by Facebook at 2012. But in 2015, it was publicly list as a specification. So this means that GraphQL is a specification, which you can think of it as just a PDF document, so it describes as a syntax semantic. So the syntax and execution semantic of this query language.

And in my opinion, this is important point, because this was a key factor for this whole GraphQL ecosystem to emerge. If you look now, GraphQL specification is implemented for more than 15 different programming languages, including Scala, with Sangria, and languages like Ruby and JavaScript, Python, [Alexa 00:19:32], [inaudible 00:19:32], and so on.

So all major programming languages have an implementation. And I think this success is, so the specification was a key for the success. So yes, GraphQL specification, we have different implementations of the specification to help people build GraphQL servers. So it’s also an [inaudible 00:19:59] supplementation written in JavaScript, which has maintained by Facebook.

Adam: Like from the sounds of it, on the outside being a query language, it sounds like somehow it’s a database front end. Is it a database front end?

Oleg: It’s just often misrepresented, or perceived as similar to SQL. So many people, when they hear about GraphQL, they think about SQL, or this kind of more complicated, more complex and very expressive language. But the fact is that GraphQL is not quite similar to SQL. It’s quite different. It’s more similar to Rest API than SQL, because with GraphQL, whatever you allow your client to ask for, everything needs to be explicitly specified.

This means every single time you, for example, say you provide a list of organizations, it is explicitly specified that user has a field, and this field is called organizations. And you can ask it on a user, but maybe not in any different other way. Also, if you would like to sort this list of organizations in particular way, you need to explicitly define an argument for this field, so that user can specify a limited subset of sort criterias.

But all of this is done by several developer, and you need to explicitly allow this field to be available on a user type. Otherwise, GraphQL doesn’t have any generic type of aggregation, like SQL does. So this means you can kind of anticipate what client would be able to ask, and you can optimize, like you can be sure that everything you expose to your GraphQL API is always optimized in some way.

Adam: So Graph QL, you know, it’s a query language, but the person on the server is responsible for writing the evaluation of this query.

Oleg: Yes, exactly. So another thing that I haven’t mentioned yet is GraphQL is a completely agnostic of the protocol. Like it can be HTTP. It can be also [TCP 00:22:30]. It’s just a simple Java TCP connection. And it also doesn’t care about the actual data format. So it can be a JSON. It can be an XML. It can be a binary data product. Like for example, I used GraphQL with MessagePack and Amazon Ion formats, which are binary data formats. And when you implement a GraphQL API, for every single field, you need to specify a function, and this is your function.

So it’s up to you how to, and what you do if particular field is requested. Of course, there are sort of tools that help you build those resolve functions. There are things like macros that will help you to derive as a structure and implementation based on the case classes.

But in some cases, you actually need to go to this database and fetch the data. But since it’s just a normal function, you can do it … You need to do it yourself. So this means GraphQL also have no notion about any kind of data storage. And then so up to you how you implement it. As a nice side effect of this is that you can create a not only single store, but a multiple data store or databases at once.

For example, in our API, we use MongoDB and ElasticSearch, and in a single GraphQL query can be fulfilled using data allotted from MongoDB and ElasticSearch. And I think even a part of it can be can come from some internal Rest API.

Adam: Yeah. Or you could have, you know, if you had some sort of microservice architecture, there’s a whole bunch of various services, and your API endpoint is kind of gathering those up to fulfill the request.

Oleg: Yeah.

Adam: So one of these tools for implementing the server side of GraphQL request is Sangria, which you created. So what led you to create this?

Oleg: Yes, exactly. So Sangria is a Scala GraphQL implementational. It helps you to build the server and expose GraphQL API, and it provides a lot of tools to help you work with GraphQL queries, to help you validate, and maybe build tools on top of it. For me, motivation was a lot of problems I personally faced with Rest API. I was building Rest API for a long time, and the first time I heard about GraphQL, I saw how it works, and how it feels. I got immediately excited about this technology, even before it was publicly released.

As soon as it’s released 2015, yeah, I think it was July, 2015. It was announced that [inaudible 00:25:27]. I was there, and I was excited about the specification. And I immediately started working on Scala implementation, because of course, none was available for Scala. And it was a huge help that GraphQL has not only the specification, but also reference implementation.

I am not a big JavaScript … so I am not a very good JavaScript developer, and it is written in JavaScript. But it was very helpful, if you’re trying to implement something that has some specification in different language, because if there is some ambiguity, if something is not quite clear based on the specification text, you can always go and look into algorithms. You can look in specific implementation. You can test it out in the actual scene, and just implement the semantic. So this was a huge help. And it took, I think, about a month, and after a month, it was kind of the first feature complete implementation of GraphQL library.

Adam: How did you find combining Scala with GraphQL? Did the technology seem to fit well together, or not?

Oleg: I think it fits quite Well. What always concerned me is that we often define case class we define and think through our data model. We define very precise relationships between all our types, and to help us reason about our type system, to help us reason about an application. But then as soon as we hit and start to implement Rest API, suddenly we throw away all of those types [inaudible 00:27:15], all of these data [inaudible 00:27:16]. And what we expose is just a JSON blob. So we just expose endpoints and JSON blobs.

Some people do expose things like open API, like the schema, but those schemas are often maintained separately. So they often are out of date, and maybe not as precise as one would prefer. And for me, it was personally a huge selling point of GraphQL, that it has a type system. It’s not as powerful as Scala type system, but I think it’s also not a good idea to expose something like Scala type system through API.

So it needs to be something more simple, and something that is more usable by many different other clients. Like in many scenarios, companies actually implement their server side in GraphQL, in Scala with Sangria. And on the client side, they have the whole team of front end developers who build React applications, who build React native applications, and maybe iOS and so on. So it is important that those people can take advantage of this information about type system, and including all those [inaudible 00:28:27] that they use.

Adam: Like the open API standard, that’s like a Swagger. It makes like the Swagger document, right?

Oleg: Exactly.

Adam: And it does tell you the types, and how you would call things and stuff, but it’s not enforced. Like there’s nothing to say that it’s correct. And it’s also completely, as you were saying, like outside of your API. It’s just, like, I maintain some giant Yammel document that produces that. And yeah. It can lag. So how do types work in GraphQL? How do I get the type of a certain call I’d like to make?

Oleg: Yeah. So it’s actually, the API of introspection, it’s not different from anything else. You’re just making a normal GraphQL request. You just say underscore underscore schema. And this field is always available on all GraphQL APIs. And you get the full introspection of the type system. It includes things like which types are available, like user or profile, or maybe the organization. Which fields those types have. You have information about interfaces and union types, about scalar types, like int and string, long and sum. Then you also have enum values, or enum types and enum values.

And I think the last thing, you also have input types. So there’s a difference between output and input types. And input types can be provided. It’s a complex [inaudible 00:30:05] kind of an object. You can think of it as a JavaScript object, or a JSON object. But it can be provided as an argument.

Adam: For like updating something, or adding something?

Oleg: Exactly. So GraphQL has … So I think it’s also important point that GraphQL is not read only. Not only read only. But it also has a mutation part. And subscriptions. So when you say just the query, it means you just want to read the data. So in this case you are working with [inaudible 00:30:26] types. But you can also say a mutation. So you can set the mutation query, which looks very, very similar. It’s just list of fields which have arguments. But they intended as a mutations for your data. So you can have things like add new product, create user, change username, and so on. So those kind of fields.

Adam: So does that mean I can send an update that affects like a whole graph, several objects that could be in several data stores?

Oleg: It’s possible. Because as far as GraphQL is concerned, it doesn’t know anything about the business logic of your application. It doesn’t try to prescribe specific way of modeling the data. It just provides tools to model data, and to describe this structure, the type system. So in this case, you just find the field, which is called, for example, subscribe user. And on the server side, you provide a function, [resolva 00:31:46]. It’s often called resolve function. Where it’s up to you what you would like to do, which data stores you would like to talk to in order to subscribe user. It might be just single data store. It can be also, for example, you can communicate to [inaudible 00:32:05] and you can create user and subscribe user to particular mailing list.

Adam: And this is the beauty of it just being a protocol, right? Is the details behind it could vary.

Oleg: Yeah, exactly. I think this is a very powerful concept. For example, Twitter uses GraphQL. So they started to use GraphQL a while back. And I think this demonstrates the power of this kind of independence of the transport, and independence of the data format. Because on the surface, you can have a HTTP API that returns JSON. But then internally, you can use the same query, and give it, or copy set its execution, because multiple different microservices which also do GraphQL, but they use maybe a [GCP 00:32:51] protocol, something like that, that above. Or Swift. And use more efficient bandwidth data format.

Adam: And is Twitter using Sangria to do this?

Oleg: Yes. Yes, they use Sangria. There’s actually a very interesting talk from GraphQL assignment 2017. I think it was in October or November. And the last talk is from Twitter. And they talked about how they implemented subscriptions with GraphQL. So GraphQL subscriptions with Sangria.

Adam: That’s awesome. I’ll have to check it out. What were your thoughts when you found out this library you built was being used by Twitter?

Oleg: It’s definitely, I was really excited, and maybe a little bit scared, because you never know. Maybe something is wrong, and something will cause disaster. And it was a while now, and we actually had some communication, and they actually contributed some of the implements to the library. There’s a lot of companies that use now GraphQL, and a lot of them also use Sangria. And I think it’s, I don’t know, I’m very happy about it. And I think it helps to improve the library, because all those companies use Sangria, and they contribute back. They contribute back not only code, but also the feedback, and maybe some problems, when something goes wrong, or maybe there’s some performance issues, and we can figure it out. We can discuss it. So this helps a lot.

Adam: Yeah. I can understand why you’d be a little fearful at first. But at this point, if Sangria is being used by a number of big companies including Twitter, the number of GraphQL requests that it’s served, like it must be pretty battle hardened at this point.

Oleg: I guess so. I suppose I remember last time that we talked about it, that it was like, I think it was about two billion requests per day at the moment.

Adam: Wow.

Oleg: Which is not, I mean, on a Twitter scale, it’s not that big. But in terms of new technology, or [inaudible 00:35:17] technology, it is quite an amazing to see. So yes, it’s definitely a good proof that this works, and it scales. And maybe there are some issues along the way, but I’m pretty sure that there’s nothing that we cannot solve [inaudible 00:35:36].

Adam: Yeah, wow. That’s some big numbers. I mean, at least from my perspective. So who else? Are there other exciting companies using Sangria?

Oleg: Definitely. So I know of Twitter and New York Times. They also use Sangria. [inaudible 00:35:54]. So the [inaudible 00:35:56], they provide this educational videos. So they have also courses about the function of GraphQL and the [inaudible 00:36:02].

Adam: Interesting. So if I were to use Sangria, and getting into the details, if I have my user case class and I want to return that as part of my GraphQL requests, what do I have to do?

Oleg: So if you just have a case class, for example, and you have a way to get this case class from some place, like database, what you need to do is to define an object type. So the object type is kind of a meta information about your type. It contains description, the name. And a list of fields. And every field has a name and description, and result function. So as soon as you have this type, so [inaudible 00:36:50] define additional meta information for the … you kind of describe the user type in more detail. Then you can create the schema. So use just a simple case class. And you can execute queries against this schema and this type. So it’s actually very little work you need to do in order to expose particular type via GraphQL.

Adam: So if I have a service, whatever it’s called, get user, and it takes like a [inaudible 00:37:22] and it returns this user case class. So then I write an object type, which basically contains the documentation for my user object? Like what types the fields are? Is that right?

Oleg: Exactly. Exactly. It defines the name of the fields, the documentation possibly, and the type of this field. And you also need to find this [inaudible 00:37:47] function which kind of provides information, given a user case class, how do I get a field like name, for example. So just a simple function. What you can also do, you can use macro. Sangria provides [inaudible 00:38:03]. Sorry, it’s a macro that looks at the structure of the case class, so any class, and generates this meta information for you. So you just make a simple … You just arrive in the structure from a case class, and expose it via GraphQL API.

Adam: Because in most of the cases, I’m just returning this object. Basically, in most of the cases, it’s just a straight translation from I have this object, and now I want to make this object type that just describes its fields with some description. So I can generate that with a macro.

Oleg: Exactly.

Adam: But if I really want to … Because there’s the cases, I’m thinking, where it’s like I don’t want to return this particular value. Is that when I would use the more … Is that when I would skip the macro?

Oleg: In fact, you can go [inaudible 00:39:01] this macro, because I personally believe that macro or macro based derivation shouldn’t be all on us [inaudible 00:39:10]. So in many cases, when I see a macro, it’s kind of all on us. Like if macro does what you like, you can just use it. If you want to do some small customizations to the result of this macro, you kind of need to go in this more explicit style.

This is not the case with Sangria, because you can, for example, derive the structure of a guess class, but you can still provide a description for fields, or you can exclude or include particular fields. So you can even replace field, or add new fields, but still derive the structure, the base structure from based on the case class. And it’s all type set, because macro just executes its compile time. So this means if you, for example, exclude field that doesn’t exist, it would be a complication error.

But if you have something completely custom, something completely new that doesn’t have kind of a case class to it, in this case, yeah, you can use explicit style. And it’s not that much qualified, but you can be very explicit. Actually, some people do prefer to use explicit style, because one of the big reasons is because you often want to keep your API data model or API type system separate from your internal data model. Your internal data model represents how you kind of implement your business logic, but since you expose, this is what you like users to see.

Adam: Yeah. Because I suppose, if I have this user object, and I just use a macro to expose it, and then somebody else adds new field to the user object, they might not actually realize that they’re exposing that via the API. If it’s not explicit, I could see that happening.

Oleg: Exactly.

Adam: So I think you slightly touched on this, but if I want to return something in my user object that is not, in fact, a field, that is like a function. Say I have employee salary, which is actually a calculation based on some other properties. But I don’t want to have that calculation have to be redone on the client. I’d rather return it. How would I expose that?

Oleg: Exactly. So for example, if you’re using macro, use a drive, object type, user, and then as an argument, you provide to it kind of a setting. It is a list of settings. They analyzed it compile time. And there, you can say something like, “Add a new field.” And from this point on, just from this field, you define as a field explicitly, which means you can provide name, description, and [inaudible 00:41:56] function. And inside this [inaudible 00:41:57] function, you can do whatever you want. You can communicate, for example, to external service and fetch this data from somewhere else. Maybe from some existing RESTful API.

Adam: And this would be kind of the same way I would use it to follow references? Like if I wanted to say that there is something hanging off my user that is organization name. However, I actually have to go out and perform a query to get that on the back end, would I expose it the same way?

Oleg: So every single field is just a function. So most of the functions will just return a data, or load the data. But some functions will need to load data from some data store. In particular, there’s a root type in GraphQL, root query type. This is a kind of your entry point. Those fields are available for you when you just type, “Okay, open curly braces,” and then you start to type your fields, this is a query type. So those are adaptable fields that you expose to the client.

And for most of those types, you will need to go to the database or some kind of data storage to load the data. Like load the user case class. And then, the user type will work with those fields, but those fields are already loaded. And the same way it works for references. It’s just recursive in the sense of on a user, you can have field organizations, and the field itself can go to database and fetch a list of organizations for this user, and then give it back.

I think one of the problems that many people ask, after introduction of those kind of things, is well, you will end up with this n plus 1 problem. You kind of load the same thing over and over again. Or you have a lot of this n plus 1 queries to database. And with Sangria, there is an option of deferred value resolution. So what you can do, instead of loading and opening the action, you can return the third value.

So this says that, okay, in this place, I would like to load an organization. What execution engine does, it collects all of those deferred values, and at the right moment, when there is nothing more to collect, it calls another function. And this function just gets a list of those deferred values as an argument, and can very, very deficiently load all this data at once. So at the end of the day, you will not end up with a n plus 1 problem. You can load all this data at once in single place.

Adam: Yeah. I think this is a really cool feature. So a person makes a request, and they get back all the users with the first name Bob. But they also want to include with that Bob’s organization. So if 10 records come back, in theory, it’s got to make 10 requests to organizations. But I believe what you can do in Sangria is that you get back all these IDs, all the organization IDs, and feed them as a list into some requests that’s like get me all these organizations. And then it kind of builds the graph back for you, right? So you end up with two requests instead of 11.

Oleg: Exactly. So if you’re familiar with libraries like Fetch, I think it’s from [inaudible 00:45:44], or [Clump 00:45:48], this is a very, very similar concept. So you kind of give back IDs and just say, “Okay, eventually use this ID to load this data, but you do it in bulk, or in batch.” And other people actually use it. The [inaudible 00:46:04], this batch, scale query. Or they maybe have Rest APIs that accepts a list of IDs, and gives back just a list of objects back. JSON objects. And yeah, exactly, so you can do it all at once.

And in terms of the structure, so at the end of the day, your execution can branch, but you will end up in many cases, if you have kind of the same type of data, with one query nesting level. So if you have deeply nested query, so you say, “You asked for user, and then organization of a user, and then users of this organization,” you will end up with at most three scale queries, or maybe [HTTP 00:46:49] requests to internal micro services.

Adam: And as a implementer, it just means I just have to write the service that can get a user, service that can get a list of users, and then a service that can get an organization by ID, and a service that can get organizations like in bulk by a list of IDs, right? And then Sangria is building this data model. Sort of, it seems like, effectively, you’re doing a sequel join, but you’re doing think in memory using maps. Is that?

Oleg: Yeah. It’s kind of similar. In fact, there are different approaches to this. And in many cases, it’s actually not very efficient to make those giant scale queries, where you have a lot of joins. It’s quite inefficient. So what people often do, they kind of separate this big scale query in multiple smaller queries that work in bulk. So for example, you can ask for user, and you load user information. And then user has a list of organizations. In this case, you just make, okay, give me all of those organizations by ID. And in many cases, actually, this is more efficient than make a huge scale query with nested joins.

Adam: Makes sense. It probably in some ways can scale better, because you can have a whole bunch of GraphQL serving boxes sitting out there, right?

Oleg: Yeah, exactly. It’s much more flexible, because you are not tied to specific data storage. Because for example, in our case, especially if you are working with [inaudible 00:48:37] scale databases. Like in our case, we work with MongoDB and ElasticSearch. Part of the data, it actually comes from ElasticSearch. Part of it comes from MongoDB. Part of it might come from external or internal micro servers. And you need to orchestrate the fulfillment of the GraphQL query across all those data storage engines. And in this case, it’s quite flexible that you can actually separate the whole, so separate the API from the data storage.

Adam: Gives a little give there. You could move things from one data store to another, and it wouldn’t even matter. Well, it would matter, but not as much. Do you think there’s cases where GraphQL isn’t an appropriate solution for an API?

Oleg: I think so. So I think GraphQL API is quite helpful if you have this fast evolving data model which you would like to expose to specific clients, and you would like to provide a lot of flexibility for those clients. But in many cases, or in some cases, you have well-established data model, and you have huge amount of clients. The actual storage of the data is distributed. Just think about like [Wikipedia 00:49:59] API. Wikipedia API doesn’t change that often. It provides [inaudible 00:50:03] media. It needs to be very cacheable. Like it is very helpful to have a cache for, for example, JSON of particular Wikipedia [Azure com 00:50:14].

And in this case, I think class API might fit much better, because it’s so tightly coupled with the HTTP and the way HTTP works, so we can use all of those reverse proxies and caches along the way, and they all can understand, since e-tags and the cache headers and so on. So this question part is kind of, it’s much easier with Rest API, because you kind of expose [inaudible 00:50:44], and it’s much easier to cache it.

This GraphQL API, if you expose it via HTTP, you normally have just a single endpoint. But what you would like to get, you specify via GraphQL query. There are ways to cache it, but it’s more complicated than with Rest API, which is tightly coupled to HTTP, and the way HTTP works.

Adam: Because I guess there’s some advantages then of the protocol being tied in to the transport protocol, I guess.

Oleg: Exactly. So there are definitely advantages for this. But for example, clients like Apollo clients, GraphQL clients, they’ll try to mitigate this problem. So for example, Apollo client has a normalized cache on the client side, so it maintains it. So this is a big help if data cacheing is important in your application. But otherwise, you can’t really take huge advantage of HTTP caching with GraphQL API.

Adam: So what do you find is a stumbling block for people learning about GraphQL and about Sangria?

Oleg: Maybe just different data model. Sometimes people have things … With Rest API, you model data in very different way than with GraphQL API. This might be a big kind of roadblock, where people try to apply the same concepts, like researches onto both GraphQL API, and they don’t take advantage of a more expressive data model that GraphQL provides. All of those relationships and connection between different parts of the data, relationships between types are maybe not modeled.

Adam: If you’re just thinking in terms of resources, and the properties that hang off those resources, GraphQL actually is more about how these things relate. So you might miss the way that you can model these relationships.

Oleg: Yeah. Exactly. People don’t even realize that GraphQL has a way of type system, and this type system is exposed where introspection API, which is always available for all GraphQL APIs that is out there, and those are revelations. And I think it’s very very powerful notion. So for me, personally, this is, I would say, one of the biggest [inaudible 00:53:18] of API is that it has a type system, and it has all those nice qualities.

Adam: And that leads to discoverability as well, right? With the Rest API, there might be like a Swagger documentation. There might be some documents somewhere describing things. But if it’s a GraphQL endpoint, it’s always going to be self documenting.

Oleg: Exactly. And you can rely on it. I think this is also important point that you can actually rely. So if you can have a GraphQL endpoint, just a URL, you know precisely how you will figure out what types it provides, what things you can do with that. And because of this, we actually see a lot of tools built for GraphQL API. Like for example, there was recently a small scale library loads the schema from two different places. And it compares the types and fields between each other.

So there’s a help instant in Sangria, and it just prints a list of Kraken changes between those two different schemas. And this is very helpful, because what you can do, and what people actually doing when I was talking to people who use it in production, they integrated as a part of the [CI 00:54:36] or [inaudible 00:54:38] pipeline. And they compare schema changes between the staging environment and production environment.

Adam: So it’s a tool that takes a GraphQL endpoint, says, “Okay, get me the types. Now, get all the fields of the types.” Then it says, “Okay, the last time I looked, there was this field, and now it’s gone. So this is a breaking change.”

Oleg: Exactly.

Adam: That’s very cool. I never thought of that. That’s a neat idea. Well, Oleg, I want to be considerate of your time. So thank you so much for talking with me. I’ve learned a lot about GraphQL. It’s been a lot of time.

Oleg: Yeah. Thank you very much. Thank you for having me.

CORECURSIVE #011

Graphql

And Sangria With Oleg Ilyenko

Transcript

Graphql