5 minute read

GraphQL has some important differences from REST, and if we consider them, we pretty quickly realize that cost analysis is important.

This is the first post in a series planned about Cost Analysis.

  • If you spend a lot of your time on GraphQL then you already know that Cost Analysis is important and can skip to a later post.
  • If you’re used to REST and you don’t understand why Cost Analysis is essential for GraphQL, then you’re not alone and hopefully this post can help.

I’ve talked to various companies around the industry who are first starting on their GraphQL journeys, and while they’re starting to appreciate the advantages of GraphQL, it isn’t obvious to them yet that there is a flip-side to those advantages. It’s critical to their businesses that if they expose public GraphQL endpoints then they need some kind of cost analysis.

Here’s a short video version of this post:


REST APIs

Imagine a small business which retrieves data from various backend databases, legacy systems, and third-party REST calls.

They have exposed some of this data externally via REST endpoints. Over time, they’ve developed hundreds of these RESTful endpoints.

They secure and govern each endpoint in a number of ways:

  • Threat Protection makes sure request data isn’t too big and isn’t requesting too much response data.
  • Rate Limiting governs how many transactions per second are allowed per consumer on an endpoint.
  • Monetization of an endpoint charges an API consumer per transaction.

GraphQL APIs

Now assume that this small business adds GraphQL in front of their various backends. GraphQL seamlessly combines all of their backend data sources, both in-house and third-party, into a single graph. This allows API consumers to access all of that data with flexible queries of the client’s choosing, all directed at a single endpoint.

This change can simplify and optimize the data traffic and future development on both the backend and the frontend, but there is a big problem with securing and managing our new endpoint:

That single endpoint is flexible enough to accept dramatically different kinds of transactions. Many transactions might use only one REST call on the backend, while some other single transaction might - on its own - result in thousands of REST calls on the backend.

As a real-world example, consider GitHub’s public API. In one transaction, you can ask for your username:

query { 
  viewer { 
    login
  }
}

In a different single transaction:

  • You can ask for the most recent 100 PRs on the GraphQL Javascript reference implementation
  • For each PR, you could get its 100 most recent commits
  • For each of those commits, you could get the user who committed it
  • For each of those users, you could get his 100 most recent gists
  • For each of those gists, you could get the 100 most recent comments on it
query {
  repository(owner: "graphql", name: "graphql-js") {
    pullRequests(last: 100) {
      edges {
        node {
          commits(last: 100) {
            edges {
              node {
                commit {
                  author {
                    user {
                      login
                      gists(last: 100) {
                        edges {
                          node {
                            comments(last: 100) {
                              totalCount
                            }
                          }
                        }
                      }
                    }
                  }
                }
              }
            }
          }
        }
      }
    }
  }
}

For how GitHub responds, see below.

This flexibility is at the heart of the advantages of GraphQL, but it breaks all of our prior management models.

  • Threat Protection is problematic since it’s possible to make a single short query that is not deep and is not wide and is not long, and yet is extremely expensive to run on the backend. While this example is somewhat deep, it should be clear that just asking for 100 PRs was already way more expensive than asking for only your login name, and short queries can be very expensive without much nesting depth or query length.
  • Both Rate Limiting and Monetization clearly can’t be fair or reasonable while treating all of these transactions the same. It’s not fair to the backend that has to do so much more work for some queries and it’s not fair to the client that gets charged for their rate limit for the tiny simple queries.

In short, threat protection, rate limiting, and monetization are all broken.

So what can we do?

Solution

The only real solution to this problem is to not treat all GraphQL transactions the same - even when they all hit our single new endpoint. Instead, we calculate the cost of each transaction. For small - or inexpensive - transactions, we let many through each second and don’t charge much money for paid API plans. For large - or expensive - transactions, we let fewer through each second, and charge API consumers more money per transaction. For out-of-control queries, we error from the transaction.

Remember that out-of-control query we sent above to GitHub? It should come as no surprise by now that GitHub uses Cost Analysis to give them Threat Protection and they reject that query with this message:

“By the time this query traverses to the gists connection, it is requesting up to 1,000,000 possible nodes which exceeds the maximum limit of 500,000.”

We’ve maintained the advantages of GraphQL, while restoring our ability to enforce Threat Protection, Rate Limiting and Monetization.

Many questions remain: where, when, and how should we calculate the cost of a GraphQL transaction? What is clear is that some form of cost analysis restores our key security and governance features while maintaining the flexibility advantages for GraphQL APIs. There’s no other way we know of to effectively manage GraphQL APIs while maintaining the benefits of GraphQL: We need to use some form of GraphQL Cost Analysis.

Do you agree? If you can think of a way to avoid cost analysis entirely, let me know.