Methods of GraphQL Cost Analysis

6 minute read

There are three basic methods of GraphQL Cost Analysis: Static Query Analysis, Dynamic Query Analysis, and Query Response Analysis.

Introduction

Here’s a short video version of this section of the post, introducing the three methods of GraphQL Cost Analysis:


Background

A GraphQL transaction:

  • Starts in a client, which might use a client library like Apollo, and sends a GraphQL query over the network.
  • The query might go through a proxy, such as Akamai or an API Management system, on its way to the backend server.
  • The server might run some GraphQL middleware, and then a GraphQL execution engine, before the middleware returns the JSON response.

This response returns through the proxy and back to the client software.

In another post, we discussed why GraphQL cost analysis is important. But where and how can we calculate the cost?

Methods of Cost Analysis

The three methods of calculating the cost are:

  • Static Query Analysis, which performs a static analysis on the GraphQL Query. It can be applied wherever we have the query — at the client, in the proxy, or at the server.
  • Dynamic Query Analysis, which sums up the cost in the GraphQL execution engine, as it is evaluating the query.
  • Query Response Analysis, which uses the JSON response to figure out how expensive the query was. Response Analysis can be applied wherever we have the response data - usually on the proxy or client.

Why do we need more than one kind of GraphQL Cost Analysis?

Imagine a request at a retail store for up to 100 products in a special category, and some details about each product. A static analysis of the query tells us that we might gather the details by calling those resolver functions 100 times. But if the search criteria matches less than 10 products, then the server would call each of those resolver functions less than 10 times.

Both costs are important —

  • For Threat Protection we should use how expensive a query might be before we execute it.
  • For Monetization, we can authorize based on how expensive a query might be and charge money based on how expensive a query actually was.
  • For Rate Limiting, either cost is appropriate, or we could even charge a user based on the upper bound, and then refund their rate limit after the transaction.

Which of these methods you’ll use depends on why you’re calculating the cost, and where you are calculating it. While there are dramatically different options for how to calculate cost, we believe that these are the three fundamental Methods of GraphQL Cost Analysis.

Static Query Analysis

Static Query Analysis

  • Inputs: Schema, Query
  • Acurracy: Upper-Bound
  • Primary Use: Threat Protection
  • Location: Middleware

Static Query Analysis predicts how expensive a query might be. Therefore, it is useful in security-sensitive settings; by predicting how expensive a query might be in advance, we can prevent dangerous queries from running in the first place. This could be used in client software as a kind of assert functionality to make sure the client does not accidentally form an ‘out of control’ query. More importantly, the primary use for Static Query Analysis is either in a proxy/gateway or in the server’s middleware.

As explained previously, a key advantage of GraphQL is putting the client in control of the contract, which inherently means that the server has no control over what the client sends in the query. This risks abuse of the backend, unless we put in some kind of control on the cost of the query.

Static Query Analysis prevents the server from even starting to run the GraphQL execution engine on queries that we deem unsafe, thus restoring safety to the backend while maintaining the flexibility of GraphQL for the client.

Dynamic Query Analysis

Dynamic Query Analysis

  • Inputs: None per se, it augments your GraphQL execution engine
  • Accuracy: Exact
  • Primary Use: Rate Limiting, Monetization
  • Location: GraphQL Execution Engine

Dynamic Query Analysis is the most accurate form of cost analysis because it is calculated directly in the GraphQL execution engine. It also does not require any extra inputs, but it does require building code into your execution engine. Optionally, if only some resolver functions or GraphQL types are expensive, you could only augment certain resolver functions. Alternatively, you could inject code into all resolver functions or simply modify the base implementation of the execution engine.

The main advantage here is the accuracy. The main disadvantages are needing to modify the execution engine and needing to wait until already being into the execution — and reading/modifying backend data — before knowing the cost. Therefore, it is not great for Threat Protection but is often used post-facto for modifying the number of tokens in a bucket for Rate Limiting.

Query Response Analysis

Query Response Analysis

  • Inputs: Schema, Query, Response
  • Accuracy: Near-Exact Upper-Bound
  • Primary Use: Rate Limiting, Monetization
  • Location: Middleware

Query Response Analysis, also called Query Result Analysis, computes a cost that is extremely close to the exact cost computed by Dynamic Query Analysis, because it can determine exact sizes of all returned lists. However, it is still an upper bound because for some queries and schemas it is possible to have response data that cannot uniquely determine which resolver functions were executed and which types are being returned.

While Dynamic Query Analysis can interrupt processing and thus still provide some measure of Threat Protection, obviously by analyzing the result we cannot hope to save any backend resources at all for the current transaction. However, we can still apply the cost to Rate Limiting decisions which will thus affect the next transactions.

This method is usually chosen when a more exact cost is required, but it is difficult or impossible to modify the execution engine, such as when one group at a company controls the GraphQL server but a different group at that company controls API Governance.

Multiple Methods

As described earlier, you don’t have to pick only one method in your enterprise. You can easily use Static Query Analysis for up-front Threat Protection, and either Dynamic Query Analysis or Query Response Analysis for Rate Limiting or Monetization. Furthermore, you can apply the upper bound from the initial static analysis to your Rate Limiting token bucket at the beginning of the transaction and then refund the difference based on the actual backend data at the end of the transaction.

What do you do today?

How/where/when do you think cost analysis should be done?