Caching in Web APIs

Jon Humble

Published in

Engineering at Depop

9 min readApr 23, 2021

There are only two hard things in Computer Science: cache invalidation and naming things.

— Phil Karlton

“Server room at CERN” by torkildr is licensed under CC BY-SA 2.0

It’s clear from its name that the World Wide Web scales well. One of the reasons for this is HTTP caches. At Depop we want to make sure we maximize the benefits of this technology for our users.

Getting caching right is notoriously difficult, so here’s an overview of the options and ideas on how to get the best out of them.

Types of cache

We will differentiate between two types of cache:

Local caches reside on the same device as the user agent — typically a mobile phone or computer. They store responses privately, just for the user of the device.

Shared caches on the other hand are found somewhere on the network between the user’s device and the origin server. Indeed, there may be several of these along the way. They store many users’ responses and these responses can be used to satisfy requests from other users if the requests are the same.

Figure 1: A simple view of how caches may look between a Depop user and our servers.

Local caches are the closest to the user, so will result in the least latency. However, they are less likely to have responses cached as they see only the responses for a single user. Shared caches on the other hand may have a response cached from another user that this user can use, so are more likely to result in a cache hit.

The benefits of caching

Reduced Latency: The client will receive a cached response quicker than if the request had to go to the origin server. This is because the cache is closer to the client on the network and it does not have to do any processing.
Reduced Server Load: Any response served by the cache saves the server having to do any work to service the request.
Reduced Network Load: Any response served by the cache means the network between the cache and the origin server is not utilized.

What’s the cost?

All these benefits have to come with a cost and with caching the cost is consistency. The more aggressively we take advantage of caching, the more likely it is we will serve the client a stale response. We will look at how to balance these concerns shortly.

What gets cached?

For our purposes, we will focus on GET requests resulting in 200 responses. RFC 7234 covers HTTP caching and section 2 has some notes on other, less common, caching scenarios.

It’s worth noting that to a client, a cache looks like it is the origin server. A client does not distinguish between a response from the origin server and one from any cache on the way¹.

Configuring how responses get cached

Caching is configured by headers in the responses from the origin server.² There are three main factors that can be controlled — cache selection, cached item freshness and cached item validation.

Cache Selection

Cache selection covers where the response can be cached.

If the item must not be cached anywhere, we use the cache-control directive thus²:

cache-control: no-store

To allow only local caches to cache a response, we use :

cache-control: private

Shared caches should respect this and not store any such decorated response.

Finally, we can use:

cache-control: public

to allow all caches to store the response. This is the usually the default, but there are circumstances involving authenticated requests in which we may need to use it, as we will see later.

Freshness

Caches are allowed to serve cached responses to clients so long as the cached item remains fresh. What is considered fresh is determined by the response headers³ outlined below:

expires: Wed, 21 Oct 2020 00:00:00 GMT

The expires header sets an absolute point in time before which a response is considered fresh.

cache-control: max-age=600

The max-age and s-maxage settings set a relative amount of time, in seconds, that the response is considered fresh relative to its original generation time on the origin server. The latter applies to shared caches only and overrides the former which applies to both types of cache.

Validation

When a cached response is requested, but found to be stale, the cache will, by default, reach out to the origin server for a fresh copy⁴, which it will then store and serve to the client.

However, HTTP has a better mechanism — a means to re-validate the freshness of the item without having to make the full request/response cycle.

This mechanism is known as a conditional GET and depends upon the presence of a validator in the response. A validator is one of these headers:

last-updated: Wed, 21 Oct 2020 10:19:02 GMT

etag: 345312-af5445-eaf43-12eec

The former header conveys the time the response was last updated. When such a response expires, the cache can issue a conditional GET using the request header:

if-modified-since: Wed, 21 Oct 2020 10:19:02 GMT

If the response has not been updated, the server will respond with 304 Not Modified and the cache can serve the cached item to the client and refresh the freshness value it has for this response.

If the response has been updated, the server will respond with a typical 200 response and a new response will be sent. This replaces the old one in the cache and is then served to the client.

Entity Tags (etag) work in a similar manner, but have the advantage that they can change more often than every second. The conditional header for etags is if-none-match: 345312-af5445-eaf43-12eec

All of this is transparent to the original calling client — the conditional GET headers are only put on requests from the cache by the cache.

Invalidation

When a cache sees a request using one of the unsafe verbs (DELETE, PUT, POST), it must remove any item from its cache that matches the URI associated with this request.

Similarly, it must also invalidate any cached item that matches the location or content-location headers in the responses to these requests.

Sadly, in practice this can be a slow and unreliable process (Some CDNs can take 20 minutes or so to reach global consistency). For this reason, it’s best not to rely on this form of invalidation as a means of cache management. This puts more onus on the correct choice of caching freshness parameters.

Trade Offs

Favouring Consistency

For the ultimate consistency, caching can be switched off altogether using the no-store directive¹. However, there is a better mechanism.

Provided the response contains a validator, the cache can be directed to revalidate every request. This uses a conditional GET rather than a full get and should reduce bandwidth and server utilisation:

cache-control: no-cache, public, must-revalidate

The no-cache directive¹ often confuses — it means that the cache can store the response, but not serve it without first checking with the origin server.

Favouring Low Latency (also favours low network and server load)

Low latency is best achieved by getting the most cache hits. To achieve this, we can use a relative or absolute cache expiry time, e.g.

cache-control: max-age=3600

Increasing the value of max-age will improve latency at the possible expense of consistency.

Once the cached item expires, the cache will make a call to the origin server for a fresh copy.

We can improve this by adding a validator to the response, e.g.

etag: ae91-bb6c-9632-00d0

Now, the cache makes a conditional GET when the cached response expires, potentially saving bandwidth and server resources.

This will be done at the time a request comes into the cache for the resource in question, blocking that request while it does so. We can improve this by specifying background updating of the items in the cache thus:

cache-control: stale-while-revalidate=1800

With this directive in place, if a request comes in for the cached response within 1800s of it expiring in the cache, the currently caches response will be served to the client while a revalidate request is made to the origin server. Hence the cache will be refreshed when the next request comes in.

By balancing the values for max-age and stale-while-revalidate we can achieve background updating of resources while minimising latency.

Maximising Resilience

Caches also come in useful if the origin server suffers an outage. In these circumstances, a cache can be directed to continue to serve stale responses for a period while the origin server is returning errors:

cache-control: stale-if-error=3600

In this example, were the origin server to go down for an hour, the cache would continue to serve clients with cached responses even if they had gone stale.

Security Considerations

Encrypted data over SSL is not cacheable. The random jumble of bits resulting from encryption are different every time and only the original sender and recipient have the keys.

Responses from requests with authorisation are not cached by default for security reasons.

Together, these facts would seem to be bad news for caching!

There are, thankfully, things we can do.

Remember this diagram:

SSL termination may happen before the origin server, giving a space for unencrypted data to be cached. In the diagram above, that might mean the ISP cache cannot be used, but the CDN one can.

With authorised content, we can enable caching, but must do so with caution.

One option is private caching:

cache-control: private, max-age=600

Here, we enable caching on the user’s device only.

An alternative is public caching with forced validation:

cache-control: public, max-age=0, must-revalidate

This allows the cache to store the item, but forces it to do a conditional GET to the origin server. This GET must pass AUTH or it will result in a 401. This ensures the client is allowed to view the cached content without having to re-generate it each time.

For this to work effectively, the response will also have to have a validator in it — e.g. an etag or a last-modified.

Deciding how to cache resources

A cache represents a trade-off between latency and consistency. The more risk you take over consistency, the better your latency. Here are some things to consider when setting up the caching for your resources:

Volatility — how likely is the resource to go stale in time period X? This helps with choosing a freshness period.

Frequency — how often is the resource accessed? This points to the cost of having cache misses and having to engage the origin server.

Security — are we happy to have this in a shared cache in a publicly accessible part of the network? Or should it only be cached privately? Or not at all?

URI to Resource uniqueness

Shared caches depend on URIs uniquely identifying resources⁵. If a different resource is returned for the same URI based on another variable (often the calling user), caching won’t work!

Don’t get carried away with server side optimisations. Let the caches take the load.

In Conclusion

Caching is an imperfect science. Getting it right requires monitoring your cache hit rates and watching out for stale data being served. Hopefully this article has given you some ideas and pointers to get the most out of this important technology.

¹ Caches add specific headers to responses, so if you look carefully you can tell if a response came from a cache.

² Not all cache implementations respect this directive. In particular Varnish pre-version 4.

³ Clients can also put headers in their requests to modify these settings.

⁴ If responses do not contain freshness directives, caches can use heuristics to decide on freshness.

⁵If another cache en route to the origin server has a fresh copy, that can be used instead.

⁶ Multiple URIs can point to the same resource, but a single URI must only ever point to one resource.