18th October 2006
In this post: some caching-related HTTP request and response headers discussed.
Modern websites are “dynamic” by nature – content you get depends on a number of variables and conditions. The simplest example – being a “guest” or a “registered” user of some forum; in both cases you get content generated by the same script/program, but it differs because of your “registered” state.
Another example – web-photogallery. By design, it is wise to always keep the original photo (the largest and presumably of the highest quality). When gallery visitor’s browser requests any smaller version of the photo – we can dynamically resize (often downsize) the original and feed the resulting photo to the browser. This scheme works OK until your gallery gets more and more visitors – CPU load climbs up and increases wait time when accessing your gallery. Evidently, caching is needed.
Caching is a process of storing some integral (independent, atomic) piece of data in order to quickly retrieve it later, when asked for, in cases, when creating/accessing that piece of data without storing would require more time. Caching provides means to bypass calculation-intensive steps of content creation, and thus decreases server load, and allows your server to generate content faster, and deliver it to more visitors.
Probably the most important parameter of caching is “cache period” – time, during which cached piece of data is considered “fresh” and is read of the cache; if the cached piece of data exists more than “cache period” – it is considered “stale”, and should be replaced with newly recreated piece of data.
When web-server sends some content to the browser, there are three main “RFC-compliant” headers it may use to instruct the browser how to cache that content. These headers are Last-Modified, ETag, and Cache-Control. Two of these – Last-Modified and ETag – force the compliant browser to generate specific request tags: If-Modified-Since and If-None-Match. The Cache-Control header instructs the browser’s cache on how to handle data. For complete technical description, please refer to RFC2616.
Last-Modified informs the browser about the date when the document (or image) was last updated (modified). Correct Last-Modified header must be issued using GMT time, and might look like this example:
- Last-Modified: Wed, 10 May 2006 08:24:39 GMT
Sending this response header makes the browser to add corresponding If-Modified-Since request header next time it needs to load the resource. Following our example, browser’s request would include:
- If-Modified-Since: Wed, 10 May 2006 08:24:39 GMT
Note: browsers are recommended to send back time exactly as it was sent by the server, without modifications. At the moment, I do not know whether and which browsers follow this recommendation. In any case, if you do send time as GMT, you are more likely to get the desired caching effect, than if you send any unique string in the Last-Modified header.
If a web-server receives If-Modified-Since header, it might reply either with a HTTP/1.1 200 OK (followed by resource data) or HTTP/1.1 304 Not Modified (header itself, no data following). Server response depends on the “fresh” or “stale” status of the resource, as described above.
Time is not always a good measure of resource “freshness”, especially in the case of dynamic sites, which send and process headers themselves, “bypassing” server behaviours. In order to identify some resource, an ETag response header can be used:
- ETag: "some-double-quoted-string"
ETag is used as a unique resource identifier, which is expected to change if the resource changes. (Here, cyclic redundancy checks come to memory – and that is correct, hashing algorithms like CRC32 and md5 can be used on resource content to generate it’s ETag.) When a browser receives ETag header, then on next request it should send “If-None-Match” header:
- If-None-Match: "some-double-quoted-string"
Obviously, string sent by the browser must be equal to one received from the server. Server behaviour in this case is determined by the equality or non-equality of local (server) and received (from browser in If-None-Match) tags: equality hints sending the 304 Not Modified header only, non-equality requires sending 200 OK with content following.
ETag header value also has weak/strong modifiers, and If-None-Match request header is not the only one for handling ETag, but I will not cover those topics here. Please refer to RFC2616 for more.
Finally, Cache-Control response header is used to tell the browser how to cache (and if to cache at all) the resource being sent. Cache-Control has a number of possible values (comma-separated if there are more than one), but I will review only some of the values.
no-cache: forces resource re-validation on each request.
no-store: (applies to both request and response) forces the resource not to be stored at all; this is usually important for sensitive data.
max-age: (in seconds) informs about the time period when resource stays “fresh”.
must-revalidate: as the name implies, effect is similar to no-cache.
no-transform: forbids any resource transformations by proxies/servers in the chain to the end user (don’t think it’s still relevant, though).
That is it. Next post will continue the caching topic with the discussion of the universal approach to caching for web-sites.