Autarchy of the Private Cave

Tiny bits of bioinformatics, [web-]programming etc

    HTTP caching: universal approach and sample code

    9th December 2006

    As described in my previous post, there are some rather simple mechanisms to enable visitor’s browser to cache content, and avoid unnecessary load on your servers. In this post I’ll take a look at some parts of the practical implementation of the caching mechanism on the server, using PHP.

    I think of caching as a set of relatively independent procedures:

    1. Process browser’s request – find out, if the browser is asking for content for the first time, or is looking for an updated version. If the browser’s content is fresh, just reply that it’s fresh. Otherwise, proceed to step 2. Also proceed to step 2, if this is a first-time request.
    2. If this is a first-time request, check if we have a fresh locally cached content, and send it in response, to bypass de novo contet generation. If local cache is stale, proceed to step 3.
    3. Generate new (fresh) content cache item, and send it to browser, together with cache-enabling headers.

    I decided to put all the header-related processing into one function, and all the cache-creation and sending – into another. If you prefer, you can make an object with two or more methods, but I see no need in doing so.

    Here’s the headers part:

    1. function HTTP_caching($timestamp)
    2. {
    3.     /*
    4.      * HTTP_caching(v0.3)
    5.      * checks headers for eTag or Modified request,
    6.      * and replies with HTTP 304 Not Modified
    7.      * if $timestamp is equal to the requested time/tag
    8.      * returns TRUE if 304, FALSE otherwise
    9.      * produces Last-Modified and ETag headers from $timestamp
    10.     */
    11.  
    12.     $latest = gmdate('D, d M Y H:i:s', $timestamp) . ' GMT';
    13.     $etag = md5($latest);
    14.  
    15.     $not = false;
    16.  
    17.     if ( isset( $_SERVER['HTTP_IF_MODIFIED_SINCE'] ) )
    18.     {
    19.         if ( $_SERVER['HTTP_IF_MODIFIED_SINCE'] == $latest )
    20.         {
    21.             header("HTTP/1.1 304 Not Modified");
    22.             header("Status: 304 Not Modified");
    23.             $not = true;
    24.         }
    25.     }
    26.     elseif ( isset( $_SERVER['HTTP_IF_NONE_MATCH'] ) )
    27.     {
    28.         if ( $_SERVER['HTTP_IF_NONE_MATCH'] == $etag )
    29.         {
    30.             header("HTTP/1.1 304 Not Modified");
    31.             header("Status: 304 Not Modified");
    32.             $not = true;
    33.         }
    34.     }
    35.     if (!$not)
    36.     {
    37.         header("Last-Modified: " . $latest );
    38.         header("ETag: " . $etag);
    39.     }
    40.     return $not;
    41. }

    This function checks two caching-related request headers: If-None-Match and If-Modified-Since. Use of two headers is redundant, but this was done with an idea to provide different caching methods for different content types in the future. Timestamp-based approach should be sufficient in most cases, but having ETag as well does not hurt.

    The parameter this function takes – $timestamp – is the date of the locally cached content item. We should check if the cached item exists before calling HTTP_caching; and if cached item does not exist, we should create it and then call HTTP_caching with the timestamp of the newly generated cached content item.

    Function is utterly simple, but needs some explaining comments:

    • There is a convenient function “apache_request_headers”, but as its name implies, it works only under apache. Thus I had to use global PHP variables for fetching specific request headers. However, this approach also doesn’t work on all hosting providers – you just do not get those headers, though they are sent by the browser. But you still benefit from local caching, and the only drawback of being unable to process request headers is that you waste traffic sending data which could have been unsent.
    • I’m also sending two equal headers – standard HTTP/1.1, and outdated Status header. This is redundant (sending Status might be removed), but depending on how your PHP is installed (as apache-httpd module or CGI) you may want to leave both in place, or experiment and leave the one which actually works.

    Now, let’s move on to actual caching. The simplest and quite reliable method of identifying any object within your cache is md5(url) – that is, the hash of the request URL. Note, that you might want to hash not the complete URL (starting with http://), but only the part after the TLD’s slash, e.g. for complete URL http://bogdan.org.ua/2006/10/27/xnameorg-down-largest-ddos-they-ever-had.html you would hash only the “xnameorg-down-largest-ddos-they-ever-had.html” part (or “2006/10/27/xnameorg-down-largest-ddos-they-ever-had.html”, if the filename part of the path might be non-unique). Evidently, this will save you from generating cache both for “http://www.bogdan.org.ua/2006/10/27/xnameorg-down-largest-ddos-they-ever-had.html” and for “http://bogdan.org.ua/2006/10/27/xnameorg-down-largest-ddos-they-ever-had.html” (differing only in “www.” part).

    Here’s what we need to do (some evident micro-tunings dropped for clarity):

    1. Hash the request URL.
    2. Check if that cache item exists. If yes – go to step 4. If not – step 3.
    3. Create cache item (cache file), set the file name to the hash of the request URL, and store the file in your cache directory (e.g. ‘/cache’).
    4. Read cache file modification timestamp.
    5. Call HTTP_caching with that timestamp. If it returns true – do nothing. Else – continue to step 6.
    6. Read cache file and send it to browser.

    To make this part of caching system wrapped in a single universally applicable function, you will need to define some “content-generating” handler function, which will be called when local cache file must be regenerated. “Content-generation” might be as simple as just reading some file from the disk; or you could wrap your index.php file in cache-checking block, so that page call and generation occur only if there is no readily available cached page.

    function caching($handler_function)
    {
    // first, check if we have existing cache
    // of currently requested resource in the ‘cache’ directory
    $cachedfile = ‘cache/’ . md5($_SERVER['REQUEST_URI']);
    if (file_exists($cachedfile) && is_file($cachedfile))
    { // cache exists. now check if it is up-to-date
    $filetime = filemtime($cachedfile);
    $modif = time() – $filetime;
    // CACHE_PERIOD is a constant defined somewhere else,
    // e.g. define(‘CACHE_PERIOD’,300)
    if ($modif <= (int)CACHE_PERIOD) { if ( HTTP_caching($filetime) === false ) { // we need to send cached content to browser - no // if-modified-since or if-none-match request headers $expires = gmdate('D, d M Y H:i:s', ($filetime + CACHE_PERIOD) ) . ' GMT'; send_cached_file($cachedfile, CACHE_PERIOD, $expires) } // if HTTP_caching returned 'true' - we are done with '304' header. // finally, ensure nothing else happens after sending contents... exit(); } // if cache is old - just call $handler_function, // assuming that it returns all the content $new_cache = $handler_function(); // if instead of using $handler_function you wrap some code: // ob_start(); // include "page_generation.php"; // $new_cache = ob_get_contents(); // ob_end_clean(); $fp = fopen ($cachedfile, "w"); fwrite ($fp, $new_cache); fclose ($fp); $filetime = filemtime($cachedfile); $expires = gmdate('D, d M Y H:i:s', ($filetime + CACHE_PERIOD) ) . ' GMT'; send_cached_file($cachedfile, CACHE_PERIOD, $expires); } } [/php] Below is a function which sends cached files to the browser. Note, that it lacks content-type parameter (or content-type detection code). [php] function send_cached_file($cachedfile, $cache, $expires) { // I omitted Content-Length and Content-Type for simplicity, // though it's better to send them header("HTTP/1.1 200 OK"); header("Status: 200 OK"); header("Pragma: cache"); header("Cache-Control: max-age=".$cache.", min-fresh=".$cache.", no-transform"); // Content-Type needs adjustment on a per-content-item basis header("Content-Type: text/html"); header("Content-Length: " . filesize($cachedfile)); header("Expires: $expires"); readfile($cachedfile); } [/php] (Note: function 'caching' was not tested; if you find any errors, or if it just doesn't work for you - let me know via the contact form, I’ll try to fix my error.)

    For the explanation of what headers mean, refer to my previous post.

    This is it for now. Coming next might be some useful information on differences in HTTP headers handling between PHP installed as a module and PHP as CGI.

    As always, comments/suggestions are welcome.

    Share

    One Response to “HTTP caching: universal approach and sample code”

    1. Directory-based random image rotation PHP script » Autarchy of the Private Cave Says:

      [...] Another option is to add in some caching options. It might be reasonable as well to use MySQL HEAP-table (in-memory table) to store the directory-reading result and refresh it only once in an hour; this way images would be randomized on each page display, but the directory would be re-read less than once in an hour. [...]

    Leave a Reply

    XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>