Autarchy of the Private Cave

Tiny bits of bioinformatics, [web-]programming etc

    • Archives

    • Recent comments

    Radical Alternative to caching: On-the-fly Content-Regeneration

    23rd May 2007

    Refreshing my scarce knowledge of Apache’s mod-rewrite, I read through the mod_rewrite guide, and found an extremely interesting section, titled

    On-the-fly Content-Regeneration

    Here’s the theoretical problem:

    1. we are building a high-traffic site with lots of once-per-(hour|day) updated items
    2. we have a CMS with just all the features we need, but it’s really CPU/DB-consuming and slow (does it sound familiar? :) )
    3. there’s a need to serve static files

    And here’s the ‘radical alternative’ solution:

    1. install the CMS of choice
    2. tweak the CMS’s output layer to both produce/write to disk (or update) static HTML files, and to dump those same pages directly to browser
    3. use the “On-the-fly Content-Regeneration” mod_rewrite rules set

    This is it, in short. The “On-the-fly Content-Regeneration” will read the static files if they exist, or will query the CMS, which will create/update the static files and output the necessary page. You can also setup a cron-job to remove all static files older than XX minutes, to force content refresh.

    Below is the copy of “On-the-fly Content-Regeneration” from the mod_rewrite guide.

    Here comes a really esoteric feature: Dynamically generated but statically served pages, i.e. pages should be delivered as pure static pages (read from the filesystem and just passed through), but they have to be generated dynamically by the webserver if missing. This way you can have CGI-generated pages which are statically served unless one (or a cronjob) removes the static contents. Then the contents gets refreshed.
    This is done via the following ruleset:

    RewriteCond %{REQUEST_FILENAME} !-s
    RewriteRule ^page\.html$ page.cgi [T=application/x-httpd-cgi,L]

    Here a request to page.html leads to a internal run of a corresponding page.cgi if page.html is still missing or has filesize null. The trick here is that page.cgi is a usual CGI script which (additionally to its STDOUT) writes its output to the file page.html. Once it was run, the server sends out the data of page.html. When the webmaster wants to force a refresh the contents, he just removes page.html (usually done by a cronjob).


    Leave a Reply

    XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>