Radical Alternative to caching: On-the-fly Content-Regeneration
23rd May 2007
Refreshing my scarce knowledge of Apache’s mod-rewrite, I read through the mod_rewrite guide, and found an extremely interesting section, titled
On-the-fly Content-Regeneration
Here’s the theoretical problem:
- we are building a high-traffic site with lots of once-per-(hour|day) updated items
- we have a CMS with just all the features we need, but it’s really CPU/DB-consuming and slow (does it sound familiar? )
- there’s a need to serve static files
And here’s the ‘radical alternative’ solution:
- install the CMS of choice
- tweak the CMS’s output layer to both produce/write to disk (or update) static HTML files, and to dump those same pages directly to browser
- use the “On-the-fly Content-Regeneration” mod_rewrite rules set
This is it, in short. The “On-the-fly Content-Regeneration” will read the static files if they exist, or will query the CMS, which will create/update the static files and output the necessary page. You can also setup a cron-job to remove all static files older than XX minutes, to force content refresh.
Below is the copy of “On-the-fly Content-Regeneration” from the mod_rewrite guide.
Description:
Here comes a really esoteric feature: Dynamically generated but statically served pages, i.e. pages should be delivered as pure static pages (read from the filesystem and just passed through), but they have to be generated dynamically by the webserver if missing. This way you can have CGI-generated pages which are statically served unless one (or a cronjob) removes the static contents. Then the contents gets refreshed.
Solution:
This is done via the following ruleset:RewriteCond %{REQUEST_FILENAME} !-s
RewriteRule ^page\.html$ page.cgi [T=application/x-httpd-cgi,L]Here a request to page.html leads to a internal run of a corresponding page.cgi if page.html is still missing or has filesize null. The trick here is that page.cgi is a usual CGI script which (additionally to its STDOUT) writes its output to the file page.html. Once it was run, the server sends out the data of page.html. When the webmaster wants to force a refresh the contents, he just removes page.html (usually done by a cronjob).