nonplussed/geek: Caching

Caching Jun 11 2003 | Development | PHP | Perl | Email

Application performance can be often dramatically increased by caching the results of expensive operations instead of performing the same operations over and over again. The classic example is to calculate loop-invariant values before a loop instead of inside of a loop. These types of optimizations are easy to see and easy to correct.

However, often the expensive operation is not localized withing a loop. It may be a function or a database query that is called with the same parameters from different places in your program. There not always a clear place where—and whether—the function’s value should be precomputed and stored. One simple and self-contained solution is for the expensive function to perform the caching of its results on behalf of the caller.

Example in PHP

The Text DB API is a flat-file SQL base database. It’s implementation of LIKE operator comparisons—which include SQL wildcard characters ‘%’ and ‘_’—was rather inefficient. If a query has a WHERE clause with a LIKE comparison, the function compare_like would be called for every row—with the same $value2 parameter. The function works by converting the $value2 parameter from SQL syntax into a regular expression syntax and then calls preg_match to perform the actual comparison.

function compare_like($value1,$value2) {
     $pat = ... // expensive code that converts $value2 to preg pattern
     return preg_match ("/^" . $pat . "$/" . $mod, $value1);
}

The expensive part of this function turns out the be the construction of the $pat pattern. Since this function is called many times with the same $value2 parameter, it makes sense to cache the calculated $pat value.

Here is the modified version of the compare_like function. It declares a static array $PATTERNS that is indexed by the SQL comparison string. When a new $value2 is encountered, the regular expression $pat value is computed and stored in $PATTERNS. On subsequent call to the function, the pattern is found in the $PATTERNS array and the expensive computation step can be skipped.

function compare_like($value1,$value2) {
     static $PATTERNS = array();
     // Lookup precomputed pattern
     if(isset($PATTERNS[$value2])) {
          $pat = $PATTERNS[$value2];
      } else {
          $pat = ... // expensive code that converts $value2 to preg pattern
          // Stash precomputed value
          $PATTERNS[$value2] = $pat;
      }       
     return preg_match ($pat, $value1);
}

Example in Perl

The MovableType MT::Template::Context module contains a routine that is called whenever a MovableType template uses the MTInclude tag to include another template. This function performs two database queries to retrieve the template and associated blog information, compiles the template into a token array, and finally builds the text from the token array.

## Example has been simplified for clarity
sub _hdlr_include {
     my($blog, $builder, $arg) = @_;
     if (my $tmpl_name = $arg->{module}) {
          require MT::Template;
          # Expensive operations
          my $tmpl = MT::Template->load(...);
          my $tokens = $builder->compile($tmpl->text);
          return $builder->build($tokens)
      }
}

The expensive part of this function is the loading of the template from the database and compiling it into tokens. When MovableType is rebuilding lots of entries, this function is called over and over with the same module names.

Here is a modified version of this function that caches pre-loaded and pre-compiled templates. The first time this function is called for a given module, the module is loaded and compiled. On subsequent calles the cached module is used for building.

## Example has been simplified for clarity
my %MODULES_CACHE = ();
sub _hdlr_include {
     my($blog, $builder, $arg) = @_;
     if (my $tmpl_name = $arg->{module}) {
          my $key = $blog->id . ':' . $tmpl_name;
          my $tokens = $MODULES_CACHE{$key};
          if(!$tokens) {
               require MT::Template;
               # Expensive operations
               my $tmpl = MT::Template->load($blog, $tmpl_name);
               $MODULES_CACHE{$key} = $tokens = $builder->compile($tmpl->text);
           }
          return $builder->build($tokens)
      }
}

Caveats

Using caching introduces overhead: the memory required to store cached information and the time it takes to look it up in the cache. It only makes sense to use caching when this overhead is outweighed by the time savings it provides. For example, if an expensive operation is performed every time with different parameters, caching does not provide any benefits at all.

Caching cannot be used if the information that you cache may become stale. For example, you can only cache results from database queries if you know that their results are static across calls (as happens to be the case in the Perl example above).