The QuerySet Cache

QuerySet caching is the automatic caching of all database reads. Conceptually, it works very similar to the built-in write-invalidate queryset caching that is present in your RDBMS.

When a read (SELECT) is made from one or more tables, that read is cached. When a write (INSERT, UPDATE, etc) is made against that table, the read cache built up for that table is invalidated. The way that Johnny achieves this is through the use of generational keys:

  • every table in your application gets a “key” associated with it that corresponds to the current generation of data
  • reads are cached with keys based off of the generations of the tables being read from
  • when a write is performed, the generation key for all involved tables is modified

When the generation key is modified, any cache keys previously built against prior generations are no longer recoverable, since the old generation key is now lost. This means on an LRU cache (like memcached, which maintains an LRU per slab), you can cache reads forever and the old generations will naturally expire out faster than any “live” cache data.

The QuerySet Cache supports Django versions 1.1 and 1.2.

class johnny.cache.QueryCacheBackend(cache_backend=None, keyhandler=None, keygen=None)

This class is the engine behind the query cache. It reads the queries going through the django Query and returns from the cache using the generation keys, or on a miss from the database and caches the results. Each time a model is updated the table keys for that model are re-created, invalidating all cached querysets for that model.

There are different QueryCacheBackend’s for different versions of django; call johnny.cache.get_backend to automatically get the proper class.

johnny.cache.get_backend()

Get’s a QueryCacheBackend class for the current version of django.

The main goals of the QuerySet Cache are:

  • To cache querysets forever
  • To be as simple as possible but still work
  • To not increase the conceptual load on the developer

Invalidation

Because queries are cached forever, it’s absolutely essential that stale data is never accessible in the cache. Since keys are never actually deleted, but merely made inaccessible by the progression of a table’s generation, “invalidation” in this context is the modification of a table’s generation key.

The query keys themselves are based on as many uniquely identifying aspects of a query that we could think of. This includes the sql itself, the params, the ordering clause, the database name (1.2 only), and of course the generations of all of the tables involved. The following would be two queries, not one:

MyModel.objects.all().order_by('-created_at')
MyModel.objects.all().order_by('created_at')

Avoiding the database at all costs was not a goal, so different ordering clauses on the same dataset are considered different queries. Since invalidation happens at the table level, any table having been modified makes the cached query inaccessible:

# cached, depends on `publisher` table
p = Publisher.objects.get(id=5)
# cached, depends on `book` and `publisher` table
Book.objects.all().select_related('publisher')
p.name = "Doubleday"
# write on `publisher` table, modifies publisher generation
p.save()
# the following are cache misses
Publisher.objects.get(id=5)
Book.objects.all().select_related('publisher')

Because invalidation is greedy in this way, it makes sense to test Johnny against your site to see if this type of caching is beneficial.

Transactions

Transactions represent an interesting problem to caches like Johnny. Because the generation keys are invalidated on write, and a transaction commit does not go down the same code path as our invalidation, there are a number of scenarios involving transactions that could cause problems.

The most obvious one is write and a read within a transaction that gets rolled back. The write invalidates the cache key, the read puts new data into the cache, but that new data never actually sees the light of day in the database. There are numerous other concurrency related issues with invalidating keys within transactions regardless of whether or not a rollback is performed, because the generational key change is in memcached and thus not protected by the transaction itself.

Because of this, when you enable Johnny, the django.db.transaction module is patched in various places to place new hooks around transaction rollback and committal. When you are in what django terms a “managed transaction”, ie a transaction that you are managing manually, Johnny automatically writes any cache keys to the LocalStore instead. On commit, these keys are pushed to the global cache; on rollback, they are discarded.

Using with TransactionMiddleware

Django ships with a middleware called django.middleware.transaction.TransactionMiddleware, which wraps all requests within a transaction and then rollsback when exceptions are thrown from within the view. Johnny only pushes transactional data to the cache on commit, but the TransactionMiddleware will leave transactions uncommitted if they are not dirty (if no writes have been performed during the request). This means that if you have views that don’t write anything, and also use the TransactionMiddleware, you’ll never populate the cache with the querysets used in those views.

This problem is described in django ticket #9964, but unfortunately fixing it isn’t straightforward because the “correct” thing to do here is in dispute. Starting with version 0.3, Johnny includes a middleware called johnny.middleware.CommittingTransactionMiddleware, which is the same as the built in version, but always commits transactions on success. Depending on your database, there are still ways to have SELECT statements modify data, but for the majority of people committing after every request, even when no UPDATE or INSERTs have been done is likely harmless and will make Johnny function much more smoothly.

Savepoints

Preliminary savepoint support is included in version 0.1. More testing is needed (and welcomed). Currently, the only django backend that has support for Savepoints is the PostgreSQL backend (MySQL’s InnoDB engine supports savepoints, but its backend doesn’t). If you use savepoints, please see the Manual Invalidation section.

Usage

To enable the QuerySet Cache, enable the middleware johnny.middleware.QueryCacheMiddleware. This middleware uses the borg pattern; to remove the applied monkey patch, you can call johnny.middleware.QueryCacheMiddleware().unpatch(), but the middleware will attempt to install itself again unless you also set settings.DISABLE_QUERYSET_CACHE to True.

Manual Invalidation

To manually invalidate a table or a model, use johnny.cache.invalidate:

johnny.cache.invalidate(*tables, **kwargs)

Invalidate the current generation for one or more tables. The arguments can be either strings representing database table names or models. Pass in kwarg ‘using’ to set the database.

Settings

The following settings are available for the QuerySet Cache:

  • DISABLE_QUERYSET_CACHE
  • JOHNNY_MIDDLEWARE_KEY_PREFIX
  • JOHNNY_MIDDLEWARE_SECONDS
  • MAN_IN_BLACKLIST
  • JOHNNY_CACHE_BACKEND

DISABLE_QUERYSET_CACHE will disable the QuerySet cache even if the middleware is installed. This is mostly to make it easy for other modules to disable the queryset cache without re-creating the entire middleware stack and then removing the QuerySet cache middleware.

JOHNNY_MIDDLEWARE_KEY_PREFIX, default “jc”, is to set the prefix for Johnny cache. It’s very important that if you are running multiple apps in the same memcached pool that you use this setting on each app so that tables with the same name in each app (like Django’s built in contrib apps) don’t clobber each other in the cache.

JOHNNY_MIDDLEWARE_SECONDS, default 0, is the period that Johnny will cache both its generational keys and its query cache results. Since the design goal of Johnny was to be able to maintain a consistent cache at all times, the default behavior is to cache everything forever. Note that if you are not using one of Johnny’s custom backends, the default value of 0 will work differently on different backends.

MAN_IN_BLACKLIST is a user defined tuple that contains table names to exclude from the QuerySet Cache. If you have no sense of humor, or want your settings file to be understandable, you can use the alias JOHNNY_TABLE_BLACKLIST. We just couldn’t resist.

JOHNNY_CACHE_BACKEND is a cache backend URI similar to what is used by Django by default, but only used for Johnny. This allows you to seperate the Cache that is used by Johnny from the caching backend of the rest of your site.

Signals

The QuerySet Cache defines two signals:

  • johnny.cache.signals.qc_hit, fired after a cache hit
  • johnny.cache.signals.qc_miss, fired after a cache miss

The sender of these signals is always the QueryCacheBackend itself.

Customization

There are many aspects of the behavior of the QuerySet Cache that are pluggable, but no easy settings-style hooks are yet provided for them. More ability to control the way Johnny functions is planned for future releases.

Table Of Contents

Previous topic

Johnny Cache

Next topic

The LocalStore Cache