Caching

One of the key reasons that low_card_tables works as well as it does is that it simply caches the entire low-card table in memory, as a set of model objects. This means that almost all operations happen incredibly quickly, with no database queries whatsoever; it only needs to hit the database when it creates a new row, or if it sees a reference to a row that has been created since the cache was loaded — and this all happens automatically, and transparently.

The Problem with Caching

There is one caveat, though, and it concerns queries. In short:

Caching can, in certain cases, cause queries against low-card attributes to miss rows that have been recently created; you will see a query not return rows that it should. These cases range from uncommon to nonexistent, depending on your application.
low_card_tables contains caching policies that can eliminate this completely (by turning off caching) or reduce the risk of it happening enormously.
Surprisingly, for many use cases, this is not a problem.

The default caching policy reduces the risk of encountering this issue a great deal. If you want to be extra-paranoid, simply say LowCardTables.low_card_cache_expiration 0, and caching will be turned off completely. But read on for more details.

What is the problem, exactly?

Say you have two processes (for example, two Rails Unicorn or Passenger processes), A and B, that each access the low-card table. Now, say you have the following rows:

user_statuses
+-----------------------+
| id | deleted | gender |
+----+---------+--------+
| 1  | 0       | female |
| 2  | 0       | male   |
| 3  | 1       | male   |
+-----------------------+

(This would occur only if no women have ever deleted their accounts, for example.)

Imagine process A and process B both have cached this table. Now, a woman deletes her account; process B handles that request. When saving this user record, low_card_tables will see a request for a row with { :deleted => true, :gender => 'female' }, not find it, and create it as ID 4; its cache will then contain all four rows.

Now, someone accesses an admin page that displays a list of all deleted users, and process A serves it. It will run a query like User.where(:deleted => true). Internally, low_card_tables uses its cache to translate this to a list of IDs that have { :deleted => false }. However, because process A has not flushed its cache yet, it doesn't know about the row with ID 4. It issues a query like SELECT * FROM users WHERE user_status_id IN (3). Because the newly-deleted user has user_status_id = 4, that user is not displayed on that page, incorrectly.

How big is the problem?

This example also does a good job of explaining why the surface area of the problem, in many use cases, is vanishingly small: in a large database (which is where low_card_tables is really useful), the chance is extremely high that there already are records with all valid combinations of the low-card attributes — in which case, the low-card table will already be populated with all possible rows. This means that the cache will be populated with all rows at startup, and no new rows will ever need to be created — so there's no problem at all.

(Another way of looking at it: the problem only occurs when both of the following are true: rows with new attribute combinations are created at runtime, and there are queries that test against some of those attributes that absolutely, positively cannot miss those rows. In some applications, like financial trading systems, this may well occur. In many consumer-oriented web sites, however, missing a few rows for a few minutes is generally not a problem.)

What's the default caching policy?

Caching policies can be changed in low_card_tables; the default one is this:

LowCardTables.low_card_cache_expiration :exponential, :zero_floor_time => 3.minutes, :min_time => 10.seconds, :exponent => 2.0, :max_time => 1.hour

What this means is this:

For the first three minutes after your process starts, no caching of low-card tables will happen at all. This is so that when you deploy code that uses new low-card tables or attributes, and they get very rapidly populated based on user requests, queries cannot possibly be affected by caching.
After those three minutes, the first cache is valid for 10 seconds.
After that 10 seconds expires, the next cache is valid for 20 seconds. The validity of the cache keeps doubling at each expiration thereafter.
The cache expiration is capped at 1 hour.

This caching policy deliberately starts out very conservatively and increases over time; this is because any newly-deployed low-card tables or attributes are likely to get populated very quickly at startup, with things settling down rapidly.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Caching

The Problem with Caching

What is the problem, exactly?

How big is the problem?

What's the default caching policy?

Clone this wiki locally