Querying Cloudant: what are stale, update and stable?
tl;dr If you are using
stale=ok in queries to Cloudant or CouchDB 2.x, you
most likely want to be using
update=false instead. If you are using
CouchDB originally used
stale=ok on the query string to specify that you were
okay with receiving out-of-date results. By default, CouchDB lazily updates
indexes upon querying them rather than when JSON data is changed or added. If up
to date results are not strictly required, using
stale=ok provides a latency
improvement for queries as the request does not have to wait for indexes to be
updated before returning results. This is particularly useful for databases with
a high write rate.
As an aside, Cloudant automatically enqueues indexes for update when primary data changes, so this problem isn’t so acute. However, in the face of high update rate bursts, it’s still possible for indexing to fall behind so a delay may occur.
When using a single node, as in CouchDB 1.x, this parameter behaved as you’d
expect. However, when clustering was added to CouchDB, a second meaning was
stale=ok: also use the same set of shard replicas to retrieve the
Recall that Cloudant and CouchDB 2.x stores three copies of each shard and
by default will use the shard replica that starts returning results fastest for
a query request. This latter fact helps even out load across the cluster.
Heavily loaded nodes will likely return slower and so won’t be picked to respond
to a given query. When using
stale=ok, the database will instead always use
the same shard replicas for every request to that index. The use of the same
replica to answer queries has two effects:
stale=okcould drive load unevenly across the nodes in your database cluster because certain shard replicas would always be used for the queries to the index that specify
stale=ok. This means a set of nodes could receive outside numbers of requests.
- If one of the replicas was hosted on a heavily loaded node in the cluster,
this would slow down all queries to that index using
stale=ok. This is compounded by the tendency of
stale=okto drive imbalanced load.
The end result is that using
stale=ok can, counter-intuitively, cause queries
to become slower. Worse, they may become unavailable during cluster split-brain
scenarios because of the forced use of a certain set of replicas. Given that
mostly people use
stale=ok to improve performance, this wasn’t a great state
to be in.
stale=ok’s existing behaviour needed to be maintained for backwards
compatibility, the fix for this problem was to introduce two new query string
parameters were introduced which set each of the two
update=true/false/lazy: controls whether the index should be up to date before the query is executed.
true: the index will be updated first.
false: the index will not be updated.
lazy: the index will not be updated before the query, but enqueued for update after the query is completed.
stable=true/false: controls the use of the certain shard replicas.
The main use of
stable=true is that queries are more likely to appear to “go
forward in time” because each shard replica may update its indexes in different
orders. However, this isn’t guaranteed, so the availability and performance
trade offs are likely not worth it.
The end result is that virtually all applications using
stale=ok should move
to instead use