Skip to content

TODO list

Viktor Söderqvist edited this page Oct 1, 2020 · 3 revisions

Internal TODO list (Nordix)

More detailed errors responses

{error, no_connection} => {error, {no_connection, SubReason}}

In most error cases, {error, no_connection} is returned. To be able to distinguish between different errors and take different actions in the application, more details are needed. We extend the no_connection atom to a 2-tuple. The added element SubReason can be a posix error code, tcp_closed, reconnecting, etc.

Status: TODO

Make configurable for which errors the query should be retried

Some errors may indicate that the application should do something else rather than retry a query.

Add an option e.g. {retry_errors, [...]} or {no_retry_errors, [...]}.

Status: TODO

Handle ASK redirects

ASK redirects are returned when the current key but not all keys in the same slot are moved to another Redis node. Currently, the same Redis node is retried over and over until we get a MOVED redirect instead. Then, the slot mapping is updated and the new Redis node is queried.

Status: TODO

MOVED redirections and async refresh mapping

When we get a MOVED redirection, we actually get the new instance, so the query can be executed directly and we can start refreshing the slot mapping in the background. Currently, the query is not retried until the slot mapping has been updated.

Status: Idea

Make use of the pipelining feature of eredis

Poolboy makes sure only one query is sent to Redis and a reply is received before the next query is sent. However, the eredis client has support for pipelining, so multiple calls could actually use the same eredis client instance. For example, eredis_cluster could have a smaller number of eredis clients per slot and just use round-robin when sending queries.

Status: Optimization idea, needs to be bench marked.

Done

Don't refresh mapping when poolboy:transaction/2 times out

There is no need to refresh slot mapping in this case.

Status: Done. Commit: https://github.com/Nordix/eredis_cluster/commit/d7f86927651970f01f5a404fb7561663ca7d9073

Note: If poolboy:transaction/2 times out because gen_server call to poolboy:checkout/3 times out, there is no problem, but if poolboy:transaction times out because of a timeout in a call to eredis:q/2, the eredis instance will still be working on the query in the background when it is returned to the pool and it's possibly still busy when it's checked out again later. Perhaps it would be good to kill the eredis client in this case to prevent it from returning to the pool.

Use existing connection when refreshing slot mapping

Historically, eredis_cluster_monitor created a new connection to one of the init nodes every time the slot mapping was to be updated. This is only needed to load the mappings the first time. Later, any existing connection can be used. Use a connection from one of the pools.

Status: Done.

Try next node if eredis:q/2 times out in eredis_cluster_monitor:get_cluster_info/5

Add a catch exit:{timeout, _GenServerCall} and handle it using a recursive call, as for other errors in this function.

Status: Done. PR: https://github.com/Nordix/eredis_cluster/pull/29

Note 1: If the gen_server call times out, the eredis client process will continue to work on the query in the background.

Note 2: If eredis:q/2 times out here, it also means that the gen_server call in eredis_cluster_monitor:refresh_mapping/1 times out, which results in an exception in the caller process. Should we increase the timeout for this gen_server call?