Skip to content

Commit

Permalink
Add documentation from README to file headers
Browse files Browse the repository at this point in the history
  • Loading branch information
attipaci committed Sep 14, 2024
1 parent 56ed0c2 commit 5816e78
Show file tree
Hide file tree
Showing 2 changed files with 179 additions and 0 deletions.
109 changes: 109 additions & 0 deletions src/smax-queue.c
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,115 @@
* Functions to support pipelined pull requests from SMA-X.
* Because they don't requite a sequence of round-trips, pipelined pulls can
* be orders of magnitude faster than staggered regular pull requests.
*
* ## Pipelined pulling (high volume queries)
*
* - [Synchronization points and waiting](#lazy-synchronization)
* - [Callbacks](#lazy-callbacks)
* - [Finishing up](#lazy-finish)
*
* The regular pulling of data from SMA-X requires a separate round-trip for each and every request. That is, successive
* pulls are sent only after the responses from the prior pull has been received. A lot of the time is spent on waiting
* for responses to come back. With round trip times in the 100 μs range, this means that this method of fetching data
* from SMA-X is suitable for obtaining at most a a few thousand values per second.
*
* However, sometimes you want to get access to a large number of values faster. This is what pipelined pulling is for.
* In pipelined mode, a batch of pull requests are sent to the SMA-X Redis server in quick succession, without waiting
* for responses. The values, when received are processed by a dedicated background thread. And, the user has an option
* of either waiting until all data is collected, or ask for as callback when the data is ready.
*
* Again it works similarly to the basic pulling, except that you submit your pull request to a queue with
* `smaxQueue()`. For example:
*
* ```c
* double d; // A value we will fill
* XMeta meta; // (optional) metadata to fill (for the above value).
*
* int status = smaxQueue("some_table", "some_var", X_DOUBLE, 1, &d, &meta);
* ```
*
* Pipelined (batched) pulls have dramatic effects on performance. Rather than being limited by round-trip times, you will
* be limited by the performance of the Redis server itself (or the network bandwidth on some older infrastructure). As
* such, instead of thousand of queries per second, you can pull 2-3 orders of magnitude more in a given time, with hudreds
* of thousands to even millions of pull per second this way.
*
* <a name="lazy-synchronization"></a>
* ### Synchronization points and waiting
*
* After you have submitted a batch of pull request to the queue, you can create a synchronization point as:
*
* ```c
* XSyncPoint *syncPoint = smaxCreateSyncPoint();
* ```
*
* A synchronization point is a marker in the queue that we can wait on. After the synchronization point is created, you
* can sumbit more pull request to the same queue (e.g. for another processing block), or do some other things for a bit
* (since it will take at least some microseconds before the data is ready). Then, when ready you can wait on the
* specific synchronization point to ensure that data submitted prior to its creation is delivered from SMA-X:
*
* ```c
* // Wait for data submitted prior to syncPoint to be ready, or time out after 1000 ms.
* int status = smaxSync(syncPoint, 1000);
*
* // Destroy the synchronization point if we no longer need it.
* xDestroySyncPoint(syncPoint);
*
* // Check return status...
* if(status == X_TIMEOUT) {
* // We timed out
* ...
* }
* else if(status < 0) {
* // Some other error
* ...
* }
* ```
*
* <a name="lazy-callbacks"></a>
* ### Callbacks
*
* The alternative to synchronization points and waiting, is to provide a callback function, which will process your data
* as soon as it is available, e.g.:
*
* ```c
* void my_pull_processor(void *arg) {
* // Say, we expect a string tag passed along to identify what we need to process...
* char *tag = (char *) arg;
*
* // Do what we need to do...
* ...
* }
* ```
*
* Then submit this callback routine to the queue after the set of variables it requires with:
*
* ```c
* // We'll call my_pull_processor, with the argument "some_tag", when prior data has arrived.
* smaxQueueCallback(my_pull_processor, "some_tag");
* ```
*
* <a name="lazy-finish"></a>
* ### Finishing up
*
* If you might still have some pending pipelined pulls that have not received responses yet, you may want to wait until
* all previously sumbitted requests have been collected. You can do that with:
*
* ```c
* // Wait for up to 3000 ms for all pipelined pulls to collect responses from SMA-X.
* int status = smaxWaitQueueComplete(3000);
*
* // Check return status...
* if(status == X_TIMEOUT) {
* // We timed out
* ...
* }
* else if(status < 0) {
* // Some other error
* ...
* }
* ```
*
*
*/

#include <stdio.h>
Expand Down
70 changes: 70 additions & 0 deletions src/smax-resilient.c
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,76 @@
* It's not especially meaningful for simple executables, which are run for limited
* time without persistence.
*
*
* ## Lazy pulling (high-frequency queries)
*
* What happens if you need the data frequently? Do you pound on the database at some high-frequency? No, you probably
* no not want to do that, especially if the data you need is not necessaily changing fast. There is no point on wasting
* network bandwidth only to return the same values again and again. This is where 'lazy' pulling excels.
*
* From the caller's perspective lazy pulling works just like regular SMA-X pulls, e.g.:
*
* ```c
* int data[10][4][2];
* int sizes[] = { 10, 4, 2 };
* XMeta meta;
*
* int status = smaxLazyPull("some_table", "some_data", X_INT, 3, sizes, data, &meta);
* ```
*
* or
*
* ```c
* int status = smaxLazyPullDouble("some_table", "some_var");
* ```
*
* But, under the hood, it does something different. The first time a new variable is lazy pulled it is fetched from the
* Redis database just like a regular pull. But, it also will cache the value, and watch for update notifications from
* the SMA-X server. Thus, as long as no update notification is received, successive calls will simply return the locally
* cached value. This can save big on network usage, and also provides orders of magnitude faster access so long as the
* variable remains unchanged.
*
* When the vatiable is updated in SMA-X, our client library will be notified, and one of two things can happen:
*
* 1. it invalidates the cache, so that the next lazy pull will again work just like a regular pull, fetching the
* updated value from SMA-X on demand. And again the library will cache that value and watch for notifications for
* the next update. Or,
*
* 2. it will trigger a background process to update the cached value in the background with a pipelined
* (high-throughput) pull. However, until the new value is actually fetched, it will return the previously cached
* value promptly.
*
* The choice between the two is yours, and you can control which suits your need best. The default behavior for lazy pulls
* is (1), but you may call `smaxLazyCache()` after the first pull of a variable, to indicate that you want to enable
* background cache updates (2) for it. The advantage of (1) is that it will never serve you outdated data even if there
* are significant network latencies -- but you may have to wait a little to fetch updates. On the other hand (2) will
* always provide a recent value with effectively no latency, but this value may be outdated if there are delays on the
* network updating the cache. The difference is typically at the micro-seconds level on a local LAN. However, (2) may
* be preferable when you need to access SMA-X data from timing critical code blocks, where it is more important to ensure
* that the value is returned quickly, rather than whether it is a millisecond too old or not.
*
* In either case, when you are done using lazy variables, you should let the library know that it no longer needs to watch
* updates for these, by calling either `smaxLazyEnd()` on specific variables, or else `smaxLazyFlush()` to stop watching
* updates for all lazy variables. (A successive lazy pull will automatically start watching for updates again, in case you
* wish to re-enable).
*
* ```c
* // Lazy pull a bunch of data (typically in a loop).
* for(...) {
* smaxLazyPull("some_table", "some_var", ...);
* smaxLaxyPull(...);
* ...
* }
*
* // Once we do not need "some_table:some_var" any more:
* smaxLazyEnd("some_table", "some_var");
*
* ...
*
* // And to stop lazy accessing all
* smaxLazyFlush();
* ```
*
* \sa smaxSetResilient()
* \sa smaxIsResilient()
*/
Expand Down

0 comments on commit 5816e78

Please sign in to comment.