You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
By default, both rate limiting and speculative executions are disabled in DSBulk.
If both are enabled, we observed that, when writing and from the server's perspective, the rate limit is not honored.
This is because rate limit permits are acquired per row written. More specifically, each call to session.executeAsync() will need to acquire permits.
If the request is retried internally by the driver, that's fine, because a retry request is only sent when the initial request has finished, so the invariant acquired <= available is respected and the server never sees more than available concurrent requests.
However, if we enable speculative executions, the driver may trigger a speculative request while the initial request is still in-flight. This will be done without acquiring more permits, since only one call to executeAsync was done. From the client's perspective, the invariant acquired <= available looks respected (we are writing one row only, but with 2 requests), but from the server's perspective, 2 requests were received and the server may find itself processing more than available concurrent requests.
The immediate consequence of such a setup is that Astra starts returning OverloadedExceptions even after #435 was implemented.
I don't have any immediate solution for that. I think we would need to change how permits are acquired for writes: each internal request that the driver sends needs to acquire permits, not only the initial one.
We can achieve this in a few ways – but all of them involve extending driver classes:
Move Guava's RateLimiter to CqlRequestHandler, and acquire the permits each time a message is written to the Netty channel, see here.
Use the driver's built-in RateLimitingRequestThrottler. But we'll need to improve this mechanism:
The RequestThrottler interface will need to access the statement being executed, in order to compute the number of permits;
The throttler is currently not invoked for speculative executions anyways. This is probably a bug btw.
I will open driver Jiras for improving the throttling mechanism. But even so I'm reluctant to try the above changes:
Moving Guava's RateLimiter inside CqlRequestHandler means that we are calling a blocking operation in a driver IO thread. This is considered bad practice and could have undesired consequences.
Using the driver's built-in RateLimitingRequestThrottler would avoid blocking operations, but it uses instead an internal queue to park requests. When the queue is full, it throws an error. This might also be undesirable for DSBulk.
Note: I think that reads are not a problem. For reads, permits are acquired per row emitted, after the results page has been received. So speculative executions won't pose any problem here.
Note2: it might be simpler to just give up on rate limit + speculative execs and document this limitation.
Note3: to mitigate this, we could look into something simpler: implementing application-level retries when a write request ends with a DriverTimeoutException. I will create a separate issue for that.
This is considered bad practice and could have undesired consequences.
I'd note that Guava's RateLimiter is currently invoked inside Reactor operators and/or (for reads only) also in driver I/O threads: both are bad practices. So all in all, we are already drowning in muddy waters wrt to using blocking code in non-blocking contexts.
By default, both rate limiting and speculative executions are disabled in DSBulk.
If both are enabled, we observed that, when writing and from the server's perspective, the rate limit is not honored.
This is because rate limit permits are acquired per row written. More specifically, each call to
session.executeAsync()
will need to acquire permits.If the request is retried internally by the driver, that's fine, because a retry request is only sent when the initial request has finished, so the invariant
acquired <= available
is respected and the server never sees more thanavailable
concurrent requests.However, if we enable speculative executions, the driver may trigger a speculative request while the initial request is still in-flight. This will be done without acquiring more permits, since only one call to
executeAsync
was done. From the client's perspective, the invariantacquired <= available
looks respected (we are writing one row only, but with 2 requests), but from the server's perspective, 2 requests were received and the server may find itself processing more thanavailable
concurrent requests.The immediate consequence of such a setup is that Astra starts returning OverloadedExceptions even after #435 was implemented.
I don't have any immediate solution for that. I think we would need to change how permits are acquired for writes: each internal request that the driver sends needs to acquire permits, not only the initial one.
We can achieve this in a few ways – but all of them involve extending driver classes:
RateLimiter
toCqlRequestHandler
, and acquire the permits each time a message is written to the Netty channel, see here.RateLimitingRequestThrottler
. But we'll need to improve this mechanism:RequestThrottler
interface will need to access the statement being executed, in order to compute the number of permits;I will open driver Jiras for improving the throttling mechanism. But even so I'm reluctant to try the above changes:
RateLimiter
insideCqlRequestHandler
means that we are calling a blocking operation in a driver IO thread. This is considered bad practice and could have undesired consequences.RateLimitingRequestThrottler
would avoid blocking operations, but it uses instead an internal queue to park requests. When the queue is full, it throws an error. This might also be undesirable for DSBulk.Note: I think that reads are not a problem. For reads, permits are acquired per row emitted, after the results page has been received. So speculative executions won't pose any problem here.
Note2: it might be simpler to just give up on rate limit + speculative execs and document this limitation.
Note3: to mitigate this, we could look into something simpler: implementing application-level retries when a write request ends with a DriverTimeoutException. I will create a separate issue for that.
┆Issue is synchronized with this Jira Task by Unito
The text was updated successfully, but these errors were encountered: