-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Resilient cache #279
Resilient cache #279
Conversation
2c4aa81
to
08a3be2
Compare
TBH this is very hard to follow. I can only say that I am happy to test this whenever it is ready to test. |
e4e0e34
to
12a72f4
Compare
I guess this is the important change here: cb98725 If you want to test this, you can use the limitador server built from this branch using:
This will require your redis to always respond in at most 1ms. You can also shut the redis down entirely... I use my limitador-driver to put load on Limitador during these tests and... well you shouldn't notice any difference and always observe single digit millisecond latency to Limitador from the driver. I've added some logging on a network partition to redis occurring and resolving:
|
@eguzki if you have that integration test for |
2, | ||
100, | ||
6, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it could be more informative if we extract this to default values in a const
(?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are... they actually are inlined from the redis crate, cause I can't reuse theirs ... not public.
Looking good codewise. Before giving my 👍 , I want to run some tests. |
Unfortunately did not pass. failures:
---- test::check_rate_limited_and_update_load_counters_with_async_redis_and_local_cache stdout ----
thread 'test::check_rate_limited_and_update_load_counters_with_async_redis_and_local_cache' panicked at limitador/tests/integration_tests.rs:698:13:
assertion failed: !result.limited
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
---- test::check_rate_limited_and_update_with_async_redis_and_local_cache stdout ----
thread 'test::check_rate_limited_and_update_with_async_redis_and_local_cache' panicked at limitador/tests/integration_tests.rs:657:13:
assertion failed: !rate_limiter.check_rate_limited_and_update(namespace, &values, 1,
false).await.unwrap().limited
failures:
test::check_rate_limited_and_update_load_counters_with_async_redis_and_local_cache
test::check_rate_limited_and_update_with_async_redis_and_local_cache
test result: FAILED. 196 passed; 2 failed; 0 ignored; 0 measured; 0 filtered out; finished in 6.39s
error: test failed, to rerun pass `-p limitador --test integration_tests` I will be looking into this as well |
That's probably that math I had missed... |
1d92ae1
to
04116a0
Compare
Before running partitioning tests, I run simple test of running 1 request per second hitting a counter with a limit of max 5 req every 10 secs. And results are not the expected ones. Test
cd limitador-server/sandbox/
make build
The applied configuration is:
diff --git a/limitador-server/sandbox/docker-compose-limitador-redis-cached.yaml b/limitador-server/sandbox/docker-compose-limitador-redis-cached.yaml
index 6db3a4d..eb56293 100644
--- a/limitador-server/sandbox/docker-compose-limitador-redis-cached.yaml
+++ b/limitador-server/sandbox/docker-compose-limitador-redis-cached.yaml
@@ -21,11 +21,11 @@ services:
- /opt/kuadrant/limits/limits.yaml
- redis_cached
- --ttl
- - "3"
+ - "30000"
- --ratio
- - "11"
+ - "1"
- --flush-period
- - "2"
+ - "100"
- --max-cached
- "5000"
- redis://redis:6379
diff --git a/limitador-server/sandbox/limits.yaml b/limitador-server/sandbox/limits.yaml
index 0f7b90b..4d553b0 100644
--- a/limitador-server/sandbox/limits.yaml
+++ b/limitador-server/sandbox/limits.yaml
@@ -1,13 +1,13 @@
---
- namespace: test_namespace
- max_value: 10
- seconds: 60
+ max_value: 5
+ seconds: 10
conditions:
- "req.method == 'GET'"
variables: []
- namespace: test_namespace
max_value: 5
- seconds: 60
+ seconds: 10
conditions:
- "req.method == 'POST'"
variables: [] make deploy-redis Limitador reports the following configuration as expected
make grpcurl while :; do bin/grpcurl -plaintext -d @ 127.0.0.1:18081 envoy.service.ratelimit.v3.RateLimitService.ShouldRateLimit <<EOM; sleep 1; done
{
"domain": "test_namespace",
"hits_addend": 1,
"descriptors": [
{
"entries": [
{
"key": "req.method",
"value": "POST"
}
]
}
]
}
EOM I would expect on every 10 seconds, 5 request being accepted and 5 requests being rejected. Instead I see this
It could be some missing store increment fixes I added in #283, specifically https://github.com/Kuadrant/limitador/pull/283/files#diff-d91ad49639aa408abab791b66f7d3bca5ea05d4ffff21d6afd7f5034b378774aR159 |
9462288
to
f52b328
Compare
7e0e86b
to
0ef11c6
Compare
@eguzki take it for another round, this should be good. |
Now looking good - namespace: test_namespace
max_value: 5
seconds: 10
conditions:
- "req.method == 'POST'"
variables: [] The result is as expected, every 10 secs, 5 accepted and 5 rejected
And integration tests for the redis cached datastore passing 🆗 |
I have run the same test multiple times:
What I see is that when redis is down, limitador starts returning "always" In one of the test (I could not reproduce), limitador entered "poison" state and it was returning "always"
I do not know what happens when redis is back, because I screwed up the environment when trying to get redis back on, and I could not reproduce the issue. |
0ef11c6
to
1afbaf8
Compare
Damn... seconds aren't milliseconds! 🤦 see e871dbe (tho I'm planning on 🔥 all that code and having it all properly encapsulated in a
😉 |
…the limit's TTL for now
also, on the poisoning, it means something went wrong while holding a lock… e.g. an |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
working like a charm.
when redis is down, requests keep being rate limited according with the configured limits. And when redis is respawned, after some "over rejecting" period (rejecting more requests than expected), the behavior is consistent with the limits. The "over rejecting" must be due to limitador flushing cached counters of past period hits.
builds on #276
fixes #273