-
Notifications
You must be signed in to change notification settings - Fork 99
Conversation
{ | ||
Name: "test_global_token_limit", | ||
UniqueKey: "account:12345", | ||
Algorithm: guber.Algorithm_TOKEN_BUCKET, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of hardcoding the algorithm here you can do something like
algorithms := []guber.Algorithm{guber.Algorithm_TOKEN_BUCKET, guber.Algorithm_LEAKY_BUCKET}
for _, algo := range algorithms {
t.Run(algo.String(), func(t *testing.T) {
// Send two hits that should be processed by the owner and the peer and deplete the remaining
sendHit(algo, guber.Status_UNDER_LIMIT, 1)
And you can remove the other test (assuming that everything else is the same)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it makes sense to keep the tests split personally for clarity but its a good point that we could merge if we expect the results to always be the same
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Always avoid DRY in tests, functional tests should be clear and verbose, and should generally not change once written.
// | ||
// We cannot preform this for LEAKY_BUCKET as we don't know how much time or what other requests | ||
// might have influenced the leak rate at the owning peer. | ||
// (Maybe we should preform the leak calculation here?????) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a newbie to this codebase, this comment suggests that we are breaking the single responsibility principle and we need to move this elsewhere. My thought was that we need a new API called PeekRateLimit
that doesn't affect the hit count and only calculates the right status.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
100% agree with this, as I had the exact same thought. However, I'm planning on removing GRPC so I decided to wait for a future PR.
I could be convinced otherwise!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would avoid major refactors that are outside the context of this PR, for which I consider to be a bugfix. A follow up might make sense though.
I agree that sounds like a bit too much logic leaking into the peer. @thrawn01 thanks for taking this up. Changes make sense and lgtm from as best as I can understand the flow. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm - thanks again :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, i'm taking back my approval because we tested this and, at big scale, it doesn't work. We will send a PR soon with our findings. See #218
This PR is replaced by #219 |
Over Limit Behavior
The way it's supposed to work is that the owner of the rate limit will broadcast the computed result of the rate limit to other peers, and that is what will be returned to clients who submit hits to a peer. The owner of the rate limit should be setting the
OVER_LIMIT
status before broadcasting.However, we don't store
OVER_LIMIT
in the cache, See this comment because it is expected that every time the functions inalgorithms.go
are called thatOVER_LIMIT
will be calculated and set properly. Again, because of this, the owning peer must preform those calculations and set theOVER_LIMIT
status before broadcasting to other peers.As these wonderful contributors have noticed (@miparnisari and @philipgough) It is currently not doing this! As was pointed out by @miparnisari, I do not know why this has not been caught before now. (Probably because of the lack of testing around the global functionality 😥)
At one time, I had envisioned the broadcaster would only send hits to the peers, and then each peer would calculate the correct
OVER_LIMIT
status based on the number of hits received from the owning peer. However, this plan would require more logic in the peers for when things likeRESET_REMAINING
or when a negative hit occurs, etc.... Instead I decided the owning peer would broadcast aRateLimitResp
to each pear which in turn would be returned to the client as the most up to date status of the rate limit. It looks like I goofed when I made the transition and decided to make a call togetLocalRateLimit()
for each queued broadcast. As a consequence I had to setrl.Hits = 0
which short circuits theOVER_LIMIT
calculation, as a result the broadcasted responses do not have the properOVER_LIMIT
set. I think the correct way to handle this is to have the broadcaster queue responses and send them without recalculating the current status before broadcast to peers. (Included in this PR)Also, I noted during testing that when using
TOKEN_BUCKET
the non owning peer would respond withUNDER_LIMIT
on the third hit. (which is kinda correct, because it will only respond with what the owner has broadcast to it) but isn't what most users would expect when usingTOKEN_BUCKET
. So I added some code for the peer to returnOVER_LIMIT
if the broadcast response has a remainder of 0.I think this works for
TOKET_BUCKET
because the expiration time on the owner and peer are the same, such that when the bucket expires, both the peer and owner will respond with 0 Hits correctly at the same time. However, forLEAKY_BUCKET
the peer doesn't really know what tokens have leaked since we last checked.... I guess we could have the peer calculate the leak... but that sounds icky... Maybe someone can convince me otherwise?