Skip to content
This repository has been archived by the owner on Apr 19, 2024. It is now read-only.

Fix global behavior #216

Closed
wants to merge 4 commits into from
Closed

Conversation

thrawn01
Copy link
Contributor

Over Limit Behavior

The way it's supposed to work is that the owner of the rate limit will broadcast the computed result of the rate limit to other peers, and that is what will be returned to clients who submit hits to a peer. The owner of the rate limit should be setting the OVER_LIMIT status before broadcasting.

However, we don't store OVER_LIMIT in the cache, See this comment because it is expected that every time the functions in algorithms.go are called that OVER_LIMIT will be calculated and set properly. Again, because of this, the owning peer must preform those calculations and set the OVER_LIMIT status before broadcasting to other peers.

As these wonderful contributors have noticed (@miparnisari and @philipgough) It is currently not doing this! As was pointed out by @miparnisari, I do not know why this has not been caught before now. (Probably because of the lack of testing around the global functionality 😥)

At one time, I had envisioned the broadcaster would only send hits to the peers, and then each peer would calculate the correct OVER_LIMIT status based on the number of hits received from the owning peer. However, this plan would require more logic in the peers for when things like RESET_REMAINING or when a negative hit occurs, etc.... Instead I decided the owning peer would broadcast a RateLimitResp to each pear which in turn would be returned to the client as the most up to date status of the rate limit. It looks like I goofed when I made the transition and decided to make a call to getLocalRateLimit() for each queued broadcast. As a consequence I had to set rl.Hits = 0 which short circuits the OVER_LIMIT calculation, as a result the broadcasted responses do not have the proper OVER_LIMIT set. I think the correct way to handle this is to have the broadcaster queue responses and send them without recalculating the current status before broadcast to peers. (Included in this PR)

Also, I noted during testing that when using TOKEN_BUCKET the non owning peer would respond with UNDER_LIMIT on the third hit. (which is kinda correct, because it will only respond with what the owner has broadcast to it) but isn't what most users would expect when using TOKEN_BUCKET. So I added some code for the peer to return OVER_LIMIT if the broadcast response has a remainder of 0.

I think this works for TOKET_BUCKET because the expiration time on the owner and peer are the same, such that when the bucket expires, both the peer and owner will respond with 0 Hits correctly at the same time. However, for LEAKY_BUCKET the peer doesn't really know what tokens have leaked since we last checked.... I guess we could have the peer calculate the leak... but that sounds icky... Maybe someone can convince me otherwise?

functional_test.go Outdated Show resolved Hide resolved
functional_test.go Outdated Show resolved Hide resolved
{
Name: "test_global_token_limit",
UniqueKey: "account:12345",
Algorithm: guber.Algorithm_TOKEN_BUCKET,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of hardcoding the algorithm here you can do something like

        algorithms := []guber.Algorithm{guber.Algorithm_TOKEN_BUCKET, guber.Algorithm_LEAKY_BUCKET}
	for _, algo := range algorithms {
		t.Run(algo.String(), func(t *testing.T) {
			// Send two hits that should be processed by the owner and the peer and deplete the remaining
			sendHit(algo, guber.Status_UNDER_LIMIT, 1)

And you can remove the other test (assuming that everything else is the same)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense to keep the tests split personally for clarity but its a good point that we could merge if we expect the results to always be the same

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Always avoid DRY in tests, functional tests should be clear and verbose, and should generally not change once written.

functional_test.go Outdated Show resolved Hide resolved
functional_test.go Outdated Show resolved Hide resolved
functional_test.go Show resolved Hide resolved
//
// We cannot preform this for LEAKY_BUCKET as we don't know how much time or what other requests
// might have influenced the leak rate at the owning peer.
// (Maybe we should preform the leak calculation here?????)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a newbie to this codebase, this comment suggests that we are breaking the single responsibility principle and we need to move this elsewhere. My thought was that we need a new API called PeekRateLimit that doesn't affect the hit count and only calculates the right status.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

100% agree with this, as I had the exact same thought. However, I'm planning on removing GRPC so I decided to wait for a future PR.

I could be convinced otherwise!

Copy link
Contributor

@philipgough philipgough Jan 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would avoid major refactors that are outside the context of this PR, for which I consider to be a bugfix. A follow up might make sense though.

@philipgough
Copy link
Contributor

I guess we could have the peer calculate the leak... but that sounds icky... Maybe someone can convince me otherwise?

I agree that sounds like a bit too much logic leaking into the peer.

@thrawn01 thanks for taking this up. Changes make sense and lgtm from as best as I can understand the flow.

Copy link
Contributor

@philipgough philipgough left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm - thanks again :)

Copy link
Contributor

@miparnisari miparnisari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, i'm taking back my approval because we tested this and, at big scale, it doesn't work. We will send a PR soon with our findings. See #218

@thrawn01 thrawn01 mentioned this pull request Feb 10, 2024
@thrawn01
Copy link
Contributor Author

This PR is replaced by #219

@thrawn01 thrawn01 closed this Feb 10, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants