-
Notifications
You must be signed in to change notification settings - Fork 99
test: Add test for global rate limiting with load balancing #207
Conversation
9074374
to
a48a45d
Compare
@@ -859,6 +864,62 @@ func TestGlobalRateLimits(t *testing.T) { | |||
}) | |||
} | |||
|
|||
func TestGlobalRateLimitsWithLoadBalancing(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a non-maintainer of this repo, it looks like a code smell to me that the easiest way to assert this behavior is through a functional test and not a unit test. 🤔
What are we testing here? GPRC's load balancing or the ability for gubernator to distribute the hits to the rest of the cluster? In reference to unit testing vs functional. If we can have a unit test which tests the ahem functionality without violating public interfaces, then we can do so. Any test which violates the public interface is suspect as it limits your ability to refactor in the future. Functional Tests > Unit Tests. You can read my thoughts on this here https://wippler.dev/posts/Testing-private-methods |
ah, I didn't realize this PR was related to #208 👍 |
client := guber.NewV1Client(conn) | ||
|
||
sendHit := func(status guber.Status, assertion func(resp *guber.RateLimitResp), i int) string { | ||
ctx, cancel := context.WithTimeout(context.Background(), clock.Hour*5) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this timeout is waaaay too high. Github actions will not run for more than 10 minutes by default
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just to be clear, this test wasnt ready to be merged as is, hence why it was in draft :) - it was simply to reproduce the culprit behaviour that was reported in the bug.
return gotResp.GetMetadata()["owner"] | ||
} | ||
|
||
// Send two hits that should be processed by the owner and the peer and deplete the limit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be processed by the owner
are you asserting this anywhere? I don't see it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no in this case im relying on the grpc load balancer we passed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but we can and should check in order to harden the test
address := fmt.Sprintf("static:///%s,%s", owner, peer) | ||
conn, err := grpc.DialContext(context.Background(), address, dialOpts...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you clarify what this is doing? it's not clear to me. are you creating a grpc connection to the owner or the peer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is creating a static pool of servers to communicate with and creating the connection
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the connection is to which server?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
between the owner and the peer
// sleep to ensure the async forward has occurred and state should be shared | ||
time.Sleep(time.Second * 5) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think (not sure) you can control this via setting globalBatchTimeout = 1
and globalSyncWait = 1ms
, so you don't need to wait so long here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or, use github.com/mailgun/holster/v4/clock
to freeze or advance system time during the test.
I didn't write the test but the goal should be to assert that the gossiping of the rate limiting from owner to peers is correct in that the
I don't know what you meant by this (and i can't access the URL you provided on my work laptop) but if a bug happens in a confined place (which is the case here), you should not have to write a functional test to demonstrate it. The test should be laser-focused on what is broken. Functional tests cover a lot more code. That being said, I don't know the codebase enough to say whether a unit test makes sense here. |
@miparnisari re functional testing. I understand that most developers are taught to write unit tests and thus are surprised when they encounter a project which is mostly functional tests. This is a common issue when we on boarding new developers at Mailgun. Please have confidence when making changes to |
@philipgough Thank you very much for your contribution. I was able to reproduce the error using your test. And since the PR has a merge conflict and it's been some time since created, in a separate branch I updated from I found some improvement. Sometimes the test passes:
Other times I get 2 failed assertions:
But this is better than the original code that failed 5 assertions. We're on the right path. You can refer to my branch to cherry-pick my commits if needed: master...Baliedge/global-lb |
Closed in favor of #224. |
Draft PR to replicate the behaviour/bug seen in #208