-
Notifications
You must be signed in to change notification settings - Fork 275
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(plugin):router_limits
plugin added
#6598
Conversation
There's a long way to go, but some interesting aspects of the required changes are implemented here. I noted that a significant number of errors and glitches and tests are resolved by making these changes. Mainly places where we were calling a service without readying it first or places where we were cloning an inner service without doing the memory "replace" dance. Because we are using a "trick" to make the router service cloneable right now, there are a few tests which don't work properly. I think for the "full" work, we'll need to make the router service properly cloneable (without requiring a mutex). This will require some fairly substantial re-working of a wide variety of services and layers. On the plus side, once we've done that work we'll be able to retire a bunch of code that we've written that we will no longer require. I'll pick this up in the New Year...
That test is just old on dev/next, so fixing it makes sense Also: add a note to coprocessor test changes to remind me that I need to understand what is happening there before this branch can merge.
re-order use statements to keep cargo fmt happy
To see how far off passing we are in CI
IMPORTANT: This change modifies the supergraph method invocation test to be a router_service service invocation test. Amongst other important details (such as we are now really testing the service through the full pipeline), it's important to note that we can't use `oneshot()` and just re-create the service every time we want to call it. If we do, then the rate limiting details are lost. So, we must re-use our service to make sure that state isn't lost.
Also: - update snapshot so it knows about concurrency - replace a bad supergraph test with a broken router test - re-order traffic shaping so that timeouts > concurrency > rate-limit
The code added yesterday was creating errors as data and using numeric codes (rather than the magic strings in 1.x). Re-instated the 1.x behaviour for reporting errors and also fixed the new timeout test.
Since we now fail traffic shaping tests at the router_service, they are not counted as graphql_errors (which are only processed, as they should be at the supergraph_service). IMO, these should never have been counted as graphql errors anyway, since they clearly aren't graphql errors but are traffic shaping (rate limit, timeout, etc...) errors. We'll still report them to the user as a 504 or a 503 or whatever, but they won't count towards the graphql_error metric. I've also updated a snapshot to reflect the error message we now provide.
Not required for GA. Implement later as a separate project.
This breaks all the http_post_mutation tests because of changes in expectation.
The AsyncCheckpoint layer was using `oneshot` and wasn't calling the prepared service. I've fixed that. This affects some of the tests, so I've fixed them as well.
Make it pass until subgraph rate limiting is changed. We'll need to update the test agains at that point.
I've been adding `buffer(50_000)` across the code base. Now replacing with `buffered` (one of our existing layers from our ServiceBuilderExt along with a helpful comment that it still needs some work.
It's just about possible to extend backpressure from the router service to the qp service with the caveat that batches may subvert backpressure. Beyond the qp, it's difficult to control because of the way the execution engine operates.
This comment has been minimized.
This comment has been minimized.
apollo-router/src/router_factory.rs
Outdated
@@ -795,10 +823,31 @@ mod test { | |||
|
|||
register_plugin!("test", "always_fails_to_start", AlwaysFailsToStartPlugin); | |||
|
|||
async fn create_service( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moving this above the tests for (at least to me) cleaner organization; down to move it back if that annoys folks
apollo-router/src/router_factory.rs
Outdated
@@ -795,10 +821,31 @@ mod test { | |||
|
|||
register_plugin!("test", "always_fails_to_start", AlwaysFailsToStartPlugin); | |||
|
|||
async fn create_service( | |||
config: Configuration, | |||
license: Option<LicenseState>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to add license to this as none of the tests use it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right! fixed
pub(crate) struct TpsLimit { | ||
pub(crate) capacity: usize, | ||
|
||
#[serde(deserialize_with = "deserialize_instant", rename = "durationMs")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use #[serde(deserialize_with = "humantime_serde::deserialize", default)]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also it's a duration not an instant :)
Approved, make sure you have adequately manually tested! |
router limits plugin for limiting the router based on what's in the user's license, starting with tps limits
NB: this includes the work from #6561
Checklist
Complete the checklist (and note appropriate exceptions) before the PR is marked ready-for-review.
Exceptions
Note any exceptions here
Notes
Footnotes
It may be appropriate to bring upcoming changes to the attention of other (impacted) groups. Please endeavour to do this before seeking PR approval. The mechanism for doing this will vary considerably, so use your judgement as to how and when to do this. ↩
Configuration is an important part of many changes. Where applicable please try to document configuration examples. ↩
Tick whichever testing boxes are applicable. If you are adding Manual Tests, please document the manual testing (extensively) in the Exceptions. ↩