feat(plugin):`router_limits` plugin added #6598

aaronArinder · 2025-01-21T00:51:26Z

router limits plugin for limiting the router based on what's in the user's license, starting with tps limits

NB: this includes the work from #6561

Checklist

Complete the checklist (and note appropriate exceptions) before the PR is marked ready-for-review.

Exceptions

Note any exceptions here

Notes

It may be appropriate to bring upcoming changes to the attention of other (impacted) groups. Please endeavour to do this before seeking PR approval. The mechanism for doing this will vary considerably, so use your judgement as to how and when to do this. ↩
Configuration is an important part of many changes. Where applicable please try to document configuration examples. ↩
Tick whichever testing boxes are applicable. If you are adding Manual Tests, please document the manual testing (extensively) in the Exceptions. ↩

There's a long way to go, but some interesting aspects of the required changes are implemented here. I noted that a significant number of errors and glitches and tests are resolved by making these changes. Mainly places where we were calling a service without readying it first or places where we were cloning an inner service without doing the memory "replace" dance. Because we are using a "trick" to make the router service cloneable right now, there are a few tests which don't work properly. I think for the "full" work, we'll need to make the router service properly cloneable (without requiring a mutex). This will require some fairly substantial re-working of a wide variety of services and layers. On the plus side, once we've done that work we'll be able to retire a bunch of code that we've written that we will no longer require. I'll pick this up in the New Year...

That test is just old on dev/next, so fixing it makes sense Also: add a note to coprocessor test changes to remind me that I need to understand what is happening there before this branch can merge.

re-order use statements to keep cargo fmt happy

To see how far off passing we are in CI

IMPORTANT: This change modifies the supergraph method invocation test to be a router_service service invocation test. Amongst other important details (such as we are now really testing the service through the full pipeline), it's important to note that we can't use `oneshot()` and just re-create the service every time we want to call it. If we do, then the rate limiting details are lost. So, we must re-use our service to make sure that state isn't lost.

Also: - update snapshot so it knows about concurrency - replace a bad supergraph test with a broken router test - re-order traffic shaping so that timeouts > concurrency > rate-limit

The code added yesterday was creating errors as data and using numeric codes (rather than the magic strings in 1.x). Re-instated the 1.x behaviour for reporting errors and also fixed the new timeout test.

Since we now fail traffic shaping tests at the router_service, they are not counted as graphql_errors (which are only processed, as they should be at the supergraph_service). IMO, these should never have been counted as graphql errors anyway, since they clearly aren't graphql errors but are traffic shaping (rate limit, timeout, etc...) errors. We'll still report them to the user as a 504 or a 503 or whatever, but they won't count towards the graphql_error metric. I've also updated a snapshot to reflect the error message we now provide.

Not required for GA. Implement later as a separate project.

This breaks all the http_post_mutation tests because of changes in expectation.

The AsyncCheckpoint layer was using `oneshot` and wasn't calling the prepared service. I've fixed that. This affects some of the tests, so I've fixed them as well.

Make it pass until subgraph rate limiting is changed. We'll need to update the test agains at that point.

I've been adding `buffer(50_000)` across the code base. Now replacing with `buffered` (one of our existing layers from our ServiceBuilderExt along with a helpful comment that it still needs some work.

It's just about possible to extend backpressure from the router service to the qp service with the caveat that batches may subvert backpressure. Beyond the qp, it's difficult to control because of the way the execution engine operates.

…imits

aaronArinder · 2025-01-28T14:06:43Z

apollo-router/src/router_factory.rs

@@ -795,10 +823,31 @@ mod test {

    register_plugin!("test", "always_fails_to_start", AlwaysFailsToStartPlugin);

+    async fn create_service(


moving this above the tests for (at least to me) cleaner organization; down to move it back if that annoys folks

BrynCooke · 2025-01-28T14:37:00Z

apollo-router/src/router_factory.rs

@@ -795,10 +821,31 @@ mod test {

    register_plugin!("test", "always_fails_to_start", AlwaysFailsToStartPlugin);

+    async fn create_service(
+        config: Configuration,
+        license: Option<LicenseState>,


No need to add license to this as none of the tests use it?

right! fixed

apollo-router/src/plugins/router_limits/mod.rs

BrynCooke · 2025-01-28T17:53:41Z

apollo-router/src/uplink/license_enforcement.rs

+pub(crate) struct TpsLimit {
+    pub(crate) capacity: usize,
+
+    #[serde(deserialize_with = "deserialize_instant", rename = "durationMs")]


Use #[serde(deserialize_with = "humantime_serde::deserialize", default)]

Also it's a duration not an instant :)

BrynCooke · 2025-01-29T10:19:01Z

Approved, make sure you have adequately manually tested!

garypen and others added 30 commits December 18, 2024 17:03

Obey the linter...

6d80932

Merge branch 'next' into garypen/next-backpressure

7491e1e

Merge branch 'next' into garypen/next-backpressure

718a870

Merge branch 'next' into garypen/next-backpressure

4c93a96

Fix the rhai integration test

716f7e5

That test is just old on dev/next, so fixing it makes sense Also: add a note to coprocessor test changes to remind me that I need to understand what is happening there before this branch can merge.

fix lint complaint

8b7653b

re-order use statements to keep cargo fmt happy

Add the new rhai testng config file

746d2c0

temporarily comment out one test

330f969

To see how far off passing we are in CI

still experimenting to see how far away this approach is

1b724d5

Move limits to traffic shaping

f932806

Rename http_server to router

3b8ed61

Fix formatting errors reported by lint

81faed0

Rename some stuff to minimise change from 1.x

3155440

Also: - update snapshot so it knows about concurrency - replace a bad supergraph test with a broken router test - re-order traffic shaping so that timeouts > concurrency > rate-limit

Try to restore the existing behaviour for reporting errors

9bd0386

The code added yesterday was creating errors as data and using numeric codes (rather than the magic strings in 1.x). Re-instated the 1.x behaviour for reporting errors and also fixed the new timeout test.

Fix lint complaints

7432306

Remove 1/2 implemented little loadshedder

d05619f

Not required for GA. Implement later as a separate project.

Merge branch 'next' into garypen/next-backpressure

69e36f1

POC: Make supergraph creator clone a BoxCloneService

b85ca4f

This breaks all the http_post_mutation tests because of changes in expectation.

Fix AsyncCheckpoint and update tests for correct behaviour

d5718f2

The AsyncCheckpoint layer was using `oneshot` and wasn't calling the prepared service. I've fixed that. This affects some of the tests, so I've fixed them as well.

Fix the xtask lint complaints

5c72744

POC: Make supergraph creator clone a BoxCloneService (#6540)

a8a8950

Modify subgraph rate-limiting test to pass for now

21dee20

Make it pass until subgraph rate limiting is changed. We'll need to update the test agains at that point.

Merge branch 'next' into garypen/next-backpressure

00c1689

Small tidying up to use buffered

6c239aa

I've been adding `buffer(50_000)` across the code base. Now replacing with `buffered` (one of our existing layers from our ServiceBuilderExt along with a helpful comment that it still needs some work.

Document tower layers in router and supergraph services (#6549)

f1cd40a

List the plugin tower layers

6788915

Merge branch 'dev' into aaron/BGS-188/router_limits

e887b45

This comment has been minimized.

Sign in to view

aaronArinder added 12 commits January 27, 2025 11:54

gourd

241ccac

gourd

9462d91

gourd

10c3e21

gourd

586640f

gourd

ee93f15

gourd

f38ad7f

gourd

6616208

gourd

63f2251

Merge remote-tracking branch 'origin/dev' into aaron/BGS-188/router_l…

e0e06e0

…imits

gourd

a4435c3

gourd

e43d948

gourd

a075723

aaronArinder commented Jan 28, 2025

View reviewed changes

gourd

a80761a

BrynCooke requested changes Jan 28, 2025

View reviewed changes

aaronArinder added 3 commits January 28, 2025 11:54

gourd

7606d0b

gourd

1afe8b0

Merge branch 'dev' into aaron/BGS-188/router_limits

9721ae8

BrynCooke requested changes Jan 28, 2025

View reviewed changes

aaronArinder added 2 commits January 28, 2025 14:31

gourd

8bd67f0

Merge branch 'dev' into aaron/BGS-188/router_limits

38ea8a3

BrynCooke approved these changes Jan 29, 2025

View reviewed changes

aaronArinder mentioned this pull request Jan 29, 2025

feat(plugins): RouterLimits into plugin ecosystem #6561

Closed

6 tasks

Merge branch 'dev' into aaron/BGS-188/router_limits

dcfb538

aaronArinder enabled auto-merge (squash) January 29, 2025 13:20

aaronArinder merged commit 419f2bd into dev Jan 29, 2025
15 checks passed

aaronArinder deleted the aaron/BGS-188/router_limits branch January 29, 2025 13:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(plugin):`router_limits` plugin added #6598

feat(plugin):`router_limits` plugin added #6598

aaronArinder commented Jan 21, 2025 •

edited

Loading

This comment has been minimized.

aaronArinder Jan 28, 2025

BrynCooke Jan 28, 2025

aaronArinder Jan 28, 2025

BrynCooke Jan 28, 2025

BrynCooke Jan 28, 2025

BrynCooke commented Jan 29, 2025

		@@ -795,10 +823,31 @@ mod test {

		register_plugin!("test", "always_fails_to_start", AlwaysFailsToStartPlugin);

		async fn create_service(

feat(plugin):router_limits plugin added #6598

feat(plugin):router_limits plugin added #6598

Conversation

aaronArinder commented Jan 21, 2025 • edited Loading

Footnotes

This comment has been minimized.

aaronArinder Jan 28, 2025

Choose a reason for hiding this comment

BrynCooke Jan 28, 2025

Choose a reason for hiding this comment

aaronArinder Jan 28, 2025

Choose a reason for hiding this comment

BrynCooke Jan 28, 2025

Choose a reason for hiding this comment

BrynCooke Jan 28, 2025

Choose a reason for hiding this comment

BrynCooke commented Jan 29, 2025

feat(plugin):`router_limits` plugin added #6598

feat(plugin):`router_limits` plugin added #6598

aaronArinder commented Jan 21, 2025 •

edited

Loading