feat: rate limiting #600

geekbrother · 2024-03-29T00:01:21Z

Description

This PR applies shared rate limiting crate from utils-rs to blockchain-api endpoints.
The combination of IP address and the endpoint is used as the identification key for the token bucket.

The following configuration environment variables were added:

RPC_PROXY_STORAGE_RATE_LIMITING_CACHE_REDIS_ADDR_READ and RPC_PROXY_STORAGE_RATE_LIMITING_CACHE_REDIS_ADDR_WRITE are Redis endpoints for the L2 cache.
RPC_PROXY_RATE_LIMITING_MAX_TOKENS token bucket size.
RPC_PROXY_RATE_LIMITING_REFILL_INTERVAL_SEC token refill interval in seconds.
RPC_PROXY_RATE_LIMITING_REFILL_RATE token refill rate per interval.

To enable rate-limiting all of the required variables must be set, otherwise, the rate-limiting will be disabled with the warning message at start.

Rate-limited counter metrics were added at e8a5cc4 to the Prometheus and Grafana panel to track how many rate-limited entries the L1 in-memory cache has so we can track how much users we are rate limiting at the moment and history.

Axum middleware was implemented and added to handlers in 3d0e989.

Default rate-limiting configuration in terraform variables in fd1e605:

Maximum tokens in bucket: 100,
Refill rate: 2,
Refill interval: 1 second.

How Has This Been Tested?

New integration tests in 9cafe6c:
- Flood more than max tokens and check is rate limited,
- Flood less than max tokens to another endpoint and check is NOT rate-limited.

Due Diligence

Breaking change
Requires a documentation update
Requires a e2e/integration test update

chris13524 · 2024-03-29T04:08:51Z

src/lib.rs

+const RATE_LIMIT_MAX_TOKENS: u32 = 300;
+const RATE_LIMIT_INTERVAL: Duration = Duration::from_secs(1);
+const RATE_LIMIT_REFILL_RATE: u32 = 100;


Looks like this is 100 req/s per IP? Seems really high. What about 2 req/s with burst of 100?

Plus I think there should be a second rate limit for the project, e.g. 1k/s.

Suggested change

const RATE_LIMIT_MAX_TOKENS: u32 = 300;

const RATE_LIMIT_INTERVAL: Duration = Duration::from_secs(1);

const RATE_LIMIT_REFILL_RATE: u32 = 100;

const RATE_LIMIT_MAX_TOKENS: u32 = 100;

const RATE_LIMIT_INTERVAL: Duration = Duration::from_secs(1);

const RATE_LIMIT_REFILL_RATE: u32 = 2;

Makes sense, probably it's too high.
I've moved the configuration to env variables in 09d6730 so we don't need to rebuild everything just to change it.
Let's start with 100 burst and 2/sec and then I'll check the metric from e8a5cc4 how many are affected.

Plus I think there should be a second rate limit for the project, e.g. 1k/s.

Speaking about project ID limiting I think this should be a follow-up PR. Let's ship this incrementally.

chris13524 · 2024-03-29T17:58:03Z

src/handlers/mod.rs

+        .is_rate_limited(
+            path.as_str(),
+            &network::get_forwarded_ip(headers.clone())
+                .unwrap_or_else(|| connect_info.0.ip())


I think this defaulting to connecting IP is unsafe. If something breaks with the forwarding headers then we should not be using the connecting IP as that's supposed to be the load balancer (and will result in major unavailability due to rate limiting the load balancer). We should at least be logging this as an error, not silently defaulting. And possibly we should not use a default at all and disable the rate limit or some other type of disable.

In Notify Server for the analytics if the forwarded header wasn't available I didn't lookup the connecting IP and instead recorded empty geo location values and emitted an error!()

Thanks for the comment. This is a good suggestion! If we are skipping the connect IP in case of absence of X-Forwarded we will break the local testing (since we don't have X-Forwarded in local tests).
My assumption is that if the Load Balancer doesn't report forwarded IP - the LB is broken and that will break a lot not only rate-limiting. I totally agree that we should log an error here in case we don't have a Forwarded IP, so we can fast and easily spot the issue in case we have LB but not have a forwarded IP.
I've added an error log in case we don't have a forwarded IP.

What's the issue with local testing? Can't you set an X-Forwarded header?

chris13524 · 2024-03-29T18:01:03Z

src/handlers/history.rs

+        state.clone(),
+        headers.clone(),
+        connect_info,
+        path.clone(),


I think using the HTTP request path as the key is a bit naive and I would prefer an enum. For e.g. proxy there are a number of endpoints that resolve to the same handler. Not sure for history we might want 1 rate limit counter for the whole history API but not sure.

Thanks for catching this!
The bucket key is a composite key for IP and matched path. The idea is that if there is a legitimate user he shouldn't exceed our max tokens anyway. So we can leave only IP-based rate-limiting btw.
If there are malicious floods and we add an exact handler path instead of a matched path we can give an additional way to flood us. A matched path was added to the key composition for example for testing facilities, but not to give users making requests max tokens * on each path * path parameters.
So my point of view we should be ok with that and no legit users should be affected.

I think I assumed this was matched path (which is what it was). Still my suggestion was to not use the path as the rate limit and prefer something more explicit. E.g. there's both /v1 and /v1/ matched paths would be the proxy handler, and in the future we might want to group them. But no big deal right now, can always refactor later if needed.

But we should be aware that they can get 2x the rate limit for the proxy handler than we might expect them to be able to.

nopestack · 2024-04-01T11:10:09Z

src/lib.rs

    let app = app.with_state(state_arc.clone());

    info!("v{}", build_version);
-    info!("Running RPC Proxy on port {}", port);
+    info!("Running Blockchain-API server on port {}", port);
    let addr: SocketAddr = format!("{host}:{port}")
        .parse()
        .expect("Invalid socket address");


can this be handled as an error instead of expecting?

Thanks for the comment! This is an initial bootstrap and if the server can't bind to the listening address we must stop execution with the fatal error. So it's expected here.

nopestack · 2024-04-01T11:10:31Z

src/utils/rate_limit.rs

+        let redis_pool = Arc::new(
+            deadpool_redis::Config::from_url(redis_addr)
+                .create_pool(Some(deadpool_redis::Runtime::Tokio1))
+                .expect("Failed to create redis pool for rate limiting"),


can this error be handled?

Since it's a bootstrapping and shouldn't hurt it's better to skip the rate-limiting and start the server in case the Redis address is wrong in the configuration. Updated in 7ffe453 thanks!

nopestack · 2024-04-01T11:10:41Z

src/utils/rate_limit.rs

+            .time_to_live(
+                interval
+                    .to_std()
+                    .expect("Failed to convert duration for rate limiting memory cache"),


can this error be handled?

This is a conversion from std Duration and chrono's, because Moka uses a different Duration. If this happens something is really wrong with the app and probably we should expect here.

nopestack

Noticed some expects/unwrap at what appears to be boot time ops which is not egregious but I'd prefer the errors to be handled if it makes sense. All ok apart from that

geekbrother self-assigned this Mar 29, 2024

geekbrother changed the base branch from master to feat/update_latest_utils_rs March 29, 2024 00:01

chris13524 reviewed Mar 29, 2024

View reviewed changes

Base automatically changed from feat/update_latest_utils_rs to master March 29, 2024 08:41

geekbrother force-pushed the feat/rate_limiting branch 3 times, most recently from d8a4163 to ccd0532 Compare March 29, 2024 10:15

geekbrother temporarily deployed to prod March 29, 2024 10:16 — with GitHub Actions Inactive

geekbrother temporarily deployed to staging March 29, 2024 10:16 — with GitHub Actions Inactive

geekbrother temporarily deployed to prod March 29, 2024 12:07 — with GitHub Actions Inactive

geekbrother temporarily deployed to staging March 29, 2024 12:07 — with GitHub Actions Inactive

geekbrother temporarily deployed to prod March 29, 2024 12:41 — with GitHub Actions Inactive

geekbrother temporarily deployed to staging March 29, 2024 12:41 — with GitHub Actions Inactive

geekbrother temporarily deployed to staging March 29, 2024 13:43 — with GitHub Actions Inactive

geekbrother temporarily deployed to prod March 29, 2024 13:43 — with GitHub Actions Inactive

geekbrother requested a review from chris13524 March 29, 2024 13:47

geekbrother temporarily deployed to prod March 29, 2024 15:47 — with GitHub Actions Inactive

geekbrother temporarily deployed to staging March 29, 2024 15:47 — with GitHub Actions Inactive

chris13524 reviewed Mar 29, 2024

View reviewed changes

chris13524 approved these changes Mar 29, 2024

View reviewed changes

geekbrother force-pushed the feat/rate_limiting branch from 09d6730 to 9cafe6c Compare April 1, 2024 09:26

geekbrother temporarily deployed to staging April 1, 2024 09:27 — with GitHub Actions Inactive

geekbrother temporarily deployed to prod April 1, 2024 09:27 — with GitHub Actions Inactive

geekbrother temporarily deployed to staging April 1, 2024 09:30 — with GitHub Actions Inactive

geekbrother temporarily deployed to prod April 1, 2024 09:30 — with GitHub Actions Inactive

geekbrother marked this pull request as ready for review April 1, 2024 09:31

geekbrother requested a review from nopestack April 1, 2024 09:31

geekbrother added 2 commits April 1, 2024 11:47

feat: rate limiting

a18652c

feat: adding metric for the in-memory rate limited counter

1385ecb

geekbrother added 2 commits April 1, 2024 11:47

fix: handling properly internal error

d287972

feat: moving configuration to env variables

d9052b4

geekbrother force-pushed the feat/rate_limiting branch from 9cafe6c to 4d2a9da Compare April 1, 2024 09:49

geekbrother temporarily deployed to staging April 1, 2024 09:50 — with GitHub Actions Inactive

geekbrother temporarily deployed to prod April 1, 2024 09:50 — with GitHub Actions Inactive

nopestack reviewed Apr 1, 2024

View reviewed changes

nopestack approved these changes Apr 1, 2024

View reviewed changes

geekbrother force-pushed the feat/rate_limiting branch from 125e71f to fd1e605 Compare April 1, 2024 11:28

geekbrother temporarily deployed to prod April 1, 2024 11:29 — with GitHub Actions Inactive

geekbrother temporarily deployed to staging April 1, 2024 11:29 — with GitHub Actions Inactive

geekbrother temporarily deployed to staging April 1, 2024 11:32 — with GitHub Actions Inactive

geekbrother temporarily deployed to prod April 1, 2024 11:32 — with GitHub Actions Inactive

geekbrother temporarily deployed to prod April 1, 2024 12:30 — with GitHub Actions Inactive

geekbrother temporarily deployed to staging April 1, 2024 12:30 — with GitHub Actions Inactive

geekbrother added 4 commits April 1, 2024 15:34

feat: implement rate-limiting middleware

e075a45

feat: implement integration tests

fc7cb6d

feat: adding terraform rate-limiting variables

451ad01

fix: handling of wrong redis address in constructor

cc44946

geekbrother force-pushed the feat/rate_limiting branch from 7ffe453 to cc44946 Compare April 1, 2024 13:34

geekbrother temporarily deployed to prod April 1, 2024 13:35 — with GitHub Actions Inactive

geekbrother temporarily deployed to staging April 1, 2024 13:35 — with GitHub Actions Inactive

geekbrother merged commit e0a9fe2 into master Apr 1, 2024
16 checks passed

geekbrother deleted the feat/rate_limiting branch April 1, 2024 13:48

geekbrother mentioned this pull request Apr 12, 2024

Per-IP rate limit for RPC requests #444

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: rate limiting #600

feat: rate limiting #600

geekbrother commented Mar 29, 2024 •

edited

Loading

chris13524 Mar 29, 2024

geekbrother Mar 29, 2024

geekbrother Mar 29, 2024

chris13524 Mar 29, 2024 •

edited

Loading

geekbrother Apr 1, 2024

chris13524 Apr 1, 2024

chris13524 Mar 29, 2024

geekbrother Apr 1, 2024

chris13524 Apr 1, 2024

nopestack Apr 1, 2024

geekbrother Apr 1, 2024

nopestack Apr 1, 2024

geekbrother Apr 1, 2024

nopestack Apr 1, 2024

geekbrother Apr 1, 2024

nopestack left a comment •

edited

Loading

feat: rate limiting #600

feat: rate limiting #600

Conversation

geekbrother commented Mar 29, 2024 • edited Loading

Description

How Has This Been Tested?

Due Diligence

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chris13524 Mar 29, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nopestack left a comment • edited Loading

Choose a reason for hiding this comment

geekbrother commented Mar 29, 2024 •

edited

Loading

chris13524 Mar 29, 2024 •

edited

Loading

nopestack left a comment •

edited

Loading