-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: rate limiting #600
feat: rate limiting #600
Conversation
src/lib.rs
Outdated
const RATE_LIMIT_MAX_TOKENS: u32 = 300; | ||
const RATE_LIMIT_INTERVAL: Duration = Duration::from_secs(1); | ||
const RATE_LIMIT_REFILL_RATE: u32 = 100; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this is 100 req/s per IP? Seems really high. What about 2 req/s with burst of 100?
Plus I think there should be a second rate limit for the project, e.g. 1k/s.
const RATE_LIMIT_MAX_TOKENS: u32 = 300; | |
const RATE_LIMIT_INTERVAL: Duration = Duration::from_secs(1); | |
const RATE_LIMIT_REFILL_RATE: u32 = 100; | |
const RATE_LIMIT_MAX_TOKENS: u32 = 100; | |
const RATE_LIMIT_INTERVAL: Duration = Duration::from_secs(1); | |
const RATE_LIMIT_REFILL_RATE: u32 = 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Plus I think there should be a second rate limit for the project, e.g. 1k/s.
Speaking about project ID limiting I think this should be a follow-up PR. Let's ship this incrementally.
d8a4163
to
ccd0532
Compare
src/handlers/mod.rs
Outdated
.is_rate_limited( | ||
path.as_str(), | ||
&network::get_forwarded_ip(headers.clone()) | ||
.unwrap_or_else(|| connect_info.0.ip()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this defaulting to connecting IP is unsafe. If something breaks with the forwarding headers then we should not be using the connecting IP as that's supposed to be the load balancer (and will result in major unavailability due to rate limiting the load balancer). We should at least be logging this as an error, not silently defaulting. And possibly we should not use a default at all and disable the rate limit or some other type of disable.
In Notify Server for the analytics if the forwarded header wasn't available I didn't lookup the connecting IP and instead recorded empty geo location values and emitted an error!()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the comment. This is a good suggestion! If we are skipping the connect IP in case of absence of X-Forwarded we will break the local testing (since we don't have X-Forwarded in local tests).
My assumption is that if the Load Balancer doesn't report forwarded IP - the LB is broken and that will break a lot not only rate-limiting. I totally agree that we should log an error here in case we don't have a Forwarded IP, so we can fast and easily spot the issue in case we have LB but not have a forwarded IP.
I've added an error log in case we don't have a forwarded IP.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the issue with local testing? Can't you set an X-Forwarded
header?
src/handlers/history.rs
Outdated
state.clone(), | ||
headers.clone(), | ||
connect_info, | ||
path.clone(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think using the HTTP request path as the key is a bit naive and I would prefer an enum. For e.g. proxy there are a number of endpoints that resolve to the same handler. Not sure for history we might want 1 rate limit counter for the whole history API but not sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for catching this!
The bucket key is a composite key for IP and matched path. The idea is that if there is a legitimate user he shouldn't exceed our max tokens anyway. So we can leave only IP-based rate-limiting btw.
If there are malicious floods and we add an exact handler path instead of a matched path we can give an additional way to flood us. A matched path was added to the key composition for example for testing facilities, but not to give users making requests max tokens * on each path * path parameters.
So my point of view we should be ok with that and no legit users should be affected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I assumed this was matched path (which is what it was). Still my suggestion was to not use the path as the rate limit and prefer something more explicit. E.g. there's both /v1
and /v1/
matched paths would be the proxy handler, and in the future we might want to group them. But no big deal right now, can always refactor later if needed.
But we should be aware that they can get 2x the rate limit for the proxy handler than we might expect them to be able to.
09d6730
to
9cafe6c
Compare
9cafe6c
to
4d2a9da
Compare
let app = app.with_state(state_arc.clone()); | ||
|
||
info!("v{}", build_version); | ||
info!("Running RPC Proxy on port {}", port); | ||
info!("Running Blockchain-API server on port {}", port); | ||
let addr: SocketAddr = format!("{host}:{port}") | ||
.parse() | ||
.expect("Invalid socket address"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this be handled as an error instead of expecting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the comment! This is an initial bootstrap and if the server can't bind to the listening address we must stop execution with the fatal error. So it's expected here.
src/utils/rate_limit.rs
Outdated
let redis_pool = Arc::new( | ||
deadpool_redis::Config::from_url(redis_addr) | ||
.create_pool(Some(deadpool_redis::Runtime::Tokio1)) | ||
.expect("Failed to create redis pool for rate limiting"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this error be handled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since it's a bootstrapping and shouldn't hurt it's better to skip the rate-limiting and start the server in case the Redis address is wrong in the configuration. Updated in 7ffe453 thanks!
.time_to_live( | ||
interval | ||
.to_std() | ||
.expect("Failed to convert duration for rate limiting memory cache"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this error be handled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a conversion from std Duration
and chrono's, because Moka uses a different Duration
. If this happens something is really wrong with the app and probably we should expect here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noticed some expects/unwrap at what appears to be boot time ops which is not egregious but I'd prefer the errors to be handled if it makes sense. All ok apart from that
125e71f
to
fd1e605
Compare
7ffe453
to
cc44946
Compare
Description
This PR applies shared rate limiting crate from utils-rs to blockchain-api endpoints.
The combination of
IP
address and the endpoint is used as the identification key for the token bucket.The following configuration environment variables were added:
RPC_PROXY_STORAGE_RATE_LIMITING_CACHE_REDIS_ADDR_READ
andRPC_PROXY_STORAGE_RATE_LIMITING_CACHE_REDIS_ADDR_WRITE
are Redis endpoints for the L2 cache.RPC_PROXY_RATE_LIMITING_MAX_TOKENS
token bucket size.RPC_PROXY_RATE_LIMITING_REFILL_INTERVAL_SEC
token refill interval in seconds.RPC_PROXY_RATE_LIMITING_REFILL_RATE
token refill rate per interval.To enable rate-limiting all of the required variables must be set, otherwise, the rate-limiting will be disabled with the warning message at start.
Rate-limited counter metrics were added at e8a5cc4 to the Prometheus and Grafana panel to track how many rate-limited entries the L1 in-memory cache has so we can track how much users we are rate limiting at the moment and history.
Axum middleware was implemented and added to handlers in 3d0e989.
Default rate-limiting configuration in terraform variables in fd1e605:
How Has This Been Tested?
Due Diligence