-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gateway caching proposal #1773
Gateway caching proposal #1773
Conversation
Important Auto Review SkippedDraft detected. Please check the settings in the CodeRabbit UI or the To trigger a single review, invoke the Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configration File (
|
Nice, it'd be interesting to have a think about which of these we'd actually see any hits on. Distinguishing between encrypted (sensitive) and non-encrypted methods might be interesting too. Like caching getBlockByHash probably more useful than caching getTransactionByHash because the latter is only useful to one user. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think all calls are cache-able.
The TTL differs.
Also, would be good to touch for "eth_call" - for example, what would be the key for caching
design/gateway/gateway_caching.md
Outdated
|
||
## 1. Why cache requests on Gateway? | ||
|
||
Currently, all `eth_` requests that hit the gateway are forwarded to our Sequencer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the sequencer is temporary. Let's say to the Ten Nodes and are executed by the Ten enclave
design/gateway/gateway_caching.md
Outdated
- `eth_sign` | ||
- `eth_signTransaction` | ||
|
||
Non-cacheable request methods are: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
they are also cache-able, but with a TTL until the next batch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed. What do you think about eth_sendRawTransaction
we can probably also catch with TTL until next batch.
(Assuming the key is the whole transaction and submitting the same tx twice in short time period should give the same response).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given we're expecting all user traffic to come through this gateway, we might want to consider a standalone cache for testnet/ mainnet. This would reduce the memory overhead but you'd just get a bit of additional infrastructure complexity. We can add specific rate-limiting for the non-cacheable methods to prevent a targeted DDoS via those.
If this is something we'd want to consider you would want to define a cache interface with an in-memory implementation for dev and a redis/ memcached implementation that we could use for production which would be specified in the application configuration.
It would be nice to have some indication of the value of a cache before investing too much time and resources. Intuitively, it feels to me like we're going to have a tiny cache hit ratio overall (like I'd guess < 1%). Maybe I'm wrong, or might be that even the savings from occasional cache hits for certain methods make it worthwhile. But I'd be interested to hear any hunches or see some back-of-an-envelope maths or finger-in-the-air stats to motivate a cache. Feels a bit like premature optimisation to me. |
Are there any prometheus/ grafana dashboards we can use to track the requests being made? That way we could quantify the % of hits and extrapolate out to anticipated traffic |
I think it might be viable to just give a go on a OOB caching package before building a custom one out. Just to see what sane defaults might accomplish and get quick functionality. Like this one https://github.com/dgraph-io/ristretto will even export metrics it seems. So it can just bolt on and run for a bit and see how the metrics play out. |
Here are the stats from Sepolia Ten Gateway in the past ~6 days.
There are a few methods that get called a lot, more than half of the methods were never called in the past week. |
I looked into |
Thanks @zkokelj those numbers are really useful, much easier to reason about with some data. Caching on those more public values seems worthwhile for sure, but worth noting that the methods we're likely to cache only hit the host and not the enclave, so might not help much with the main bottlenecks. |
Good stuff, and an important proposal.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Why this change is needed
To discuss if and how to best implement caching on Ten Gateway.
https://github.com/ten-protocol/ten-internal/issues/2851
What changes were made as part of this PR
Please provide a high level list of the changes made
PR checks pre-merging
Please indicate below by ticking the checkbox that you have read and performed the required
PR checks