chore(rfc): operation cache warmer #1115

StarpTech · 2024-08-25T12:44:42Z

Motivation and Context

TODO

Tests or benchmark included
Documentation is changed or added on https://app.gitbook.com/
PR title must follow conventional-commit-standard

rfc/distributed-operation-cache.md

jensneuse · 2024-08-25T19:28:18Z

rfc/distributed-operation-cache.md

+The User can push individual operations to the distributed operation cache by using the CLI:
+
+```bash
+wgc router cache add -g mygraph operations.json


Should this be under the namespace "router"?
At which level do we apply cache warming?
Org? Namespace? Graph? Router?

As an alternative, you could consider creating a "cache-warming-bucket", which you can then use to add operations, and which can be used in the router config, e.g.:

version: "1" cache_warmup: enabled: true interval: 5m bucket: myBucket

I'm not sure though if this is the right direction.
The cache warming bucket is somewhat closely related to a Graph because the containing operations need to be compatible with the client schema of the graph.

rfc/distributed-operation-cache.md

jensneuse · 2024-08-25T19:33:57Z

rfc/distributed-operation-cache.md

+]
+```
+
+The cli command is idempotent and always updates the cache with the latest operations. This doesn't trigger the computation of the Top-N operations which is done periodically by the Cosmo Platform.


Another aspect to think about is pre-compiling operation plans.
We could make plans serializable, generate and serialize them at composition time.
Once the router wants to "load" them, it doesn't have to plan everything but can load the plan much faster.

vasylruban-nw

Thank you so much for drafting this proposal, it looks great, can't wait to see it live!

Left a few questions I had during reading.

vasylruban-nw · 2024-09-05T22:23:12Z

rfc/operation-cache-warmer.md

+The User can push individual operations to the operation cache by using the CLI:
+
+```bash
+wgc federated-graph operation-cache add --graph mygraph --file operations.json


Curious about manual cache invalidation.

What will happen in the following scenario:

First, we run:

wgc federated-graph operation-cache add --graph mygraph --file operationA.json

Then we run:

wgc federated-graph operation-cache add --graph mygraph --file operationB.json

What will we have in the cache after the second operation? Will it be just Operation B or both Operation A and Operation B?

Curious, because if it is both Operation A and Operation B, the cache eventually will be filled with the manual operation and there will not be space for automatic operations unless we can invalidate it explicitly.

This is right. Both operations will be in the "batch". This is the part where a customer can manage the cache themself and is responsible of it. At some point, we could also introduce a percentage limit about how much space can be reserved for manual pushed operations. In the first version, we want to keep it simple.

vasylruban-nw · 2024-09-05T22:32:46Z

rfc/operation-cache-warmer.md

+The User can push individual operations to the operation cache by using the CLI:
+
+```bash
+wgc federated-graph operation-cache add --graph mygraph --file operations.json


I wonder if this command could be executed automatically (like part of CI/CD) or if we have to call it manually?

Yes, the idea is to run this locally as well as part of the CI/CD process.

Good point. I'd recommend to make the operation idempotent so that we can run this over and over again on each deployment without having duplicates.

That was the idea. Operations are identified by the hash and appended to the "batch" until the cache is full. A user can repush the changes in another pipeline run to re-prioritize the list.

vasylruban-nw · 2024-09-05T22:48:24Z

rfc/operation-cache-warmer.md

+
+### Cosmo UI integration
+
+A User can disable the operation cache in the Cosmo UI. The User can see the current operations in the cache and remove them if necessary. The User can also see the current status of the cache and the last computation time.


Dumb question:

what will happen if we disable the operation cache in the Cosmo UI but keep cache_warmup enabled in the router configuration?

In that case, Cosmo would no longer compute the latest TopN operations for you and your router will fetch at some point an outdated list of operations. It won't break anything.

I'd say that in this case, the Router would try to fetch operations for warmup, but the CP won't return anything, so nothing will be warmed up.

The latest batch of operations has been pushed on the S3. The router will fetch for it on startup and schema changes. If nothing changes, the router will always fetch the latest available batch. There is no interaction with the controlplane.

This is one possibility. It would be the "best-effort" approach because there might be operations that can still benefit from the stale cache.

Another solution is to delete the cache after the user has disabled it in the Studio, in that case the router won't find any artifact and skip the warm up. I think this is the case that @jensneuse described.

This could be made configurable.

I think it's reasonable to delete the warming data in S3 when the feature is disabled.

I'm fine with both ways. 👍 Let's purge it.

joshlevinson · 2024-09-05T23:08:04Z

rfc/operation-cache-warmer.md

+- Total operation pre-execution time: Normalization, Validation, Planning
+- Total request count
+
+The Top-N computation is done for a specific time interval e.g. 3-72 hour (configurable). The operations are sorted by the pre-execution time and request count. The Top-N operations are then pushed to the operation cache. Manual operations have a higher priority than automatic operations. This means when the cache capacity is reached, manual operations are moved to the cache first and automatic operations are removed.


How will Top-N prioritize between request count and pre-execution time? I could see wanting to prioritize slowest planned queries and then sort by request count or vice versa.

The ultimate goal is to prepare operations in advance, focusing on those that take the most time. We will only sort by request count when two items have the same pre-execution time.

I think @joshlevinson is right in that we should prioritize by planning time and then request count.

We mean the same thing 😄

vasylruban-nw · 2024-09-09T21:01:05Z

rfc/operation-cache-warmer.md

+
+### Platform integration
+
+For containerized environments like Kubernetes, users should use the readiness probe to ensure that the router is ready to accept traffic. Setting not to small values for the readiness probe timeout is recommended to ensure that the router has enough time to prepare the cache. For schema updates after startup, this process is non-blocking because the new graph schema isn't swapped until the cache is warmed up.


To double-check:
/health/ready will be successful only when the router is warmed up, right?

That's right!

github-actions · 2024-09-25T05:19:19Z

This PR was marked stale due to lack of activity. It will be closed in 14 days.

github-actions · 2024-10-09T05:19:39Z

Closed as inactive. Feel free to reopen if this PR is still being worked on.

github-actions · 2024-10-24T05:19:18Z

This PR was marked stale due to lack of activity. It will be closed in 14 days.

github-actions · 2024-11-08T05:19:22Z

This PR was marked stale due to lack of activity. It will be closed in 14 days.

github-actions · 2024-11-23T05:20:08Z

This PR was marked stale due to lack of activity. It will be closed in 14 days.

chore(rfc): distributed operation cache

115d1b7

StarpTech requested review from devsergiy, AndreasZeissner, jensneuse, thisisnithin, Aenimus and JivusAyrus August 25, 2024 12:44

StarpTech added 14 commits August 25, 2024 14:45

chore: add code highlight

0a709eb

chore: add comment

ac09b1c

chore: improve

4db8584

chore: improve

1f5a7e1

chore: improve

5eb656f

chore: improve

90d8a1a

chore: improve

675b5d7

chore: improve

b970a75

chore: improve

b84bf31

chore: improve

db443c8

chore: improve

cf19728

chore: correct typo

9960125

chore: improve

aabc645

chore: add example

e890cfe

jensneuse reviewed Aug 25, 2024

View reviewed changes

rfc/distributed-operation-cache.md Outdated Show resolved Hide resolved

chore: add example

bc988b7

jensneuse reviewed Aug 25, 2024

View reviewed changes

rfc/distributed-operation-cache.md Outdated Show resolved Hide resolved

jensneuse reviewed Aug 25, 2024

View reviewed changes

StarpTech added 2 commits August 25, 2024 21:34

chore: fix typo

076fb7f

chore: rename rfc

d3e5984

StarpTech changed the title ~~chore(rfc): distributed operation cache~~ chore(rfc): operation cache warmer Aug 25, 2024

chore: address feedback

27a3604

vasylruban-nw reviewed Sep 5, 2024

View reviewed changes

joshlevinson reviewed Sep 5, 2024

View reviewed changes

vasylruban-nw reviewed Sep 9, 2024

View reviewed changes

github-actions bot added the Stale label Sep 25, 2024

github-actions bot closed this Oct 9, 2024

StarpTech removed the Stale label Oct 9, 2024

StarpTech reopened this Oct 9, 2024

github-actions bot added the Stale label Oct 24, 2024

StarpTech removed the Stale label Oct 24, 2024

github-actions bot added the Stale label Nov 8, 2024

StarpTech removed the Stale label Nov 8, 2024

github-actions bot added the Stale label Nov 23, 2024

StarpTech removed the Stale label Nov 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(rfc): operation cache warmer #1115

chore(rfc): operation cache warmer #1115

StarpTech commented Aug 25, 2024

jensneuse Aug 25, 2024

jensneuse Aug 25, 2024

vasylruban-nw left a comment

vasylruban-nw Sep 5, 2024

StarpTech Sep 9, 2024

vasylruban-nw Sep 5, 2024

StarpTech Sep 9, 2024

jensneuse Sep 10, 2024

StarpTech Sep 10, 2024 •

edited

Loading

vasylruban-nw Sep 5, 2024

StarpTech Sep 9, 2024 •

edited

Loading

jensneuse Sep 10, 2024

StarpTech Sep 10, 2024 •

edited

Loading

jensneuse Sep 10, 2024

StarpTech Sep 10, 2024 •

edited

Loading

joshlevinson Sep 5, 2024

StarpTech Sep 9, 2024 •

edited

Loading

jensneuse Sep 10, 2024

StarpTech Sep 10, 2024

vasylruban-nw Sep 9, 2024

StarpTech Sep 9, 2024

github-actions bot commented Sep 25, 2024

github-actions bot commented Oct 9, 2024

github-actions bot commented Oct 24, 2024

github-actions bot commented Nov 8, 2024

github-actions bot commented Nov 23, 2024


		### Cosmo UI integration

		A User can disable the operation cache in the Cosmo UI. The User can see the current operations in the cache and remove them if necessary. The User can also see the current status of the cache and the last computation time.


		### Platform integration

		For containerized environments like Kubernetes, users should use the readiness probe to ensure that the router is ready to accept traffic. Setting not to small values for the readiness probe timeout is recommended to ensure that the router has enough time to prepare the cache. For schema updates after startup, this process is non-blocking because the new graph schema isn't swapped until the cache is warmed up.

chore(rfc): operation cache warmer #1115

Are you sure you want to change the base?

chore(rfc): operation cache warmer #1115

Conversation

StarpTech commented Aug 25, 2024

Motivation and Context

TODO

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vasylruban-nw left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

StarpTech Sep 10, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

StarpTech Sep 9, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

StarpTech Sep 10, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

StarpTech Sep 10, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

StarpTech Sep 9, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Sep 25, 2024

github-actions bot commented Oct 9, 2024

github-actions bot commented Oct 24, 2024

github-actions bot commented Nov 8, 2024

github-actions bot commented Nov 23, 2024

StarpTech Sep 10, 2024 •

edited

Loading

StarpTech Sep 9, 2024 •

edited

Loading

StarpTech Sep 10, 2024 •

edited

Loading

StarpTech Sep 10, 2024 •

edited

Loading

StarpTech Sep 9, 2024 •

edited

Loading