Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[indexer]: Investigate schema changes needed for optimistic indexing #5670

Open
tomxey opened this issue Feb 27, 2025 · 0 comments
Open

[indexer]: Investigate schema changes needed for optimistic indexing #5670

tomxey opened this issue Feb 27, 2025 · 0 comments
Assignees
Labels
infrastructure Issues related to the Infrastructure Team

Comments

@tomxey
Copy link
Contributor

tomxey commented Feb 27, 2025

As a continuation of #5151

The transaction data will come from the node in CheckpointTransaction format:

pub struct CheckpointTransaction {

It contains almost all the information that we need to index a tx.
What is missing is:

  • tx_sequence_number
  • cp_sequence_number
  • cp_timestamp

It's not possible to get this data before checkpoint is created.
The most problematic are the cp/tx sequence numbers, since they are extensively used in the whole indexer DB schema as primary/foreign keys.

We could try to store the new transactions/objects/events/... in separate tables to avoid a need to assign sequence numbers to them.

Regardless of the schema design, we would most likely need to provide some common ordering for objects from normal and "optimistic" tables to have consistent api responses.
If we would allow that the tx_seq number can change for a tx when it is moved from "optimistic" to normal tables, then it would be possible to e.g. miss the object in the response if client uses paging, since the object can belong to future pages at the beginning and then be moved to previous pages, making the client not see it at all.

One way to make tx_seq numbers consistent is to assign them as we go, the first time we see them in the indexer (that's the way node handles it), and never change this number again, even after checkpoint arrives.
The downside is that the tx_seq numbers in indexer will not be consistent then with the ordering in the checkpoint, and the ordering may altogether be different for all indexer instances (txs optimistically indexed by given indexer will always come first).
Would it be acceptable?

One possibility here may be to stop using tx_seq (and cp_seq?) internally to order objects, and order them e.g. via Digest, or some other value that is known from the start.
This will totally break the API property that objects are ordered by creation time though.
Is it an option?

Another possibility would be to try to support only a subset of queries, e.g only get_object queries and transaction builder queries, and skip endpoints that need ordering to return objects.
Would it be acceptable to have such inconsistent api? With tx effects available in some endpoints, but not in others.

Could data retrieval from 2 different tables be supported by graphql? Or is it bound to single table per object type?

Graphql seems to use the concept of checkpoint_viewed_at when retrieving the data. Would it cause issues with "optimistic" data that do not yet have a checkpoint?

If we would decide to store optimistically indexed transactions in the already existing tables, then we would also need to fill checkpoint number for them somehow.

@tomxey tomxey added the infrastructure Issues related to the Infrastructure Team label Feb 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
infrastructure Issues related to the Infrastructure Team
Projects
None yet
Development

No branches or pull requests

2 participants