[indexer]: Investigate schema changes needed for optimistic indexing #5670

tomxey · 2025-02-27T16:27:56Z

As a continuation of #5151

The transaction data will come from the node in CheckpointTransaction format:

iota/crates/iota-types/src/full_checkpoint_content.rs

Line 77 in ae815a9

pub struct CheckpointTransaction {

It contains almost all the information that we need to index a tx.
What is missing is:

tx_sequence_number
cp_sequence_number
cp_timestamp

It's not possible to get this data before checkpoint is created.
The most problematic are the cp/tx sequence numbers, since they are extensively used in the whole indexer DB schema as primary/foreign keys.

We could try to store the new transactions/objects/events/... in separate tables to avoid a need to assign sequence numbers to them.

Regardless of the schema design, we would most likely need to provide some common ordering for objects from normal and "optimistic" tables to have consistent api responses.
If we would allow that the tx_seq number can change for a tx when it is moved from "optimistic" to normal tables, then it would be possible to e.g. miss the object in the response if client uses paging, since the object can belong to future pages at the beginning and then be moved to previous pages, making the client not see it at all.

One way to make tx_seq numbers consistent is to assign them as we go, the first time we see them in the indexer (that's the way node handles it), and never change this number again, even after checkpoint arrives.
The downside is that the tx_seq numbers in indexer will not be consistent then with the ordering in the checkpoint, and the ordering may altogether be different for all indexer instances (txs optimistically indexed by given indexer will always come first).
Would it be acceptable?

One possibility here may be to stop using tx_seq (and cp_seq?) internally to order objects, and order them e.g. via Digest, or some other value that is known from the start.
This will totally break the API property that objects are ordered by creation time though.
Is it an option?

Another possibility would be to try to support only a subset of queries, e.g only get_object queries and transaction builder queries, and skip endpoints that need ordering to return objects.
Would it be acceptable to have such inconsistent api? With tx effects available in some endpoints, but not in others.

Could data retrieval from 2 different tables be supported by graphql? Or is it bound to single table per object type?

Graphql seems to use the concept of checkpoint_viewed_at when retrieving the data. Would it cause issues with "optimistic" data that do not yet have a checkpoint?

If we would decide to store optimistically indexed transactions in the already existing tables, then we would also need to fill checkpoint number for them somehow.

The text was updated successfully, but these errors were encountered:

tomxey added the infrastructure Issues related to the Infrastructure Team label Feb 28, 2025

kodemartin assigned tomxey and kodemartin Feb 28, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[indexer]: Investigate schema changes needed for optimistic indexing #5670

[indexer]: Investigate schema changes needed for optimistic indexing #5670

tomxey commented Feb 27, 2025 •

edited

Loading

[indexer]: Investigate schema changes needed for optimistic indexing #5670

[indexer]: Investigate schema changes needed for optimistic indexing #5670

Comments

tomxey commented Feb 27, 2025 • edited Loading

tomxey commented Feb 27, 2025 •

edited

Loading