indexer-alt: sum_obj_types pipeline #20054

amnn · 2024-10-28T13:53:09Z

Description

The sum_obj_types sequential pipeline is the sui-indexer-alt equivalent of objects and objects_snapshot. It is a sum-mary of the object owner and type information as of a particular checkpoint (its watermark).

It differs from objects and objects_snapshot in the following ways:

It only contains owner and type information (no coin balance, content, digest, or dynamic field information), but has a more "complete" set of indices for this data.
The type information has been standardised into the same style as used by event indices.
It uses the painter's algorithm to avoid processing redundant object updates (process transactions in a checkpoint in reverse order).
Updates to the underlying table are atomic (all writes are sent out within a single transaction, along with a watermark update).
It avoids redundant writes across checkpoints as well (e.g. if an object is modified multiple times, only the last modification will be turned into a DB update).

This PR does not include the logic to delay updates (necessary to replicate the behaviour of objects_snapshot). This will be added at a later date.

Test plan

Ran the indexer on the first 1M checkpoints in mainnet (produced 2481 live objects by the end):

sui$ cargo run -p sui-indexer-alt --release --                                   \
  --database-url "postgres://postgres:postgrespw@localhost:5432/sui_indexer_alt" \
  --remote-store-url https://checkpoints.mainnet.sui.io                          \
  --last-checkpoint 1000000

Stack

Release notes

Check each box that your changes affect. If none of the boxes relate to your changes, release notes aren't required.

For each box you select, include information after the relevant heading that describes the impact of your changes that a user might notice and any actions they must take to implement updates.

vercel · 2024-10-28T13:53:14Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
sui-docs	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Oct 29, 2024 6:00pm

3 Skipped Deployments

Name	Status	Preview	Updated (UTC)
multisig-toolkit	⬜️ Ignored (Inspect)	Visit Preview	Oct 29, 2024 6:00pm
sui-kiosk	⬜️ Ignored (Inspect)	Visit Preview	Oct 29, 2024 6:00pm
sui-typescript-docs	⬜️ Ignored (Inspect)	Visit Preview	Oct 29, 2024 6:00pm

crates/sui-indexer-alt/src/models/objects.rs

crates/sui-indexer-alt/migrations/2024-10-27-150938_sum_obj_types/up.sql

wlmyng · 2024-10-28T22:49:21Z

crates/sui-indexer-alt/migrations/2024-10-27-150938_sum_obj_types/up.sql

+ON sum_obj_types (owner_kind, owner_id, object_id, object_version);
+
+CREATE INDEX IF NOT EXISTS sum_obj_types_pkg
+ON sum_obj_types (package, object_id, object_version);


do we need to include the object_version in these indices, since this only tracks the live objects set?

Yeah, I think you're right, we can get rid of it -- I added it for ordering purposes, originally.

Actually, I take it back -- we need the version in the index because we are querying for it. By including the version in the index we can keep this as an index-only scan.

What about

CREATE INDEX IF NOT EXISTS sum_obj_types_pkg ON sum_obj_types (package, object_id) INCLUDE (object_version)

The only benefit of doing this is if the index is otherwise unique on (package, object_id), which we do not need, so I prefer to use the form that doesn't require the extra SQL features.

crates/sui-indexer-alt/migrations/2024-10-27-150938_sum_obj_types/up.sql

crates/sui-indexer-alt/src/models/objects.rs

crates/sui-indexer-alt/src/handlers/sum_obj_types.rs

## Description The `sum_obj_types` sequential pipeline is the `sui-indexer-alt` equivalent of `objects` and `objects_snapshot`. It is a `sum`-mary of the object owner and type information as of a particular checkpoint (its watermark). It differs from `objects` and `objects_snapshot` in the following ways: - It only contains owner and type information (no coin balance, content, digest, or dynamic field information), but has a more "complete" set of indices for this data. - The type information has been standardised into the same style as used by event indices. - It uses the painter's algorithm to avoid processing redundant object updates (process transactions in a checkpoint in reverse order). - Updates to the underlying table are atomic (all writes are sent out within a single transaction, along with a watermark update). - It avoids redundant writes across checkpoints as well (e.g. if an object is modified multiple times, only the last modification will be turned into a DB update). This PR does not include the logic to delay updates (necessary to replicate the behaviour of `objects_snapshot`). This will be added at a later date. ## Test plan Ran the indexer on the first 1M checkpoints in mainnet (produced 2481 live objects by the end): ``` sui$ cargo run -p sui-indexer-alt --release -- \ --database-url "postgres://postgres:postgrespw@localhost:5432/sui_indexer_alt" \ --remote-store-url https://checkpoints.mainnet.sui.io \ --last-checkpoint 1000000 ```

## Description Trying to make the purposes of various tuning parameters clearer by renaming them. ## Test plan :eyes: ## Stack - #20050 - #20051 - #20052 - #20053 - #20054 --- ## Release notes Check each box that your changes affect. If none of the boxes relate to your changes, release notes aren't required. For each box you select, include information after the relevant heading that describes the impact of your changes that a user might notice and any actions they must take to implement updates. - [ ] Protocol: - [ ] Nodes (Validators and Full nodes): - [ ] Indexer: - [ ] JSON-RPC: - [ ] GraphQL: - [ ] CLI: - [ ] Rust SDK: - [ ] REST API:

crates/sui-indexer-alt/src/handlers/sum_obj_types.rs

gegaowp · 2024-10-29T19:19:55Z

crates/sui-indexer-alt/src/handlers/sum_obj_types.rs

+                .execute(conn)
+        });
+
+        let deleted: usize = try_join_all(delete_chunks).await?.into_iter().sum();


here updates and deletions are handled separately and sequentically, which dilutes the tuning power of BATCH_SIZE and likely generate small DB commit especially for deletions, wdyt of splitting updates and deletions earlier？

I'm not sure I understand the suggestion (to split updates and deletions earlier) could you elaborate?

The reason why I ended up having to split them up here was because of access to the connection -- Rust did not like me sharing it between two closures (the map over update chunks, and the map over deletion chunks), because it is a &mut.

There are other ways around this, where we split the updates, chunk them up, and then recombine them into a single stream of updates. Then we can map over these updates in a single pass, using one closure that can take ownership of the conn. We can do that if we notice this is a bottleneck.

yes, I saw the &mut conn sharing issue too; and by splitting earlier, I meant having separate functions for commit_mutations and commit_deletions, so that we can batch the mutations and deletions "earlier" and avoid committing very small deletion chunks b/c deletion count is often smaller than mutation count https://metrics.sui.io/goto/OcJUs1WNg?orgId=1 It makes the trait more verbose indeed so it has pros and cons.

gegaowp · 2024-10-30T15:14:13Z

crates/sui-indexer-alt/src/handlers/sum_obj_types.rs

+        }
+    }
+
+    async fn commit(values: &Self::Batch, conn: &mut db::Connection<'_>) -> anyhow::Result<usize> {


can you remind me why you chose a conn instead of a pool here, as naively try_join_all will interleave on execution?

The connection is tied to a transaction -- all these commits are done atomically, alongside the watermark update. This is required for consistency but something that the current indexer does not do.

amnn requested review from lxfind, bmwill, emmazzz, gegaowp and wlmyng October 28, 2024 13:53

amnn self-assigned this Oct 28, 2024

amnn commented Oct 28, 2024

View reviewed changes

crates/sui-indexer-alt/src/models/objects.rs Show resolved Hide resolved

amnn force-pushed the amnn/idx-sum-obj-types branch from 86d530b to e81dfcb Compare October 28, 2024 13:56

vercel bot deployed to Preview – sui-docs October 28, 2024 13:59 View deployment

amnn mentioned this pull request Oct 28, 2024

easy(indexer-alt): clarify names of tuning parameters #20055

Merged

8 tasks

amnn force-pushed the amnn/idx-seq branch from ce52e8a to 93476f9 Compare October 28, 2024 15:25

amnn force-pushed the amnn/idx-sum-obj-types branch from e81dfcb to d2b7457 Compare October 28, 2024 15:25

vercel bot deployed to Preview – sui-docs October 28, 2024 15:30 View deployment

wlmyng reviewed Oct 28, 2024

View reviewed changes

amnn force-pushed the amnn/idx-seq branch from 93476f9 to 8a2210b Compare October 29, 2024 14:47

amnn force-pushed the amnn/idx-sum-obj-types branch from d2b7457 to ffb298b Compare October 29, 2024 14:49

vercel bot deployed to Preview – sui-docs October 29, 2024 14:51 View deployment

Base automatically changed from amnn/idx-seq to amnn/idx-proc October 29, 2024 14:52

amnn mentioned this pull request Oct 29, 2024

indexer-alt: sum_coin_balances pipeline #20070

Merged

8 tasks

Base automatically changed from amnn/idx-proc to main October 29, 2024 16:46

amnn requested a review from wlmyng October 29, 2024 17:01

amnn added 5 commits October 29, 2024 17:40

fixup: explain sum_obj_types fields

76fcaa0

fixup: StoredSumObjTypes primary key

8be8a4b

fixup: don't delete unwrapped then deleted objects

075da68

fixup: make StoredOwnerKind readable from SQL

3e5c7bb

amnn force-pushed the amnn/idx-sum-obj-types branch from ffb298b to 3e5c7bb Compare October 29, 2024 17:59

vercel bot deployed to Preview – sui-docs October 29, 2024 18:00 View deployment

wlmyng approved these changes Oct 29, 2024

View reviewed changes

amnn merged commit 27978f1 into main Oct 29, 2024
52 checks passed

amnn deleted the amnn/idx-sum-obj-types branch October 29, 2024 18:31

gegaowp reviewed Oct 29, 2024

View reviewed changes

gegaowp reviewed Oct 30, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

indexer-alt: sum_obj_types pipeline #20054

indexer-alt: sum_obj_types pipeline #20054

amnn commented Oct 28, 2024

vercel bot commented Oct 28, 2024 •

edited

Loading

wlmyng Oct 28, 2024

amnn Oct 29, 2024

amnn Oct 29, 2024

wlmyng Oct 29, 2024

amnn Oct 29, 2024

gegaowp Oct 29, 2024

amnn Oct 30, 2024

gegaowp Oct 30, 2024

gegaowp Oct 30, 2024

amnn Oct 30, 2024

indexer-alt: sum_obj_types pipeline #20054

indexer-alt: sum_obj_types pipeline #20054

Conversation

amnn commented Oct 28, 2024

Description

Test plan

Stack

Release notes

vercel bot commented Oct 28, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vercel bot commented Oct 28, 2024 •

edited

Loading