-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[iota-analytics-indexer] Add README.md #4470
Changes from 2 commits
9e109b0
193811b
986f859
2603f68
caa86bb
78615db
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||
---|---|---|---|---|---|---|---|---|
@@ -0,0 +1,125 @@ | ||||||||
IOTA Analytics Indexer | ||||||||
======================= | ||||||||
|
||||||||
The IOTA Analytics Indexer is a service that exports data from the main IOTA network to a remote big object store (S3/GCS/Azure) for further analytical processing. It does not perform any analysis on its own. | ||||||||
|
||||||||
**Key Features** | ||||||||
---------------- | ||||||||
|
||||||||
* Exports data from the IOTA network to a remote big object store | ||||||||
* Provides BigQuery and Snowflake schemas for the exported data | ||||||||
|
||||||||
> Note: BigQuery and Snowflake are cloud-based data warehousing solutions. | ||||||||
> After getting data there one can analyse it in the cloud using SQL queries. | ||||||||
> | ||||||||
> BigQuery is part of Google Cloud Platform: [https://cloud.google.com/bigquery] | ||||||||
> | ||||||||
> Snowflake isn't part of any large cloud provider: [https://snowflake.com] | ||||||||
|
||||||||
|
||||||||
**Relation to iota-indexer** | ||||||||
---------------------------- | ||||||||
|
||||||||
### iota-indexer | ||||||||
|
||||||||
Currently iota-indexer is computing and storing analytical metrics about: | ||||||||
- network statistics (amount of transactions, transactions per second) | ||||||||
- (active) addresses (transactions senders/recipients) | ||||||||
- move calls | ||||||||
|
||||||||
Those metrics are computed by a separate analytical worker instance of the indexer, but it uses the main DB as the main indexer instance. | ||||||||
|
||||||||
It seems that some of the values stored in main indexer tables by iota-indexer's `fullnode_sync_worker` are only stored there for analytical purposes (move calls, tx recipients) and could potentially be not processed/stored if not for analytical reasons. | ||||||||
|
||||||||
### iota-analytics-indexer | ||||||||
|
||||||||
The `iota-analytics-indexer` is not computing any analytical metrics directly. | ||||||||
It is only exporting data for further processing via external tools (BigQuery/SnowFlake). | ||||||||
Functionality from `iota-indexer` that is not required to serve user JSON RPC/GraphQL requests could potentially be moved away from `iota-indexer` and served by some other tool based on data exported by the `iota-analytics-indexer`. | ||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nit: Phrasing (if I get it right)
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done |
||||||||
The sync logic in `iota-indexer` could be simplified a bit then to store only data that is needed to serve user requests. | ||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nit:
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done |
||||||||
|
||||||||
|
||||||||
|
||||||||
**Schemas** | ||||||||
----------- | ||||||||
|
||||||||
The crate provides: | ||||||||
- [BigQuery Schemas](src/store/bq/schemas/) | ||||||||
- [SnowFlake Schemas](src/store/snowflake/schemas/) | ||||||||
- [Rust struct representations](src/tables.rs) | ||||||||
|
||||||||
for the data that it is exporting. | ||||||||
|
||||||||
The tables covered by the schemas: | ||||||||
- CHECKPOINT | ||||||||
- EVENT | ||||||||
- MOVE_CALL | ||||||||
- OBJECT | ||||||||
- MOVE_PACKAGE | ||||||||
- TRANSACTION_OBJECT - input and output objects for given transactions | ||||||||
- TRANSACTION | ||||||||
|
||||||||
|
||||||||
> Note: The following rust structs currently do not have DB schemas prepared: | ||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nit: For further emphasis (see also GH alerts)
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for the tip. Done. |
||||||||
> - DynamicFieldEntry | ||||||||
> - WrappedObjectEntry | ||||||||
|
||||||||
**Architecture** | ||||||||
---------------- | ||||||||
|
||||||||
When running the indexer, one needs to specify object type that would be extracted from checkpoints and uploaded to the cloud. | ||||||||
|
||||||||
The following object types are supported: | ||||||||
- Checkpoint | ||||||||
- Object | ||||||||
- Transaction | ||||||||
- TransactionObjects | ||||||||
- Event | ||||||||
- MoveCall | ||||||||
- MovePackage | ||||||||
- DynamicField | ||||||||
- WrappedObject | ||||||||
|
||||||||
Only one object type can be passed in given run, to process multiple object types it is needed to run multiple analytics indexer instances. | ||||||||
|
||||||||
In general, the data flow is as follows: | ||||||||
|
||||||||
* Checkpoints are read via JSON RPC using reused code from `iota_data_ingestion_core`. | ||||||||
* Checkpoints are processed by an appropriate handler (e.g. `EventHandler`), which extracts relevant objects from each transaction of the checkpoint. | ||||||||
* Objects are passed to the Writer, which writes the objects to a local temporary store in CSV or Parquet format. | ||||||||
* The `AnalyticsProcessor` syncs the objects from the local store to the remote store (S3/GCS/Azure, or also local, for testing purposes). | ||||||||
* Every 5 minutes the last processed checkpoint ID is fetched from BigQuery/Snowflake and reported as a metric. | ||||||||
|
||||||||
**Note:** It is assumed that data from the big object store will be readable from BigQuery/Snowflake automatically, the indexer is not putting the data in BigQuery/Snowflake tables explicitly. | ||||||||
|
||||||||
Here is a graph summarizing the data flow: | ||||||||
|
||||||||
```mermaid | ||||||||
--- | ||||||||
config: | ||||||||
look: handDrawn | ||||||||
theme: neutral | ||||||||
--- | ||||||||
flowchart TD | ||||||||
FNODE["Fullnode/Indexer"] <-->|JSON RPC| CPREADER["`IndexerExecutor/CheckpointReader from the **iota_data_ingestion_core** package`"]; | ||||||||
subgraph "`**iota-analytics-indexer**`" | ||||||||
CPREADER -->|"`Executor calls **AnalyticsProcessor** for each checkpoint, which in turn passes the checkpoint to appropriate Handler`"| HANDLER["CheckpointHandler/EventHandler etc., depending on indexer configuration"] | ||||||||
HANDLER -->|"`**AnalyticsProcessor** reads object data extracted from the checkpoint by the Handler and passes it to the Writer`"| WRITER["CSVWriter/ParquetWriter"] | ||||||||
WRITER -->|Writes objects to temporary local storage| DISK[Temporary Local Storage] | ||||||||
DISK --> REMOTESYNC["`Task inside of **AnalyticsProcessor** that removes files from Local Storage and uploads them to Remote Storage(S3/GCS/Azure)`"] | ||||||||
WRITER -->|"`Once every few checkpoints, **AnalyticsProcessor** calls cut() to prepare file to be sent, FileMetadata is sent to the Remote Sync Task which triggers the sync`"| REMOTESYNC | ||||||||
REMOTESYNC -->|Some process outside of analytics indexer makes the newly uploaded data available via BigQuery/Snowflake tables| BQSF["BigQuery/Snowflake"] | ||||||||
BQSF -->|"Every 5 minutes max processed checkpoint number is read from the tables"| METRICS[Analytics Indexer Prometheus Metrics] | ||||||||
end | ||||||||
|
||||||||
linkStyle 6 color:red; | ||||||||
``` | ||||||||
|
||||||||
**Metrics** | ||||||||
----------- | ||||||||
|
||||||||
The following Prometheus metrics are served by `iota-analytics-indexer` to monitor the indexer execution: | ||||||||
|
||||||||
- **total_received**: count of checkpoints processed in given run | ||||||||
- **last_uploaded_checkpoint**: id of last checkpoint uploaded to the big object store | ||||||||
- **max_checkpoint_on_store**: id of last checkpoint available via BigQuery/Snowflake tables |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Phrasing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done