Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for ingesting via precomputed TxMeta in Horizon #4911

Closed
3 of 5 tasks
tamirms opened this issue Jun 15, 2023 · 6 comments · Fixed by #5374
Closed
3 of 5 tasks

Add support for ingesting via precomputed TxMeta in Horizon #4911

tamirms opened this issue Jun 15, 2023 · 6 comments · Fixed by #5374
Assignees
Labels
cdp-horizon-scrum horizon performance issues aimed at improving performance

Comments

@tamirms
Copy link
Contributor

tamirms commented Jun 15, 2023

Horizon currently has two modes of ingestion:

  1. The deprecated DatabaseBackend which extracts ledgers from Stellar Core's postgres DB
  2. Captive Core

We need to add a third mode which is ingesting from the precomputed TxMeta backend. This will require adding new LedgerBackend implementation and configuration flags for Horizon that allow operators to select the new ledger backend.

Initial scope is to enable precomputed for reingestion only, not live/forward ingestion. We will implement the following changes to the Horizon reingest command to support the new mode of ingestion:

  • When using precomputed TxMeta we may need to adjust the minimum batch size and we may not need to round the batch sizes to multiples of 64. - deferring this optimization for initial pass.
  • The default value for the parallel-job-size parameter needs to be reduced. (experimentally I found 100 to be the most efficient on production hw).
  • add command line / env flags to allow horizon to toggle between ingestion via captive core and ingestion via precomputed Tx Meta for just reingestion commands.
  • add command line / env flags for horizon to configure the BufferedStorageBackend, i.e. gcs bucket, schema aspects like ledgers-per-file, etc
  • update reingestion integration tests so we exercise reingestion via both captive core and the BufferedStorageBackend

As part of scope considerations, we will not address the following on first deliverable, and these can be done in later iterations:

  • cli command flag validations related to captive core vs. the new pre-computed, several captive core settings are global.
  • updating any ops deployed environments or ci jobs(jenkins) that run horizon db reingest
@tamirms tamirms moved this from Backlog to Next Sprint Proposal in Platform Scrum Jun 15, 2023
@mollykarcher mollykarcher added performance issues aimed at improving performance and removed snapshots labels Jun 15, 2023
@mollykarcher mollykarcher moved this from Next Sprint Proposal to Backlog in Platform Scrum Jun 15, 2023
@mollykarcher mollykarcher moved this from Backlog to Next Sprint Proposal in Platform Scrum Jun 22, 2023
@mollykarcher
Copy link
Contributor

1 (DatabaseBackend) will be removed on completion of #4855

@tamirms
Copy link
Contributor Author

tamirms commented May 31, 2024

We have added a ledger backend implementation which will read precomputed Tx Meta from a data lake:

https://github.com/stellar/go/blob/master/ingest/ledgerbackend/buffered_storage_backend.go

The work remaining to complete this issue is:

  • add command line / env flags to allow horizon to toggle between ingestion via captive core and ingestion via precomputed Tx Meta
  • add command line / env flags for horizon to configure the BufferedStorageBackend
  • update reingestion integration tests so we exercise reingestion via both captive core and the BufferedStorageBackend

@mollykarcher mollykarcher moved this from Backlog to To Do in Platform Scrum Jun 3, 2024
@mollykarcher mollykarcher added this to the platform sprint 47 milestone Jun 3, 2024
@sydneynotthecity
Copy link

sydneynotthecity commented Jun 4, 2024

We should make sure that we are streamlining the flags passed that specify captive-core vs data lake source system ingestion so that operators cannot configure invalid variants. The flags should be simple and easy for the operator to understand.

We should also add a flag warning if an operator decides to run real time ingestion using the datastore ledgerbackend. Running off a datastore will introduce a lag and might be too slow to work properly with synchronous transaction submission.

@sydneynotthecity
Copy link

out of scope: productionalizing the code so that Horizon can fetch files from the GCS bucket. This will require coordination with ops

@urvisavla urvisavla self-assigned this Jun 26, 2024
@sreuland sreuland moved this from To Do to In Progress in Platform Scrum Jul 2, 2024
@urvisavla
Copy link
Contributor

urvisavla commented Jul 2, 2024

We should also add a flag warning if an operator decides to run real time ingestion using the datastore ledgerbackend. Running off a datastore will introduce a lag and might be too slow to work properly with synchronous transaction submission.

How about adding an option only to horizon db reingest|fill-gaps that will allow switching between captive-core and precomputed txmeta thereby limiting the use of precomputed txmeta to reingestion only?

@sreuland
Copy link
Contributor

sreuland commented Jul 3, 2024

@urvisavla , I think your suggestion could be woven in as a toggle for live and reingest, I've updated the acceptance sub-tasks in the description of ticket to capture. please re-write if not on target. I can work one of tasks in parallel, let me know, thanks.

sreuland added a commit to sreuland/go that referenced this issue Jul 12, 2024
sreuland added a commit to sreuland/go that referenced this issue Jul 12, 2024
sreuland added a commit to sreuland/go that referenced this issue Jul 12, 2024
sreuland added a commit to sreuland/go that referenced this issue Jul 15, 2024
sreuland added a commit to sreuland/go that referenced this issue Jul 15, 2024
@sreuland sreuland moved this from In Progress to Needs Review in Platform Scrum Jul 15, 2024
sreuland added a commit that referenced this issue Jul 16, 2024
sreuland added a commit that referenced this issue Jul 16, 2024
sreuland added a commit to sreuland/go that referenced this issue Jul 17, 2024
sreuland added a commit that referenced this issue Jul 17, 2024
…ds for DataStore, BufferedStorageBackend, review feedback
sreuland added a commit that referenced this issue Jul 17, 2024
sreuland added a commit that referenced this issue Jul 18, 2024
…xec error, make sure close db conn regardless during test shutdown
sreuland added a commit that referenced this issue Jul 18, 2024
sreuland added a commit that referenced this issue Jul 18, 2024
…me parameters_test cases due to ingest bug stated in comments
@github-project-automation github-project-automation bot moved this from Needs Review to Done in Platform Scrum Jul 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cdp-horizon-scrum horizon performance issues aimed at improving performance
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

5 participants