Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

exp/services/ledgerexporter: Guide to installing and running ledger exporter #5355

Merged
merged 2 commits into from
Jun 27, 2024

Conversation

urvisavla
Copy link
Contributor

@urvisavla urvisavla commented Jun 20, 2024

PR Checklist

PR Structure

  • This PR has reasonably narrow scope (if not, break it down into smaller PRs).
  • This PR avoids mixing refactoring changes with feature changes (split into two PRs
    otherwise).
  • This PR's title starts with name of package that is most changed in the PR, ex.
    services/friendbot, or all or doc if the changes are broad or impact many
    packages.

Thoroughness

  • This PR adds tests for the most critical parts of the new functionality or fixes.
  • I've updated any docs (developer docs, .md
    files, etc... affected by this change). Take a look in the docs folder for a given service,
    like this one.

Release planning

  • I've updated the relevant CHANGELOG (here for Horizon) if
    needed with deprecations, added features, breaking changes, and DB schema changes.
  • I've decided if this PR requires a new major/minor version according to
    semver, or if it's mainly a patch change. The PR is targeted at the next
    release branch if it's not a patch change.

What

This PR is part of hubble-382 which includes adding instructions on how to run the ledger exporter and explaining the command line options to the public docs. Currently, I am adding the installation and running guide to the README with the goal of keeping all necessary documentation for setting up and running the ledger exporter close to the code.

Additional docs, such as a developer guide is included in a separate PR. There will also be a guide for creating consumer apps using the data exported by the ledger exporter. Once all the documentation is in place, we can discuss and decide what information should be included in the public documentation and what can remain in the github repo.

Why

HUBBLE-382

Known limitations

N/A

@urvisavla urvisavla force-pushed the lexie-docs-update branch 3 times, most recently from eed32fb to 2f8b0ee Compare June 20, 2024 21:24
@urvisavla urvisavla changed the title exp/services/ledgerexporter: Updated README with step by step guide to installing and running ledger exporter exp/services/ledgerexporter: Step by step guide to installing and running ledger exporter Jun 20, 2024
@urvisavla urvisavla force-pushed the lexie-docs-update branch 4 times, most recently from 9613b99 to 53214a8 Compare June 25, 2024 06:40
@urvisavla urvisavla marked this pull request as ready for review June 25, 2024 06:40
@urvisavla urvisavla changed the title exp/services/ledgerexporter: Step by step guide to installing and running ledger exporter exp/services/ledgerexporter: Guide to installing and running ledger exporter Jun 25, 2024
@urvisavla urvisavla requested a review from a team June 25, 2024 17:21
Copy link
Contributor

@tamirms tamirms left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the CLI section I like how you have added sections describing the usage and arguments for each command. However, I think the descriptions for the fill-gaps and append commands would be better left as is. There is a lot of detail in the command descriptions that @sreuland wrote in #5335 that appears to be edited away in this PR.

For example, the original description for append mode mentions the precondition for resumability:

This feature requires ledgers to be present on the remote data store for some (possibly empty) prefix of the requested range and then absent for the (possibly empty) remainder.

I think it's important we keep this info in the readme because that is the only place where we communicate the prerequisite

@urvisavla urvisavla force-pushed the lexie-docs-update branch from e81bcf9 to cae76e0 Compare June 26, 2024 05:34
@urvisavla
Copy link
Contributor Author

urvisavla commented Jun 26, 2024

In the CLI section I like how you have added sections describing the usage and arguments for each command. However, I think the descriptions for the fill-gaps and append commands would be better left as is. There is a lot of detail in the command descriptions that @sreuland wrote in #5335 that appears to be edited away in this PR.

For example, the original description for append mode mentions the precondition for resumability:

This feature requires ledgers to be present on the remote data store for some (possibly empty) prefix of the requested range and then absent for the (possibly empty) remainder.

I think it's important we keep this info in the readme because that is the only place where we communicate the prerequisite

You're right, It does leave out the following points for the append mode because and mainly because I was unsure of its meaning:

- This feature requires ledgers to be present on the remote data store for some (possibly empty) prefix of the requested range and then absent for the (possibly empty) remainder.
What does this precondition mean? Does it mean that append mode cannot be used if there is no pre-existing data?

- It’s guaranteed that ledgers exported during appendmode fromstartand up to the last logged ledger fileUploaded {ledger file name} were contiguous, meaning all ledgers within that range were exported to the data lake with no gaps or missing ledgers in between.
Regarding the phrase “and up to the last logged ledger file Uploaded {ledger file name},” does this refer to the console logs? Also, if there are no gaps then why is the scan-and-fill command required?

It might be more useful to provide guidelines on when to use each mode for the user but I am not entirely clear on them myself. Perhaps you or @sreuland could provide more insights into this? Thanks!

@urvisavla urvisavla force-pushed the lexie-docs-update branch from cae76e0 to b918d6f Compare June 26, 2024 06:09
@tamirms
Copy link
Contributor

tamirms commented Jun 26, 2024

What does this precondition mean? Does it mean that append mode cannot be used if there is no pre-existing data?

Running append on the range [s, e] (where s <= e) will efficiently export all the ledgers in the range by skipping ahead to the last exported ledger in the range. For example if the range is [2, 100] and we have already exported [2, 50], append will skip ahead to ledger 51 and start the export process from that point.

However, append is only guaranteed to export all the ledgers in the requested range if the data lake satisfies the following precondition:

This feature requires ledgers to be present on the remote data store for some (possibly empty) prefix of the requested range and then absent for the (possibly empty) remainder.

I mentioned an example above where the requested range is [2, 100] and we have already exported [2, 50]. In this example, the already exported prefix is [2, 50] and the absent remainder is [51, 100], thus the precondition is satisfied.

Here are some other scenarios where the precondition is satisfied:

  • the data lake is empty. The already exported prefix is empty because the data lake has no ledgers. The absent remainder is [2, 100] because all the ledgers from the requested range are missing from the data lake.
  • the data lake already contains ledgers [2, 100]. The already exported prefix is the entire requested range [2, 100]. The absent remainder is empty because all the ledgers have already been exported.

Here are some scenarios where the precondition is not satisfied:

  • the data lake contains ledgers [10, 70]. [10, 70] is not a prefix of [2, 100]. A prefix of [2, 100] must start at ledger 2.
  • the data lake contains ledgers [2, 70] and [95, 100]. [2, 70] is a prefix of [2, 100]. However, all the ledgers following the prefix must be absent in order for the precondition to be satisfied ("...and then absent for the (possibly empty) remainder"). The remainder [71, 100] is not absent because the data lake contains ledgers [95, 100].

Because the precondition is not satisfied in the 2 cases above, we cannot guarantee that append will export all the ledgers in the requested [2, 100] range. However, scan-and-fill will be able to guarantee that all the ledgers in the requested [2, 100] range will be exported because it will always iterate through every ledger in the requested range checking to see if each one is present.


Regarding the phrase “and up to the last logged ledger file Uploaded {ledger file name},” does this refer to the console logs? Also, if there are no gaps then why is the scan-and-fill command required?

That part of the documentation is essentially stating that, given a requested ledger range [s, e], if the append precondition was satisfied on [s, e] before running append --start s --end e, it is guaranteed that [s, e] will still satisfy the append precondition after running append no matter how ledger exporter terminates. In other words, even if append crashes unexpectedly at any point during the export process, it will never introduce any gaps that would violate the append precondition.

Also, if there are no gaps then why is the scan-and-fill command required?

This property of append only guarantees that the append precondition is maintained if the append precondition was satisfied before running append. There are many scenarios where scan-and-fill is not needed. But it's still possible to modify a data lake in such a way that the append precondition is not satisfied. For example:

  • Manually deleting a prefix of [s, e] ledgers from the data lake would break the precondition.
  • Manually deleting ledger(s) in the middle of [s, e] would break the precondition.
  • If you run append on [2, 100] and then on [150, 200], we know that the precondition is satisfied on [2, 100] and [150, 200]. However, if you consider the range [2, 200] the precondition is not satisfied because there is going to be a gap in the middle.

@urvisavla urvisavla force-pushed the lexie-docs-update branch from b918d6f to 2dae44b Compare June 26, 2024 20:02
@urvisavla
Copy link
Contributor Author

What does this precondition mean? Does it mean that append mode cannot be used if there is no pre-existing data?

Running append on the range [s, e] (where s <= e) will efficiently export all the ledgers in the range by skipping ahead to the last exported ledger in the range. For example if the range is [2, 100] and we have already exported [2, 50], append will skip ahead to ledger 51 and start the export process from that point.

However, append is only guaranteed to export all the ledgers in the requested range if the data lake satisfies the following precondition:

This feature requires ledgers to be present on the remote data store for some (possibly empty) prefix of the requested range and then absent for the (possibly empty) remainder.

I mentioned an example above where the requested range is [2, 100] and we have already exported [2, 50]. In this example, the already exported prefix is [2, 50] and the absent remainder is [51, 100], thus the precondition is satisfied.

Here are some other scenarios where the precondition is satisfied:

  • the data lake is empty. The already exported prefix is empty because the data lake has no ledgers. The absent remainder is [2, 100] because all the ledgers from the requested range are missing from the data lake.
  • the data lake already contains ledgers [2, 100]. The already exported prefix is the entire requested range [2, 100]. The absent remainder is empty because all the ledgers have already been exported.

Here are some scenarios where the precondition is not satisfied:

  • the data lake contains ledgers [10, 70]. [10, 70] is not a prefix of [2, 100]. A prefix of [2, 100] must start at ledger 2.
  • the data lake contains ledgers [2, 70] and [95, 100]. [2, 70] is a prefix of [2, 100]. However, all the ledgers following the prefix must be absent in order for the precondition to be satisfied ("...and then absent for the (possibly empty) remainder"). The remainder [71, 100] is not absent because the data lake contains ledgers [95, 100].

Because the precondition is not satisfied in the 2 cases above, we cannot guarantee that append will export all the ledgers in the requested [2, 100] range. However, scan-and-fill will be able to guarantee that all the ledgers in the requested [2, 100] range will be exported because it will always iterate through every ledger in the requested range checking to see if each one is present.

Regarding the phrase “and up to the last logged ledger file Uploaded {ledger file name},” does this refer to the console logs? Also, if there are no gaps then why is the scan-and-fill command required?

That part of the documentation is essentially stating that, given a requested ledger range [s, e], if the append precondition was satisfied on [s, e] before running append --start s --end e, it is guaranteed that [s, e] will still satisfy the append precondition after running append no matter how ledger exporter terminates. In other words, even if append crashes unexpectedly at any point during the export process, it will never introduce any gaps that would violate the append precondition.

Also, if there are no gaps then why is the scan-and-fill command required?

This property of append only guarantees that the append precondition is maintained if the append precondition was satisfied before running append. There are many scenarios where scan-and-fill is not needed. But it's still possible to modify a data lake in such a way that the append precondition is not satisfied. For example:

  • Manually deleting a prefix of [s, e] ledgers from the data lake would break the precondition.
  • Manually deleting ledger(s) in the middle of [s, e] would break the precondition.
  • If you run append on [2, 100] and then on [150, 200], we know that the precondition is satisfied on [2, 100] and [150, 200]. However, if you consider the range [2, 200] the precondition is not satisfied because there is going to be a gap in the middle.

Oh, so the precondition is not necessarily for running in append mode; it's a precondition for guaranteeing an export without gaps. The phrase "This feature requires ledgers to be present" confused me into thinking that unless the condition is met, you cannot run in append mode. Now it makes sense. Thanks for explaining it in depth 🙏 I'll add the original description back.

@urvisavla urvisavla force-pushed the lexie-docs-update branch from 7d772d7 to bbf04f8 Compare June 27, 2024 16:28
@urvisavla urvisavla merged commit 6ebfa53 into stellar:master Jun 27, 2024
23 checks passed
@urvisavla urvisavla deleted the lexie-docs-update branch June 27, 2024 16:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants