Skip to content

Commit

Permalink
exp/services/ledgerexporter: Updated README with step by step guide t…
Browse files Browse the repository at this point in the history
…o installing and running ledger exporter
  • Loading branch information
urvisavla committed Jun 20, 2024
1 parent 100dc4f commit 5b8e984
Show file tree
Hide file tree
Showing 5 changed files with 207 additions and 81 deletions.
221 changes: 152 additions & 69 deletions exp/services/ledgerexporter/README.md
Original file line number Diff line number Diff line change
@@ -1,101 +1,184 @@
# Ledger Exporter (Work in Progress)
## Ledger Exporter: Installation and Usage Guide

The Ledger Exporter is a tool designed to export ledger data from a Stellar network and upload it to a specified destination. It supports both bounded and unbounded modes, allowing users to export a specific range of ledgers or continuously export new ledgers as they arrive on the network.
This guide provides step-by-step instructions on installing and using the Ledger Exporter, a tool that helps you export Stellar network ledger data to a Google Cloud Storage (GCS) bucket for efficient analysis and storage.

Ledger Exporter currently uses captive-core as the ledger backend and GCS as the destination data store.

# Exported Data Format
The tool allows for the export of multiple ledgers in a single exported file. The exported data is in XDR format and is compressed using zstd before being uploaded.
**Table of Contents**

```go
type LedgerCloseMetaBatch struct {
StartSequence uint32
EndSequence uint32
LedgerCloseMetas []LedgerCloseMeta
}
```
* [Prerequisites](#prerequisites)
* [Installation Steps](#installation-steps)
* [Set Up GCP Credentials](#set-up-gcp-credentials)
* [Create a GCS Bucket](#create-a-gcs-bucket)
* [Configuration](#configuration)
* [Create a Configuration File (`config.toml`)](#create-a-configuration-file-configtoml)
* [Running the Ledger Exporter](#running-the-exporter)
* [Pull the Docker Image](#pull-the-docker-image)
* [Run the Exporter](#run-the-exporter)
* [CLI Commands](#cli-commands)
* [scan-and-fill](#1-scan-and-fill)
* [append](#2-append)

## Getting Started
## Prerequisites

### Installation (coming soon)
* **Google Cloud Platform (GCP) Account:** You'll need a GCP account to create a GCS bucket for storing the exported data.
* **Docker:** Allows you to run the Ledger Exporter in a self-contained environment. The official installation guide: [https://docs.docker.com/engine/install/](https://docs.docker.com/engine/install/)

### Command Line Options
## Installation Steps

#### Scan and Fill Mode:
Exports a specific range of ledgers, defined by --start and --end. Will only export to remote datastore if data is absent.
```bash
ledgerexporter scan-and-fill --start <start_ledger> --end <end_ledger> --config-file <config_file_path>
```
### Set Up GCP Credentials

#### Append Mode:
Exports ledgers initially searching from --start, looking for the next absent ledger sequence number proceeding --start on the data store. If abscence is detected, the export range is narrowed to `--start <absent_ledger_sequence>`.
This feature requires ledgers to be present on the remote data store for some (possibly empty) prefix of the requested range and then absent for the (possibly empty) remainder.
Create application default credentials for your Google Cloud Platform (GCP) project by following these steps:
1. Download the [SDK](https://cloud.google.com/sdk/docs/install).
2. Install and initialize the [gcloud CLI](https://cloud.google.com/sdk/docs/initializing).
3. Create [application authentication credentials](https://cloud.google.com/docs/authentication/provide-credentials-adc#google-idp) and store it in a secure location on your system, such as $HOME/.config/gcloud/application_default_credentials.json.

In this mode, the --end ledger can be provided to stop the process once export has reached that ledger, or if absent or 0 it will result in continous exporting of new ledgers emitted from the network.
For detailed instructions, refer to the [Providing Credentials for Application Default Credentials (ADC) guide.](https://cloud.google.com/docs/authentication/provide-credentials-adc)

It’s guaranteed that ledgers exported during `append` mode from `start` and up to the last logged ledger file `Uploaded {ledger file name}` were contiguous, meaning all ledgers within that range were exported to the data lake with no gaps or missing ledgers in between.
```bash
ledgerexporter append --start <start_ledger> --config-file <config_file_path>
```
### Create a GCS Bucket

### Configuration (toml):
The `stellar_core_config` supports two ways for configuring captive core:
- use prebuilt captive core config toml, archive urls, and passphrase based on `stellar_core_config.network = testnet|pubnet`.
- manually set the the captive core confg by supplying these core parameters which will override any defaults when `stellar_core_config.network` is present also:
`stellar_core_config.captive_core_toml_path`
`stellar_core_config.history_archive_urls`
`stellar_core_config.network_passphrase`
1. Go to the GCP Console's Storage section ([https://console.cloud.google.com/storage](https://console.cloud.google.com/storage)) and create a new bucket.
2. Choose a descriptive name for the bucket, such as `stellar-ledger-data`.
3. **Note down the bucket name** as you'll need it later in the configuration process.

Ensure you have stellar-core installed and set `stellar_core_config.stellar_core_binary_path` to it's path on o/s.
## Configuration

Enable web service that will be bound to localhost post and publishes metrics by including `admin_port = {port}`
### Create a Configuration File (`config.toml`)

The configuration file specifies details about your GCS bucket, stellar network and other settings.

Replace the placeholder values in the sample file with your specific information:

<details>
<summary> Sample TOML Configuration (config.toml) </summary>

An example config, demonstrating preconfigured captive core settings and gcs data store config.
```toml
# Admin port configuration
# Specifies the port number for hosting the web service locally to publish metrics.
admin_port = 6061

[datastore_config]
# Datastore Configuration
[datastore]
# Specifies the type of datastore. Currently, only Google Cloud Storage (GCS) is supported.
type = "GCS"

[datastore_config.params]
destination_bucket_path = "your-bucket-name/<optional_subpath1>/<optional_subpath2>/"
[datastore.parameters]
# The Google Cloud Storage bucket path for storing data, with optional subpaths for organization.
bucket_path = "your-bucket-name/<optional_subpath1>/<optional_subpath2>/"

[datastore.schema]
# Configuration for ledger and file storage.
ledgers_per_file = 64 # Number of ledgers stored in each file.
files_per_partition = 10 # Number of files per partition directory.

[datastore_config.schema]
ledgers_per_file = 64
files_per_partition = 10
# Stellar-core Configuration
[stellar_core]
# Use default captive-core config based on network
# Options are "testnet" for the test network or "pubnet" for the public network.
network = "testnet"

[stellar_core_config]
network = "testnet"
stellar_core_binary_path = "/my/path/to/stellar-core"
captive_core_toml_path = "my-captive-core.cfg"
history_archive_urls = ["http://testarchiveurl1", "http://testarchiveurl2"]
network_passphrase = "test"
# Alternatively, you can manually configure captive-core parameters (overrides defaults if 'network' is set).

# Path to the captive-core configuration file.
#captive_core_config_path = "my-captive-core.cfg"

# URLs for Stellar history archives, with multiple URLs allowed.
#history_archive_urls = ["http://testarchiveurl1", "http://testarchiveurl2"]

# Network passphrase for the Stellar network.
#network_passphrase = "Test SDF Network ; September 2015"

# Path to stellar-core binary
# Not required when running in a Docker container as it has the stellar-core installed and path is set.
# When running outside of Docker, it will look for stellar-core in the OS path if it exists.
#stellar_core_binary_path = "/my/path/to/stellar-core
```
</details>

### Exported Files
## Running the Ledger Exporter

#### File Organization:
- Ledgers are grouped into files, with the number of ledgers per file set by `ledgers_per_file`.
- Files are further organized into partitions, with the number of files per partition set by `files_per_partition`.
### Pull the Docker Image

### Filename Structure:
- Filenames indicate the ledger range they contain, e.g., `0-63.xdr.zstd` holds ledgers 0 to 63.
- Partition directories group files, e.g., `/0-639/` holds files for ledgers 0 to 639.
Open a terminal window and run the following command to download the Stellar Ledger Exporter Docker image:

#### Example:
with `ledgers_per_file = 64` and `files_per_partition = 10`:
- Partition names: `/0-639`, `/640-1279`, ...
- Filenames: `/0-639/0-63.xdr.zstd`, `/0-639/64-127.xdr.zstd`, ...
```bash
docker pull stellar/ledger-exporter
```

### Run the Ledger Exporter

#### Special Cases:
The following command demonstrates how to run the Ledger Exporter:

- If `ledgers_per_file` is set to 1, filenames will only contain the ledger number.
- If `files_per_partition` is set to 1, filenames will not contain the partition.
```bash
docker run --platform linux/amd64 -d \
-v "$HOME/.config/gcloud/application_default_credentials.json":/.config/gcp/credentials.json:ro \
-e GOOGLE_APPLICATION_CREDENTIALS=/.config/gcp/credentials.json \
-v ${PWD}/config.toml:/config.toml \
stellar/ledger-exporter <command> [options]
```

#### Note:
- Avoid changing `ledgers_per_file` and `files_per_partition` after configuration for consistency.
**Explanation:**

#### Retrieving Data:
- To locate a specific ledger sequence, calculate the partition name and ledger file name using `files_per_partition` and `ledgers_per_file`.
- The `GetObjectKeyFromSequenceNumber` function automates this calculation.
* `--platform linux/amd64`: Specifies the platform architecture (adjust if needed for your system).
* `-d`: Runs the container in detached mode (background process).
* `-v`: Mounts volumes to map your local GCP credentials and config.toml file to the container:
* `$HOME/.config/gcloud/application_default_credentials.json`: Your local GCP credentials file.
* `${PWD}/config.toml`: Your local configuration file.
* `-e GOOGLE_APPLICATION_CREDENTIALS=/.config/gcp/credentials.json`: Sets the environment variable for credentials within the container.
* `stellar/ledger-exporter`: The Docker image name.
* `<command>`: The Stellar Ledger Exporter command (e.g., [scan-and-fill](#1-scan-and-fill), [append](#2-append))

## CLI Commands

The Stellar Ledger Exporter offers two primary commands to manage ledger data export:

### 1. scan-and-fill

**Purpose:**
Exports a specific range of Stellar ledgers, defined by the `--start` and `--end` options.

**Behavior:**
- Scans the specified ledger sequence range.
- Exports only missing ledgers to the remote datastore (GCS bucket).
- Avoids unnecessary exports if data is already present.

**Usage:**

```bash
docker run --platform linux/amd64 -d \
-v "$HOME/.config/gcloud/application_default_credentials.json":/.config/gcp/credentials.json:ro \
-e GOOGLE_APPLICATION_CREDENTIALS=/.config/gcp/credentials.json \
-v ${PWD}/config.toml:/config.toml \
stellar/ledger-exporter \
scan-and-fill --start <start_ledger> --end <end_ledger> [--config <config_file>]
```

Arguments:
- `--start <start_ledger>` (required): The starting ledger sequence number in the range to export.
- `--end <end_ledger>` (required): The ending ledger sequence number in the range.
- `--config <config_file_path>` (optional): The path to your configuration file, containing details like GCS bucket information. Defaults to `config.toml` in the runtime working directory.

### 2. append

**Purpose:**
Exports ledgers starting from `--start`, searching for the next missing ledger sequence number in the datastore. If a missing ledger is found, the export begins from that missing ledger.

**Behavior:**
- Starts searching from the provided `--start` ledger and identifies the first missing ledger sequence number after `--start` in the remote datastore (GCS bucket).
- Narrows the export range to include only missing ledgers from that point onwards.
- If the `--end` ledger is provided, it will stop the process once export has reached that ledger. If the `--end` ledger is absent or set to 0, the exporter will continuously export new ledgers as they appear on the network.

**Usage:**

```bash
docker run --platform linux/amd64 -d \
-v "$HOME/.config/gcloud/application_default_credentials.json":/.config/gcp/credentials.json:ro \
-e GOOGLE_APPLICATION_CREDENTIALS=/.config/gcp/credentials.json \
-v ${PWD}/config.toml:/config.toml \
stellar/ledger-exporter \
append --start <start_ledger> [--end <end_ledger>] [--config <config_file>]
```

Arguments:
- `--start <start_ledger>` (required): The starting ledger sequence number for the export process.
- `--end <end_ledger>` (optional): The ending ledger sequence number. If omitted or set to 0, the exporter will continuously export new ledgers as they appear on the network.
- `--config <config_file_path>` (optional): The path to your configuration file, containing details like GCS bucket information. Defaults to `config.toml` in the runtime working directory.
44 changes: 44 additions & 0 deletions exp/services/ledgerexporter/config.example.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@

# Sample TOML Configuration

# Admin port configuration
# Specifies the port number for hosting the web service locally to publish metrics.
admin_port = 6061

# Datastore Configuration
[datastore]
# Specifies the type of datastore. Currently, only Google Cloud Storage (GCS) is supported.
type = "GCS"

[datastore.parameters]
# The Google Cloud Storage bucket path for storing data, with optional subpaths for organization.
bucket_path = "your-bucket-name/<optional_subpath1>/<optional_subpath2>/"

[datastore.schema]
# Configuration for ledger and file storage.
ledgers_per_file = 64 # Number of ledgers stored in each file.
files_per_partition = 10 # Number of files per partition directory.

# Stellar-core Configuration
[stellar_core]
# Use default captive-core config based on network
# Options are "testnet" for the test network or "pubnet" for the public network.
network = "testnet"

# Alternatively, you can manually configure captive-core parameters (overrides defaults if 'network' is set).

# Path to the captive-core configuration file.
#captive_core_config_path = "my-captive-core.cfg"

# URLs for Stellar history archives, with multiple URLs allowed.
#history_archive_urls = ["http://testarchiveurl1", "http://testarchiveurl2"]

# Network passphrase for the Stellar network.
#network_passphrase = "Test SDF Network ; September 2015"

# Path to stellar-core binary
# Not required when running in a Docker container as it has the stellar-core installed and path is set.
# When running outside of Docker, it will look for stellar-core in the OS path if it exists.
# If you want to override the path, you can do so here.
#stellar_core_binary_path = "/my/path/to/stellar-core"

1 change: 0 additions & 1 deletion exp/services/ledgerexporter/config.toml
Original file line number Diff line number Diff line change
Expand Up @@ -10,5 +10,4 @@ files_per_partition = 64000

[stellar_core_config]
network = "testnet"
stellar_core_binary_path = "/usr/local/bin/stellar-core"

8 changes: 4 additions & 4 deletions exp/services/ledgerexporter/internal/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ func defineCommands() {
RunE: func(cmd *cobra.Command, args []string) error {
settings := bindCliParameters(cmd.PersistentFlags().Lookup("start"),
cmd.PersistentFlags().Lookup("end"),
cmd.PersistentFlags().Lookup("config-file"),
cmd.PersistentFlags().Lookup("config"),
)
settings.Mode = ScanFill
return ledgerExporterCmdRunner(settings)
Expand All @@ -52,7 +52,7 @@ func defineCommands() {
RunE: func(cmd *cobra.Command, args []string) error {
settings := bindCliParameters(cmd.PersistentFlags().Lookup("start"),
cmd.PersistentFlags().Lookup("end"),
cmd.PersistentFlags().Lookup("config-file"),
cmd.PersistentFlags().Lookup("config"),
)
settings.Mode = Append
return ledgerExporterCmdRunner(settings)
Expand All @@ -64,14 +64,14 @@ func defineCommands() {

scanAndFillCmd.PersistentFlags().Uint32P("start", "s", 0, "Starting ledger (inclusive), must be set to a value greater than 1")
scanAndFillCmd.PersistentFlags().Uint32P("end", "e", 0, "Ending ledger (inclusive), must be set to value greater than 'start' and less than the network's current ledger")
scanAndFillCmd.PersistentFlags().String("config-file", "config.toml", "Path to the TOML config file. Defaults to 'config.toml' on runtime working directory path.")
scanAndFillCmd.PersistentFlags().String("config", "config.toml", "Path to the TOML config file. Defaults to 'config.toml' on runtime working directory path.")
viper.BindPFlags(scanAndFillCmd.PersistentFlags())

appendCmd.PersistentFlags().Uint32P("start", "s", 0, "Starting ledger (inclusive), must be set to a value greater than 1")
appendCmd.PersistentFlags().Uint32P("end", "e", 0, "Ending ledger (inclusive), optional, setting to non-zero means bounded mode, "+
"only export ledgers from 'start' up to 'end' value which must be greater than 'start' and less than the network's current ledger. "+
"If 'end' is absent or '0' means unbounded mode, exporter will continue to run indefintely and export the latest closed ledgers from network as they are generated in real time.")
appendCmd.PersistentFlags().String("config-file", "config.toml", "Path to the TOML config file. Defaults to 'config.toml' on runtime working directory path.")
appendCmd.PersistentFlags().String("config", "config.toml", "Path to the TOML config file. Defaults to 'config.toml' on runtime working directory path.")
viper.BindPFlags(appendCmd.PersistentFlags())
}

Expand Down
14 changes: 7 additions & 7 deletions exp/services/ledgerexporter/internal/main_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -29,12 +29,12 @@ func TestFlagsOutput(t *testing.T) {
}{
{
name: "no sub-command",
commandArgs: []string{"--start", "4", "--end", "5", "--config-file", "myfile"},
commandArgs: []string{"--start", "4", "--end", "5", "--config", "myfile"},
expectedErrOutput: "Error: ",
},
{
name: "append sub-command with start and end present",
commandArgs: []string{"append", "--start", "4", "--end", "5", "--config-file", "myfile"},
commandArgs: []string{"append", "--start", "4", "--end", "5", "--config", "myfile"},
expectedErrOutput: "",
appRunner: appRunnerSuccess,
expectedSettings: RuntimeSettings{
Expand All @@ -46,7 +46,7 @@ func TestFlagsOutput(t *testing.T) {
},
{
name: "append sub-command with start and end absent",
commandArgs: []string{"append", "--config-file", "myfile"},
commandArgs: []string{"append", "--config", "myfile"},
expectedErrOutput: "",
appRunner: appRunnerSuccess,
expectedSettings: RuntimeSettings{
Expand All @@ -58,13 +58,13 @@ func TestFlagsOutput(t *testing.T) {
},
{
name: "append sub-command prints app error",
commandArgs: []string{"append", "--start", "4", "--end", "5", "--config-file", "myfile"},
commandArgs: []string{"append", "--start", "4", "--end", "5", "--config", "myfile"},
expectedErrOutput: "test error",
appRunner: appRunnerError,
},
{
name: "scanfill sub-command with start and end present",
commandArgs: []string{"scan-and-fill", "--start", "4", "--end", "5", "--config-file", "myfile"},
commandArgs: []string{"scan-and-fill", "--start", "4", "--end", "5", "--config", "myfile"},
expectedErrOutput: "",
appRunner: appRunnerSuccess,
expectedSettings: RuntimeSettings{
Expand All @@ -76,7 +76,7 @@ func TestFlagsOutput(t *testing.T) {
},
{
name: "scanfill sub-command with start and end absent",
commandArgs: []string{"scan-and-fill", "--config-file", "myfile"},
commandArgs: []string{"scan-and-fill", "--config", "myfile"},
expectedErrOutput: "",
appRunner: appRunnerSuccess,
expectedSettings: RuntimeSettings{
Expand All @@ -88,7 +88,7 @@ func TestFlagsOutput(t *testing.T) {
},
{
name: "scanfill sub-command prints app error",
commandArgs: []string{"scan-and-fill", "--start", "4", "--end", "5", "--config-file", "myfile"},
commandArgs: []string{"scan-and-fill", "--start", "4", "--end", "5", "--config", "myfile"},
expectedErrOutput: "test error",
appRunner: appRunnerError,
},
Expand Down

0 comments on commit 5b8e984

Please sign in to comment.