diff --git a/README.md b/README.md
index 1501fa3c..544608ff 100644
--- a/README.md
+++ b/README.md
@@ -1,210 +1,123 @@
+# **Stellar ETL**
-# Stellar ETL
The Stellar-ETL is a data pipeline that allows users to extract data from the history of the Stellar network.
## **Table of Contents**
- [Exporting the Ledger Chain](#exporting-the-ledger-chain)
- - [Command Reference](#command-reference)
- - [Bucket List Commands](#bucket-list-commands)
- - [export_accounts](#export_accounts)
- - [export_offers](#export_offers)
- - [export_trustlines](#export_trustlines)
- - [export_claimable_balances](#export_claimable_balances)
- - [export_pools](#export_pools)
- - [export_signers](#export_signers)
- - [export_contract_data (futurenet, testnet)](#export_contract_data)
- - [export_contract_code (futurenet, testnet)](#export_contract_code)
- - [export_config_settings (futurenet, testnet)](#export_config_settings)
- - [export_ttl (futurenet, testnet)](#export_ttl)
- - [History Archive Commands](#history-archive-commands)
- - [export_ledgers](#export_ledgers)
- - [export_transactions](#export_transactions)
- - [export_operations](#export_operations)
- - [export_effects](#export_effects)
- - [export_assets](#export_assets)
- - [export_trades](#export_trades)
- - [export_diagnostic_events (futurenet, testnet)](#export_diagnostic_events)
- - [Stellar Core Commands](#stellar-core-commands)
- - [export_ledger_entry_changes](#export_ledger_entry_changes)
- - [export_orderbooks (unsupported)](#export_orderbooks-unsupported)
- - [Utility Commands](#utility-commands)
- - [get_ledger_range_from_times](#get_ledger_range_from_times)
+- [Command Reference](#command-reference)
+ - [Export Commands](#export-commands)
+ - [export_ledgers](#export_ledgers)
+ - [export_transactions](#export_transactions)
+ - [export_operations](#export_operations)
+ - [export_effects](#export_effects)
+ - [export_assets](#export_assets)
+ - [export_trades](#export_trades)
+ - [export_diagnostic_events](#export_diagnostic_events)
+ - [export_ledger_entry_changes](#export_ledger_entry_changes)
+ - [Utility Commands](#utility-commands)
+ - [get_ledger_range_from_times](#get_ledger_range_from_times)
- [Schemas](#schemas)
- [Extensions](#extensions)
- [Adding New Commands](#adding-new-commands)
-
+
+---
-# Exporting the Ledger Chain
+# **Exporting the Ledger Chain**
## **Docker**
+
1. Download the latest version of docker [Docker](https://www.docker.com/get-started)
-2. Pull the stellar-etl Docker image: `docker pull stellar/stellar-etl`
-3. Run the Docker images with the desired stellar-etl command: `docker run stellar/stellar-etl stellar-etl [etl-command] [etl-command arguments]`
+2. Pull the latest stellar-etl Docker image: `docker pull stellar/stellar-etl:latest`
+3. Run the Docker images with the desired stellar-etl command: `docker run stellar/stellar-etl:latest stellar-etl [etl-command] [etl-command arguments]`
## **Manual Installation**
-1. Install Golang v1.19.0 or later: https://golang.org/dl/
+1. Install Golang v1.22.1 or later: https://golang.org/dl/
2. Ensure that your Go bin has been added to the PATH env variable: `export PATH=$PATH:$(go env GOPATH)/bin`
-3. Download and install Stellar-Core v19.0.0 or later: https://github.com/stellar/stellar-core/blob/master/INSTALL.md
-
-4. Run `go get github.com/stellar/stellar-etl` to install the ETL
-
+3. If using captive-core, download and install Stellar-Core v20.0.0 or later: https://github.com/stellar/stellar-core/blob/master/INSTALL.md
+4. Run `go install github.com/stellar/stellar-etl@latest` to install the ETL
5. Run export commands to export information about the legder
-## **Command Reference**
-- [Bucket List Commands](#bucket-list-commands)
- - [export_accounts](#export_accounts)
- - [export_offers](#export_offers)
- - [export_trustlines](#export_trustlines)
- - [export_claimable_balances](#export_claimable_balances)
- - [export_pools](#export_pools)
- - [export_signers](#export_signers)
- - [export_contract_data](#export_contract_data)
- - [export_contract_code](#export_contract_code)
- - [export_config_settings](#export_config_settings)
- - [export_ttl](#export_ttl)
-- [History Archive Commands](#history-archive-commands)
- - [export_ledgers](#export_ledgers)
- - [export_transactions](#export_transactions)
- - [export_operations](#export_operations)
- - [export_effects](#export_effects)
- - [export_assets](#export_assets)
- - [export_trades](#export_trades)
- - [export_diagnostic_events](#export_diagnostic_events)
- - [Stellar Core Commands](#stellar-core-commands)
- - [export_orderbooks (unsupported)](#export_orderbooks-unsupported)
- - [Utility Commands](#utility-commands)
- - [get_ledger_range_from_times](#get_ledger_range_from_times)
-
-Every command accepts a `-h` parameter, which provides a help screen containing information about the command, its usage, and its flags.
-
-Commands have the option to read from testnet with the `--testnet` flag, from futurenet with the `--futurenet` flag, and defaults to reading from mainnet without any flags.
-> *_NOTE:_* Adding both flags will default to testnet. Each stellar-etl command can only run from one network at a time.
-
-
-
-***
-
-## **Bucket List Commands**
-
-These commands use the bucket list in order to ingest large amounts of data from the history of the stellar ledger. If you are trying to read large amounts of information in order to catch up to the current state of the ledger, these commands provide a good way to catchup quickly. However, they don't allow for custom start-ledger values. For updating within a user-defined range, see the Stellar Core commands.
-
-> *_NOTE:_* In order to get information within a specified ledger range for bucket list commands, see the export_ledger_entry_changes command.
-
-
-
-### **export_accounts**
-
-```bash
-> stellar-etl export_accounts --end-ledger 500000 --output exported_accounts.txt
-```
-
-Exports historical account data from the genesis ledger to the provided end-ledger to an output file. The command reads from the bucket list, which includes the full history of the Stellar ledger. As a result, it should be used in an initial data dump. In order to get account information within a specified ledger range, see the export_ledger_entry_changes command.
-
-
-
-### **export_offers**
+## **Manual build for local development**
-```bash
-> stellar-etl export_offers --end-ledger 500000 --output exported_offers.txt
-```
+1. Clone this repo `git clone https://github.com/stellar/stellar-etl`
+2. Build stellar-etl with `go build`
+3. Run export commands to export information about the legder
-Exports historical offer data from the genesis ledger to the provided end-ledger to an output file. The command reads from the bucket list, which includes the full history of the Stellar ledger. As a result, it should be used in an initial data dump. In order to get offer information within a specified ledger range, see the export_ledger_entry_changes command.
-
-
+> _*Note:*_ If using the GCS datastore, you can run the following to set GCP credentials to use in your shell
-### **export_trustlines**
-
-```bash
-> stellar-etl export_trustlines --end-ledger 500000 --output exported_trustlines.txt
```
-
-Exports historical trustline data from the genesis ledger to the provided end-ledger to an output file. The command reads from the bucket list, which includes the full history of the Stellar ledger. As a result, it should be used in an initial data dump. In order to get trustline information within a specified ledger range, see the export_ledger_entry_changes command.
-
-
-
-### **export_claimable_balances**
-
-```bash
-> stellar-etl export_claimable_balances --end-ledger 500000 --output exported_claimable_balances.txt
+gcloud auth login
+gcloud config set project dev-hubble
+gcloud auth application-default login
```
-Exports claimable balances data from the genesis ledger to the provided end-ledger to an output file. The command reads from the bucket list, which includes the full history of the Stellar ledger. As a result, it should be used in an initial data dump. In order to get claimable balances information within a specified ledger range, see the export_ledger_entry_changes command.
+> _*Note:*_ Instructions for installing gcloud can be found [here](https://cloud.google.com/sdk/docs/install-sdk)
-### **export_pools**
+---
-```bash
-> stellar-etl export_pools --end-ledger 500000 --output exported_pools.txt
-```
-
-Exports historical liquidity pools data from the genesis ledger to the provided end-ledger to an output file. The command reads from the bucket list, which includes the full history of the Stellar ledger. As a result, it should be used in an initial data dump. In order to get liquidity pools information within a specified ledger range, see the export_ledger_entry_changes command.
-
-
-
-### **export_signers**
-
-```bash
-> stellar-etl export_signers --end-ledger 500000 --output exported_signers.txt
-```
-
-Exports historical account signers data from the genesis ledger to the provided end-ledger to an output file. The command reads from the bucket list, which includes the full history of the Stellar ledger. As a result, it should be used in an initial data dump. In order to get account signers information within a specified ledger range, see the export_ledger_entry_changes command.
+# **Command Reference**
-
-
-### **export_contract_data**
+- [Export Commands](#export-commands)
+ - [export_ledgers](#export_ledgers)
+ - [export_transactions](#export_transactions)
+ - [export_operations](#export_operations)
+ - [export_effects](#export_effects)
+ - [export_assets](#export_assets)
+ - [export_trades](#export_trades)
+ - [export_diagnostic_events](#export_diagnostic_events)
+ - [export_ledger_entry_changes](#export_ledger_entry_changes)
+- [Utility Commands](#utility-commands)
+ - [get_ledger_range_from_times](#get_ledger_range_from_times)
-```bash
-> stellar-etl export_contract_data --end-ledger 500000 --output export_contract_data.txt
-```
-
-Exports historical contract data data from the genesis ledger to the provided end-ledger to an output file. The command reads from the bucket list, which includes the full history of the Stellar ledger. As a result, it should be used in an initial data dump. In order to get contract data information within a specified ledger range, see the export_ledger_entry_changes command.
-
-
-
-### **export_contract_code**
+Every command accepts a `-h` parameter, which provides a help screen containing information about the command, its usage, and its flags.
-```bash
-> stellar-etl export_contract_code --end-ledger 500000 --output export_contract_code.txt
-```
+Commands have the option to read from testnet with the `--testnet` flag, from futurenet with the `--futurenet` flag, and defaults to reading from mainnet without any flags.
-Exports historical contract code data from the genesis ledger to the provided end-ledger to an output file. The command reads from the bucket list, which includes the full history of the Stellar ledger. As a result, it should be used in an initial data dump. In order to get contract code information within a specified ledger range, see the export_ledger_entry_changes command.
+> _*NOTE:*_ Adding both flags will default to testnet. Each stellar-etl command can only run from one network at a time.
-### **export_config_settings**
+---
-```bash
-> stellar-etl export_config_settings --end-ledger 500000 --output export_config_settings.txt
-```
+## **Export Commands**
-Exports historical config settings data from the genesis ledger to the provided end-ledger to an output file. The command reads from the bucket list, which includes the full history of the Stellar ledger. As a result, it should be used in an initial data dump. In order to get config settings data information within a specified ledger range, see the export_ledger_entry_changes command.
+These commands export information using the [Ledger Exporter](https://github.com/stellar/go/blob/master/exp/services/ledgerexporter/README.md) output files within a specified datastore (currently [datastore](https://github.com/stellar/go/tree/master/support/datastore) only supports GCS). This allows users to provide a start and end ledger range. The commands in this category export a list of everything that occurred within the provided range. All of the ranges are inclusive.
-
+> _*NOTE:*_ The datastore must contain the expected compressed LedgerCloseMetaBatch XDR binary files as exported from [Ledger Exporter](https://github.com/stellar/go/blob/master/exp/services/ledgerexporter/README.md#exported-files).
-### **export_ttl**
+#### Common Flags
-```bash
-> stellar-etl export_ttl --end-ledger 500000 --output export_ttl.txt
-```
+| Flag | Description | Default |
+| -------------- | --------------------------------------------------------------------------------------------- | ----------------------- |
+| start-ledger | The ledger sequence number for the beginning of the export period. Defaults to genesis ledger | 2 |
+| end-ledger | The ledger sequence number for the end of the export range | 0 |
+| strict-export | If set, transform errors will be fatal | false |
+| testnet | If set, will connect to Testnet instead of Pubnet | false |
+| futurenet | If set, will connect to Futurenet instead of Pubnet | false |
+| extra-fields | Additional fields to append to output jsons. Used for appending metadata | --- |
+| captive-core | If set, run captive core to retrieve data. Otherwise use TxMeta file datastore | false |
+| datastore-path | Datastore bucket path to read txmeta files from | ledger-exporter/ledgers |
+| buffer-size | Buffer size sets the max limit for the number of txmeta files that can be held in memory | 1000 |
+| num-workers | Number of workers to spawn that read txmeta files from the datastore | 5 |
+| retry-limit | Datastore GetLedger retry limit | 3 |
+| retry-wait | Time in seconds to wait for GetLedger retry | 5 |
-Exports historical expiration data from the genesis ledger to the provided end-ledger to an output file. The command reads from the bucket list, which includes the full history of the Stellar ledger. As a result, it should be used in an initial data dump. In order to get expiration information within a specified ledger range, see the export_ledger_entry_changes command.
+> _*NOTE:*_ Using captive-core requires a Stellar Core instance that is v20.0.0 or later. The commands use the Core instance to retrieve information about changes from the ledger. More information about the Stellar ledger information can be found [here](https://developers.stellar.org/network/horizon/api-reference/resources).
+>
As the Stellar network grows, the Stellar Core instance has to catch up on an increasingly large amount of information. This catch-up process can add some overhead to the commands in this category. In order to avoid this overhead, run prefer processing larger ranges instead of many small ones, or use unbounded mode.
+>
Recommended resources for running captive-core within a KubernetesPod:
+> ```
+> {cpu: 3.5, memory: 20Gi, ephemeral-storage: 12Gi}
+> ```
-***
-
-## **History Archive Commands**
-
-These commands export information using the history archives. This allows users to provide a start and end ledger range. The commands in this category export a list of everything that occurred within the provided range. All of the ranges are inclusive.
-
-> *_NOTE:_* Commands except `export_ledgers` and `export_assets` also require Captive Core to export data.
-
-
+---
### **export_ledgers**
@@ -213,10 +126,12 @@ These commands export information using the history archives. This allows users
--end-ledger 500000 --output exported_ledgers.txt
```
-This command exports ledgers within the provided range.
+This command exports ledgers within the provided range.
+---
+
### **export_transactions**
```bash
@@ -228,6 +143,8 @@ This command exports transactions within the provided range.
+---
+
### **export_operations**
```bash
@@ -239,6 +156,8 @@ This command exports operations within the provided range.
+---
+
### **export_effects**
```bash
@@ -250,7 +169,10 @@ This command exports effects within the provided range.
+---
+
### **export_assets**
+
```bash
> stellar-etl export_assets \
--start-ledger 1000 \
@@ -261,7 +183,10 @@ Exports the assets that are created from payment operations over a specified led
+---
+
### **export_trades**
+
```bash
> stellar-etl export_trades \
--start-ledger 1000 \
@@ -272,7 +197,10 @@ Exports trade data within the specified range to an output file
+---
+
### **export_diagnostic_events**
+
```bash
> stellar-etl export_diagnostic_events \
--start-ledger 1000 \
@@ -283,15 +211,7 @@ Exports diagnostic events data within the specified range to an output file
-***
-
-## **Stellar Core Commands**
-
-These commands require a Stellar Core instance that is v19.0.0 or later. The commands use the Core instance to retrieve information about changes from the ledger. These changes can be in the form of accounts, offers, trustlines, claimable balances, liquidity pools, or account signers.
-
-As the Stellar network grows, the Stellar Core instance has to catch up on an increasingly large amount of information. This catch-up process can add some overhead to the commands in this category. In order to avoid this overhead, run prefer processing larger ranges instead of many small ones, or use unbounded mode.
-
-
+---
### **export_ledger_entry_changes**
@@ -302,82 +222,79 @@ As the Stellar network grows, the Stellar Core instance has to catch up on an in
This command exports ledger changes within the provided ledger range. Flags can filter which ledger entry types are exported. If no data type flags are set, then by default all types are exported. If any are set, it is assumed that the others should not be exported.
-Changes are exported in batches of a size defined by the `batch-size` flag. By default, the batch-size parameter is set to 64 ledgers, which corresponds to a five minute period of time. This batch size is convenient because checkpoint ledgers are created every 64 ledgers. Checkpoint ledgers act as anchoring points for the nodes on the network, so it is beneficial to export in multiples of 64.
+Changes are exported in batches of a size defined by the `--batch-size` flag. By default, the batch-size parameter is set to 64 ledgers, which corresponds to a five minute period of time. This batch size is convenient because checkpoint ledgers are created every 64 ledgers. Checkpoint ledgers act as anchoring points for the nodes on the network, so it is beneficial to export in multiples of 64.
This command has two modes: bounded and unbounded.
#### **Bounded**
- If both a start and end ledger are provided, then the command runs in a bounded mode. This means that once all the ledgers in the range are processed and exported, the command shuts down.
-
-#### **Unbounded**
-If only a start ledger is provided, then the command runs in an unbounded fashion starting from the provided ledger. In this mode, the Stellar Core connects to the Stellar network and processes new changes as they occur on the network. Since the changes are continually exported in batches, this process can be continually run in the background in order to avoid the overhead of closing and starting new Stellar Core instances.
-
-
-
-### **export_orderbooks (unsupported)**
-
-```bash
-> stellar-etl export_orderbooks --start-ledger 1000 \
---end-ledger 500000 --output exported_orderbooks_folder/
-```
-> *_NOTE:_* This is an expermental feature and is currently unsupported.
+If both a start and end ledger are provided, then the command runs in a bounded mode. This means that once all the ledgers in the range are processed and exported, the command shuts down.
-This command exports orderbooks within the provided ledger range. Since exporting complete orderbooks at every single ledger would require an excessive amount of storage space, the output is normalized. Each batch that is exported contains multiple files, namely: `dimAccounts.txt`, `dimOffers.txt`, `dimMarkets.txt`, and `factEvents.txt`. The dim files relate a data structure to an ID. `dimMarkets`, for example, contains the buying and selling assets of a market, as well as the ID for that market. That ID is used in other places as a replacement for the full market information. This normalization process saves a significant amount of space (roughly 90% in our benchmarks). The `factEvents` file connects ledger numbers to the offer IDs that were present at that ledger.
+#### **Unbounded (Currently Unsupported)**
-Orderbooks are exported in batches of a size defined by the `batch-size` flag. By default, the batch-size parameter is set to 64 ledgers, which corresponds to a five minute period of time. This batch size is convenient because checkpoint ledgers are created every 64 ledgers. Checkpoint ledgers act as anchoring points in that once they are available, so are the previous 63 nodes. It is beneficial to export in multiples of 64.
+If only a start ledger is provided, then the command runs in an unbounded fashion starting from the provided ledger. In this mode, stellar-etl will block and wait for the next sequentially written ledger file in the datastore. Since the changes are continually exported in batches, this process can be continually run in the background in order to avoid the overhead of closing and starting new stellar-etl instances.
-This command has two modes: bounded and unbounded.
+The following are the ledger entry type flags that can be used to export data:
-#### **Bounded**
- If both a start and end ledger are provided, then the command runs in a bounded mode. This means that once all the ledgers in the range are processed and exported, the command shuts down.
-
-#### **Unbounded**
-If only a start ledger is provided, then the command runs in an unbounded fashion starting from the provided ledger. In this mode, the Stellar Core connects to the Stellar network and processes new orderbooks as they occur on the network. Since the changes are continually exported in batches, this process can be continually run in the background in order to avoid the overhead of closing and starting new Stellar Core instances.
+- export-accounts
+- export-trustlines
+- export-offers
+- export-pools
+- export-balances
+- export-contract-code
+- export-contract-data
+- export-config-settings
+- export-ttl
-***
+---
## **Utility Commands**
+These commands aid in the usage of [Export Commands](#export-commands).
+
### **get_ledger_range_from_times**
+
```bash
> stellar-etl get_ledger_range_from_times \
--start-time 2019-09-13T23:00:00+00:00 \
--end-time 2019-09-14T13:35:10+00:00 --output exported_range.txt
```
-This command exports takes in a start and end time and converts it to a ledger range. The ledger range that is returned will be the smallest possible ledger range that completely covers the provided time period.
+This command takes in a start and end time and converts it to a ledger range. The ledger range that is returned will be the smallest possible ledger range that completely covers the provided time period.
-
+
+---
# Schemas
See https://github.com/stellar/stellar-etl/blob/master/internal/transform/schema.go for the schemas of the data structures that are outputted by the ETL.
-
+---
+
# Extensions
+
This section covers some possible extensions or further work that can be done.
## **Adding New Commands**
+
In general, in order to add new commands, you need to add these files:
- - `export_new_data_structure.go` in the `cmd` folder
- - This file can be generated with cobra by calling: `cobra add {command}`
- - This file will parse flags, create output files, get the transformed data from the input package, and then export the data.
- - `export_new_data_structure_test.go` in the `cmd` folder
- - This file will contain some tests for the newly added command. The `runCLI` function does most of the heavy lifting. All the tests need is the command arguments to test and the desired output.
- - Test data should be stored in the `testdata/new_data_structure` folder
- - `new_data_structure.go` in the `internal/input` folder
- - This file will contain the methods needed to extract the new data structure from wherever it is located. This may be the history archives, the bucket list, or a captive core instance.
- - This file should extract the data and transform it, and return the transformed data.
- - If working with captive core, the methods need to work in the background. There should be methods that export batches of data and send them to a channel. There should be other methods that read from the channel and transform the data so it can be exported.
+- `export_new_data_structure.go` in the `cmd` folder
+ - This file can be generated with cobra by calling: `cobra add {command}`
+ - This file will parse flags, create output files, get the transformed data from the input package, and then export the data.
+- `export_new_data_structure_test.go` in the `cmd` folder
+ - This file will contain some tests for the newly added command. The `runCLI` function does most of the heavy lifting. All the tests need is the command arguments to test and the desired output.
+ - Test data should be stored in the `testdata/new_data_structure` folder
+- `new_data_structure.go` in the `internal/input` folder
+ - This file will contain the methods needed to extract the new data structure from wherever it is located. This may be the history archives, the bucket list, a captive core instance, or a datastore.
+ - If working with captive core, the methods need to work in the background. There should be methods that export batches of data and send them to a channel. There should be other methods that read from the channel and transform the data so it can be exported.
- `new_data_structure.go` in the `internal/transform` folder
- - This file will contain the methods needed to transform the extracted data into a form that is suitable for BigQuery.
- - The struct definition for the transformed object should be stored in `schemas.go` in the `internal/transform` folder.
+ - This file will contain the methods needed to transform the extracted data into a form that is suitable for BigQuery.
+ - The struct definition for the transformed object should be stored in `schemas.go` in the `internal/transform` folder.
A good number of common methods are already written and stored in the `util` package.