Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Testground tooling #2456

Merged
merged 90 commits into from
Dec 8, 2023
Merged
Show file tree
Hide file tree
Changes from 82 commits
Commits
Show all changes
90 commits
Select commit Hold shift + click to select a range
a008844
feat: add genesis test package for creating testnet genesises
evan-forbes Sep 9, 2023
6e7cc58
feat!: use a specific genesis creation package
evan-forbes Sep 10, 2023
86d492b
chore: cleanup
evan-forbes Sep 10, 2023
25d60cd
fix: update the default propose timeout
evan-forbes Sep 10, 2023
f4c8718
chore: cleanup
evan-forbes Sep 10, 2023
47e1a80
chore: linter
evan-forbes Sep 10, 2023
a383494
refactor!: change names to appease the linter
evan-forbes Sep 10, 2023
39b306a
Apply suggestions from code review
evan-forbes Sep 10, 2023
e855a67
fix: ci can't handle the club right now
evan-forbes Sep 10, 2023
e16be41
fix: ci can't handle the club right now
evan-forbes Sep 10, 2023
e8b4926
Merge branch 'evan/multi-validator-genesis-creation' of https://githu…
evan-forbes Sep 10, 2023
eda02ec
feat: flesh out celestia-app's testground tests
evan-forbes Sep 11, 2023
e2de741
feat: finish first draft of testground tooling
evan-forbes Sep 12, 2023
83f0f76
chore: cleanup
evan-forbes Sep 12, 2023
db8d2f2
fix: go mod by importing specific version of app
evan-forbes Sep 12, 2023
84f3980
chore: update to use three validators
evan-forbes Sep 12, 2023
96fb988
chore: update docker file
evan-forbes Sep 12, 2023
d39e44b
fix: pass halt height
evan-forbes Sep 13, 2023
0ca3996
chore: more fixes
evan-forbes Sep 13, 2023
cf50bb4
feat: sync statuses before beginning each test
evan-forbes Sep 13, 2023
d8e2fed
chore: deubbing
evan-forbes Sep 14, 2023
195383e
chore: cleanup and switch back mnemonic
evan-forbes Sep 14, 2023
cc6950e
chore: cleanup
evan-forbes Sep 14, 2023
5949133
chore: finally working
evan-forbes Sep 14, 2023
ef17302
chore: cleanup
evan-forbes Sep 14, 2023
42b578c
feat: add ability for leader to issue commands
evan-forbes Sep 15, 2023
507059a
Merge branch 'evan/testground-test' of https://github.com/celestiaorg…
evan-forbes Sep 15, 2023
baf266e
chore: add more params
evan-forbes Sep 15, 2023
6bc03cd
feat: record the blocks in the leader's retro
evan-forbes Sep 17, 2023
0b4bf7a
chore: some debugging
evan-forbes Sep 18, 2023
c0ab032
chore!: move to util
evan-forbes Sep 18, 2023
ce19605
Merge branch 'main' into evan/multi-validator-genesis-creation
evan-forbes Sep 18, 2023
71d6a8d
Merge branch 'main' into evan/testground-test
evan-forbes Sep 19, 2023
fa1952c
Merge branch 'evan/multi-validator-genesis-creation' into evan/testgr…
evan-forbes Sep 19, 2023
aadd52e
wip:
evan-forbes Sep 20, 2023
bc943c0
finish refactor
evan-forbes Sep 23, 2023
e4dbbf6
fix: some debugging and increase to 5 nodes using txsim
evan-forbes Sep 25, 2023
e1ccf1c
fix: debug for larger validator sets and set the default to 8MB
evan-forbes Sep 25, 2023
e397795
feat: add the testground const
evan-forbes Sep 25, 2023
4ba33a6
feat: add mempool to the config
evan-forbes Sep 25, 2023
112df3c
chore: BIG BLOCK
evan-forbes Sep 26, 2023
2a5185a
refactor: big blonk
evan-forbes Sep 26, 2023
b7a8597
chore: last update to go mod and plan
evan-forbes Sep 26, 2023
6a95382
feat: use seed nodes
evan-forbes Sep 28, 2023
b53e9b7
chore: bump to latest version of core with bigger block parts
evan-forbes Oct 3, 2023
1000798
chore: use version of celestia app with bumped core
evan-forbes Oct 3, 2023
28d3953
chore: bump core again to 256kb block parts
evan-forbes Oct 3, 2023
9d2369c
feat: add influx integration
evan-forbes Oct 3, 2023
9e4ceb7
chore: use latest verison of app
evan-forbes Oct 3, 2023
b03b38e
chore: latest changes to get seeds and introspection working
evan-forbes Oct 9, 2023
cb5414f
chore: bump to the latest version of tendermint with channel tracing
evan-forbes Oct 11, 2023
0ff161f
chore: update to latest core
evan-forbes Oct 11, 2023
d79f274
chore: use an older version without tracing and 1MB block parts
evan-forbes Oct 12, 2023
2022867
chore: update to 2MB block part
evan-forbes Oct 13, 2023
bb24860
Merge branch 'main' into evan/testground-test
evan-forbes Nov 10, 2023
de36333
chore: leftover merge conficts
evan-forbes Nov 10, 2023
41b960c
chore: stop modifying the square size test for hube blocks
evan-forbes Nov 10, 2023
06309a7
chore: incorporate review feedback
evan-forbes Nov 10, 2023
796eb34
refactor: use the genesis function to create the genesis
evan-forbes Nov 10, 2023
1cb7536
docs: add flow chart
evan-forbes Nov 10, 2023
e608781
refactor: minor name change
evan-forbes Nov 12, 2023
6305141
chore!: remove redundant mechanisms to submit pfbs
evan-forbes Nov 14, 2023
4a39b27
fix: get the network running how we expect
evan-forbes Nov 16, 2023
db0fdfd
chore: final touches
evan-forbes Nov 17, 2023
3a5fcdd
fix: coderabbit suggestions
evan-forbes Nov 20, 2023
5e3a71e
fix: more suggestions
evan-forbes Nov 20, 2023
1fce736
fix: remaining suggestions
evan-forbes Nov 20, 2023
c4a0261
fix: use v1 for latest version
evan-forbes Nov 20, 2023
64641d9
chore: remove some code
evan-forbes Nov 21, 2023
5fb00a8
fix: set the protocol version during integration tests
evan-forbes Nov 21, 2023
22e7107
fix: set the protocol version to the testground one
evan-forbes Nov 21, 2023
c680cf1
Merge branch 'evan/testground-test' of https://github.com/celestiaorg…
evan-forbes Nov 21, 2023
3cf0697
docs: clarify testnode usage in godoc
evan-forbes Dec 6, 2023
261214a
docs: add go docs to funcs
evan-forbes Dec 6, 2023
4fc5f8f
docs: typo
evan-forbes Dec 6, 2023
3104b5c
Merge branch 'main' into evan/testground-test
evan-forbes Dec 6, 2023
1c6c20a
chore: update the plan.toml to use "celestia" instead of core/app
evan-forbes Dec 6, 2023
07b0a18
chore: remove setting the app version via an app optin
evan-forbes Dec 6, 2023
651b85c
fix: rename versions appropriately
evan-forbes Dec 6, 2023
6d02f50
chore: update deps
evan-forbes Dec 6, 2023
d33b898
Merge branch 'main' into evan/testground-test
evan-forbes Dec 6, 2023
b10c6e5
fix: compiler
evan-forbes Dec 6, 2023
1b7ab87
docs: add info about testground in the instructions
evan-forbes Dec 6, 2023
d698845
fix: pin to ledger-cosmos-go v0.12.4 to fix ledger
rootulp Dec 6, 2023
93cb5d4
chore: reviewer feedback
evan-forbes Dec 8, 2023
39b36da
chore: sanity checking ci
evan-forbes Dec 8, 2023
3d69fc0
Merge branch 'main' into evan/testground-test
evan-forbes Dec 8, 2023
50bc61e
fix: test
evan-forbes Dec 8, 2023
ecbfb5e
fix: update the expected version
evan-forbes Dec 8, 2023
3b8d83f
fix: expected version
evan-forbes Dec 8, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions go.work
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
go 1.21.1

use (
.
./test/testground
)
evan-forbes marked this conversation as resolved.
Show resolved Hide resolved
909 changes: 909 additions & 0 deletions go.work.sum

Large diffs are not rendered by default.

7 changes: 7 additions & 0 deletions pkg/appconsts/testground/app_consts.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
package testground

const (
Version uint64 = 420420420
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] Version could use a GoDoc that explains why this number looks crazy

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resolved in 93cb5d4

SquareSizeUpperBound int = 512
SubtreeRootThreshold int = 64
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[question] related to https://github.com/celestiaorg/celestia-app/pull/2456/files#r1417502424

why define this constant if it is never used?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resolved in 93cb5d4

we should probably do the same for v2

)
11 changes: 9 additions & 2 deletions pkg/appconsts/versioned_consts.go
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
package appconsts

import (
"github.com/celestiaorg/celestia-app/pkg/appconsts/testground"
v1 "github.com/celestiaorg/celestia-app/pkg/appconsts/v1"
v2 "github.com/celestiaorg/celestia-app/pkg/appconsts/v2"
)
Expand All @@ -24,8 +25,14 @@ func SubtreeRootThreshold(_ uint64) int {
// SquareSizeUpperBound is the maximum original square width possible
// for a version of the state machine. The maximum is decided through
// governance. See `DefaultGovMaxSquareSize`.
func SquareSizeUpperBound(_ uint64) int {
return v1.SquareSizeUpperBound
func SquareSizeUpperBound(v uint64) int {
switch v {
case testground.Version:
return testground.SquareSizeUpperBound
// There is currently only a single square size upper bound.
default:
return v1.SquareSizeUpperBound
}
Comment on lines +33 to +35
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[question] Why doesn't this have case statements for each version?

	case testground.Version:
		return testground.SquareSizeUpperBound
	case v1.Version:
		return v1.SquareSizeUpperBound
	case v2.Version:
		return v2.SquareSizeUpperBound

even if v1 and v2 have the same constants, this seems more readable and future proof in case we update the v2 constant prior to cutting the v2 release

Copy link
Member Author

@evan-forbes evan-forbes Dec 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually disagree with this, as with each version, we'd have to go change each if statement across the entire codebase. besides being a bit annoying, there's no way to tell if we missed one or are being incosistent places. imo, just saying default reads as all versions use the same constant, which feels more concise then having to read each version and compare

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO this thread shouldn't block this PR and instead we should create a new issue for this.

Go doesn't have exhaustive checking (like Rust) but I don't understand why silently defaulting to an old version is preferable over a panic. I see the options as:

Current

func SquareSizeUpperBound(v uint64) int {
	switch v {
	case testground.Version:
		return testground.SquareSizeUpperBound
	// There is currently only a single square size upper bound.
	default:
		return v1.SquareSizeUpperBound
	}

Proposal

func SquareSizeUpperBound(v uint64) int {
	switch v {
	case testground.Version:
		return testground.SquareSizeUpperBound
	case v1.Version:
		return v1.SquareSizeUpperBound
	case v2.Version:
		return v2.SquareSizeUpperBound
	default:
		panic(fmt.Sprintf("unsupported version %v", v))
	}
}

ref: https://rustc-dev-guide.rust-lang.org/pat-exhaustive-checking.html#pattern-and-exhaustiveness-checking

Copy link
Member Author

@evan-forbes evan-forbes Dec 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why silently defaulting to an old version is preferable over a panic.

there's a thread of two discussing this in the version PRs, so apologies if you are already aware of this point, but essentially were relying on there not being any human error to avoid a network halting bug. There's also not really any benefit, since the appversion MUST be checked when verifying the header, both by Comet and the light client.

If the appversion is not expected or the node/LC doesn't know how to handle that version, it must halt. This means that there are no scenarios where the panic would serve a positive purpose. The only time that it would get hit is when we accidentally forget to update one of the ever increasing if statements, in which case the entire network halts for no reason, even if they upgraded.

imo we definitely should not introduce a panic. I'm also of the opinion that specifying only the changes for a constant is actually more readable. If we specify each value for each version and we want to find when something changed, we have to look up each constant and manually compare, which feels weird to me. That's just a readability opinion, and won't halt the network, so I don't hold that strong of a view on that.

I think we've casually discussed this synchronously, but since this has come up multiple times, I'm happy to schedule a sync or open a discussion specifically for it if need.

}

var (
Expand Down
89 changes: 89 additions & 0 deletions test/testground/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
# BUILD_BASE_IMAGE is the base image to use for the build. It contains a rolling
# accumulation of Go build/package caches.
ARG BUILD_BASE_IMAGE=docker.io/golang:1.21.0-alpine3.18
# This Dockerfile performs a multi-stage build and RUNTIME_IMAGE is the image
# onto which to copy the resulting binary.
#
# Picking a different runtime base image from the build image allows us to
# slim down the deployable considerably.
#
# The user can override the runtime image by passing in the appropriate builder
# configuration option.
ARG RUNTIME_IMAGE=alpine:3.18

#:::
#::: BUILD CONTAINER
#:::
FROM ${BUILD_BASE_IMAGE} AS builder

# PLAN_DIR is the location containing the plan source inside the container.
ENV PLAN_DIR /plan

ENV INFLUXDB_URL=http://influxdb:8086

# SDK_DIR is the location containing the (optional) sdk source inside the container.
ENV SDK_DIR /sdk

# Delete any prior artifacts, if this is a cached image.
RUN rm -rf ${PLAN_DIR} ${SDK_DIR} /testground_dep_lists

# TESTPLAN_EXEC_PKG is the executable package of the testplan to build.
# The image will build that package only.
ARG TESTPLAN_EXEC_PKG="."

# GO_PROXY is the go proxy that will be used, or direct by default.
ARG GO_PROXY=https://proxy.golang.org

# BUILD_TAGS is either nothing, or when expanded, it expands to "-tags <comma-separated build tags>"
ARG BUILD_TAGS

# TESTPLAN_EXEC_PKG is the executable package within this test plan we want to build.
ENV TESTPLAN_EXEC_PKG ${TESTPLAN_EXEC_PKG}

# We explicitly set GOCACHE under the /go directory for more tidiness.
ENV GOCACHE /go/cache


# Copy only go.mod files and download deps, in order to leverage Docker caching.
COPY /plan/go.mod ${PLAN_DIR}/go.mod

RUN apk add gcompat

# Download deps.
RUN echo "Using go proxy: ${GO_PROXY}" \
&& cd ${PLAN_DIR} \
&& go env -w GOPROXY="${GO_PROXY}" \
&& go mod download


# Now copy the rest of the source and run the build.
COPY . /


RUN cd ${PLAN_DIR} \
&& go env -w GOPROXY="${GO_PROXY}" \
&& CGO_ENABLED=${CgoEnabled} GOOS=linux GOARCH=amd64 go build -o ${PLAN_DIR}/testplan.bin ${BUILD_TAGS} ${TESTPLAN_EXEC_PKG}

# Store module dependencies
RUN cd ${PLAN_DIR} \
&& go list -m all > /testground_dep_list

#:::
#::: (OPTIONAL) RUNTIME CONTAINER
#:::

## The 'AS runtime' token is used to parse Docker stdout to extract the build image ID to cache.
FROM ${RUNTIME_IMAGE} AS runtime
RUN apk add --no-cache bash gcompat curl
# PLAN_DIR is the location containing the plan source inside the build container.
ENV PLAN_DIR /plan


# HOME ENV is crucial for app/sdk -> remove at your OWN RISK!
ENV HOME /

COPY --from=builder /testground_dep_list /
COPY --from=builder ${PLAN_DIR}/testplan.bin /testplan

EXPOSE 9090 26657 26656 1317 26658 26660 26659 30000
ENTRYPOINT [ "/testplan"]
132 changes: 132 additions & 0 deletions test/testground/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,132 @@
# Testground Experiement Tooling

## Test Instance Communication and Experiment Flow

```go
// Role is the interface between a testground test entrypoint and the actual
// test logic. Testground creates many instances and passes each instance a
// configuration from the plan and manifest toml files. From those
// configurations a Role is created for each node, and the three methods below
// are ran in order.
type Role interface {
// Plan is the first function called in a test by each node. It is
// responsible for creating the genesis block, configuring nodes, and
// starting the network.
Plan(ctx context.Context, runenv *runtime.RunEnv, initCtx *run.InitContext) error
// Execute is the second function called in a test by each node. It is
// responsible for running any experiments. This is phase where commands are
// sent and received.
Execute(ctx context.Context, runenv *runtime.RunEnv, initCtx *run.InitContext) error
// Retro is the last function called in a test by each node. It is
// responsible for collecting any data from the node and/or running any
// retrospective tests or benchmarks.
Retro(ctx context.Context, runenv *runtime.RunEnv, initCtx *run.InitContext) error
}

var _ Role = (*Leader)(nil)

var _ Role = (*Follower)(nil)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[optional] This Go code may become stale with respect to the Role interface defined in test/testground/network.go. We may remove the code and replace with a permalink.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resolved in 93cb5d4

```

```mermaid
sequenceDiagram
participant I as Initializer Node
participant L as Leader Node
participant F1 as Follower Node 1
participant F2 as Follower Node 2
participant Fn as Follower Node N

Note over I, Fn: Testground Initialization
I->>L: Create Leader Node Instance
I->>F1: Create Follower Node 1 Instance
I->>F2: Create Follower Node 2 Instance
I->>Fn: Create Follower Node N Instance

Note over L, Fn: EntryPoint(runenv *runtime.RunEnv, initCtx *run.InitContext)

Note over L, Fn: Plan(ctx context.Context, runenv *runtime.RunEnv, initCtx *run.InitContext)
F1->>L: Send PeerPacket
F2->>L: Send PeerPacket
Fn->>L: Send PeerPacket

Note over L: Genesis Creation
L->>L: Collect GenTx

L->>F1: Send Genesis File
L->>F2: Send Genesis File
L->>Fn: Send Genesis File

Note over L: Configuration
L->>L: Configurators

L->>F1: Send Config Files
L->>F2: Send Config Files
L->>Fn: Send Config Files

Note over L, Fn: Start Network

Note over L, Fn: Execute(ctx context.Context, runenv *runtime.RunEnv, initCtx *run.InitContext)

L->>F1: Send Arbitrary Commands
L->>F2: Send Arbitrary Commands
L->>Fn: Send Arbitrary Commands

L->>F1: Send EndTest Command
L->>F2: Send EndTest Command
L->>Fn: Send EndTest Command

Note over L, Fn: Retro(ctx context.Context, runenv *runtime.RunEnv, initCtx *run.InitContext)

Note over L: Process log local data
```

## Configuring an Experiment

### Defining Topologies and Configs

Per the diagram above, the leader node initializes and modifies the configs used
by each node. This allows for arbitrary network topologies to be created.

## Implemented Experiments

### Standard

The `standard` test runs an experiment that is as close to mainnet as possible.
This is used as a base for other experiements.

## Running the Experiment

```sh
cd ./test/testground
testground plan import --from . --name celestia
evan-forbes marked this conversation as resolved.
Show resolved Hide resolved
evan-forbes marked this conversation as resolved.
Show resolved Hide resolved

# This command should be executed in the 1st terminal
testground daemon

# This command should be executed in the 2nd terminal
testground run composition -f compositions/standard/plan.toml --wait

# After the test has been completed, run this command to cleanup remaining instance resources
testground terminate --runner cluster:k8s
```

## Collecting Data

### Grafana

All metrics data is logged to a separate testground specific grafana/influx
node. To access that node, forward the ports use kubectl.

```sh
export POD_NAME=$(kubectl get pods --namespace default -l "app.kubernetes.io/name=grafana,app.kubernetes.io/instance=tg-monitoring" -o jsonpath="{.items[0].metadata.name}")

kubectl --namespace default port-forward $POD_NAME 3000

contact members of the devops team or testground admins to get the creds for accessing this node.
```

### Tracing

The tracing infrastructure in celestia-core can be used by using `tracing_nodes`
plan parameter greater than 0, along with specifying the tracing URL and tracing
token as plan parameters in the `plan.toml`.
54 changes: 54 additions & 0 deletions test/testground/compositions/standard/plan.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
[metadata]
name = "standard"
author = "core-app"

[global]
plan = "celestia"
case = "entrypoint"
total_instances = 40
builder = "docker:generic"
runner = "cluster:k8s"
disable_metrics = false

[global.run.test_params]
chain_id = "standard-x"
timeout = "25m"
halt_height = "50"
latency = "0"
bandwidth = "1Gib"
validators = "40"
topology = "seed"
pex = "true"
timeout_propose = "10s"
timeout_commit = "11s"
per_peer_bandwidth = "5Mib"
blob_sequences = "2"
blob_sizes = "130000"
blobs_per_sequence = "1"
inbound_peer_count = "20"
outbound_peer_count = "10"
gov_max_square_size = "128"
max_block_bytes = "2000000"
mempool = "v1"
broadcast_txs = "true"
tracing_nodes = "0"
tracing_token = ""
tracing_url = ""

[[groups]]
id = "validators"
builder = "docker:generic"
[groups.resources]
memory = "8Gi"
cpu = "6"
[groups.instances]
count = 40
percentage = 0.0
[groups.build_config]
build_base_image = "golang:1.21.3"
enable_go_build_cache = true
enabled = true
go_version = "1.21"
[groups.build]
[groups.run]
artifact = ""
Loading
Loading