-
Notifications
You must be signed in to change notification settings - Fork 324
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Testground tooling #2456
feat: Testground tooling #2456
Changes from 72 commits
a008844
6e7cc58
86d492b
25d60cd
f4c8718
47e1a80
a383494
39b306a
e855a67
e16be41
e8b4926
eda02ec
e2de741
83f0f76
db8d2f2
84f3980
96fb988
d39e44b
0ca3996
cf50bb4
d8e2fed
195383e
cc6950e
5949133
ef17302
42b578c
507059a
baf266e
6bc03cd
0b4bf7a
c0ab032
ce19605
71d6a8d
fa1952c
aadd52e
bc943c0
e4dbbf6
e1ccf1c
e397795
4ba33a6
112df3c
2a5185a
b7a8597
6a95382
b53e9b7
1000798
28d3953
9d2369c
9e4ceb7
b03b38e
cb5414f
0ff161f
d79f274
2022867
bb24860
de36333
41b960c
06309a7
796eb34
1cb7536
e608781
6305141
4a39b27
db0fdfd
3a5fcdd
5e3a71e
1fce736
c4a0261
64641d9
5fb00a8
22e7107
c680cf1
3cf0697
261214a
4fc5f8f
3104b5c
1c6c20a
07b0a18
651b85c
6d02f50
d33b898
b10c6e5
1b7ab87
d698845
93cb5d4
39b36da
3d69fc0
50bc61e
ecbfb5e
3b8d83f
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
go 1.21.1 | ||
|
||
use ( | ||
. | ||
./test/testground | ||
) | ||
evan-forbes marked this conversation as resolved.
Show resolved
Hide resolved
|
Large diffs are not rendered by default.
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
package v1 | ||
evan-forbes marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
const ( | ||
Version uint64 = 420420420 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. [nit] Version could use a GoDoc that explains why this number looks crazy There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. resolved in 93cb5d4 |
||
SquareSizeUpperBound int = 512 | ||
SubtreeRootThreshold int = 64 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. [question] related to https://github.com/celestiaorg/celestia-app/pull/2456/files#r1417502424 why define this constant if it is never used? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. resolved in 93cb5d4 we should probably do the same for v2 |
||
) |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,12 +1,12 @@ | ||
package appconsts | ||
|
||
import ( | ||
testground "github.com/celestiaorg/celestia-app/pkg/appconsts/testground" | ||
evan-forbes marked this conversation as resolved.
Show resolved
Hide resolved
|
||
v1 "github.com/celestiaorg/celestia-app/pkg/appconsts/v1" | ||
v2 "github.com/celestiaorg/celestia-app/pkg/appconsts/v2" | ||
) | ||
|
||
const ( | ||
LatestVersion = v2.Version | ||
LatestVersion = v1.Version | ||
evan-forbes marked this conversation as resolved.
Show resolved
Hide resolved
|
||
) | ||
|
||
// SubtreeRootThreshold works as a target upper bound for the number of subtree | ||
|
@@ -24,8 +24,13 @@ func SubtreeRootThreshold(_ uint64) int { | |
// SquareSizeUpperBound is the maximum original square width possible | ||
// for a version of the state machine. The maximum is decided through | ||
// governance. See `DefaultGovMaxSquareSize`. | ||
func SquareSizeUpperBound(_ uint64) int { | ||
return v1.SquareSizeUpperBound | ||
func SquareSizeUpperBound(v uint64) int { | ||
switch v { | ||
case testground.Version: | ||
return testground.SquareSizeUpperBound | ||
default: | ||
return v1.SquareSizeUpperBound | ||
} | ||
Comment on lines
+33
to
+35
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. [question] Why doesn't this have case statements for each version? case testground.Version:
return testground.SquareSizeUpperBound
case v1.Version:
return v1.SquareSizeUpperBound
case v2.Version:
return v2.SquareSizeUpperBound even if v1 and v2 have the same constants, this seems more readable and future proof in case we update the v2 constant prior to cutting the v2 release There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I actually disagree with this, as with each version, we'd have to go change each if statement across the entire codebase. besides being a bit annoying, there's no way to tell if we missed one or are being incosistent places. imo, just saying default reads as all versions use the same constant, which feels more concise then having to read each version and compare There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. IMO this thread shouldn't block this PR and instead we should create a new issue for this. Go doesn't have exhaustive checking (like Rust) but I don't understand why silently defaulting to an old version is preferable over a Currentfunc SquareSizeUpperBound(v uint64) int {
switch v {
case testground.Version:
return testground.SquareSizeUpperBound
// There is currently only a single square size upper bound.
default:
return v1.SquareSizeUpperBound
} Proposalfunc SquareSizeUpperBound(v uint64) int {
switch v {
case testground.Version:
return testground.SquareSizeUpperBound
case v1.Version:
return v1.SquareSizeUpperBound
case v2.Version:
return v2.SquareSizeUpperBound
default:
panic(fmt.Sprintf("unsupported version %v", v))
}
} There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
there's a thread of two discussing this in the version PRs, so apologies if you are already aware of this point, but essentially were relying on there not being any human error to avoid a network halting bug. There's also not really any benefit, since the appversion MUST be checked when verifying the header, both by Comet and the light client. If the appversion is not expected or the node/LC doesn't know how to handle that version, it must halt. This means that there are no scenarios where the panic would serve a positive purpose. The only time that it would get hit is when we accidentally forget to update one of the ever increasing if statements, in which case the entire network halts for no reason, even if they upgraded. imo we definitely should not introduce a panic. I'm also of the opinion that specifying only the changes for a constant is actually more readable. If we specify each value for each version and we want to find when something changed, we have to look up each constant and manually compare, which feels weird to me. That's just a readability opinion, and won't halt the network, so I don't hold that strong of a view on that. I think we've casually discussed this synchronously, but since this has come up multiple times, I'm happy to schedule a sync or open a discussion specifically for it if need. |
||
} | ||
|
||
var ( | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,89 @@ | ||
# BUILD_BASE_IMAGE is the base image to use for the build. It contains a rolling | ||
# accumulation of Go build/package caches. | ||
ARG BUILD_BASE_IMAGE=docker.io/golang:1.21.0-alpine3.18 | ||
# This Dockerfile performs a multi-stage build and RUNTIME_IMAGE is the image | ||
# onto which to copy the resulting binary. | ||
# | ||
# Picking a different runtime base image from the build image allows us to | ||
# slim down the deployable considerably. | ||
# | ||
# The user can override the runtime image by passing in the appropriate builder | ||
# configuration option. | ||
ARG RUNTIME_IMAGE=alpine:3.18 | ||
|
||
#::: | ||
#::: BUILD CONTAINER | ||
#::: | ||
FROM ${BUILD_BASE_IMAGE} AS builder | ||
|
||
# PLAN_DIR is the location containing the plan source inside the container. | ||
ENV PLAN_DIR /plan | ||
|
||
ENV INFLUXDB_URL=http://influxdb:8086 | ||
|
||
# SDK_DIR is the location containing the (optional) sdk source inside the container. | ||
ENV SDK_DIR /sdk | ||
|
||
# Delete any prior artifacts, if this is a cached image. | ||
RUN rm -rf ${PLAN_DIR} ${SDK_DIR} /testground_dep_lists | ||
|
||
# TESTPLAN_EXEC_PKG is the executable package of the testplan to build. | ||
# The image will build that package only. | ||
ARG TESTPLAN_EXEC_PKG="." | ||
|
||
# GO_PROXY is the go proxy that will be used, or direct by default. | ||
ARG GO_PROXY=https://proxy.golang.org | ||
|
||
# BUILD_TAGS is either nothing, or when expanded, it expands to "-tags <comma-separated build tags>" | ||
ARG BUILD_TAGS | ||
|
||
# TESTPLAN_EXEC_PKG is the executable package within this test plan we want to build. | ||
ENV TESTPLAN_EXEC_PKG ${TESTPLAN_EXEC_PKG} | ||
|
||
# We explicitly set GOCACHE under the /go directory for more tidiness. | ||
ENV GOCACHE /go/cache | ||
|
||
|
||
# Copy only go.mod files and download deps, in order to leverage Docker caching. | ||
COPY /plan/go.mod ${PLAN_DIR}/go.mod | ||
|
||
RUN apk add gcompat | ||
|
||
# Download deps. | ||
RUN echo "Using go proxy: ${GO_PROXY}" \ | ||
&& cd ${PLAN_DIR} \ | ||
&& go env -w GOPROXY="${GO_PROXY}" \ | ||
&& go mod download | ||
|
||
|
||
# Now copy the rest of the source and run the build. | ||
COPY . / | ||
|
||
|
||
RUN cd ${PLAN_DIR} \ | ||
&& go env -w GOPROXY="${GO_PROXY}" \ | ||
&& CGO_ENABLED=${CgoEnabled} GOOS=linux GOARCH=amd64 go build -o ${PLAN_DIR}/testplan.bin ${BUILD_TAGS} ${TESTPLAN_EXEC_PKG} | ||
|
||
# Store module dependencies | ||
RUN cd ${PLAN_DIR} \ | ||
&& go list -m all > /testground_dep_list | ||
|
||
#::: | ||
#::: (OPTIONAL) RUNTIME CONTAINER | ||
#::: | ||
|
||
## The 'AS runtime' token is used to parse Docker stdout to extract the build image ID to cache. | ||
FROM ${RUNTIME_IMAGE} AS runtime | ||
RUN apk add --no-cache bash gcompat curl | ||
# PLAN_DIR is the location containing the plan source inside the build container. | ||
ENV PLAN_DIR /plan | ||
|
||
|
||
# HOME ENV is crucial for app/sdk -> remove at your OWN RISK! | ||
ENV HOME / | ||
|
||
COPY --from=builder /testground_dep_list / | ||
COPY --from=builder ${PLAN_DIR}/testplan.bin /testplan | ||
|
||
EXPOSE 9090 26657 26656 1317 26658 26660 26659 30000 | ||
ENTRYPOINT [ "/testplan"] |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,132 @@ | ||
# Testground Experiement Tooling | ||
|
||
## Test Instance Communication and Experiment Flow | ||
|
||
```go | ||
// Role is the interface between a testground test entrypoint and the actual | ||
// test logic. Testground creates many instances and passes each instance a | ||
// configuration from the plan and manifest toml files. From those | ||
// configurations a Role is created for each node, and the three methods below | ||
// are ran in order. | ||
type Role interface { | ||
// Plan is the first function called in a test by each node. It is | ||
// responsible for creating the genesis block, configuring nodes, and | ||
// starting the network. | ||
Plan(ctx context.Context, runenv *runtime.RunEnv, initCtx *run.InitContext) error | ||
// Execute is the second function called in a test by each node. It is | ||
// responsible for running any experiments. This is phase where commands are | ||
// sent and received. | ||
Execute(ctx context.Context, runenv *runtime.RunEnv, initCtx *run.InitContext) error | ||
// Retro is the last function called in a test by each node. It is | ||
// responsible for collecting any data from the node and/or running any | ||
// retrospective tests or benchmarks. | ||
Retro(ctx context.Context, runenv *runtime.RunEnv, initCtx *run.InitContext) error | ||
} | ||
|
||
var _ Role = (*Leader)(nil) | ||
|
||
var _ Role = (*Follower)(nil) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. [optional] This Go code may become stale with respect to the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. resolved in 93cb5d4 |
||
``` | ||
|
||
```mermaid | ||
sequenceDiagram | ||
participant I as Initializer Node | ||
participant L as Leader Node | ||
participant F1 as Follower Node 1 | ||
participant F2 as Follower Node 2 | ||
participant Fn as Follower Node N | ||
|
||
Note over I, Fn: Testground Initialization | ||
I->>L: Create Leader Node Instance | ||
I->>F1: Create Follower Node 1 Instance | ||
I->>F2: Create Follower Node 2 Instance | ||
I->>Fn: Create Follower Node N Instance | ||
|
||
Note over L, Fn: EntryPoint(runenv *runtime.RunEnv, initCtx *run.InitContext) | ||
|
||
Note over L, Fn: Plan(ctx context.Context, runenv *runtime.RunEnv, initCtx *run.InitContext) | ||
F1->>L: Send PeerPacket | ||
F2->>L: Send PeerPacket | ||
Fn->>L: Send PeerPacket | ||
|
||
Note over L: Genesis Creation | ||
L->>L: Collect GenTx | ||
|
||
L->>F1: Send Genesis File | ||
L->>F2: Send Genesis File | ||
L->>Fn: Send Genesis File | ||
|
||
Note over L: Configuration | ||
L->>L: Configurators | ||
|
||
L->>F1: Send Config Files | ||
L->>F2: Send Config Files | ||
L->>Fn: Send Config Files | ||
|
||
Note over L, Fn: Start Network | ||
|
||
Note over L, Fn: Execute(ctx context.Context, runenv *runtime.RunEnv, initCtx *run.InitContext) | ||
|
||
L->>F1: Send Arbitrary Commands | ||
L->>F2: Send Arbitrary Commands | ||
L->>Fn: Send Arbitrary Commands | ||
|
||
L->>F1: Send EndTest Command | ||
L->>F2: Send EndTest Command | ||
L->>Fn: Send EndTest Command | ||
|
||
Note over L, Fn: Retro(ctx context.Context, runenv *runtime.RunEnv, initCtx *run.InitContext) | ||
|
||
Note over L: Process log local data | ||
``` | ||
|
||
## Configuring an Experiment | ||
|
||
### Defining Topologies and Configs | ||
|
||
Per the diagram above, the leader node initializes and modifies the configs used | ||
by each node. This allows for arbitrary network topologies to be created. | ||
|
||
## Implemented Experiments | ||
|
||
### Standard | ||
|
||
The `standard` test runs an experiment that is as close to mainnet as possible. | ||
This is used as a base for other experiements. | ||
|
||
## Running the Experiment | ||
|
||
```sh | ||
cd ./test/testground | ||
testground plan import --from . --name celestia | ||
evan-forbes marked this conversation as resolved.
Show resolved
Hide resolved
evan-forbes marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
# This command should be executed in the 1st terminal | ||
testground daemon | ||
|
||
# This command should be executed in the 2nd terminal | ||
testground run composition -f compositions/standard/plan.toml --wait | ||
|
||
# After the test has been completed, run this command to cleanup remaining instance resources | ||
testground terminate --runner cluster:k8s | ||
``` | ||
|
||
## Collecting Data | ||
|
||
### Grafana | ||
|
||
All metrics data is logged to a separate testground specific grafana/influx | ||
node. To access that node, forward the ports use kubectl. | ||
|
||
```sh | ||
export POD_NAME=$(kubectl get pods --namespace default -l "app.kubernetes.io/name=grafana,app.kubernetes.io/instance=tg-monitoring" -o jsonpath="{.items[0].metadata.name}") | ||
|
||
kubectl --namespace default port-forward $POD_NAME 3000 | ||
|
||
contact members of the devops team or testground admins to get the creds for accessing this node. | ||
``` | ||
|
||
### Tracing | ||
|
||
The tracing infrastructure in celestia-core can be used by using `tracing_nodes` | ||
plan parameter greater than 0, along with specifying the tracing URL and tracing | ||
token as plan parameters in the `plan.toml`. |
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
@@ -0,0 +1,54 @@ | ||||||
[metadata] | ||||||
name = "standard" | ||||||
author = "core-app" | ||||||
|
||||||
[global] | ||||||
plan = "core-app" | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Following this comment, may want to consider renaming it to
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||||||
case = "entrypoint" | ||||||
total_instances = 40 | ||||||
builder = "docker:generic" | ||||||
runner = "cluster:k8s" | ||||||
disable_metrics = false | ||||||
|
||||||
[global.run.test_params] | ||||||
chain_id = "standard-x" | ||||||
timeout = "25m" | ||||||
halt_height = "50" | ||||||
latency = "0" | ||||||
bandwidth = "1Gib" | ||||||
validators = "40" | ||||||
topology = "seed" | ||||||
pex = "true" | ||||||
timeout_propose = "10s" | ||||||
timeout_commit = "11s" | ||||||
per_peer_bandwidth = "5Mib" | ||||||
blob_sequences = "2" | ||||||
blob_sizes = "130000" | ||||||
blobs_per_sequence = "1" | ||||||
inbound_peer_count = "20" | ||||||
outbound_peer_count = "10" | ||||||
gov_max_square_size = "128" | ||||||
max_block_bytes = "2000000" | ||||||
mempool = "v1" | ||||||
broadcast_txs = "true" | ||||||
tracing_nodes = "0" | ||||||
tracing_token = "" | ||||||
tracing_url = "" | ||||||
|
||||||
[[groups]] | ||||||
id = "validators" | ||||||
builder = "docker:generic" | ||||||
[groups.resources] | ||||||
memory = "8Gi" | ||||||
cpu = "6" | ||||||
[groups.instances] | ||||||
count = 40 | ||||||
percentage = 0.0 | ||||||
[groups.build_config] | ||||||
build_base_image = "golang:1.21.3" | ||||||
enable_go_build_cache = true | ||||||
enabled = true | ||||||
go_version = "1.21" | ||||||
[groups.build] | ||||||
[groups.run] | ||||||
artifact = "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Question & Optional Suggestion] What is the impact of this if not used for testing purposes? Couldn't this alternatively be a function in the testground package?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the app version must be set internally, so unfortnuately we can't do that since we don't have direct access to the baseapp, only the interface that wraps around it. However, we should be able to remove this now that we have the latest version of our fork of the sdk, since that version sets the appversion internally similar to this
to answer the question tho, it could be used to change the app version. The app version is included in and checked when verifying the header, so doing this without coordination with the rest of the network would result in the node halting.