Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(swamp): use celestia-app instead of the kvstore #1160

Merged

Conversation

evan-forbes
Copy link
Member

@evan-forbes evan-forbes commented Sep 23, 2022

refactors the swamp test to use celestia-app and a normal tendermint node instead of the kvstore and the rpc test tendermint node.

close #628

@evan-forbes evan-forbes self-assigned this Sep 23, 2022
@evan-forbes
Copy link
Member Author

evan-forbes commented Sep 26, 2022

the current refactor in this PR is using testutil/network from the cosmos-sdk, which could be removed or modified sometime soon cosmos/cosmos-sdk#12527.

fwiw, I could not find yet find the reason why the cosmos-sdk's testutil/network is incompatible with sub 1 second block times. For whatever reason, it continues to increase proposal time by one second per block if the timeout commit is set below 1s. If the timeout commit is set greater than 1s, then the times are as expected. This does not occur with a normal one node network using celestia-app, only when using the cosmos-sdk's testutil/network

there are a few weird things about this bug:

  • the block times decrease irl despite the block times saying that they are precisely 1 second
  • when the bug occurs, the precision of the block times are identical.
bug not occurring
timeout-commit = 1000ms
2022-09-26 23:06:00.547264861 +0000 UTC
2022-09-26 23:06:01.580626296 +0000 UTC
2022-09-26 23:06:02.614770553 +0000 UTC
2022-09-26 23:06:03.648607489 +0000 UTC
2022-09-26 23:06:04.684149147 +0000 UTC
bug occurring
timeout-commit = 950ms
2022-09-26 23:06:39.503282545 +0000 UTC
2022-09-26 23:06:40.503282545 +0000 UTC
2022-09-26 23:06:41.503282545 +0000 UTC
2022-09-26 23:06:42.503282545 +0000 UTC
bug occurring
timeout-commit = 100ms
2022-09-26 23:10:10.135204764 +0000 UTC
2022-09-26 23:10:11.135204764 +0000 UTC
2022-09-26 23:10:12.135204764 +0000 UTC
2022-09-26 23:10:13.135204764 +0000 UTC

Still not sure if we should fix this, or go with our own proprietary option that we have already started on.

@evan-forbes evan-forbes force-pushed the evan/refactor-swamp-to-use-celestia-app branch from cedaa8b to 4e77a0c Compare September 30, 2022 03:10
@evan-forbes
Copy link
Member Author

evan-forbes commented Sep 30, 2022

had some time to work on this, and the last bug ended up being that we needed to change a the consensus parameter TimeIotaMs, to configure the network to not have a minimum amount of time required to produce a block. By default this is 1000ms, and if the block time is shorter than this, then tendermint will just continually add the minimum time to each commit signature, which is used to determine the block time.

There was another issue where the cosmos-sdk will override this value while collecting the gentxs if using the this genutil func.

both of those are fixed and now we have block using 100ms. This did fix a few swamp tests, but I'm still seeing some errors the last test that it would be useful to get a second pair of eyes on. I'm getting either of these errors

--- FAIL: TestFullReconstructFromLights (51.22s)
    reconstruct_test.go:175: 
                Error Trace:    /home/evan/go/src/github.com/celestiaorg/celestia-node/nodebuilder/tests/reconstruct_test.go:175
                Error:          Received unexpected error:
                                header: not found
                Test:           TestFullReconstructFromLights
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x404114]

goroutine 408206 [running]:
bytes.Equal(...)
        /home/evan/.go/src/bytes/bytes.go:20
github.com/celestiaorg/celestia-node/service/share.isMinRoot(0xc0250aef40?)
        /home/evan/go/src/github.com/celestiaorg/celestia-node/service/share/cache_availability.go:95 +0x56
github.com/celestiaorg/celestia-node/service/share.(*CacheAvailability).SharesAvailable(0xc0250aef30, {0x3315418, 0xc010ed2480}, 0xc02b6eba28?)
        /home/evan/go/src/github.com/celestiaorg/celestia-node/service/share/cache_availability.go:52 +0x4e
github.com/celestiaorg/celestia-node/das.(*DASer).sample(0xc001ad2640, {0x3315418, 0xc010ed2480}, 0xc02ab84f00)
        /home/evan/go/src/github.com/celestiaorg/celestia-node/das/daser.go:153 +0x63
        ```

@Wondertan
Copy link
Member

@evan-forbes, I will checkout the branch and try to reproduce this

@evan-forbes
Copy link
Member Author

thanks! this PR is using this branch on the application to have a working version of the new testnode on v0.7.0-rc-1

@evan-forbes
Copy link
Member Author

update: I'll try and work on this tmrw/later this week. The branch of the app that it uses needs to be slightly updated to something closer to celestiaorg/celestia-app#824.

@Wondertan
Copy link
Member

@evan-forbes, why do we need 10k of funded accounts in the testnode setup? It makes test execution for me unbelievably slow

@Wondertan
Copy link
Member

Wondertan commented Oct 6, 2022

@evan-forbes, so the issue with header: not found happens because the App blocks production rate is unlimited. This relation is unobvious, but it is what it is.

It's unclear why timeouts from Swamp's config are set incorrectly, but I am digging into

@Wondertan
Copy link
Member

Ok, so I could slow down block production by changing SkipTimeoutCommit to false.
Also, I was able to make the Swamp work reliably, no matter what the block production rate is.

@Wondertan
Copy link
Member

Wondertan commented Oct 6, 2022

So I pushed fixes for the synchronization logic. Now the reconstruction tests cause issues. It succeeds, but it takes muuuch longer than it should. The reason is somewhere in PFDs and FillBlocks implementation.

EDIT: The issue is in testnode.FillBlock. Currently, FillBlocks runs block amount of testnode.FillBlocks, but I was able to spawn exactly the required amount of blocks by running testnode.FillBlock only once with block amount accounts. After this fact, it should be obvious what went wrong.

@Wondertan Wondertan force-pushed the evan/refactor-swamp-to-use-celestia-app branch from dc618fc to 11543be Compare October 6, 2022 16:29
@Wondertan
Copy link
Member

Ok, now the reconstruction tests work correctly, and the last one to fix are BEFP related

@Wondertan Wondertan self-assigned this Oct 6, 2022
@Wondertan
Copy link
Member

Cool, so this migration uncovered a bug related to services shutdown on FraudProof arrival

@codecov-commenter
Copy link

codecov-commenter commented Oct 6, 2022

Codecov Report

Merging #1160 (b199655) into main (23c77e8) will increase coverage by 0.10%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##             main    #1160      +/-   ##
==========================================
+ Coverage   55.51%   55.61%   +0.10%     
==========================================
  Files         161      161              
  Lines        9582     9582              
==========================================
+ Hits         5319     5329      +10     
+ Misses       3735     3727       -8     
+ Partials      528      526       -2     
Impacted Files Coverage Δ
das/coordinator.go 92.64% <0.00%> (+1.47%) ⬆️
header/core/listener.go 58.49% <0.00%> (+5.66%) ⬆️
ipld/get_shares.go 100.00% <0.00%> (+10.00%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@Wondertan Wondertan force-pushed the evan/refactor-swamp-to-use-celestia-app branch 4 times, most recently from 9c5ac32 to 0604706 Compare October 6, 2022 21:47
@Wondertan Wondertan force-pushed the evan/refactor-swamp-to-use-celestia-app branch 3 times, most recently from a427e14 to 5316e63 Compare October 7, 2022 16:50
@Wondertan
Copy link
Member

I've included one commit that fixes CI on the main to see that all the tests here reliably pass. It will remove it before merging.

@Wondertan
Copy link
Member

I also rebased it on the main and desperately trying to make CI green. Locally everything works fine!

@evan-forbes
Copy link
Member Author

I should have time today or tmrw to continue debugging this. Will likely end up trying quite a few temporary commits to fix the CI. I can create another branch if we need to try different things in parallel.

@Wondertan
Copy link
Member

Hey @evan-forbes. Thanks. I will also continue debugging. I feel like I am very close

@Wondertan Wondertan force-pushed the evan/refactor-swamp-to-use-celestia-app branch from 8b9aa1e to 18ea261 Compare October 11, 2022 14:21
@Wondertan
Copy link
Member

Wondertan commented Oct 11, 2022

Rebased. Removing temporary commit

@Wondertan
Copy link
Member

Would like to have a last look from @renaynay and @Bidon15 before merging

Bidon15
Bidon15 previously approved these changes Oct 11, 2022
@Wondertan Wondertan force-pushed the evan/refactor-swamp-to-use-celestia-app branch 2 times, most recently from 87a9a06 to d44a893 Compare October 11, 2022 16:03
@Wondertan
Copy link
Member

The last change here, I hope. I updated timeouts and tide up the go.mod

vgonkivs
vgonkivs previously approved these changes Oct 11, 2022
Bidon15
Bidon15 previously approved these changes Oct 11, 2022
evan-forbes and others added 11 commits October 11, 2022 18:24
…ction rate

By setting the trustedHash with height 1 we make synchronization work with any block production rate.

`SkipTimeoutCommit` as `false` ensures `TimeoutCommit` for longer block times is respected

Also sort some imports
…eallocated accounts

FillBlocks was generating much more blocks than expected. Calling of testnode.FillBlock once with block amount of accounts fixes it.
Also, the amount of preallocated accounts was increased, to ensure enough blocks can be generated.
Additionally, the blocktime was increased to avoid over production of blocks.
…ase it to 5 mins

It's very likely that introduction of new App into Swamp brings more CPU usage subsequently increasing the time CI needs to finish tests
@Wondertan Wondertan dismissed stale reviews from Bidon15 and vgonkivs via f97750f October 11, 2022 16:24
@Wondertan Wondertan force-pushed the evan/refactor-swamp-to-use-celestia-app branch from 5529971 to f97750f Compare October 11, 2022 16:24
@Wondertan Wondertan merged commit d0eaef6 into celestiaorg:main Oct 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Status: Done
Development

Successfully merging this pull request may close these issues.

swamp: Integration tests with celestia-app instead of kvstore
7 participants