-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Beat itest [1/3]: remove the usage of standby nodes #9306
base: yy-waiting-on-merge
Are you sure you want to change the base?
Conversation
This commit adds a new flag to shuffle all the test cases before running them so tests which require lots of blocks to be mined are less likely to be run in the same tranch. The other benefit is this approach provides a more efficient way to figure which tests are broken since all the differnet backends are running different tranches in their builds, we can identify more failed tests in one push.
To make sure each run is shuffled, we use the action ID as the seed.
Prepares the upcoming refactor. We now never call `ht.Alice` directly, instead, we always init `alice := ht.Alice` so it's easier to see how they are removed in a following commit.
This commit removes the usage of the standby nodes and uses `CreateSimpleNetwork` when applicable. Also introduces a helper method `NewNodeWithCoins` to quickly start a node with funds.
This commit removes the standby nodes Alice and Bob.
Since we don't have standby nodes anymore, we don't need to close the channels when the test finishes. Previously we would do so to make sure the standby nodes have a clean state for the next test case, which is no longer relevant.
Soemtimes the node may be hanging for a while without being noticed, causing failures in its following tests, thus making the debugging extrememly difficult. We now assert the node has been shut down from the logs to assert the shutdown process behaves as expected.
This commit fixes a misuse of `ht.Subtest`, where the nodes should always be created inside the subtest.
This bug was hidden because we used standby nodes before, which always have more-than-necessary wallet utxos.
Important Review skippedAuto reviews are limited to specific labels. 🏷️ Labels to auto review (1)
Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
Pull reviewers statsStats of the last 30 days for lnd:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice! I like the idea of shuffling the tests. And lots of other goodies in this PR as well.
The PR body mentions splitting the sub tests into standalone tests to speed up. I assume that's been moved to a different part of the saga?
Looks like one of the tests fails everywhere but not sure if it's related to this PR or fixed in a later one. So non-blocking since this is being merged into a side branch.
--- FAIL: TestLightningNetworkDaemon/tranche02/53-of-192/bitcoind/open_channel_reorg_test (38.98s)
harness_node.go:403: Starting node (name=Alice) with PID=5580
harness_node.go:403: Starting node (name=Bob) with PID=5600
miner.go:631:
Error Trace: /home/runner/work/lnd/lnd/lntest/miner/miner.go:631
/home/runner/work/lnd/lnd/itest/lnd_open_channel_test.go:70
/home/runner/work/lnd/lnd/lntest/harness.go:297
/home/runner/work/lnd/lnd/itest/lnd_test.go:130
Error: Received unexpected error:
expected new miner(555) to be 5 blocks ahead of original miner(551)
Test: TestLightningNetworkDaemon/tranche02/53-of-192/bitcoind/open_channel_reorg_test
Messages: failed to assert block height delta
t.Logf("=========> tests finished for tranche: %v, tested %d "+ | ||
"cases, end height: %d\n", trancheIndex, len(testCases), height) | ||
//nolint:forbidigo | ||
fmt.Printf("=========> tranche %v finished, tested %d cases, mined "+ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we switch from t.Logf()
to fmt.Printf()
? Just to get rid of the indentation? Or also so it's always printed, no matter if -test.v
is specified?
// | ||
// NOTE: Because the parallel tests are initialized with the same seed (job | ||
// ID), they will always have the same order. | ||
func maybeShuffleTestCases() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oooh, this is a very clever solution!
This PR is resulted from fixing the itest failures in
blockbeat
. Since it's a critical system, we need to make sure the CI is indicative of reporting failures. Check #9260 for the final result.Improvements
This PR does the following,
btcd
itest, previously it takes 45m and now it takes around 18m.Changes
standby nodes are removed. On one hand, this setup contributes to some of the flakes we've previously seen, e.g., checking num of edges. It creates quite a burden as all the tests are supposed to be independent, and this setup has created unnecessary states that need to be managed. On the other hand, this setup hides bugs, especially when it comes to how the wallet manages UTXOs since it always gives each standby node more than enough UTXOs to use, which is shown in one of the test failures found, in which the
ListCoins
gives incorrect result and is currently under investigation.tests are shuffled deterministically before running to maximize the tranche scheduler. Out of the 8 tranches in the CI, one of them takes much longer to run due to more blocks mined in this tranche. Since the time it takes to run the test is almost linear to the num of blocks mined, we now shuffle the test cases so it's more likely each tranche mines a similar num of blocks. This setup also gives a faster feedback loop during development, which was why I did it originally. Suppose one change caused 10 tests to fail, with shuffling, we are likely to catch all of them in one CI run, whereas we'd run the CI multiple times previously.
most of the table-driven tests are now broken into independent tests, mostly because we want them to be more maintainable, but also contribute to speeding up the CI as the more tests we have, the more likely we get even tranches.
Aside from the above structural changes, all the flakes are now documented, including the ones in windows (windows is less stable, unfortunately). We also assert the shutdown is clean (which contributes to some of the nasty flakes).
Dependent PRs
Issues fixed from this new itest setup, which are dependent PRs,
htlcswitch+routing: handle nil pointer dereference properly #9303
lnd: stop
graphBuilder
during shutdown #9292lnwallet: log the amounts in the same unit #9291
peer+lnd: fix peer blocking on node shutdown #9275
multi: fix rpcclient shutdown #9261
trivial: prepare itest for
blockbeat
#9259chainntnfs: fix missing notifications #9258
routing: fix missionControlStore blocks on shutting down #9249
lntest: fix edge assertion and reset min relay fee #9248
itest: fix flake in
payment_failure_reason_canceled
#9228