Add `chaos-mesh` for testing (BFT-392) #70

IAvecilla · 2024-02-27T21:22:18Z

What ❔

Incorporate chaos-mesh tools into the consensus testing framework.

Why ❔

To validate the network's behavior in response to non-canonical node behavior, we need tests. Chaos-mesh enables us to introduce delays or node crashes, simulating real-world network scenarios for thorough testing.

Co-authored-by: Bruno França <[email protected]>

IAvecilla · 2024-03-11T18:27:11Z

@moshababo, I've implemented the initial approach for this integration here. It's likely to undergo several iterations and discussions to refine the testing flow and the introduction of chaos into the network. I may make some additional changes in the coming days, but please feel free to suggest any improvements or request changes as needed.

moshababo · 2024-03-11T18:34:42Z

Can you respond to my previous reviews so I'll know what changes to expect?

IAvecilla · 2024-03-14T14:41:47Z

@moshababo Done!

moshababo · 2024-03-17T23:45:45Z

node/tests/src/main.rs

+        .unwrap();
+    let last_voted_view: u64 =
+        serde_json::from_value(response.get("last_commited_block").unwrap().to_owned()).unwrap();
+    for _ in 0..5 {


What's the purpose of this loop?

At first my idea was to check a few times that the node was indeed going through the delay and not being able to vote for the views, now I'm not sure if that is useful at all. Maybe there's a better way to achieve this.

I don't think it adds much value, while adding non-determinism to the scenario (for being dependent on the response latency).
It is good to check that status while the chaos resources are deployed, but it's better done via test controlling the teardown explicitly.

moshababo · 2024-03-17T23:50:42Z

node/tests/src/main.rs

+pub async fn delay_test(test_result: Arc<Mutex<u8>>) -> anyhow::Result<()> {
+    let client = k8s::get_client().await.unwrap();
+    let target_node = "consensus-node-01";
+    let ip = k8s::get_node_rpc_address_with_name(&client, target_node)


Better to further encapsulate technical details under the test framework. The actual test scenario should more declarative.

moshababo · 2024-03-17T23:51:55Z

node/tests/src/main.rs

+/// We use unwraps here because this function is intended to be used like a test.
+pub async fn delay_test(test_result: Arc<Mutex<u8>>) -> anyhow::Result<()> {
+    let client = k8s::get_client().await.unwrap();
+    let target_node = "consensus-node-01";


The test framework should support the installation of chaos resources for a set of nodes, not just a single one.

moshababo · 2024-03-18T00:06:35Z

node/tests/src/main.rs

@@ -83,6 +83,15 @@ pub async fn sanity_test() {
    }
 }

+/// Sanity test for the RPC server.
+/// We use unwraps here because this function is intended to be used like a test.
+pub async fn delay_test() {


Yes, most byzantine behavior duration will need to be "denominated" in views, not clock time, for actually testing liveness during the disruption (and not just quick recovery once it's over).

It might be needed only for the removal, while applying it always at the beginning of the test, but it's better to just have the test framework supports chaos scheduling with from_view,to_view args.

brunoffranca · 2024-03-27T13:40:57Z

@moshababo given the developments with AttackNet, should we close this?

IAvecilla and others added 30 commits January 10, 2024 18:34

Add dockerfile for executable node

6bd6170

Add compose file for testing purposes

b2fb9ab

Add entrypoint for node consensus dockerfile

e768bc6

Add some comments and improve compose test file

e6b6487

Add new makefile commands to run dockerized consensus node

4ca7fdc

Update readme with docs to run consensus node in docker

28df58b

Delete unnecesary building dependency in compose file

3e4b881

Rename docker image

febcafc

Set container names manually in compose file

b2e7e9c

Separate config directory for nodes running in docker

5e27e6e

Fix node configuration for docker consensus example

6e2236a

Improve command to run a node in a container

810a4b5

Generate the node configs in release mode

4088c0a

Fix docker cleanup to force deletion

4475489

Remove unnecesary copies to container

689b6d5

Add target dir to docker ignore

99dc691

Move every docker related config file to the project root

a3b270d

Fix typo in README

f93890f

Co-authored-by: Bruno França <[email protected]>

Fix typo in makefile command

f3893cf

Make the path to makefile be the same for local and docker

1c721c8

disabled clipply lint

8cac863

fixed lint, updated deps

d24c8bc

Change file name for the docker entrypoint and add comment to the script

9520c4c

Move version to latest for rust image

54c72ce

Change name of command and dir generation for node config

13e61e1

Add command to stop dockerized nodes

3b8d299

Add example file with local address for node configuration

2d09c1d

Update README with new updates

c287c56

Write a general overview on the README

e66cec9

Add tonic as dependency

eb7077e

IAvecilla added 3 commits March 11, 2024 12:23

Fix clippy and update comments

478eef8

Add duration parameter for the chaos deployment

20b7c4d

Reorganize all k8s module with new chaos functions

44faceb

IAvecilla marked this pull request as ready for review March 11, 2024 18:24

IAvecilla requested a review from moshababo March 11, 2024 18:24

Delete outdated changes in the node rpc side

b024f3a

moshababo reviewed Mar 18, 2024

View reviewed changes

Base automatically changed from rpc_execution_connection to main March 19, 2024 20:00

IAvecilla added 9 commits March 20, 2024 17:10

Merge branch 'main' into chaos_mesh_integration

9c85559

Fix merge errors

0dff79f

Encapsulate test logic

8ab0747

Add new type for pod IP

96d0fe6

Fix new abstractions for rpc calls

490619c

Supports delay for many nodes

f0729da

Split logic for network chaos action into different module

e011eee

Add new function docs

43989ff

Add teardown function for delay deployment

61fb9c9

IAvecilla requested a review from moshababo March 22, 2024 20:32

IAvecilla added 6 commits March 22, 2024 17:33

Merge branch 'main' into chaos_mesh_integration

5c5629e

Fix typos

ffa2fe0

Fix last typos

2efcd44

Fix heck crate version in deny file

b5aadcf

Fix format

c2ecdad

Reorder files for chaos mesh functionality

0b353a6

brunoffranca closed this Sep 25, 2024

brunoffranca deleted the chaos_mesh_integration branch November 1, 2024 19:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `chaos-mesh` for testing (BFT-392) #70

Add `chaos-mesh` for testing (BFT-392) #70

IAvecilla commented Feb 27, 2024 •

edited

Loading

IAvecilla commented Mar 11, 2024

moshababo commented Mar 11, 2024

IAvecilla commented Mar 14, 2024

moshababo Mar 17, 2024

IAvecilla Mar 20, 2024 •

edited

Loading

moshababo Mar 21, 2024

moshababo Mar 17, 2024

moshababo Mar 17, 2024

moshababo Mar 18, 2024

brunoffranca commented Mar 27, 2024

Add chaos-mesh for testing (BFT-392) #70

Add chaos-mesh for testing (BFT-392) #70

Conversation

IAvecilla commented Feb 27, 2024 • edited Loading

What ❔

Why ❔

IAvecilla commented Mar 11, 2024

moshababo commented Mar 11, 2024

IAvecilla commented Mar 14, 2024

moshababo Mar 17, 2024

Choose a reason for hiding this comment

IAvecilla Mar 20, 2024 • edited Loading

Choose a reason for hiding this comment

moshababo Mar 21, 2024

Choose a reason for hiding this comment

moshababo Mar 17, 2024

Choose a reason for hiding this comment

moshababo Mar 17, 2024

Choose a reason for hiding this comment

moshababo Mar 18, 2024

Choose a reason for hiding this comment

brunoffranca commented Mar 27, 2024

Add `chaos-mesh` for testing (BFT-392) #70

Add `chaos-mesh` for testing (BFT-392) #70

IAvecilla commented Feb 27, 2024 •

edited

Loading

IAvecilla Mar 20, 2024 •

edited

Loading