🐛 Chain process crash when processing multiple blocks with tallies #297

Thomasvdam · 2024-07-05T10:45:35Z

🐛 Bug Report

When the chain is processing several DR tallies while new ones are being posted/committed the processes sometimes crashes during tally VM execution. When systemd restarts the SEDAD process it just resumes with the block that failed and completes it without issues, including the DR during which the crash happened. This makes it look like the block took a long time to produce, but that doesn't seem to be the case.

We looked into it briefly but the only notable thing is that when executing several tallies per block for a few blocks suddenly the CPU spikes and the process crashes with the last log being the ‘executing tally VM’ one, basically the same error we see on the ARM machine.

Steps to Reproduce

We were able to most reliable reproduce it by submitting 10-20 data requests without an executor running. Then we'd start the executor with a fast ticker time so it executes and commits/reveals multiple DRs per block.

All submitted DRs were identical apart from the memo field.

Stack trace & error message

Logs taken from journalctl:

Jul 04 14:44:30 cosmovisor[300924]: 2:44PM INF non-empty tally list - starting tally process module=x/wasm-storage
Jul 04 14:44:30 cosmovisor[300924]: 2:44PM INF executing tally VM arguments=["","[{\"reveal\":[84,104,105,115,32,105,115,32,97,32,68,82,32,101,120,101,99,117,116,105,111,110,32,114,101,115,117,108,116],\"salt\":[],\"exit_code\":0,\"gas_used\":\"0\"}]","[0]"] consensus=true module=x/wasm-storage request_id=0caa807fdec96f2b88f697a5fe4ce86bb21a55da44f9f49a053cb33a609593cc tally_wasm_hash=e728da7f1457fcf1110f8664e1962d0076d1f301cef09e5f130b9a7d60ccef06
Jul 04 14:44:30 cosmovisor[300924]: 2:44PM INF posting execution results to SEDA contract execution_result={"ExitInfo":{"ExitCode":0,"ExitMessage":"Ok"},"Result":"W3sic2FsdCI6W10sImV4aXRfY29kZSI6MCwiZ2FzX3VzZWQiOiIwIiwicmV2ZWFsIjpbODQsMTA0LDEwNSwxMTUsMzIsMTA1LDExNSwzMiw5NywzMiw2OCw4MiwzMiwxMDEsMTIwLDEwMSw5OSwxMTcsMTE2LDEwNSwxMTEsMTEwLDMyLDExNCwxMDEsMTE1LDExNywxMDgsMTE2XSwiaW5Db25zZW5zdXMiOjB9XQ==","Stderr":[],"Stdout":["Hello\n[DEBUG]: encodedReve>
Jul 04 14:44:30 cosmovisor[300924]: 2:44PM INF tally flow completed module=x/wasm-storage post_result= request_id=0caa807fdec96f2b88f697a5fe4ce86bb21a55da44f9f49a053cb33a609593cc
Jul 04 14:44:30 cosmovisor[300924]: 2:44PM INF executing tally VM arguments=["","[{\"reveal\":[84,104,105,115,32,105,115,32,97,32,68,82,32,101,120,101,99,117,116,105,111,110,32,114,101,115,117,108,116],\"salt\":[],\"exit_code\":0,\"gas_used\":\"0\"}]","[0]"] consensus=true module=x/wasm-storage request_id=55fec897776dc9c5c7450b7fdb13edbb3c3e5151f938ba4ad368852d4b31779f tally_wasm_hash=e728da7f1457fcf1110f8664e1962d0076d1f301cef09e5f130b9a7d60ccef06
Jul 04 14:46:51 cosmovisor[300916]: Error: signal: aborted (core dumped)
Jul 04 14:46:51 cosmovisor[300916]: 2:46PM ERR error="signal: aborted (core dumped)" module=cosmovisor

When the node is restarted it processes that block without issues:

Jul 04 14:46:55 cosmovisor[321226]: 2:46PM INF non-empty tally list - starting tally process module=x/wasm-storage
Jul 04 14:46:55 cosmovisor[321226]: 2:46PM INF executing tally VM arguments=["","[{\"reveal\":[84,104,105,115,32,105,115,32,97,32,68,82,32,101,120,101,99,117,116,105,111,110,32,114,101,115,117,108,116],\"salt\":[],\"exit_code\":0,\"gas_used\":\"0\"}]","[0]"] consensus=true module=x/wasm-storage request_id=0caa807fdec96f2b88f697a5fe4ce86bb21a55da44f9f49a053cb33a609593cc tally_wasm_hash=e728da7f1457fcf1110f8664e1962d0076d1f301cef09e5f130b9a7d60ccef06
Jul 04 14:46:55 cosmovisor[321226]: 2:46PM INF posting execution results to SEDA contract execution_result={"ExitInfo":{"ExitCode":0,"ExitMessage":"Ok"},"Result":"W3sic2FsdCI6W10sImV4aXRfY29kZSI6MCwiZ2FzX3VzZWQiOiIwIiwicmV2ZWFsIjpbODQsMTA0LDEwNSwxMTUsMzIsMTA1LDExNSwzMiw5NywzMiw2OCw4MiwzMiwxMDEsMTIwLDEwMSw5OSwxMTcsMTE2LDEwNSwxMTEsMTEwLDMyLDExNCwxMDEsMTE1LDExNywxMDgsMTE2XSwiaW5Db25zZW5zdXMiOjB9XQ==","Stderr":[],"Stdout":["Hello\n[DEBUG]: encodedReve>
Jul 04 14:46:55 cosmovisor[321226]: 2:46PM INF tally flow completed module=x/wasm-storage post_result= request_id=0caa807fdec96f2b88f697a5fe4ce86bb21a55da44f9f49a053cb33a609593cc
Jul 04 14:46:55 cosmovisor[321226]: 2:46PM INF executing tally VM arguments=["","[{\"reveal\":[84,104,105,115,32,105,115,32,97,32,68,82,32,101,120,101,99,117,116,105,111,110,32,114,101,115,117,108,116],\"salt\":[],\"exit_code\":0,\"gas_used\":\"0\"}]","[0]"] consensus=true module=x/wasm-storage request_id=55fec897776dc9c5c7450b7fdb13edbb3c3e5151f938ba4ad368852d4b31779f tally_wasm_hash=e728da7f1457fcf1110f8664e1962d0076d1f301cef09e5f130b9a7d60ccef06
Jul 04 14:46:55 cosmovisor[321226]: 2:46PM INF posting execution results to SEDA contract execution_result={"ExitInfo":{"ExitCode":0,"ExitMessage":"Ok"},"Result":"W3sic2FsdCI6W10sImV4aXRfY29kZSI6MCwiZ2FzX3VzZWQiOiIwIiwicmV2ZWFsIjpbODQsMTA0LDEwNSwxMTUsMzIsMTA1LDExNSwzMiw5NywzMiw2OCw4MiwzMiwxMDEsMTIwLDEwMSw5OSwxMTcsMTE2LDEwNSwxMTEsMTEwLDMyLDExNCwxMDEsMTE1LDExNywxMDgsMTE2XSwiaW5Db25zZW5zdXMiOjB9XQ==","Stderr":[],"Stdout":["Hello\n[DEBUG]: encodedReve>
Jul 04 14:46:55 cosmovisor[321226]: 2:46PM INF tally flow completed module=x/wasm-storage post_result= request_id=55fec897776dc9c5c7450b7fdb13edbb3c3e5151f938ba4ad368852d4b31779f
Jul 04 14:46:55 cosmovisor[321226]: 2:46PM INF executing tally VM arguments=["","[{\"reveal\":[84,104,105,115,32,105,115,32,97,32,68,82,32,101,120,101,99,117,116,105,111,110,32,114,101,115,117,108,116],\"salt\":[],\"exit_code\":0,\"gas_used\":\"0\"}]","[0]"] consensus=true module=x/wasm-storage request_id=bd3941e5bece128e6f339dd27aaa6f72b6951e66ab1f636d723eb971d2b5868c tally_wasm_hash=e728da7f1457fcf1110f8664e1962d0076d1f301cef09e5f130b9a7d60ccef06
Jul 04 14:46:55 cosmovisor[321226]: 2:46PM INF posting execution results to SEDA contract execution_result={"ExitInfo":{"ExitCode":0,"ExitMessage":"Ok"},"Result":"W3sic2FsdCI6W10sImV4aXRfY29kZSI6MCwiZ2FzX3VzZWQiOiIwIiwicmV2ZWFsIjpbODQsMTA0LDEwNSwxMTUsMzIsMTA1LDExNSwzMiw5NywzMiw2OCw4MiwzMiwxMDEsMTIwLDEwMSw5OSwxMTcsMTE2LDEwNSwxMTEsMTEwLDMyLDExNCwxMDEsMTE1LDExNywxMDgsMTE2XSwiaW5Db25zZW5zdXMiOjB9XQ==","Stderr":[],"Stdout":["Hello\n[DEBUG]: encodedReve>
Jul 04 14:46:55 cosmovisor[321226]: 2:46PM INF tally flow completed module=x/wasm-storage post_result= request_id=bd3941e5bece128e6f339dd27aaa6f72b6951e66ab1f636d723eb971d2b5868c
Jul 04 14:46:55 cosmovisor[321226]: 2:46PM INF executing tally VM arguments=["","[{\"reveal\":[84,104,105,115,32,105,115,32,97,32,68,82,32,101,120,101,99,117,116,105,111,110,32,114,101,115,117,108,116],\"salt\":[],\"exit_code\":0,\"gas_used\":\"0\"}]","[0]"] consensus=true module=x/wasm-storage request_id=7e5fc45696f04f289004cdb9711f1b5818a30f9e12d3238b40b6bb4394c9212b tally_wasm_hash=e728da7f1457fcf1110f8664e1962d0076d1f301cef09e5f130b9a7d60ccef06
Jul 04 14:46:55 cosmovisor[321226]: 2:46PM INF posting execution results to SEDA contract execution_result={"ExitInfo":{"ExitCode":0,"ExitMessage":"Ok"},"Result":"W3sic2FsdCI6W10sImV4aXRfY29kZSI6MCwiZ2FzX3VzZWQiOiIwIiwicmV2ZWFsIjpbODQsMTA0LDEwNSwxMTUsMzIsMTA1LDExNSwzMiw5NywzMiw2OCw4MiwzMiwxMDEsMTIwLDEwMSw5OSwxMTcsMTE2LDEwNSwxMTEsMTEwLDMyLDExNCwxMDEsMTE1LDExNywxMDgsMTE2XSwiaW5Db25zZW5zdXMiOjB9XQ==","Stderr":[],"Stdout":["Hello\n[DEBUG]: encodedReve>
Jul 04 14:46:55 cosmovisor[321226]: 2:46PM INF tally flow completed module=x/wasm-storage post_result= request_id=7e5fc45696f04f289004cdb9711f1b5818a30f9e12d3238b40b6bb4394c9212b
Jul 04 14:46:55 cosmovisor[321226]: 2:46PM INF executing tally VM arguments=["","[{\"reveal\":[84,104,105,115,32,105,115,32,97,32,68,82,32,101,120,101,99,117,116,105,111,110,32,114,101,115,117,108,116],\"salt\":[],\"exit_code\":0,\"gas_used\":\"0\"}]","[0]"] consensus=true module=x/wasm-storage request_id=1b4f0cf1b8b8917012517ab8b609405d71c32bca212cbd318e0330575b275535 tally_wasm_hash=e728da7f1457fcf1110f8664e1962d0076d1f301cef09e5f130b9a7d60ccef06
Jul 04 14:46:55 cosmovisor[321226]: 2:46PM INF posting execution results to SEDA contract execution_result={"ExitInfo":{"ExitCode":0,"ExitMessage":"Ok"},"Result":"W3sic2FsdCI6W10sImV4aXRfY29kZSI6MCwiZ2FzX3VzZWQiOiIwIiwicmV2ZWFsIjpbODQsMTA0LDEwNSwxMTUsMzIsMTA1LDExNSwzMiw5NywzMiw2OCw4MiwzMiwxMDEsMTIwLDEwMSw5OSwxMTcsMTE2LDEwNSwxMTEsMTEwLDMyLDExNCwxMDEsMTE1LDExNywxMDgsMTE2XSwiaW5Db25zZW5zdXMiOjB9XQ==","Stderr":[],"Stdout":["Hello\n[DEBUG]: encodedReve>
Jul 04 14:46:55 cosmovisor[321226]: 2:46PM INF tally flow completed module=x/wasm-storage post_result= request_id=1b4f0cf1b8b8917012517ab8b609405d71c32bca212cbd318e0330575b275535
Jul 04 14:46:55 cosmovisor[321226]: 2:46PM INF finalized block block_app_hash=C4E3550300A6494D8DE9A934903B8518C2CBF8A802D1BD395BAB118CB102AB21 height=6026 module=consensus num_txs_res=5 num_val_updates=0

Expected Behavior

The node should not suddenly crash when executing tally WASM binaries.

Your Environment

./sedad version: 0.2.0-dev.2, taken from Github release (statically compiled)
Architecture: x86_64

The text was updated successfully, but these errors were encountered:

Thomasvdam added type: bug 🐛 Something isn't working priority: critical Must be fixed immediately labels Jul 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🐛 Chain process crash when processing multiple blocks with tallies #297

🐛 Chain process crash when processing multiple blocks with tallies #297

Thomasvdam commented Jul 5, 2024

🐛 Chain process crash when processing multiple blocks with tallies #297

🐛 Chain process crash when processing multiple blocks with tallies #297

Comments

Thomasvdam commented Jul 5, 2024

🐛 Bug Report

Steps to Reproduce

Stack trace & error message

Expected Behavior

Your Environment