Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AlephBFT consensus code is too chatty #3598

Closed
dpc opened this issue Nov 14, 2023 · 12 comments · Fixed by #3670
Closed

AlephBFT consensus code is too chatty #3598

dpc opened this issue Nov 14, 2023 · 12 comments · Fixed by #3670
Assignees

Comments

@dpc
Copy link
Contributor

dpc commented Nov 14, 2023

While I'm debugging tests in the fallout of the fiat CI fiasco, the AlephBFT stands out as being way too chat with messages that tell me nothing useful.

2023-11-14T19:06:43.910094Z  INFO AlephBFT-runway: Runway status report: healthy. DAG size - 123; DAG height - 34; DAG top row - [(0, 33), (1, 34), (
2, 34), (3, 18)].

Now that we had been using it for some time it would be good time to revise them.

As a general rule anything at Info and above log level should be dumbed down for someone that doesn't understand anything about AlephBFT and avoid any consensus specific terms.

At info level I'd be interested mostly in:

"We processed a block with: (summary of items)" type of messages, when there's some consensus action going on.

At Warn about things going clearly wrong and most probably indicating a problem.

@dpc
Copy link
Contributor Author

dpc commented Nov 14, 2023

@joschisan Assigning to you for obvious reasons. :)

@dpc
Copy link
Contributor Author

dpc commented Nov 14, 2023

BTW. It might make sense to have some ./docs document explaining AlephBFT logs assuming limited knowledge or something. Donno, it seems important, but hard to understand and 99% of the time not the issue at hand.

@justinmoon
Copy link
Contributor

Deep dive: default tracing subscriber could be set to ERROR level for alephbft?

@dpc
Copy link
Contributor Author

dpc commented Nov 16, 2023

  • Report upstream
  • in fedimint-logging we can set levels for particular logging modules/targets to by prepending the RUST_LOG env var with some Aleph-xyz=error,Aleph-foo=error etc.

@dpc
Copy link
Contributor Author

dpc commented Nov 16, 2023

Reported Cardinal-Cryptography/AlephBFT#373

@dpc
Copy link
Contributor Author

dpc commented Nov 16, 2023

Will try to help with #3625

@elsirion
Copy link
Contributor

Fixed by #3625 to the best of my knowledge, reopen if still relevant.

@dpc
Copy link
Contributor Author

dpc commented Nov 20, 2023

When I do fedimint-cli ln-invoice --amount 100msat I get:

image

2023-11-20T22:36:08.699405Z  INFO connection{remote_addr=127.0.0.1:54390 conn_id=19}: jsonrpsee_server::server: Accepting new connection 3/1000
2023-11-20T22:36:08.724074Z ERROR connection{remote_addr=127.0.0.1:54390 conn_id=19}: jsonrpsee_server::transport::ws: WS transport error: i/o error: Transport endpoint is not connected (os error 107); terminate connection: 19
2023-11-20T22:36:22.245727Z  INFO consensus: Session completed
2023-11-20T22:36:22.293199Z  INFO consensus: Loaded aleph buffer with 0 bytes
2023-11-20T22:36:30.503555Z  INFO connection{remote_addr=127.0.0.1:53446 conn_id=20}: jsonrpsee_server::server: Accepting new connection 3/1000
2023-11-20T22:36:32.583353Z ERROR connection{remote_addr=127.0.0.1:53446 conn_id=20}: jsonrpsee_server::transport::ws: WS transport error: i/o error: Transport endpoint is not connected (os error 107); terminate connection: 20
2023-11-20T22:36:40.506115Z ERROR jsonrpsee_server::transport::ws: WS transport error: send ping failed: Networking or low-level protocol error: connection closed
2023-11-20T22:37:10.222280Z  INFO consensus: Session completed
2023-11-20T22:37:10.629388Z  INFO consensus: Loaded aleph buffer with 0 bytes
2023-11-20T22:37:36.294793Z  INFO connection{remote_addr=127.0.0.1:44744 conn_id=21}: jsonrpsee_server::server: Accepting new connection 3/1000
2023-11-20T22:37:38.577648Z ERROR connection{remote_addr=127.0.0.1:44744 conn_id=21}: jsonrpsee_server::transport::ws: WS transport error: i/o error: Transport endpoint is not connected (os error 107); terminate connection: 21
2023-11-20T22:37:46.303167Z ERROR jsonrpsee_server::transport::ws: WS transport error: send ping failed: Networking or low-level protocol error: connection closed
2023-11-20T22:37:58.099181Z  INFO consensus: Session completed
2023-11-20T22:37:58.154714Z  INFO consensus: Loaded aleph buffer with 0 bytes
2023-11-20T22:38:45.483987Z  INFO consensus: Session completed
2023-11-20T22:38:45.551832Z  INFO consensus: Loaded aleph buffer with 0 bytes

Seems like we re-connecting. Is this expected? If so - should we hide it?

Also the buffer with 0 bytes ... is it good that it's 0 bytes? Important? Should we lower it?

@joschisan

@dpc dpc reopened this Nov 20, 2023
@dpc dpc self-assigned this Nov 20, 2023
@joschisan
Copy link
Contributor

joschisan commented Nov 20, 2023

This is not a log from aleph bft but from fedimint-Server. The aleph buffer is the backup by aleph bft; it would be non-zero if we recover from a mid session crash. Not sure which level to put this though, feel free to change it. I myself just used it for debugging in the beginning, could also be removed.

@dpc
Copy link
Contributor Author

dpc commented Nov 20, 2023

@joschisan I don't want to remove it if it's useful. We could log conditionally if non-zero, etc. I just don't understand the implications here.

What about these reconnects? Are these expected to be happening?

@joschisan
Copy link
Contributor

@dpc So if we are not recovering from a mid session crash it will always log zero bytes, does not seem that helpful to me but I do not really have a opinion on this or effective logging in general so please change or remove the server logs I put in however you think is best.

I have no idea why the reconnects occur.

@elsirion
Copy link
Contributor

We could log conditionally if non-zero,

This is the way

I have no idea why the reconnects occur.

The really annoying logs come from the API. Every time a client disconnects by just disappearing/closing the TCP stream (as the CLI client always does) an error is logged afaik. We should also reduce jsorpsee logging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants