Directory nodes constantly disconnecting #1435

PulpCattel · 2023-01-18T10:09:27Z

EDIT: as per #1436 (comment), the spam issue should be mostly solved. Renamed this issue to focus on the "disconnect" part. Hopefully the changes in #1436 will improve this already.

EDIT: g3hv4uynnmynqqq2mchf3fcm3yd46kfzmcdogejuckgwknwyq5ya6iad is constantly disconnecting too now.

As reported by many different users and contributors, the following two directory nodes are disconnecting in a repetitive/constant pattern:

3kxw6lf5vf6y26emzwgibzhrzhmhqiw6ekrek3nqfjjmhwznb2moonad.onion:5222
bqlpq6ak24mwvuixixitift4yu42nxchlilrcqwk2ugn45tdclg42qid.onion:5222

The log is simply a constant repetition of:

[DEBUG]  We got a disconnect event: 3kxw6lf5vf6y26emzwgibzhrzhmhqiw6ekrek3nqfjjmhwznb2moonad.onion:5222
[INFO ]  Updating status to connected for peer 3kxw6lf5vf6y26emzwgibzhrzhmhqiw6ekrek3nqfjjmhwznb2moonad.onion:5222.

It started last week from what I can gather from users report, but I can't confirm it wasn't happening earlier too.
At this point, it seems plausible to me this is a bug in our code logic.

The simplest explanation that comes to my mind is that we are not handling correctly some connectivity issue edge case.

Curiously, the newly added directory node g3hv4uynnmynqqq2mchf3fcm3yd46kfzmcdogejuckgwknwyq5ya6iad.onion:5222 is unaffected. This node is not yet in a release, so it might simply be because of less network traffic (EDIT: this directory is now experiencing exactly the same problem).

FWIW, signet directories also appear to be working.

This issue might resolve itself, but IMHO it's good to look into it to prevent it from happening again.

The text was updated successfully, but these errors were encountered:

Fixes #1435. Prior to this commit, if connections were immediately closed, no backoff or jitter were applied to delay the next connection attempt, resulting in too many connection attempts to directory nodes that the client could not reach, and spam in the logs. After this commit, we enforce that we always shut down the ClientService on any disconnection trigger or connection failure, and only attempt to reconnect with a manual backoff delay (4 seconds increasing by 50% for 20 attempts, plus a few seconds jitter), for directory nodes, while for non-directory nodes we always give up on any failure.

See #1435. Prior to this commit, if connections were immediately closed, no backoff or jitter were applied to delay the next connection attempt, resulting in too many connection attempts to directory nodes that the client could not reach, and spam in the logs. After this commit, we enforce that we always shut down the ClientService on any disconnection trigger or connection failure, and only attempt to reconnect with a manual backoff delay (4 seconds increasing by 50% for 20 attempts, plus a few seconds jitter), for directory nodes, while for non-directory nodes we always give up on any failure. Max delay is a bit above 3 hours.

AdamISZ · 2023-01-27T15:20:12Z

Just dropping a note on what I've been seeing for the last few days since #1436 merge: though we back off (with that code ofc, most users are on 0.9.8), we still seem to be basically permanently shut off from two of the three directory-nodes. More work required, even though that patch did have the desired effect, and bots can actually run normally without log spam.

Re: "more work", I think a good place to focus on is how the directory node onion services are configured. Perhaps somewhere buried there's a setting that can allow more connections? Or at least, figure out what the connection limits are (both in timing of requests, and total number, say)?

PulpCattel · 2023-01-27T17:19:20Z

Just focusing about Tor, this could be a start https://community.torproject.org/onion-services/advanced/dos/

For more crude details we might have to dig Tor source code and spec at some point.

AdamISZ · 2023-01-29T16:55:29Z

Was reading docs for a related project, and noticed this: https://tunnelsats.github.io/tunnelsats/FAQ.html#tuning-tor ... perhaps worth investigating? I believe someone mentioned LongLivedPorts on this repo a while ago. Edit: yes, see fbcb9fd - hence 5222.

AdamISZ · 2023-02-05T23:21:11Z

Interesting point from man tor: ConnLimit defaults to 1000. Second, I observed earlier today that about 1000 !orderbook messages were propagated by my dir node. Third, the way we use tor is that a separate instance of the tor process is run for each directory node.

Though we see only perhaps a couple of hundred makers (at best), we could very easily be supporting 1000 takers+makers+obwatchers. I'll look into changing this. (though, i still don't actually understand why it would be 'connect then disconnect' and not just 'refuse', if indeed this is in any way related).

Edit: hmm, on a second read, unfortunately that appears to be a minimum not a maximum. So probably irrelevant but may as well leave the note here.

AdamISZ · 2023-02-07T17:03:43Z

In case anyone else is investigating this like me, here's a point that might be very important: how to see detailed logs of the tor instance in which your directory node is running:

Note that we spawn a new tor process for it, and that process can be set to log to a particular file by editing path-to-jmcs/jmvenv/lib/python3.xx/site-packages/txtorcon/controller.py line 314 by extending this list:

config_args = ['-f', '/dev/null/non-existant-on-purpose', '--ignore-missing-torrc']

to something like:

config_args = ['-f', '/dev/null/non-existant-on-purpose', '--ignore-missing-torrc', 'Log', 'info file yourfilename.txt']

(read man tor, but notice that while on command line you need double quotes around the multi word value for the Log option, here, you don't need th edouble quotes). Another thing is that the location of the log file created is in /tmp/tortmpxxxxx where the random xxxxx can be seen from ps aux | grep tor or similar.

Log levels here are debug, info, warn, notice or similar as per docs/man page. I think info is already pretty verbose, debug looks overwhelming. I'm going to be tail -f-ing this to help me see what transpires with different setups.

(You can also control the output of journalctl with options in the .service file, like StandardOutput:journal+console but it doesn't help you with the tor log, hence the above).

Please let me know if you have other info about logging tor, here.

AdamISZ · 2023-02-07T17:10:06Z

An addendum: when tail-ing that output, it was still overwhelming (which itself may be interesting!) for my mainnet dnode, so you may find notice more realistic than info (as, in fact, the man page suggests). Let's see.

CohibAA · 2023-03-03T05:46:31Z

Seeing this pretty often still when launching bots. Curious why sometimes they will reconnect for very short periods of time.

2023-03-xxxxxxxxxxxxxxx [WARNING]  Failure to send message to directory: g3hv4uynnmynqqq2mchf3fcm3yd46kfzmcdogejuckgwknwyq5ya6iad.onion:5222.
2023-03-xxxxxxxxxxxxxxx [WARNING]  Failure to send message to directory: 3kxw6lf5vf6y26emzwgibzhrzhmhqiw6ekrek3nqfjjmhwznb2moonad.onion:5222.
2023-03-xxxxxxxxxxxxxxx [WARNING]  Failure to send message to directory: bqlpq6ak24mwvuixixitift4yu42nxchlilrcqwk2ugn45tdclg42qid.onion:5222.

So, I decided to should start my own directory node to dig into why. Tor is quite a bit more sophisticated than I am, so it was a lot of trial and error to even get going. Now that it's working, I guess it's time to try to break it.

If anyone wants to connect, here is the list of directory_nodes I am currently using for reference. This includes the newest from the stopgap list of directory nodes, #1445, to date of this comment, as well as the new node from #1456.

directory_nodes = plq5jw5hqke6werrc5duvgwbuczrg4mphlqsqbzmdfwxwkm2ncdzxeqd.onion:5222,g3hv4uynnmynqqq2mchf3fcm3yd46kfzmcdogejuckgwknwyq5ya6iad.onion:5222,3kxw6lf5vf6y26emzwgibzhrzhmhqiw6ekrek3nqfjjmhwznb2moonad.onion:5222,bqlpq6ak24mwvuixixitift4yu42nxchlilrcqwk2ugn45tdclg42qid.onion:5222,ylegp63psfqh3zk2huckf2xth6dxvh2z364ykjfmvsoze6tkfjceq7qd.onion:5222

PulpCattel · 2023-03-27T08:10:09Z

From what I can see, g3hv4uynnmynqqq2mchf3fcm3yd46kfzmcdogejuckgwknwyq5ya6iad is "down" too now, same pattern as the other directories. RIP.

It really seems like there's a magic threshold that we reach and things stop working.

CohibAA · 2023-03-27T16:49:23Z

2023-03-27 xxxxxxxxxxxx [WARNING]  Failure to send message to directory: g3hv4uynnmynqqq2mchf3fcm3yd46kfzmcdogejuckgwknwyq5ya6iad.onion:5222.
2023-03-27 xxxxxxxxxxxx [WARNING]  Failure to send message to directory: 3kxw6lf5vf6y26emzwgibzhrzhmhqiw6ekrek3nqfjjmhwznb2moonad.onion:5222.
2023-03-27 xxxxxxxxxxxx [WARNING]  Failure to send message to directory: bqlpq6ak24mwvuixixitift4yu42nxchlilrcqwk2ugn45tdclg42qid.onion:5222.

Seeing the same thing here still. I've been monitoring the tor logs for the directory node I setup, but nothing really unusual has presented itself yet, and it still hasn't been hit by the "magic threshold" as far as I can tell. Please report if you have issues with "~~wkd3kd73ze62sqhlj2ebwe6fxqvshw5sya6nkvrgcweegt7ljhuspaid.onion~~"* and I can reference back to logs more directly possibly.

edit: The "wkd3" onion referenced above has been replaced by plq5jw5hqke6werrc5duvgwbuczrg4mphlqsqbzmdfwxwkm2ncdzxeqd.onion due to unrelated server downtime.

See JoinMarket-Org#1435. Prior to this commit, if connections were immediately closed, no backoff or jitter were applied to delay the next connection attempt, resulting in too many connection attempts to directory nodes that the client could not reach, and spam in the logs. After this commit, we enforce that we always shut down the ClientService on any disconnection trigger or connection failure, and only attempt to reconnect with a manual backoff delay (4 seconds increasing by 50% for 20 attempts, plus a few seconds jitter), for directory nodes, while for non-directory nodes we always give up on any failure. Max delay is a bit above 3 hours.

AdamISZ mentioned this issue Jan 18, 2023

Connections to dnodes slow down #1436

Merged

PulpCattel changed the title ~~Log spammed by We got a disconnect event~~ Directory nodes constantly disconnecting Jan 23, 2023

AdamISZ mentioned this issue Jan 23, 2023

Can't use sendpayment.py We failed to connect and handshake with ANY directories; onion messaging is not functioning. #1440

Closed

AdamISZ closed this as completed in bc626e5 Jan 23, 2023

PulpCattel reopened this Jan 24, 2023

This was referenced Mar 2, 2023

Troubleshooting Directory Service Node Setup #1456

Closed

Stopgap: list of directory nodes #1445

Open

CohibAA mentioned this issue Mar 27, 2023

Add stopgap directory_nodes #1467

Closed

PulpCattel mentioned this issue Sep 9, 2023

onion messaging is not functioning. #1548

Closed

theborakompanioni mentioned this issue Sep 12, 2023

config: add directory nodes joinmarket-webui/jam-docker#112

Merged

PulpCattel mentioned this issue Jul 8, 2024

Colluding makers with very large bond values using an out of sync bitcoin node? #1716

Open

PulpCattel mentioned this issue Aug 29, 2024

yield generator [ERROR] We failed to connect and handshake with ANY directories; onion messaging is not functioning. #1714

Closed

PulpCattel mentioned this issue Oct 2, 2024

config: add directory node 'odpwaf…kptaid.onion' to defaults #1733

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Directory nodes constantly disconnecting #1435

Directory nodes constantly disconnecting #1435

PulpCattel commented Jan 18, 2023 •

edited

Loading

AdamISZ commented Jan 27, 2023

PulpCattel commented Jan 27, 2023

AdamISZ commented Jan 29, 2023 •

edited

Loading

AdamISZ commented Feb 5, 2023 •

edited

Loading

AdamISZ commented Feb 7, 2023

AdamISZ commented Feb 7, 2023

CohibAA commented Mar 3, 2023 •

edited

Loading

PulpCattel commented Mar 27, 2023

CohibAA commented Mar 27, 2023 •

edited

Loading

Directory nodes constantly disconnecting #1435

Directory nodes constantly disconnecting #1435

Comments

PulpCattel commented Jan 18, 2023 • edited Loading

AdamISZ commented Jan 27, 2023

PulpCattel commented Jan 27, 2023

AdamISZ commented Jan 29, 2023 • edited Loading

AdamISZ commented Feb 5, 2023 • edited Loading

AdamISZ commented Feb 7, 2023

AdamISZ commented Feb 7, 2023

CohibAA commented Mar 3, 2023 • edited Loading

PulpCattel commented Mar 27, 2023

CohibAA commented Mar 27, 2023 • edited Loading

PulpCattel commented Jan 18, 2023 •

edited

Loading

AdamISZ commented Jan 29, 2023 •

edited

Loading

AdamISZ commented Feb 5, 2023 •

edited

Loading

CohibAA commented Mar 3, 2023 •

edited

Loading

CohibAA commented Mar 27, 2023 •

edited

Loading