Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatic reconnection on network failure #312

Open
6 of 9 tasks
Keruspe opened this issue Nov 13, 2020 · 11 comments
Open
6 of 9 tasks

Automatic reconnection on network failure #312

Keruspe opened this issue Nov 13, 2020 · 11 comments

Comments

@Keruspe
Copy link
Collaborator

Keruspe commented Nov 13, 2020

Now that we have the topology API, here are the steps required towards automatic reconnection:

  • add an Option<Consumer> to the wrapped basic_consume, set it to None from the public wrapper, store the Option in the state and use it to restore everything in the given Consumer if we got Some.
  • introduce an InternalTopology that stores Connection/Channel/Consumer objects alongside the topology items
  • Add some conversion between InternalTopology and Topology; dropping the associated items
  • change topology methods to return the Internaltopology, and make the current one use that and convert to public Topology.
  • in the same way, introduce some restore_internal, make restore use it, and pass the Options stored in the InternalTopology to basic_consume and friends
  • hook up basic get in InternalTopology + restore_internal
  • add an Option<Channel> to the channel creation to share internals with the Channel we want to restore, set it to None, but use it when finalizing if it's Some.
  • add an Option<InternalTopology> set to None to the connection process, and use it when it's some to restore_internal
  • detect network failure from the event loop, and instead of bubbling it up, call topology_internal, reinitiate connection with Some(InternalTopology)
@robo-corg
Copy link

Would you be interested in a PR for this?

@Keruspe
Copy link
Collaborator Author

Keruspe commented Mar 17, 2021

Sure.
Otherwise I plan to work on this this summer once 2.0 is out

@kageru
Copy link

kageru commented Sep 19, 2022

Any update on this?

Automatic reconnects would be really useful for me. I’d even try to contribute if something specific is missing.

@Ks89
Copy link

Ks89 commented Dec 29, 2022

I'm also interested on this feature

@Keruspe
Copy link
Collaborator Author

Keruspe commented Dec 29, 2022

I'd be willing to take some sponsorship to work on this

@TroyKomodo
Copy link

@carlhoerberg
Copy link

I'd be willing to take some sponsorship to work on this

We're willing to sponsor this, plz email me at [email protected]

@Keruspe
Copy link
Collaborator Author

Keruspe commented Jul 11, 2024

Progress is being made on this front, initial version should be coming this summer

@Keruspe
Copy link
Collaborator Author

Keruspe commented Aug 4, 2024

Small update on this front:
I've slightly reworked my approach for this now that I could actually spend time on this (thanks to @carlhoerberg and CloudAMQP support).
I fixed a few bugs in the TCP loop that will be required for this to work properly.
I'm working on handling AMQP "soft" errors (e.g. errors local to 1 channel) to first be able to properly implement recovery of one channel and get it more easily tested.
Once channel recovery is done, I'll move on to AMQP "hard" errors, that are global to the connection, to ensure we properly recover all channels too.
Then the last step will be to trigger the recovery for other errors too (Such as TCP errors).
I will create the associated issues, but the Channel part (which is fundamental for the other parts to properly work) should be done before end of summer. Issuing a passive queue declare on a non existing queue on a channel will probably be the easiest way of testing this, as it triggers a channel error.

@conioX
Copy link

conioX commented Oct 4, 2024

Any news about this?

@Keruspe
Copy link
Collaborator Author

Keruspe commented Oct 12, 2024

I'm sorry about this, last two months were a lot... rougher than anticipated. All current progress can be tracked in #416.
I'm still first focusing on channel recovery, and I'll get to connection recovery once this is stabilized.
Currently, the publishing part works pretty well and I'm confident in the implementation.
I want to hook up some topology recovery (tmp queues recreation and so on) .
The consumer part is trickier, but parts of it are already there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants