-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pubsub not configured Error #46
Comments
I am having the same issue. |
This may be related to phoenixframework/phoenix#3179 |
Chris' reply to this is helpful but not the full story, since Absinthe.Phoenix right now is written to expect you to do In the meantime I highly suggest implementing "readiness" checks alongside your heath checks, where you don't route traffic to your node until it's fully booted. |
@benwilson512 are there any recent recommendations on this? Since I upgraded to Phoenix 1.5, I see the many messages like this in my logs when my server shuts down:
The error seems to happen for my already-connected clients. Not new clients connecting while the app is being shut down. The client is connected to the socket, and the socket itself is fine. The highlighted line seems to be when Only when Phoenix.Endpoint closes the WebSocket connection will the client try to reconnect to a new socket and there they get to a healthy server. Any guidance would be much appreciated. Thanks for Absinthe! :) |
@sebastianseilund What order do you have things in your supervision tree? Have you considered using something like https://github.com/derekkraan/ranch_connection_drainer ? Ideally you don't keep routing traffic to a node that is shutting down. |
Order:
If I put I'm not routing any new traffic to my nodes once shutdown starts. As described above, the error happens in the sockets that are already connected (via WebSocket). This is the order of things:
I haven't looked at https://github.com/derekkraan/ranch_connection_drainer before. Would you put that after |
We are seeing the exact same error that @sebastianseilund posted above, for very similar reasons. |
@benwilson512 I looked into https://github.com/derekkraan/ranch_connection_drainer. It waits for connections to end before letting Endpoint shut itself down. So I don't think it'll fix the issue here since the WebSocket connections stay open. |
Fair. I think the best option here would be to figure out a way to have the supervision tree look like this:
Then the endpoint would shutdown before the pubsub. Let me see what I can do. |
I think that would be amazing 🙏 |
@benwilson512 Is there any movement on this? I am seeing ~150 of these errors daily on my production. |
@lstrzebinczyk the best solution here is to use a load balancer that routes traffic away from your node on deploy. Even if I split out the pubsub from the endpoint, it will simply result in different issues for your clients. If you aren't routing traffic away from your node on deploy then when they are disconnected they may try to reconnect to the same node which will now fail. I will try to find time to make the proposed improvements here this week, but it is merely going to help with error messages, it isn't going to solve the root issue. The root issue is that you are not disconnecting traffic properly from a node that is going down. |
This doesn't happen. Kubernetes does route new traffic away from pods before shutting down the Elixir container. The existing network connections are not terminated by anything in Kubernetes though. When the client reconnects, they'll be sent to one of the new pods (or one of the old ones that haven't begun shutdown yet).
I don't think you'd want to terminate active connections outside of Elixir's control. If you have a regular slow HTTP request, you don't want it to be terminated half-way. That could leave stuff in a bad state, or at least ends up in an unnecessary 5xx error to the client. Just let it drain. When |
That's fair, there are definitely shutdown patterns that make sense for Elixir to lead. OK, I'll prioritize working on that this week. |
any update on this? |
Any updates? |
I'm encountering this as well. I'm curious if there is an update on the horizon similar to what @benwilson512 seemed to have in mind in #46 (comment). |
Experienced this today, and fixed it by adding a supervisor to my application.ex (might have missed it when setting up). Hope this helps. def start(_type, _args) do
import Supervisor.Spec
children = [
AppWeb.Telemetry,
{Phoenix.PubSub, name: App.PubSub},
AppWeb.Endpoint,
supervisor(Absinthe.Subscription, [AppWeb.Endpoint]) -> Added this
]
end |
- Purpose is an experiment to decouple from Phoenix.Endpoint as described in absinthe-graphql#46
- Purpose is an experiment to decouple from Phoenix.Endpoint as described in absinthe-graphql#46
@alvnrapada's fix didn't seem to work on my application when I was testing by doing a rolling restart of a deployment in k8s... so I made this experimental "hack PR" #89 outlined by #46 (comment). Prior to making the change to my kubernetes deployed app indicated by this PR, I too was receiving the "pubsub not configured" error whenever I was triggering a rolling restart in kubernetes as a test. I tried adding tests to the PR but Phoenix.ChannelTest helper pathways seemed to expect an ets backed endpoint configuration, and I wasn't quite sure the best way of tackling this at the moment (open to suggestions). Testing notes:
I will try and put together a demo application esp. if people think it might help in any way with testing (I'd also be curious as to results others might get), but here is the outline of what I did: application.ex
user_socket.ex
endpoint.ex
new decoupled pubsub implementation file
whatever_file_where_a_subscription_is_published.ex
config.exs
In terms of my k8 deployment.yaml config, it might also be worth noting I had a readiness probe and a terminationGracePeriod seconds of 65 like this (abbreviated yaml):
|
we've just started seeing this issue in the last week or so, wondering if we can help at all? |
Has anyone been able to reproduce this issue in a locally running application? That might simplify testing any proposed fixes here. |
It happens in our dev enviroment every time it restarts while running under |
So here's what I'm trying to do to get the supervision tree to end with the endpoint: The supervision tree setup: children = [
{Phoenix.PubSub, [name: MyappWeb.PubSub, adapter: Phoenix.PubSub.PG2]},
{Absinthe.Subscription, MyappWeb.Endpoint},
MyappWeb.Endpoint,
] The Endpoint for my app: defmodule MyappWeb.Endpoint do
# Important that this goes before `use Phoenix.Endpoint`
use MyappWeb.OverridePhoenixEndpoint, otp_app: :myapp
use Phoenix.Endpoint, otp_app: :myapp
use Absinthe.Phoenix.Endpoint
# ...
end The overriding for defmodule MyappWeb.OverridePhoenixEndpoint do
defmacro __using__([otp_app: app_name]) do
quote do
defp pubsub_server! do
server = if :ets.whereis(__MODULE__) != :undefined do
config(:pubsub_server)
else
Application.get_env(unquote(app_name), __MODULE__) |> Keyword.get(:pubsub_server)
end
server || raise ArgumentError, "no :pubsub_server configured for #{inspect(__MODULE__)}"
end
end
end
end This seems to me to be working, but also seems like it's not the greatest idea in the world. Does anyone have any better suggestions or reasons why this is dangerous? |
I'm seeing this error as well in my local environment. I'm attempting to capture process exit's to gracefully shutdown. When I issue a |
I was also able to reproduce sending a sigterm signal to the process running
For reference, sigterm is the signal sent by AWS ECS system to shutdown task containers. "Today, ECS always sends a SIGTERM" (https://aws.amazon.com/blogs/containers/graceful-shutdowns-with-ecs/) |
I was also able to repro in our codebase by sending SIGTERM to the process while there's an active subscription. |
Any news here? It seems that having Subscription and Endpoint causes race conditions and your Supervision tree ends up being unpredictable |
This issue still occurs, though I only get one error every few days running in prod. Anyone found a "neat" fix? |
there is (a bit dirty) fix that we use now in production: prosapient/absinthe@39e5a92 note: it handles SIGTERM (perfect for k8s) - just to understand when pubsub is not configured and when there is a progress with app shutdown. |
@benwilson512 Any movement on this issue? We have been implementing more subscription-based functionality in our app and as a consequence we're seeing this PubSub failure a lot more now when users are connected to the socket during a restart. @bfad 's solution technically does work - but we get a lot of warning logs like
With that solution, killing the app (with
before closing gracefully. |
Sentry catches a lot of
RuntimeError
s sayingPubsub not configured! Subscriptions require a configured pubsub module.
.I could not find out in which cases it is failing, i just see around 600 errors over a timespan of 30 days. I've never experienced the error myself and it only occurs in a low percentage of all the subscriptions.
Full Exception
Versions
Setup
*
Absinthe.Subscription
added toApplication
Absinthe.Phoenix.Socket
in Socket ImplementationAbsinthe.Plug
(viaApolloTracing.Pipeline
)The text was updated successfully, but these errors were encountered: