Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Healthcheck fails with Kamal 2 #1

Closed
kaka-ruto opened this issue Oct 8, 2024 · 6 comments
Closed

Healthcheck fails with Kamal 2 #1

kaka-ruto opened this issue Oct 8, 2024 · 6 comments

Comments

@kaka-ruto
Copy link

It does not seem to work with Kamal 2 - I get this error on deploy basecamp/thruster#42 (comment)

I tested with the endpoint pointing to /up and /healthcheck

But if I comment the mount, kamal deploy works well

@kaka-ruto
Copy link
Author

I think the issue I found is if any of the checks fails, you will not know that it was the cause, it just let's Kamal handle the error reporting as best as it understands - without context from Allgood.

Will be very helpful if the failing healthcheck will be passed onto kamal for reporting

@rameerez
Copy link
Owner

rameerez commented Oct 24, 2024

@kaka-ruto thanks for reporting! I'm testing this in a Rails 8.0.0.rc1 app using Kamal 2.2.2 right now, with the default allgood config, and I think I know why it failed for you by default.

Apparently, in Rails 8 ActiveRecord::Base.connection.active? returns false even for a healthy ActiveRecord connection, if it didn't .connect! before (lazy connection). It will do that even for a connection that is able to read and write to DB in another check (e.g.: ActiveRecord::Base.connection.execute("SELECT 1").any? succeeds)

This was the very first test in the suggested allgood default config in the README, and it fails by default now in Rails 8+. I will edit it.


RE: Kamal reading the allgood error -- unfortunately, HTTP codes is as much as I can return from allgood. 200 OK if all tests were healthy, 503 otherwise. I could add more semantics by returning a JSON payload with it, or a similar solution, but I'm afraid Kamal only reads the HTTP headers, and we'd need to PR some mechanism for Kamal to pick up on that payload and display it in the kamal deploy logs

Not sure how to proceed other than improving the first default test from the README and adding a warning for Kamal users. Do you have any ideas?

@kaka-ruto
Copy link
Author

Hi @rameerez , thanks for the check!

Yeah, it's a two way solution isn't it - Kamal needs a protocol of some sort to accept healthcheck payloads, and I doubt that that's the team direction at the moment.

I think for now the documentation update will suffice. Let's see if Kamal first comes up with a mechanism to accept healthcheck payloads of some sort.

@rameerez
Copy link
Owner

@kaka-ruto thanks for the follow up! I've just improved the docs a bit more with more explicit mentions to Kamal. Hope this is enough! Closing this issue for now, thanks again for reporting!

@nathanvda
Copy link

We recently started deploying with Kamal, and now we actually have a new feature request: it would be nice if we could have a kamal-specific /up endpoint (just the basics, has the deployment worked) and another more complete check for our uptime-robot (e.g. are api calls being processed --we do not need this complete test before the deployment).

I had a quick look at the documentation and code, and I am not sure how we could extend allgood to provide this? So either provide two end-points with specific checks, or make checks conditional based on a parameter or something similar? If you have an approach in mind, I am willing to help with the implementation.

@rameerez
Copy link
Owner

@nathanvda there's no need to have allgood override the Kamal default /up endpoint – you can have both!

What I have in the routes.rb of all my projects is:

get "up" => "rails/health#show", as: :rails_health_check
mount Allgood::Engine => '/healthcheck'

This way, I keep the default /up route provided by Rails to inform Kamal whether the deployment succeeded or not; and the /healthcheck route (allgood) also exists to monitor that the app business logic is working as expected.

Does this solve your question? Or am I missing some detail?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants