Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Installer fails with Bad Gateway #251

Open
aardjon opened this issue Nov 26, 2024 · 9 comments
Open

Installer fails with Bad Gateway #251

aardjon opened this issue Nov 26, 2024 · 9 comments

Comments

@aardjon
Copy link

aardjon commented Nov 26, 2024

I'm trying to install Consul on Ubuntu 24.04.1 LTS, but the installer fails with:

fatal: [consuldemo.example.com]: FAILED! => changed=false 
  connection: close
  content: |-
    <html>
    <head><title>502 Bad Gateway</title></head>
    <body>
    <center><h1>502 Bad Gateway</h1></center>
    <hr><center>nginx/1.24.0 (Ubuntu)</center>
    </body>
    </html>
  content_length: '166'
  content_type: text/html
  date: Tue, 26 Nov 2024 16:52:30 GMT
  elapsed: 0
  invocation:
    module_args:
      attributes: null
      body: null
      body_format: raw
      ca_path: null
      ciphers: null
      client_cert: null
      client_key: null
      creates: null
      decompress: true
      dest: null
      follow_redirects: safe
      force: false
      force_basic_auth: false
      group: null
      headers: {}
      http_agent: ansible-httpget
      method: GET
      mode: null
      owner: null
      remote_src: false
      removes: null
      return_content: true
      selevel: null
      serole: null
      setype: null
      seuser: null
      src: null
      status_code:
      - 200
      timeout: 30
      unix_socket: null
      unredirected_headers: []
      unsafe_writes: false
      url: http://127.0.0.1
      url_password: null
      url_username: null
      use_gssapi: false
      use_netrc: true
      use_proxy: true
      validate_certs: false
  msg: 'Status code was 502 and not [200]: HTTP Error 502: Bad Gateway'
  redirected: false
  server: nginx/1.24.0 (Ubuntu)
  status: 502
  url: http://127.0.0.1

See installer.log for more log output (with -vvvvvv). Running curl localhost also gets as 502.

I don't see any error in the output. Where can I find the corresponding nginx logfile (it's not in /var/log/)? Can you please support me in solving this problem?

Please note that the installer didn't reach this point in one run due to #233, but got cancelled and continued later (just in case this is important).

Thanks in advance for all support!

@javierm
Copy link
Member

javierm commented Nov 27, 2024

It looks like the application isn't running and nginx can't connect to it 🤔. Could you check the output of:

systemctl --user status puma_consul_production

Replace production with staging if you used the installer with the staging environment, and replace consul with your application name if you changed it when using the installer.

Also, could you check the application logs in ~/consul/current/log/?

@aardjon
Copy link
Author

aardjon commented Nov 28, 2024

Thank you! Here's the output (I just re-ran the installer with the same error):

systemctl --user status puma_consul_production
$ systemctl --user status puma_consul_production
● puma_consul_production.service - Puma HTTP Server for consul (production)
     Loaded: loaded (/home/consul/.config/systemd/user/puma_consul_production.service; enabled; preset: enabled)
     Active: active (running) since Thu 2024-11-28 06:53:31 CET; 10h ago
TriggeredBy: ● puma_consul_production.socket
   Main PID: 1158 (ruby)
      Tasks: 17 (limit: 2261)
     Memory: 394.0M (peak: 398.0M)
        CPU: 2min 22ms
     CGroup: /user.slice/user-1002.slice/[email protected]/app.slice/puma_consul_production.service
             ├─1158 "puma 5.6.9 (unix:///home/consul/consul/current/tmp/sockets/puma.sock)"
             ├─1356 "puma: cluster worker 0: 1158"
             └─1358 "puma: cluster worker 1: 1158"

Nov 28 06:53:31 consuldd systemd[1026]: Started puma_consul_production.service - Puma HTTP Server for consul (production).

(But only when I execute it as the deploy user, root gets Failed to connect to bus: No medium found.)

Also, could you check the application logs in ~/consul/current/log/?

$ ls -l consul/current/log/
total 4
-rw-rw-r-- 1 consul consul 355 Nov 28 17:06 delayed_job.log
-rw-rw-r-- 1 consul consul   0 Nov 28 17:01 production.log

Content of delayed_job.log:

# Logfile created on 2024-11-28 17:01:33 +0100 by logger.rb/v1.6.1
I, [2024-11-28T17:06:34.448922 #51218]  INFO -- : 2024-11-28T17:06:34+0100: [Worker(delayed_job.0 host:consuldemo pid:51218)] Starting job worker
I, [2024-11-28T17:06:35.439041 #51228]  INFO -- : 2024-11-28T17:06:35+0100: [Worker(delayed_job.1 host:consuldemo pid:51228)] Starting job worker

Looks good to me 🤔

journalctl also just shows that the PUMA service is started.

@aardjon
Copy link
Author

aardjon commented Dec 8, 2024

I finally found the nginx logs, they are in /var/log/nginx 😎

access.log content:

127.0.0.1 - - [26/Nov/2024:17:17:59 +0100] "GET / HTTP/1.1" 502 166 "-" "ansible-httpget"
127.0.0.1 - - [26/Nov/2024:17:52:30 +0100] "GET / HTTP/1.1" 502 166 "-" "ansible-httpget"
127.0.0.1 - - [26/Nov/2024:17:58:00 +0100] "GET / HTTP/1.1" 502 166 "-" "Wget/1.21.4"
::1 - - [26/Nov/2024:18:16:58 +0100] "GET / HTTP/1.1" 502 166 "-" "curl/8.5.0"

These are my two tries running the installer, followed by two tries requesting localhost with wget and curl.

The corresponding error.log content:

2024/11/26 17:17:59 [crit] 37784#37784: *1 stat() "/home/consul/consul/current/public//index.html" failed (13: Permission denied), client: 127.0.0.1, server: 192.168.178.110, request: "GET / HTTP/1.1", host: "127.0.0.1"
2024/11/26 17:17:59 [crit] 37784#37784: *1 stat() "/home/consul/consul/current/public/" failed (13: Permission denied), client: 127.0.0.1, server: 192.168.178.110, request: "GET / HTTP/1.1", host: "127.0.0.1"
2024/11/26 17:17:59 [crit] 37784#37784: *1 connect() to unix:/home/consul/consul/shared/tmp/sockets/puma.sock failed (13: Permission denied) while connecting to upstream, client: 127.0.0.1, server: 192.168.178.110, request: "GET / HTTP/1.1", upstream: "http://unix:/home/consul/consul/shared/tmp/sockets/puma.sock:/", host: "127.0.0.1"
2024/11/26 17:52:30 [crit] 47010#47010: *1 stat() "/home/consul/consul/current/public//index.html" failed (13: Permission denied), client: 127.0.0.1, server: 192.168.178.110, request: "GET / HTTP/1.1", host: "127.0.0.1"
2024/11/26 17:52:30 [crit] 47010#47010: *1 stat() "/home/consul/consul/current/public/" failed (13: Permission denied), client: 127.0.0.1, server: 192.168.178.110, request: "GET / HTTP/1.1", host: "127.0.0.1"
2024/11/26 17:52:30 [crit] 47010#47010: *1 connect() to unix:/home/consul/consul/shared/tmp/sockets/puma.sock failed (13: Permission denied) while connecting to upstream, client: 127.0.0.1, server: 192.168.178.110, request: "GET / HTTP/1.1", upstream: "http://unix:/home/consul/consul/shared/tmp/sockets/puma.sock:/", host: "127.0.0.1"
2024/11/26 17:58:00 [crit] 47011#47011: *3 stat() "/home/consul/consul/current/public//index.html" failed (13: Permission denied), client: 127.0.0.1, server: 192.168.178.110, request: "GET / HTTP/1.1", host: "localhost"
2024/11/26 17:58:00 [crit] 47011#47011: *3 stat() "/home/consul/consul/current/public/" failed (13: Permission denied), client: 127.0.0.1, server: 192.168.178.110, request: "GET / HTTP/1.1", host: "localhost"
2024/11/26 17:58:00 [crit] 47011#47011: *3 connect() to unix:/home/consul/consul/shared/tmp/sockets/puma.sock failed (13: Permission denied) while connecting to upstream, client: 127.0.0.1, server: 192.168.178.110, request: "GET / HTTP/1.1", upstream: "http://unix:/home/consul/consul/shared/tmp/sockets/puma.sock:/", host: "localhost"
2024/11/26 18:16:58 [crit] 47011#47011: *5 stat() "/home/consul/consul/current/public//index.html" failed (13: Permission denied), client: ::1, server: 192.168.178.110, request: "GET / HTTP/1.1", host: "localhost"
2024/11/26 18:16:58 [crit] 47011#47011: *5 stat() "/home/consul/consul/current/public/" failed (13: Permission denied), client: ::1, server: 192.168.178.110, request: "GET / HTTP/1.1", host: "localhost"
2024/11/26 18:16:58 [crit] 47011#47011: *5 connect() to unix:/home/consul/consul/shared/tmp/sockets/puma.sock failed (13: Permission denied) while connecting to upstream, client: ::1, server: 192.168.178.110, request: "GET / HTTP/1.1", upstream: "http://unix:/home/consul/consul/shared/tmp/sockets/puma.sock:/", host: "localhost"

So it says "permission denied". File system permissions of the socked file and the public directory:

consul@consuldd:~$ ls -al /home/consul/consul/shared/tmp/sockets/puma.sock
srw-rw-rw- 1 consul consul 0 Dez  3 15:51 /home/consul/consul/shared/tmp/sockets/puma.sock
consul@consuldd:~$ ls -al /home/consul/consul/shared/tmp/ | grep sockets
drwxrwxr-x 2 consul consul 4096 Dez  3 15:51 sockets
consul@consuldd:~$ ls -al /home/consul/consul/current/ | grep public
drwxrwxr-x  3 consul consul   4096 Nov 22 19:52 public
consul@consuldd:~$ ls -al /home/consul/consul/current/public
total 164
drwxrwxr-x  3 consul consul  4096 Nov 22 19:52 .
drwxrwxr-x 14 consul consul  4096 Nov 28 16:58 ..
-rw-rw-r--  1 consul consul   727 Okt 15 16:41 403.html
-rw-rw-r--  1 consul consul   660 Okt 15 16:41 404.html
-rw-rw-r--  1 consul consul   710 Okt 15 16:41 422.html
-rw-rw-r--  1 consul consul   684 Okt 15 16:41 500.html
lrwxrwxrwx  1 consul consul    40 Nov 22 19:52 assets -> /home/consul/consul/shared/public/assets
lrwxrwxrwx  1 consul consul    49 Nov 22 19:52 ckeditor_assets -> /home/consul/consul/shared/public/ckeditor_assets
-rw-rw-r--  1 consul consul 12470 Okt 15 16:41 consul_logo.png
-rw-rw-r--  1 consul consul 97115 Okt 15 16:41 errors_bg.jpg
-rw-rw-r--  1 consul consul  1150 Okt 15 16:41 favicon.ico
drwxrwxr-x  3 consul consul  4096 Nov 22 19:52 machine_learning
-rw-rw-r--  1 consul consul  1500 Okt 15 16:41 maintenance.html
-rw-rw-r--  1 consul consul  4502 Okt 15 16:41 social_media_icon.png
-rw-rw-r--  1 consul consul  4179 Okt 15 16:41 social_media_icon_twitter.png
lrwxrwxrwx  1 consul consul    40 Nov 22 19:52 system -> /home/consul/consul/shared/public/system

Please note that the /home/consul/consul/current/public//index.html file from the error log doesn't exist at all.

@javierm Can you please have a look again? Should the index.html file exist at this point, or is this an expected state? What kind of permissions are we lacking - I checked file access, but maybe the user herself needs some further privileges?

Thanks in advance,
Aardjon

@javierm
Copy link
Member

javierm commented Dec 9, 2024

Hi, @aardjon 😄.

Thanks for the logs (and sorry I forgot to reply two weeks ago 🙏).

Should the index.html file exist at this point, or is this an expected state?

That's fine, the file should not exist. If it existed, it would overwrite the homepage generated by the application.

I guess it's OK if nginx looks for that file and, if it doesn't find it, forwards the request to Puma (which is what we want). Having said that, I don't get any references to index.html in the nginx errors when I remove permissions from the Puma socket 🤔. Could you paste the contents of your /etc/nginx/sites-enabled/default, just in case?

The permissions in the Puma socket seem to be just fine. Could you run sudo -u www-data ls -l /home/consul/consul/shared/tmp/sockets/puma.sock and see the results? There's still the chance that nginx can't access some of the folders in that path 🤔. That might explain why it says "Permission denied" when accessing public/index.html in your case but not in my case.

By the way, the command above assumes the user www-data is running the nginx process. You can confirm that running ps aux | grep nginx.

The other thing that comes to mind is that I see requests are done to an IP address instead of a domain. I take it you aren't using a domain and an SSL certificate?

@aardjon
Copy link
Author

aardjon commented Dec 11, 2024

Hi @javierm,

Could you run sudo -u www-data ls -l /home/consul/consul/shared/tmp/sockets/puma.sock and see the results? There's still the chance that nginx can't access some of the folders in that path 🤔

consul@consuldd:~$ sudo -u www-data ls -l /home/consul/consul/shared/tmp/sockets/puma.sock
ls: cannot access '/home/consul/consul/shared/tmp/sockets/puma.sock': Permission denied

Thanks for this hint - it led me to the root cause! www-data indeed wasn't permitted to access /home/consul (permissions were drwxr-x--- - no idea why, other user's homes are accessible 🤔). I changed it to drwxr-x--x (chmod a+x /home/consul) and now it can access the socket. Also, curl localhost now returns some HTML content 🥳. The installer was able to continue and finally succeeded 👍! Thank you very much for your kind support 🙇!

For me, the problem is now solved. But what do you think about adding an explicit permission check to the installer (or maybe it can even adjust them automatically) as a solution to this issue?

Thanks again,
Aardjon

@javierm
Copy link
Member

javierm commented Dec 12, 2024

Hi, @aardjon 😄.

The installer does adjust those permissions 🤔.

- name: Ensure correct permissions of deploy user home directory
  file:
    path: "{{ home_dir }}"
    owner: "{{ deploy_user }}"
    group: "{{ deploy_group }}"
    mode: 0755
    state: directory

Just to be sure, is it possible that the permissions were changed after running the installer? Or that the consul user was created manually? 🤔

@aardjon
Copy link
Author

aardjon commented Dec 21, 2024

Hi @javierm,

please excuse the long time without reply! 🙄

Just to be sure, is it possible that the permissions were changed after running the installer? Or that the consul user was created manually? 🤔

I created the consul user manually, yes - I didn't expect this to be done automatically by the installer. Does it?

Also, I don't think I changed the directory permission on purpose after the installer ran, but of course I cannot say for sure 🤔. After all, I'm not the only one with sudo permissions on that machine. So it's definitely possible that the permissions have been changed later, maybe by accident. I'll keep an eye on it if I ever run the installer on another machine again 😄

@javierm
Copy link
Member

javierm commented Dec 23, 2024

Hi, @aardjon 😄.

I didn't expect this to be done automatically by the installer. Does it?

Yes, the installer automatically creates a user based on the deploy_user variable (by default, deploy) and adds the permissions. Maybe the documenation about it is a bit confusing 🤔. If so, pull requests are welcome 😉.

What's true is that currently the installer doesn't do certain steps (like adding the permissions) if the user was created manually 🤔. Do you know enough about Ansible to open a pull request to adjust this behavior?

@aardjon
Copy link
Author

aardjon commented Jan 2, 2025

Hi @javierm,

I wish you a happy new year (in case you are in a region that just switched years 😅 )!

Yes, the installer automatically creates a user based on the deploy_user variable (by default, deploy) and adds the
permissions. Maybe the documenation about it is a bit confusing 🤔. If so, pull requests are welcome 😉.

At least it wasn't clear to me. I will check if I can improve the documentation. But there's one thing still unclear to me: In the hosts file, I had to configure the user name and SSH parameters for the remote connection (I entered the data for the desired deploy user there). If the user doesn't exist yet, how does the installer connect to the remote site at all 🤔 ?

Do you know enough about Ansible to open a pull request to adjust this behavior?

Sorry, I don't know anything about Ansible, actually this was my very first contact with it at all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants