Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Caddy hangs when adding new config via API (after hundreds of successful uses over years) #6844

Open
9 of 10 tasks
coolaj86 opened this issue Feb 14, 2025 · 2 comments
Open
9 of 10 tasks
Labels
help wanted 🆘 Extra attention is needed

Comments

@coolaj86
Copy link
Contributor

coolaj86 commented Feb 14, 2025

Tested with v2.7.5, v2.8.4, and v2.9.1.

I'm using Caddy as a reverse proxy for ssh and https to hundreds of LXCs and VMs.
My ~/.config/caddy/autosave.json is approaching 200kb.

Most of the configs are very similar in nature - just the domain name and instance IP are changed in various places. In fact, the last 30+ configs have been exactly the same.

Over the past few years an API call would occasionally hang and I'd have to make the request again, but it's been mostly set it and forget it.

Starting tonight - with no changes to the caddy server or API config templates, which have been working for months - the failure rate (hanging) suddenly became so high that I can no longer use the API for a complete set of changes.

I've include my API calls below. Caddy will hang on any of them, but it won't make it through more than 3 before hanging and never completing. I have not been able to get it to complete a full set of changes, even after restarts and upgrades.

Loading the full config from scratch via API has always been hit or miss - so I typically just add each piece individually as shown in the curl examples below.

If I load a barebones template of ~/.config/caddy/autosave.json and then load the most recent backup from the API, that works. That may just have been the luck of the draw, or it may indicate that the number of configs and size of the file is related to the hanging.

Troubleshooting

  • check caddy log for indication of failure
  • the disk is not anywhere near full (36% used)
  • plenty of RAM (8gb, but only ~1gb is used currently)
  • plenty of CPU (5% - 10% on 4 vCPUs)
  • permissions have not changed (-rw-r--r--)
  • run API calls from localhost (in case another network issue is the culprit)
  • nowhere near the Let's Encrypt rate limits
  • full stop and start of caddy doesn't fix the issue
  • updated to the latest go, xcaddy, and caddy
  • I'm too afraid to completely wipe ~/.local/share/caddy
    (a restart after that would put us over the Let's Encrypt limits)

Example Config

This omits a fair amount of boilerplate from the initial setup - which has not changed and has been working for a long, long time - but rather represents the config that is added for new LXCs and VMs.

{
  "apps": {
    "http": {
      "servers": {
        "srv443": {
          "listener_wrappers": [
            {
              "routes": [
                {
                  "handle": [
                    {
                      "@id": "project-foo-1_lxcs_example_com_tls_routing",
                      "handler": "subroute",
                      "routes": [
                        {
                          "handle": [
                            {
                              "connection_policies": [
                                {
                                  "alpn": [
                                    "http/1.1"
                                  ]
                                }
                              ],
                              "handler": "tls"
                            },
                            {
                              "handler": "subroute",
                              "routes": [
                                {
                                  "handle": [
                                    {
                                      "handler": "proxy",
                                      "upstreams": [
                                        {
                                          "@id": "project-foo-1_lxcs_example_com_tls_proxy_ssh",
                                          "dial": [
                                            "10.11.5.132:22"
                                          ]
                                        }
                                      ]
                                    }
                                  ],
                                  "match": [
                                    {
                                      "ssh": {}
                                    }
                                  ]
                                },
                                {
                                  "match": [
                                    {
                                      "http": [
                                        {
                                          "host": [
                                            "project-foo-1.lxcs.example.com"
                                          ]
                                        }
                                      ]
                                    }
                                  ]
                                }
                              ]
                            }
                          ],
                          "match": [
                            {
                              "tls": {
                                "sni": [
                                  "project-foo-1.lxcs.example.com"
                                ]
                              }
                            }
                          ]
                        }
                      ]
                    }
                  ]
                }
              ]
            }
          ],
          "routes": [
            {
              "@id": "project-foo-1_lxcs_example_com_http_routing",
              "handle": [
                {
                  "handler": "subroute",
                  "routes": [
                    {
                      "handle": [
                        {
                          "handler": "reverse_proxy",
                          "headers": {
                            "request": {
                              "set": {
                                "Host": [
                                  "project-foo-1.lxcs.example.com"
                                ]
                              }
                            }
                          },
                          "upstreams": [
                            {
                              "@id": "project-foo-1_lxcs_example_com_http_proxy_ip",
                              "dial": "10.11.5.132:3080"
                            }
                          ]
                        }
                      ]
                    }
                  ]
                }
              ],
              "match": [
                {
                  "host": [
                    "project-foo-1.lxcs.example.com"
                  ]
                }
              ],
              "terminal": true
            }
          ]
        }
      }
    },
    "tls": {
      "automation": {
        "policies": [
          {
            "@id": "project-foo-1_lxcs_example_com_tls_policy",
            "subjects": [
               "project-foo-1.lxcs.example.com"
            ]
          }
        ]
      },
      "certificates": {
        "automate": [
           "project-foo-1.lxcs.example.com"
        ]
      }
    }
  }
}

Example API Calls

fn_add_tls_policy() { (
    my_lxc_id="${1:-}"
    my_lxc_domain="${2:-}"

    curl --fail-with-body -sS --proto '=https' --tlsv1.2 \
        -u "${CADDY_USER}:${CADDY_PASS}" \
        -X POST \
        "${CADDY_HOST}/config/apps/tls/automation/policies/..." \
        -H "Content-Type: application/json" \
        --data-binary '
          [{
            "@id": "'"${my_lxc_id}"'_tls_policy",
            "subjects": [ "'"${my_lxc_domain}"'" ]
          }]
        '
); }

fn_add_tls_automation() { (
    #my_lxc_id="${1:-}"
    my_lxc_domain="${2:-}"

    curl --fail-with-body -sS --proto '=https' --tlsv1.2 \
        -u "${CADDY_USER}:${CADDY_PASS}" \
        -X POST \
        "${CADDY_HOST}/config/apps/tls/certificates/automate/..." \
        -H "Content-Type: application/json" \
        --data-binary '["'"${my_lxc_domain}"'"]'
); }

fn_add_tls_routing() { (
    my_lxc_id="${1:-}"
    my_lxc_domain="${2:-}"
    my_lxc_ip="${3:-}"

    curl --fail-with-body -sS --proto '=https' --tlsv1.2 \
        -u "${CADDY_USER}:${CADDY_PASS}" \
        -X POST \
        "${CADDY_HOST}/config/apps/http/servers/${CADDY_SRV}/listener_wrappers/0/routes/0/handle/..." \
        -H "Content-Type: application/json" \
        --data-binary '
        [{
          "@id": "'"${my_lxc_id}"'_tls_routing",
          "handler": "subroute",
          "routes": [
            {
              "match": [ { "tls": { "sni": [ "'"${my_lxc_domain}"'" ] } } ],
              "handle": [
                {
                  "handler": "tls",
                  "connection_policies": [ { "alpn": [ "http/1.1" ] } ]
                },
                {
                  "handler": "subroute",
                  "routes": [
                    {
                      "match": [ { "ssh": {} } ],
                      "handle": [
                        { "handler": "proxy", "upstreams": [ {

                            "@id": "'"${my_lxc_id}"'_tls_proxy_ssh",
                            "dial": [ "'"${my_lxc_ip}:22"'" ] } ] }
                      ]
                    },
                    {
                      "match": [ { "http": [ { "host": [ "'"${my_lxc_domain}"'" ] } ] } ]
                    }
                  ]
                }
              ]
            }
          ]
        }]'
); }

fn_add_http_handler() { (
    my_lxc_id="${1:-}"
    my_lxc_domain="${2:-}"
    my_lxc_ip="${3:-}"
    my_tcp_port="${4:-}"

    curl --fail-with-body -sS --proto '=https' --tlsv1.2 \
        -u "${CADDY_USER}:${CADDY_PASS}" \
        -X POST \
        "${CADDY_HOST}/config/apps/http/servers/${CADDY_SRV}/routes/..." \
        -H "Content-Type: application/json" \
        --data-binary '
        [{
          "@id": "'"${my_lxc_id}"'_http_routing",
          "match": [ { "host": [ "'"${my_lxc_domain}"'" ] } ],
          "terminal": true,
          "handle": [
            {
              "handler": "subroute",
              "routes": [
                {
                  "handle": [
                    {
                      "handler": "reverse_proxy",
                      "headers": {
                        "request": {
                          "set": { "Host": [ "'"${my_lxc_domain}"'" ] }
                        }
                      },
                      "upstreams": [ {
                        "@id": "'"${my_lxc_id}"'_http_proxy_ip",
                        "dial": "'"${my_lxc_ip}:${my_tcp_port}"'" } ]
                    }
                  ]
                }
              ]
            }
          ]
        }]
        '
); }

XCaddy

This is the script that I use to build caddy. I just did a fresh build with the latest xcaddy, go, and caddy today.

#!/bin/sh
set -e
set -u

export CGO_ENABLED=0
export GOOS=linux
export GOARCH=amd64
#export XCADDY_SETCAP=1
export XCADDY_SUDO=0
#export XCADDY_SKIP_CLEANUP=1

#my_branch="v2.7.5"
#my_branch="v2.8.4"
my_branch="v2.9.1"
my_out="caddy-${my_branch}-bnna"

(
    cd /tmp || exit 1
    xcaddy build ${my_branch} \
        --output ./"${my_out}" \
        --with github.com/caddy-dns/namedotcom \
        --with github.com/mholt/caddy-l4/layer4 \
        --with github.com/mholt/caddy-l4/modules/l4tls \
        --with github.com/mholt/caddy-l4/modules/l4subroute \
        --with github.com/mholt/caddy-l4/modules/l4http \
        --with github.com/mholt/caddy-l4/modules/l4ssh \
        --with github.com/mholt/caddy-l4/modules/l4proxy

    echo "Built $(pwd)/${my_out}"
)

Example full config reload

This hangs forever.

#!/bin/sh
set -e
set -u

my_file="${1:-}"
if test -z "${my_file}"; then
        echo >&2 ""
        echo >&2 "USAGE"
        echo >&2 "    caddy-load ./caddy.json"
        echo >&2 ""
        exit 1
fi

echo "start loading"
curl -X POST "http://localhost:2019/load" \
        -H "Content-Type: application/json" \
        -d @"${my_file}"
echo "done loading"
@coolaj86
Copy link
Contributor Author

@mholt Are other customers using caddy's API with hundreds of domains or more?
(I would assume so, though perhaps not with layer4 routing, but probably with similar use of adding TLS automation)

Is there a particular set of circumstances that might be causing a lock?

Is there a more preferred way to dynamically add sites - such as editing the config through other means and sending a SIGHUP or SIGUSR2?

@mholt
Copy link
Member

mholt commented Feb 16, 2025

Sorry AJ, been a busy weekend and I'm still catching up on the latest issues.

I have to look into this some more, but, real quick: yes, there are some servers using the API with tens of thousands of domains.

The only way to change the config is via the API (the CLI also uses the API).

If something is hanging, it's quite possible there's faulty locking logic, but it could be a lot of things really. If it's faulty locking, then a full goroutine stack dump (see https://caddyserver.com/docs/profiling) would reveal this by showing a relevant stack trace in the semacquire state (I think). A relevant stack trace would likely be one in the admin-API-related code.

In terms of a quick resolution, I'd start looking into things there. I will try to get around to it soon but ECH is keeping me busy at the moment...

@mholt mholt added the help wanted 🆘 Extra attention is needed label Feb 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted 🆘 Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants