Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Setting default_sni to a value that won't generate a cert can cause memory exhaustion #6835

Open
p1u3o opened this issue Feb 10, 2025 · 8 comments

Comments

@p1u3o
Copy link

p1u3o commented Feb 10, 2025

If default_sni is set to a value that can't generate a certificate, it seems Caddy will create many instances of "getCertDuringHandshake" whenever a connection with no valid SNI is made, this never seem to finish and cause increased memory growth until eventually killed by the orchestrator.

This was a mis-configuration on my part, I mistakenly deployed a change that led to the HOSTNAME env variable, usually set to the load balancer hostname, becoming set to the docker container id.

{
    default_sni {$HOSTNAME}
}

With debug logging enabled, the log is flooded with this message.

{"level":"debug","ts":1739193344.6278892,"logger":"tls.handshake","msg":"no matching certificates and no custom selection logic","identifier":"<lb ip>"}

Image

Version:
Caddy v2.9.1

Modules:
caddy.storage.consul
dns.providers.cloudflare
dns.providers.powerdns
supervisor
@mholt
Copy link
Member

mholt commented Feb 10, 2025

Hmm, I'm not really sure what to do about this though. We can't know you made a mistake like that, I don't think...

I'm also not really sure what that graph is. What am I looking at? 85 MB memory usage? That's pretty normal when there's traffic. Is there an actual leak?

@p1u3o
Copy link
Author

p1u3o commented Feb 10, 2025

The graph shows the memory usage of getCertDuringHandshake (please correct me if I'm interpreting it wrong?) The number there matched up with the memory usage of the Caddy process when the pprof dump was taken.

Image

Perhaps Caddy could exit if default_sni does not work or throw a warning?

@p1u3o p1u3o closed this as completed Feb 10, 2025
@mholt
Copy link
Member

mholt commented Feb 10, 2025

I see, so that one function is using 9.8 GB instantaneously (i.e. not cumulatively)?

That does seem like a problem... could you grab a profile? https://caddyserver.com/docs/profiling -- heap and goroutine dump would be useful I think.

(Did you mean to close this?)

@p1u3o
Copy link
Author

p1u3o commented Feb 10, 2025

I have the matching heap and goroutine dump.

Are they safe to post?

Edit: Nope, didn't mean to close, not sure how I managed that.

@p1u3o p1u3o reopened this Feb 10, 2025
@mholt
Copy link
Member

mholt commented Feb 10, 2025

Yeah, profiles are technically safe to share.

@p1u3o
Copy link
Author

p1u3o commented Feb 10, 2025

Attached

dumps.zip

@p1u3o p1u3o closed this as completed Feb 10, 2025
@p1u3o
Copy link
Author

p1u3o commented Feb 10, 2025

Reopened, I had a Github helper extension that is breaking the issue page, my bad.

@p1u3o p1u3o reopened this Feb 10, 2025
@mholt
Copy link
Member

mholt commented Feb 13, 2025

Thanks, I'll take a look!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants