-
Notifications
You must be signed in to change notification settings - Fork 445
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: DNS-01 ACME validation failures do not mark challenge as invalid #1365
Comments
I've done a quick prototype of a fix for this which I can share if desired. It's not very forgiving, though - it fails if dns01validate() fails once, and given that validation checks are performed synchronously when clients answer challenges, it's not entirely clear how to make this retriable. Ideally validation would happen asynchronously, but that'd be a much larger architectural change :/ Possibly the fix is to also perform validation (if the authz is not already positively valid/invalid) when a client polls the status of an authorization and add a retry counter (or a "give up after" deadline) to either Challenge or Authorization. It's a little weird to have a state-changing thing happen as the result of a POST-as-GET, but given that the spec requires that clients poll for the state transition, we might be able to get away with it. |
Thanks Anna! We'll take a look at this in our open source triage meeting tomorrow morning and get back to you. |
Thanks so much for agreeing to take a look! I should add that this is critical to our use case: if DNS validation fails, we really do need the challenge and authorization to become invalid so that clients can notice and act accordingly. |
Hi @mholt, We were just discussing this in our open source meeting, and your name came up. Our philosophy with the ACME server is to align to the spec as well as we can, and when the spec is inconclusive we try to align with what the most popular ACME clients actually do in the field. Right now, we handle ACME orders using a simple strategy where DNS challenges never become invalid. How does Thanks, |
ACMEz polls until the authorization is finalized:
Why's that? What becomes of the DNS challenge if it's not validated? Maybe I'm missing something obvious from a server perspective (which I have almost no experience with), but IMO a server should never leave things hanging. It sounds like in this situation the client has initiated the DNS-01 challenge, indicating that the proper record should be in place, so when the server looks it up it should either be validated or it is not. RFC 8555:
IMO this is pretty clear, it either fails or succeeds. I don't see why leaving it hanging would be useful. I think ACMEz would keep polling if no status change occurs. |
Entirely agreed. The spec is pretty clear that validation failure is supposed to make a challenge transition from processing -> invalid when validation of that challenge fails (RFC 8555 7.1.6 p31). Which does bring up another curious issue, which is that step-ca doesn't use the 'processing' status for challenges at all... I have an extremely hacky proof of concept implementation of this working now that stores a validation deadline and a flag indicating whether the challenge has been answered as part of the challenge in the data store. The flag should be replaced by making step-ca use 'processing' and 'pending' per spec, but that's going to be a bigger effort |
FWIW this is what our application (which is using the Certbot Python |
I updated my proof of concept to make challenges start in state The approach I'm taking is to
Currently I'm only doing the additional check at GetAuthorization time for DNS-01 challenges, but this should work for the trinity of upstream challenge types (DNS-01, HTTP-01, ALPN-01). DEVICEATTEST-01 will be more work, since it has an actual payload for the I have confirmed that this strategy both covers the case of "we need validation failures to be positive and invalidating" as well as "sometimes our local DNS setup is flaky and the first query fails", both of which are needed for our use case |
As a style note, it's more than a little confusing that GetAuthorization and GetChallenge can change the state of authorizations and challenges! It's convenient in this case, but was definitely not behavior I expected from functions named GetSomething :/ I'm relatively new to Go, though, so this might be a norm I'm just not familiar with. |
Thank you both. |
I'm quite aware that my implementation is a hack, incidentally; obviously actual asynchronous validation would be ideal, but it's not currently implemented for anything else in step-ca and it would be a heavy lift to add it to this. I don't have unit tests written specifically for it yet, but I do have the rest of the test suite passing. I can try to get a PR up for review sometime next week if people would like to see what I have so far and think this approach is worth pursuing at least in the short term |
Hey @aglasgall 👋 . First off, thanks for opening this issue and I apologize for the long silence in getting back to you. I briefly looked into how the ACME spec and Boulder handle this case. Boulder, as far as I can tell, does indeed mark the challenge as invalid on the first validation failure. The ACME spec, however, seems to be a bit more nuanced (at least in my interpretation). I don't believe there are any We're open to changing this - so that we invalidate challenges as soon as they fail a validation attempt - however, 1) the ACME spec in this matter is only a recommendation, and 2) I'm slightly worried about breaking implementations that have come to rely on infinite retry. Would you be open to sharing a PR for the changes you detailed above? To your point about validating challenges out of band - I left a short comment about this upstream in this issue thread. Basically we had an implementation a handful of years ago and it was our opinion at the time that it would overcomplicate CA logic and startup configuration for very little gain. To your point about the function names being confusing. Yes. It's definitely bad style. The names should be updated to reflect the fact that they can modify the state. |
One thing that some tools like Perhaps if validation fails you could optimistically retry using the authoritative nameserver? |
Steps to Reproduce
Your Environment
step-ca
Version - 0.24.1Expected Behavior
The challenge should eventually transition state to INVALID
Actual Behavior
The challenge stays in PENDING state indefinitely
Additional Context
Looking at
acme/challenge.go
, I see that both of the failure paths in dns01Validate() passmarkInvalid=false
tostoreError()
:and
Either these should fail on the first go or there should be some capped (configurable?) number of retries or retry period after which validation should fail and the challenge marked as invalid.
Contributing
Vote on this issue by adding a 👍 reaction.
To contribute a fix for this issue, leave a comment (and link to your pull request, if you've opened one already).
I'm willing to work to fix this issue, ideally in a way that doesn't involve indefinite bikeshedding about what the retry policy should be :)
The text was updated successfully, but these errors were encountered: