Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support acme ssl certificates #636

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

heuels
Copy link
Contributor

@heuels heuels commented Jan 3, 2025

A stab at #88 🙌🏻

I've made changes to the routeros_system_certificate provider to support requesting a new ACME certificate.
I'm not very fluent at Go, so any and all comments are very appreciated!

Prerequisites

  • The device must be accessible on port 80 of the domain name referenced in common_name.
  • The device must have www service enabled.

Limitations

  • The certificate returned by the resource is not guaranteed to be the one that was created. This is simply because RouterOS does not tell us anything about the generated certificate, so we can only match one using common_name.
  • It theoretically might take the ACME server more than 60 seconds to validate ownership of the domain and issue a certificate. I'm guessing that in this case the TF run would fail but certificate request would not, and one would need to import the certificate manually into the state.
  • name and common_name must both be present in the resource and must match, because when using ACME, name is generated and always equals to common_name.
  • After manually importing a resource one would need to run apply once to add acme_ssl_certificate {} block to it.

Examples

Example with Let's Encypt:

resource "routeros_system_certificate" "letsencrypt" {
  acme_ssl_certificate {}
  common_name = "example.com"
  name        = "example.com"
}

Example with Let's Encypt Staging:

resource "routeros_system_certificate" "letsencrypt_staging" {
  acme_ssl_certificate {
    directory_url = "https://acme-staging-v02.api.letsencrypt.org/directory"
  }

  common_name = "example.com"
  name        = "example.com"
}

@heuels heuels requested a review from a team as a code owner January 3, 2025 16:10
@heuels heuels force-pushed the resource_system_certificate_acme branch from 7616856 to a456bd0 Compare January 3, 2025 16:35
@vaerh
Copy link
Collaborator

vaerh commented Jan 3, 2025

That's great! I immediately see a few things I want to clarify and try to fix. So I have a suggestion for you: will you have the energy to test and finalize the interaction with ACME next week?
I guess I'll have to do an interim release and refactoring for PR #633 for now. After that, I would take the certificates.

@heuels
Copy link
Contributor Author

heuels commented Jan 3, 2025

@vaerh sure thing, happy to help!

@vaerh
Copy link
Collaborator

vaerh commented Jan 14, 2025

@heuels
I am so sorry! I've been sick and I'm still recovering. As soon as I get my backlog sorted out, I'll continue to do the provider.
I apologize again.

@vaerh
Copy link
Collaborator

vaerh commented Jan 17, 2025

Hi!
I've had another look at the added code and requirements for successfully obtaining certifications from LE and I have an urge to do some thinking out loud.
Maybe it makes sense to move the code for LE to a separate module in order to be able to immediately check the availability of port 80 and implement timeouts?
Maybe we should do something similar to waiting for the container to start. But MT's handling of certificates leaves a lot to be desired, as I've also encountered certificate identification issues.

@heuels
Copy link
Contributor Author

heuels commented Jan 20, 2025

@vaerh to be honest that feels like giving too much responsibility to this module. I think that the user should be the one responsible for opening the port (by inserting a firewall rule), making sure that the www service is running and available, and then closing the port.

Consider the use-case when the company is running their own on-premises ACME server. The module could not possibly know which interfaces should www be available on, and making it available on every interface (incl. public-facing ones) may present a security risk.

@vaerh
Copy link
Collaborator

vaerh commented Jan 22, 2025

@heuels
I didn't seem to be able to express my point properly and I'll try again.
I propose to make a separate resource to check if the www service is available by a simple request to the MT. If it is disabled there is no point in trying to issue a certificate, we need to notify the user with an error. And he will make his own decision how to enable and configure www.
I also suggest to put the logic in a separate resource, as we need to take into account timeouts that may occur when getting a certificate.

@vaerh
Copy link
Collaborator

vaerh commented Jan 29, 2025

Hi @heuels , I need your help.
I have made a draft of the resource, but I am left with some unanswered questions (due to network configuration I was not able to issue a certificate from LE):

  • Can you post the output of curl -ku user https://router/rest/certificate | jq for a certificate from ACME?
  • Which commands should be used to remove the certificate remove / issued-revoke or a combination of both?
  • How long can it take to create a certificate (maximum)?

@heuels
Copy link
Contributor Author

heuels commented Jan 30, 2025

@vaerh I will check these today and report back 🫡

@heuels
Copy link
Contributor Author

heuels commented Jan 30, 2025

@vaerh

  1. A request to the https endpoint without a valid certificate would fail:
$ curl -ku user https://router/rest/certificate | jq
Enter host password for user 'user':
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
curl: (35) LibreSSL/3.3.6: error:1404B410:SSL routines:ST_CONNECT:sslv3 alert handshake failure
  1. Here is the response from the http endpoint:
$ curl -ku mgmt http://10.21.99.1/rest/certificate | jq
Enter host password for user 'mgmt':
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  5939  100  5939    0     0   100k      0 --:--:-- --:--:-- --:--:--   99k
[
  {
    ".id": "*9",
    "akid": "bbbcc347a5e4bca9c6c3a4720c108da235e1234",
    "common-name": "router.lan",
    "crl": "false",
    "days-valid": "89",
    "digest-algorithm": "sha256",
    "expires-after": "12w5d23h57s",
    "fingerprint": "5429607b9ec43ffe030db58552d5b852f53117c5d85e5dd9f592c5aae8671234",
    "invalid-after": "2025-04-30 16:13:22",
    "invalid-before": "2025-01-30 16:13:23",
    "issuer": "C=US,O=Let's Encrypt,CN=R10",
    "key-size": "2048",
    "key-type": "rsa",
    "key-usage": "digital-signature,key-encipherment,tls-server,tls-client",
    "name": "router.lan",
    "private-key": "true",
    "serial-number": "04a2b4e83a28b703ba33d57dec1bd8071234",
    "skid": "ba0edfc4db7b98a6ed5be9d97b040c9699c21234",
    "subject-alt-name": "DNS:router.lan",
    "trusted": "true"
  }
]
  1. I think remove is the proper command. Since a LE certificate was not issued by the router, issued-revoke fails:
[user@MikroTik] /certificate> issued-revoke router.lan
failure: Not an issued certificate!
  1. I haven't found anything in the official docs for LE, but I was able to find issues with other ACME providers where the process took more than 10 minutes. Maybe a user-supplied timeout would be best? And a 60-second default seems like a reasonable fallback, the process never took more than ~30 seconds for me.

@vaerh
Copy link
Collaborator

vaerh commented Jan 30, 2025

Thanks for the info!

When implementing timeouts for ipsec I encountered the unpleasant thing that the REST session timeout is only 60 seconds. Always.
There is an implementation of timeout on container startup I need to look at, but it's not my code. Need to investigate.

Can you check if the new resource will work?

@vaerh
Copy link
Collaborator

vaerh commented Feb 3, 2025

@heuels
When you have free time, please test the PR changes.

@heuels
Copy link
Contributor Author

heuels commented Feb 6, 2025

@vaerh apologies, bit of a hectic week. I will have time to test this on Monday 🙏🏻

Comment on lines +50 to +57
"challenge_type": {
Type: schema.TypeString,
Optional: true,
ForceNew: true,
Description: "ACME challenge.",
ValidateFunc: validation.StringInSlice([]string{"cloud-dns", "http-01"}, false),
DiffSuppressFunc: AlwaysPresentNotUserProvided,
},
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think RouterOS supports DNS challenge?

Here is a full list of supported parameters: https://help.mikrotik.com/docs/spaces/ROS/pages/2555969/Certificates#Certificates-Serverproperties

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a slightly modified type parameter, I can leave the original name.

изображение

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh, interesting!

@heuels
Copy link
Contributor Author

heuels commented Feb 10, 2025

@vaerh I tried a couple of times, and the module execution consistently failed for me, however the certificate every time was successfully created. Here is an example execution log:

$ tofu apply
╷
│ Warning: Provider development overrides are in effect
│ 
│ The following provider development overrides are set in the CLI configuration:
│  - terraform-routeros/routeros in ~/Projects/terraform-provider-routeros
│ 
│ The behavior may therefore not match any released version of the provider and applying changes may cause the
│ state to become incompatible with published releases.
╵

OpenTofu used the selected providers to generate the following execution plan. Resource actions are indicated
with the following symbols:
  + create

OpenTofu will perform the following actions:

  # module.example_letsencrypt.routeros_system_certificate_acme.letsencrypt will be created
  + resource "routeros_system_certificate_acme" "letsencrypt" {
      + akid             = (known after apply)
      + common_name      = (known after apply)
      + country          = (known after apply)
      + crl              = (known after apply)
      + days_valid       = (known after apply)
      + digest_algorithm = (known after apply)
      + directory_url    = "https://acme-staging-v02.api.letsencrypt.org/directory"
      + dns_name         = "gw1.example.com"
      + expired          = (known after apply)
      + expires_after    = (known after apply)
      + fingerprint      = (known after apply)
      + id               = (known after apply)
      + invalid_after    = (known after apply)
      + invalid_before   = (known after apply)
      + issued           = (known after apply)
      + issuer           = (known after apply)
      + key_size         = (known after apply)
      + key_type         = (known after apply)
      + key_usage        = (known after apply)
      + locality         = (known after apply)
      + name             = (known after apply)
      + organization     = (known after apply)
      + private_key      = (known after apply)
      + serial_number    = (known after apply)
      + skid             = (known after apply)
      + state            = (known after apply)
      + subject_alt_name = (known after apply)
      + trusted          = (known after apply)
      + unit             = (known after apply)
    }

Plan: 1 to add, 0 to change, 0 to destroy.

Do you want to perform these actions?
  OpenTofu will perform the actions described above.
  Only 'yes' will be accepted to approve.

  Enter a value: yes

module.example_letsencrypt.routeros_system_certificate_acme.letsencrypt: Creating...
module.example_letsencrypt.routeros_system_certificate_acme.letsencrypt: Still creating... [10s elapsed]
module.example_letsencrypt.routeros_system_certificate_acme.letsencrypt: Still creating... [20s elapsed]
module.example_letsencrypt.routeros_system_certificate_acme.letsencrypt: Still creating... [30s elapsed]
module.example_letsencrypt.routeros_system_certificate_acme.letsencrypt: Still creating... [40s elapsed]
╷
│ Error: resource no longer exists
│ 
│   with module.example_letsencrypt.routeros_system_certificate_acme.letsencrypt,
│   on ../_modules/example_letsencrypt/main.tf line 12, in resource "routeros_system_certificate_acme" "letsencrypt":
│   12: resource "routeros_system_certificate_acme" "letsencrypt" {
│ 
╵

@vaerh
Copy link
Collaborator

vaerh commented Feb 10, 2025

Yeah, that's a problem. I can't verify the work with LE.

Can you run the execution with environment variables ROS_LOG_COLOR=1 TF_LOG_PROVIDER=debug tofu apply?

@heuels
Copy link
Contributor Author

heuels commented Feb 10, 2025

@vaerh here you go:

https://gist.github.com/heuels/003771c8524e29ae87c90d63fc1c1832

@vaerh
Copy link
Collaborator

vaerh commented Feb 10, 2025

Interesting sausage chain coming in :) I'm taking some time to figure it out.

@vaerh
Copy link
Collaborator

vaerh commented Feb 14, 2025

I experimented on my local ACME and got the same result. In principle, we can try to process it, although we won't be able to use the code that is used to wait for containers to start and stop.
But as a result of experiments I got an extremely unpleasant thing: I got two identical certificates. And the problem is that I can't automatically select one of them. There is no unique parameter by which you can identify the resource.
I'm still thinking about what to do about it.

изображение

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants