Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CNI] network mode bridge doesn't allow hairpin #13352

Closed
ygersie opened this issue Jun 13, 2022 · 12 comments
Closed

[CNI] network mode bridge doesn't allow hairpin #13352

ygersie opened this issue Jun 13, 2022 · 12 comments

Comments

@ygersie
Copy link
Contributor

ygersie commented Jun 13, 2022

When spinning up a job that uses CNI to setup the forwarding the container can't reach itself on the host port. This probably isn't a very common use case but when deploying a container that needs to discover itself and its peers through a Consul endpoint we get back the host IP + Port including itself. The connection towards the endpoint that references itself will then not work and gives timeouts. You can reproduce with the following job:

job "ygersie" {
  datacenters = ["dc1"]
  namespace   = "default"

  group "example" {
    count = 1

    network {
      mode = "bridge"
      port "foo" {
        to = 1337
      }
    }

    task "example" {
      driver = "docker"
      config {
        image = "alpine"
        args  = ["nc", "-lk", "-p", "${NOMAD_PORT_foo}", "-e", "cat"]
      }

      resources {
        cpu    = 100
        memory = 64
      }
    }
  }
}

And then from the container a netcat times out:

/ # nc -v ${NOMAD_HOST_IP_foo} ${NOMAD_HOST_PORT_foo}

I compiled a version of Nomad with hairpinMode enabled in the nomadCNIConfigTemplate which resolves the issue.

Can this be made either configureable or enabled by default or is there any particular reason why I wouldn't want this?

@DerekStrickland
Copy link
Contributor

Hi @ygersie

Thanks for reporting this issue, and for providing a potential solution! I'll take a look at what you've got here and then discuss where it might fit in the roadmap. Also, feel free to submit a PR to add the configuration. You might get to it more quickly than we do, and community PRs are always welcome!

@ygersie
Copy link
Contributor Author

ygersie commented Jun 14, 2022

Hi @DerekStrickland

Thanks for the update. Yeah, I'd like to mainly confirm that this wouldn't cause any adverse side effects. Afaict there should not be any implications. If you guys agree I'm happy to push the change to make it a default setting.

@DerekStrickland
Copy link
Contributor

I don't know that we could set it as the default. Some quick research seems to indicate that not all CNI implementations support it. I think the PR would need to default to false but allow user configuration to enable hairpin mode.

@ygersie
Copy link
Contributor Author

ygersie commented Jun 15, 2022

@DerekStrickland it may not be supported by all CNI implementations but this is specifically the one used to setup:

network {
  mode = "bridge"
}

which is used to setup port forwarding using the CNI plugins and also required when using Consul Connect. Afaik it won't affect any other (user supplied) configurations. But if there might be other issues then it definitely needs to be configurable. Please let me know what you think.

@jrasell
Copy link
Member

jrasell commented Jun 15, 2022

Hi @ygersie, hope you're doing well!

I think another thing to keep in mind here is the backwards compatibility and behaviour consistency when updating the built-in CNI configuration. I wonder if we could add a new client configuration parameter similar to client.bridge_network_name and client.bridge_network_subnet named client.bridge_harpin_mode which defaults to false, but allows easy setting if desired?

@DerekStrickland DerekStrickland removed their assignment Jun 17, 2022
@A-Helberg
Copy link

This seems to be required to run most clustering applications.
It seems to be a common pattern that these applications connect to all nodes in their cluster including themselves.

This includes apps like grafana loki, and cassandra.
As @jrasell suggests a config parameter would be quite useful.

The two workarounds are:

  1. Make the changes as @ygersie, suggests and compile it yourself,
  2. Create a second cni bridge. The downside to this is that you loose consul-connect as it only works on network = "bridge"

@A-Helberg
Copy link

Tried my hand at implementing this in
#13834

@ygersie
Copy link
Contributor Author

ygersie commented Jul 19, 2022

Thanks @A-Helberg completely dropped off my radar again.

@johnalotoski
Copy link

Also ran into this issue and saw that hairpinning packets send a SYN and never receive an ACK. Packet tracing logs through the iptables rules didn't seem to reveal anything out of the ordinary. Glad there is a fix/option coming for this, thanks!

@tgross
Copy link
Member

tgross commented Jul 25, 2022

I've left a comment here (#13834 (review)) about whether we should implement this via exposing the CNI config directly (as in #13824), rather than adding another config knob.

@lgfa29
Copy link
Contributor

lgfa29 commented Feb 3, 2023

Closing this one as completed by #15961.

While there have been discussions about a more flexible configuration approach, after further discussion we feel like adding more customization to the default bridge may result in unexpected outcomes that are hard for us to debug. The bridge network mode should be predictable and easily reproducible by the team so we can rely on common standard configuration.

Users that require more advanced customization are able to create their own bridge network using CNI. The main downside of this is that Consul Service Mesh requires network_mode = "bridge", but this is a separate feature request that is being tracked in #8953.

Feel free to 👍 and add more comments there.

Thank you everyone for the feedback!

Copy link

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jan 14, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
Development

No branches or pull requests

7 participants