-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mechanism for editing the nomad0 CNI template #13824
Comments
Hi @the-maldridge! This seems like a reasonable idea, and I've marked it for roadmapping. The CNI configuration we use can be found at One approach we could take here is to allow the administrator to override that template with a config file somewhere on the host. The configuration seems fairly straightforward, but then it's a matter of hunting down anywhere in the client code that has specific assumptions about that template and figuring out how to detect what's the right behavior from there. |
@tgross I like the template idea, it would provide the most flexibility while removing a dependency on a hard-coded string literal, something I always like to do. What do you think about using go:embed to include the default template rather than using the string literal as a means of simplifying the code that loads the various options. I can't remember off the top of my head though what version of Go that was introduced in to know if nomad already targets that version. |
Yup, |
@tgross I think some sort of 'escape hatch' similar to those being used for envoy maybe an option here. If we could pass on some additional 'json' to some partas of the nomad's bridge conflist file, like adding additional plugins to the list, etc. that would make it easier to extend nomad's bridge CNI setup. In my case that would allow for using cilium along with nomad's own bridge. And being able to mix and match Consul Connect enabled services with other policied by cilium. Or even have direct l3 reachability from tasks on different nomad nodes, tunneled by cilium under nomad's bridge. |
Another option that came to my mind could be using something like https://github.com/qntfy/kazaam in order to allow the user to specify some 'json transformations' to apply to normad's bridge CNI config in runtime. This would work like:
While this might not be the most straightforward mean to 'edit' the CNI template, this is probably the most flexible option, and can open a lot of possibilities for sysadmins to integrate nomad's bridge with many different networking systems. Dunno what do you think @tgross.. if this seems 'acceptable' from hashicorp's point of view I could try to hack something. Regards |
@pruiz my major worry with that specific approach is that it introduces a new DSL into the Nomad job spec. Combine that with HCL2 interpolation and Levant/nomad-pack interpolation and that could get really messy. If we were going to allow job operator configuration of the bridge at all, I'm pretty sure we'd want it to be a HCL that generates the resulting JSON CNI config (which isn't all that complex of an object, in any CNI config I've seen at least). That also introduces a separation of duties concern. Right now the cluster administrator owns the bridge configuration to the limited degree we allow that; expanding that configuration is what's been proposed as the top-level issue here. Extending some of that ownership to the job operator blurs that line. Can you describe in a bit more detail (ideally with examples) what kind of configurations you couldn't do with the original proposal here (along with the |
Hi @tgross, I probably miss explained my self a bit. I was not proposing to add the new 'bridge_transform_rules' parameter to nomad's job spec. Just adding it to nomad client/host config.. IMHO, being able to fine-tune bridge's CNI config from job spec would be good, but it opens a lot more issues hard to solve, as bridge instance (and veth's attached to it) should be consistent among jobs for things like Consul Connect to work. However, being able to customize bridge's CNI settings at host-level (ie. from /etc/nomad.d/nomad.hcl) opens (I think) a lot of potential. And keeping it (right now) restricted to cluster admins, makes sense (at least to me), as cluster admin is the one with actual knowledge of the networking & environment where the node lives on. As per the new-DSL issue, I understand your point about adding another sub-DSL to config, but I just dont see how we can apply 'unlimited' modifications to a json document using HCL. Adding some 'variables' to interpolate to the JSON emitted by networking_bridge_linux.go and replace them with new values at /etc/nomad.d/nomad.hcl, seems something workable, but as it happens with other similar approaches, the user N+1 is going to find he needs a new interpolable variable somewhere within the JSON which is not yet provided.. That's why I was looking into something more unrestricted. |
In my use case, for example, my idea would be to mix Consul Connect & Cilium on top of nomad's bridge. In order to do so, my nomad's host config (/etc/nomad.d/nomad.hcl) would include something like:
With this configuration applied on cluster nodes, I could be able to launch jobs using the native bridge (instead of cni/*) which will be able to make mixed use of Consul Connect and Cilium, enabling:
All at the same time and from within the same Task Group. Regards [1] Currently jobs using Cilium (by means of a network=cni/*) cannot use Consul Connect (and vice-versa).. |
That's a really complete and much better phrased explanation and feature matrix than I was typing up @pruiz, it sounds like we have almost identical use cases here. I also think this is something that realistically only a cluster root operator should change, since this is going to involve potentially installing additional packages at the host level to make it work. As to the HCL/JSON issue, what about writing the transforms in HCL and then converting that to the relevant JSON as is already done for jobspecs? It adds implementation complexity for sure, but it also keeps the operator experience uniform, which it sounds like is a primary goal here. |
Ok, I'm glad we're all on the same page then that this belongs to the cluster administrator. So if I tried to boil down the "transformations" proposal a bit, the primary advantage here over simply pointing to a CNI config file is wanting to avoid handling unique-per-host CNI configuration files so that you can do things like IP prefixes per host (as opposed to having host configuration management do it). That seems reasonable given we already have Nomad creating the bridge. You'd still need a source for the per-host configuration though. Suppose we had a 90/10 solution here by supporting a |
Hi @tgross I think the cni_bridge_config_template seems like a good middle point, yes, cause:
And I think this is something everybody can cope with. As for the actual template file to pass to cni_bridge_config_template, I think that could be a plain text file onto which nomad can perform such variable interpolations. Or a consul-template file which nomad can render (passing the variables to consul-template's engine), as nomad already uses consul-template for other similiar stuff. Dunno what do you guys think on this? Last, with regard to interpolation variables, I think nomad could pass at a minimun the same values it is already using when generating bridge's json:
And we could consider exposing as interpolation also (but not sure):
Regards |
Hi everyone 👋 After further discussion we feel like adding more customization to the default Users that require more advanced customization are able to create their own bridge network using CNI. The main downside of this is that Consul Service Mesh currently requires Feel free to 👍 and add more comments there. Thank you everyone for the ideas and feedback! |
Hmm, that's a frustrating resolution as it means that to use consul connect in conjunction with CNI I'd now need to edit every network block in every service template in every cluster, whether or not those tasks used a CNI network previously. At that point it seems like the better option to me is to abandon consul connect entirely and use a 3rd party CNI to achieve a similar result. I'm following the other ticket, but it really doesn't look like any consideration is given there to the default path that nomad comes with out of the box. Any thoughts on how to continue to have working defaults and still enjoy both CNI and Consul Connect? |
@lgfa29 While consul connect is a good solution for common use cases, it is clearly lacking when trying to use it to deploy applications requiring more complex network setups (for example applications requiring direct [non-nat, non-proxied] connections from clients, or clusters requiring flexible connection between nodes on dynamically allocated ports, solutions requiring maxing out the network I/O performance of the host, etc.). For such situations the only option available is to use CNI, but even this is somewhat limited on nomad (ie. CNI has to be setup per host, networking has to be defined on a job-basis and CNI stuff has to be already present and pre-deployed/running on nomad-server before deploying the job, one can not mix connect with custom-CNIs, etc.). And, at the same time, there is no solution for having more than "one networking" (ie. CNI plus bridge) for a single Task, nor there is a clear solution for mixing jobs using Consul Connect and jobs using CNI. This is clearly an issue for nomad users, as this limits Consul Connect to simple use cases, forcing us to deploy anything not (let's say) Consul-Connect-Commpatible outside of nomad, on top of a different solution (for deployment, traffic policying, etc.) and relay on outbound gateway for providing access from nomad's jobs to such 'outside' elements. I understand hashicorp needs a product that can be supported with some clear use case and limits. But at the same time we as community need some extensibility for use cases not needing the be covered by commercial hashicorp support options. That's why the idea of this being a setting for extending the standard nomad feature made sense to me. HashiCorp could simply label this as 'community supported-only' or something like that and focus on enhancing consul connect, but at the same time let the community work around until something better arrives. As stated I was willing to provide a PR for this new feature, but right now, I feel a bit stranded, as I don't really understand why not supporting a use case which on nomad code-base only implies being able to extend the CNI config, and which can be declared 'community supported' if that's a problem for hashicorp's business. I just hope you guys can reconsider this issue. Regards |
I, too, support @pruiz use-case. I had to abandon Hashistack altogether because of Nomad's opinions on CNI. Consul Connect is a good generic solution, but it leaves much to be desired in the flexibility department. I tried to plumb in Cilium using their (deprecated) Consul integration and after a few months I had to bag it. It doesn't seem impossible, but it's beyond my current capabilities. So, yes. What Pablo is proposing doesn't seem unreasonable and I ask HCI to reconsider. |
Hi everyone 👋 Thanks for the feedback. I think I either didn't do a good job explaining myself or completely misunderstood the proposal. I will go over the details and check with the rest of the team again to make sure I have things right. Apologies for the confusion. |
Hi everyone 👋 After a more thorough look into this I want to share what I have observed so far and expand on the direction we're planning to take for Nomad's networking story. The main question I'm trying to answer is:
From my investigation so far I have not been able to find examples where a custom CNI configuration would not be able to accomplish the same results as a the proposed My first test attempted to validate the following:
For this I copied Nomad's bridge configuration from the docs and changed the IP range. mybridge.conflist{
"cniVersion": "0.4.0",
"name": "mybridge",
"plugins": [
{
"type": "loopback"
},
{
"type": "bridge",
"bridge": "mybridge",
"ipMasq": true,
"isGateway": true,
"forceAddress": true,
"ipam": {
"type": "host-local",
"ranges": [
[
{
"subnet": "192.168.15.0/24"
}
]
],
"routes": [
{ "dst": "0.0.0.0/0" }
]
}
},
{
"type": "firewall",
"backend": "iptables",
"iptablesAdminChainName": "NOMAD-ADMIN"
},
{
"type": "portmap",
"capabilities": {"portMappings": true},
"snat": true
}
]
} I then used the following job to test each network. example.nomadjob "example" {
datacenters = ["dc1"]
group "cache-cni" {
network {
mode = "cni/mybridge"
port "db" {
to = 6379
}
}
service {
name = "redis"
port = "db"
provider = "nomad"
address_mode = "alloc"
}
task "redis" {
driver = "docker"
config {
image = "redis:7"
ports = ["db"]
}
}
task "ping" {
driver = "docker"
lifecycle {
hook = "poststart"
sidecar = true
}
config {
image = "redis:7"
command = "/bin/bash"
args = ["/local/script.sh"]
}
template {
data = <<EOF
#!/usr/bin/env bash
while true; do
{{range nomadService "redis"}}
echo "Pinging {{.Address}}:{{.Port}}"
redis-cli -h {{.Address}} -p {{.Port}} PING
{{end}}
sleep 3
done
EOF
destination = "local/script.sh"
}
}
}
group "cache-bridge" {
network {
mode = "bridge"
port "db" {
to = 6379
}
}
service {
name = "redis"
port = "db"
provider = "nomad"
address_mode = "alloc"
}
task "redis" {
driver = "docker"
config {
image = "redis:7"
ports = ["db"]
}
}
task "ping" {
driver = "docker"
lifecycle {
hook = "poststart"
sidecar = true
}
config {
image = "redis:7"
command = "/bin/bash"
args = ["/local/script.sh"]
}
template {
data = <<EOF
#!/usr/bin/env bash
while true; do
{{range nomadService "redis"}}
echo "Pinging {{.Address}}:{{.Port}}"
redis-cli -h {{.Address}} -p {{.Port}} PING
{{end}}
sleep 3
done
EOF
destination = "/local/script.sh"
}
}
}
} I was able to access the allocations from the host via the port mapping, as expected from the default shell$ nomad service info redis
Job ID Address Tags Node ID Alloc ID
example 192.168.15.46:6379 [] 7c8fc26d 4068e3b7
example 172.26.64.135:6379 [] 7c8fc26d f94a4782
$ nc -v 192.168.15.46 6379
Connection to 192.168.15.46 6379 port [tcp/redis] succeeded!
ping
+PONG
^C
$ nc -v 172.26.64.135 6379
Connection to 172.26.64.135 6379 port [tcp/redis] succeeded!
ping
+PONG
^C
$ nomad alloc status 40
ID = 4068e3b7-b4f9-b935-db17-784a693aa134
Eval ID = d60c7ff0
Name = example.cache-cni[0]
Node ID = 7c8fc26d
Node Name = lima-default
Job ID = example
Job Version = 0
Client Status = running
Client Description = Tasks are running
Desired Status = run
Desired Description = <none>
Created = 1m28s ago
Modified = 1m14s ago
Deployment ID = 09df8981
Deployment Health = healthy
Allocation Addresses (mode = "cni/mybridge"):
Label Dynamic Address
*db yes 127.0.0.1:20603 -> 6379
Task "ping" (poststart sidecar) is "running"
Task Resources:
CPU Memory Disk Addresses
48/100 MHz 840 KiB/300 MiB 300 MiB
Task Events:
Started At = 2023-02-07T22:59:47Z
Finished At = N/A
Total Restarts = 0
Last Restart = N/A
Recent Events:
Time Type Description
2023-02-07T22:59:47Z Started Task started by client
2023-02-07T22:59:46Z Task Setup Building Task Directory
2023-02-07T22:59:42Z Received Task received by client
Task "redis" is "running"
Task Resources:
CPU Memory Disk Addresses
17/100 MHz 3.0 MiB/300 MiB 300 MiB
Task Events:
Started At = 2023-02-07T22:59:46Z
Finished At = N/A
Total Restarts = 0
Last Restart = N/A
Recent Events:
Time Type Description
2023-02-07T22:59:46Z Started Task started by client
2023-02-07T22:59:45Z Task Setup Building Task Directory
2023-02-07T22:59:42Z Received Task received by client
$ nc -v 127.0.0.1 20603
Connection to 127.0.0.1 20603 port [tcp/*] succeeded!
ping
+PONG
^C
$ nomad alloc status f9
ID = f94a4782-d4ad-d0e9-ced7-de90c1cfadf3
Eval ID = d60c7ff0
Name = example.cache-bridge[0]
Node ID = 7c8fc26d
Node Name = lima-default
Job ID = example
Job Version = 0
Client Status = running
Client Description = Tasks are running
Desired Status = run
Desired Description = <none>
Created = 1m50s ago
Modified = 1m35s ago
Deployment ID = 09df8981
Deployment Health = healthy
Allocation Addresses (mode = "bridge"):
Label Dynamic Address
*db yes 127.0.0.1:20702 -> 6379
Task "ping" (poststart sidecar) is "running"
Task Resources:
CPU Memory Disk Addresses
51/100 MHz 696 KiB/300 MiB 300 MiB
Task Events:
Started At = 2023-02-07T22:59:47Z
Finished At = N/A
Total Restarts = 0
Last Restart = N/A
Recent Events:
Time Type Description
2023-02-07T22:59:47Z Started Task started by client
2023-02-07T22:59:47Z Task Setup Building Task Directory
2023-02-07T22:59:42Z Received Task received by client
Task "redis" is "running"
Task Resources:
CPU Memory Disk Addresses
14/100 MHz 2.5 MiB/300 MiB 300 MiB
Task Events:
Started At = 2023-02-07T22:59:47Z
Finished At = N/A
Total Restarts = 0
Last Restart = N/A
Recent Events:
Time Type Description
2023-02-07T22:59:47Z Started Task started by client
2023-02-07T22:59:46Z Task Setup Building Task Directory
2023-02-07T22:59:42Z Received Task received by client
$ nc -v 127.0.0.1 20702
Connection to 127.0.0.1 20702 port [tcp/*] succeeded!
ping
+PONG
^C So it seems to be possible to have a custom bridge network based off Nomad's default that behaves the same way, with the exception of some items that I will address below. Next I wanted to test something different:
For the first test I used the macvlan{
"cniVersion": "0.4.0",
"name": "mymacvlan",
"plugins": [
{
"type": "loopback"
},
{
"name": "mynet",
"type": "macvlan",
"master": "eth0",
"ipam": {
"type": "host-local",
"ranges": [
[
{
"subnet": "192.168.10.0/24"
}
]
],
"routes": [
{
"dst": "0.0.0.0/0"
}
]
}
},
{
"type": "portmap",
"capabilities": {
"portMappings": true
},
"snat": true
}
]
} job "example" {
datacenters = ["dc1"]
group "cache-bridge" {
network {
mode = "bridge"
port "db" {
to = 6379
}
}
service {
name = "redis"
port = "db"
provider = "nomad"
address_mode = "alloc"
}
task "redis" {
driver = "docker"
config {
image = "redis:7"
ports = ["db"]
}
}
task "ping" {
driver = "docker"
lifecycle {
hook = "poststart"
sidecar = true
}
config {
image = "redis:7"
command = "/bin/bash"
args = ["/local/script.sh"]
}
template {
data = <<EOF
#!/usr/bin/env bash
while true; do
{{range nomadService "redis"}}
echo "Pinging {{.Address}}:{{.Port}}"
redis-cli -h {{.Address}} -p {{.Port}} PING
{{end}}
sleep 3
done
EOF
destination = "local/script.sh"
}
}
}
group "cache-cni" {
network {
mode = "cni/mymacvlan"
port "db" {
to = 6379
}
}
service {
name = "redis"
port = "db"
provider = "nomad"
address_mode = "alloc"
}
task "redis" {
driver = "docker"
config {
image = "redis:7"
ports = ["db"]
}
}
task "ping" {
driver = "docker"
lifecycle {
hook = "poststart"
sidecar = true
}
config {
image = "redis:7"
command = "/bin/bash"
args = ["/local/script.sh"]
}
template {
data = <<EOF
#!/usr/bin/env bash
while true; do
{{range nomadService "redis"}}
echo "Pinging {{.Address}}:{{.Port}}"
redis-cli -h {{.Address}} -p {{.Port}} PING
{{end}}
sleep 3
done
EOF
destination = "/local/script.sh"
}
}
}
} $ nomad service info redis
Job ID Address Tags Node ID Alloc ID
example 192.168.10.2:6379 [] 62e0ad12 af93e40f
example 172.26.64.137:6379 [] 62e0ad12 c1483937
$ nc -v 192.168.10.2 6379
^C
$ nomad alloc logs -task ping af
Pinging 192.168.10.2:6379
PONG
Pinging 172.26.64.137:6379
Pinging 192.168.10.2:6379
PONG
$ nomad alloc logs -task ping c1
Pinging 192.168.10.2:6379
Pinging 172.26.64.137:6379
PONG
Pinging 192.168.10.2:6379
Pinging 172.26.64.137:6379
PONG I wasn't able to get cross-network and host port mapping communication working, but allocations in the same network were able to communicate. I think this is where my lack of more advanced networking configuration is a problem and I wonder if I'm just missing a route configuration somewhere. macvlan - same networkjob "example" {
datacenters = ["dc1"]
group "cache-cni-1" {
network {
mode = "cni/mymacvlan"
port "db" {
to = 6379
}
}
service {
name = "redis"
port = "db"
provider = "nomad"
address_mode = "alloc"
}
task "redis" {
driver = "docker"
config {
image = "redis:7"
ports = ["db"]
}
}
task "ping" {
driver = "docker"
lifecycle {
hook = "poststart"
sidecar = true
}
config {
image = "redis:7"
command = "/bin/bash"
args = ["/local/script.sh"]
}
template {
data = <<EOF
#!/usr/bin/env bash
while true; do
{{range nomadService "redis"}}
echo "Pinging {{.Address}}:{{.Port}}"
redis-cli -h {{.Address}} -p {{.Port}} PING
{{end}}
sleep 3
done
EOF
destination = "local/script.sh"
}
}
}
group "cache-cni-2" {
network {
mode = "cni/mymacvlan"
port "db" {
to = 6379
}
}
service {
name = "redis"
port = "db"
provider = "nomad"
address_mode = "alloc"
}
task "redis" {
driver = "docker"
config {
image = "redis:7"
ports = ["db"]
}
}
task "ping" {
driver = "docker"
lifecycle {
hook = "poststart"
sidecar = true
}
config {
image = "redis:7"
command = "/bin/bash"
args = ["/local/script.sh"]
}
template {
data = <<EOF
#!/usr/bin/env bash
while true; do
{{range nomadService "redis"}}
echo "Pinging {{.Address}}:{{.Port}}"
redis-cli -h {{.Address}} -p {{.Port}} PING
{{end}}
sleep 3
done
EOF
destination = "/local/script.sh"
}
}
}
} $ nomad service info redis
Job ID Address Tags Node ID Alloc ID
example 192.168.10.3:6379 [] 62e0ad12 a6b08599
example 192.168.10.4:6379 [] 62e0ad12 abd6f643
$ nomad alloc logs -task ping ab
Pinging 192.168.10.3:6379
PONG
Pinging 192.168.10.4:6379
PONG
Pinging 192.168.10.3:6379
PONG
Pinging 192.168.10.4:6379
PONG
Pinging 192.168.10.3:6379
PONG
$ nomad alloc logs -task ping a6
Pinging 192.168.10.3:6379
PONG
Pinging 192.168.10.4:6379
PONG
Pinging 192.168.10.3:6379
PONG
Pinging 192.168.10.4:6379
PONG Next I tried a Cilium network setup since @pruiz and @brotherdust mentioned it. It is indeed quite challenging to get it working, but I think I was able to get enough running for what I needed. First I tried to run as an external configuration using the generic Veth Chaining approach because I think this is what is being suggested here, the ability to chain additional plugins to Nomad's bridge. Cilium - custom CNIOnce again I started from the bridge configuration in our docs and chained {
"cniVersion": "0.4.0",
"name": "cilium",
"plugins": [
{
"type": "loopback"
},
{
"type": "bridge",
"bridge": "mybridge",
"ipMasq": true,
"isGateway": true,
"forceAddress": true,
"ipam": {
"type": "host-local",
"ranges": [
[
{
"subnet": "192.168.15.0/24"
}
]
],
"routes": [
{ "dst": "0.0.0.0/0" }
]
}
},
{
"type": "firewall",
"backend": "iptables",
"iptablesAdminChainName": "NOMAD-ADMIN"
},
{
"type": "portmap",
"capabilities": {"portMappings": true},
"snat": true
},
{
"type": "cilium-cni"
}
]
} I also used the Consul KV store backend because that's what I most familiarized with, I don't think this choice influences the test. $ consul agent -dev I then copied the Cilium CNI plugin to my host's $ docker run --rm -it -v /opt/cni/bin/:/host cilium/cilium:v1.12.6 /bin/bash
root@df6cdba526a8:/home/cilium# cp /opt/cni/bin/cilium-cni /host
root@df6cdba526a8:/home/cilium# exit Enable some Docker driver configuration to be able to mount host volumes and run the Cilium agent in privileged mode. client {
cni_config_dir = "..."
}
plugin "docker" {
config {
allow_privileged = true
volumes {
enabled = true
}
}
} Start Nomad and run the Cilium agent job. job "cilium" {
datacenters = ["dc1"]
group "agent" {
task "agent" {
driver = "docker"
config {
image = "cilium/cilium:v1.12.6"
command = "cilium-agent"
args = [
"--kvstore=consul",
"--kvstore-opt", "consul.address=127.0.0.1:8500",
"--enable-ipv6=false",
]
privileged = true
network_mode = "host"
volumes = [
"/var/run/docker.sock:/var/run/docker.sock",
"/var/run/cilium:/var/run/cilium",
"/sys/fs/bpf:/sys/fs/bpf",
"/var/run/docker/netns:/var/run/docker/netns:rshared",
"/var/run/netns:/var/run/netns:rshared",
]
}
}
}
} Make sure things are good. $ sudo cilium status
KVStore: Ok Consul: 127.0.0.1:8300
Kubernetes: Disabled
Host firewall: Disabled
CNI Chaining: none
Cilium: Ok 1.12.6 (v1.12.6-9cc8d71)
NodeMonitor: Disabled
Cilium health daemon: Ok
IPAM: IPv4: 2/65534 allocated from 10.15.0.0/16,
BandwidthManager: Disabled
Host Routing: Legacy
Masquerading: IPTables [IPv4: Enabled, IPv6: Disabled]
Controller Status: 20/20 healthy
Proxy Status: OK, ip 10.15.100.217, 0 redirects active on ports 10000-20000
Global Identity Range: min 256, max 65535
Hubble: Disabled
Encryption: Disabled
Cluster health: 1/1 reachable (2023-02-07T23:51:41Z) Run job that uses job "example" {
datacenters = ["dc1"]
group "cache-cni" {
network {
mode = "cni/cilium"
port "db" {
to = 6379
}
}
service {
name = "redis"
port = "db"
provider = "nomad"
address_mode = "alloc"
}
task "redis" {
driver = "docker"
config {
image = "redis:7"
ports = ["db"]
}
}
task "ping" {
driver = "docker"
lifecycle {
hook = "poststart"
sidecar = true
}
config {
image = "redis:7"
command = "/bin/bash"
args = ["/local/script.sh"]
}
template {
data = <<EOF
#!/usr/bin/env bash
while true; do
{{range nomadService "redis"}}
echo "Pinging {{.Address}}:{{.Port}}"
redis-cli -h {{.Address}} -p {{.Port}} PING
{{end}}
sleep 3
done
EOF
destination = "local/script.sh"
}
}
}
group "cache-bridge" {
network {
mode = "bridge"
port "db" {
to = 6379
}
}
service {
name = "redis"
port = "db"
provider = "nomad"
address_mode = "alloc"
}
task "redis" {
driver = "docker"
config {
image = "redis:7"
ports = ["db"]
}
}
task "ping" {
driver = "docker"
lifecycle {
hook = "poststart"
sidecar = true
}
config {
image = "redis:7"
command = "/bin/bash"
args = ["/local/script.sh"]
}
template {
data = <<EOF
#!/usr/bin/env bash
while true; do
{{range nomadService "redis"}}
echo "Pinging {{.Address}}:{{.Port}}"
redis-cli -h {{.Address}} -p {{.Port}} PING
{{end}}
sleep 3
done
EOF
destination = "/local/script.sh"
}
}
}
} Remove the $ sudo cilium endpoint list
ENDPOINT POLICY (ingress) POLICY (egress) IDENTITY LABELS (source:key[=value]) IPv6 IPv4 STATUS
ENFORCEMENT ENFORCEMENT
10 Enabled Enabled 5 reserved:init 10.15.115.127 ready
310 Disabled Disabled 4 reserved:health 10.15.203.243 ready
4041 Disabled Disabled 1 reserved:host ready
$ sudo cilium endpoint labels -d reserved:init 10
$ sudo cilium endpoint list
ENDPOINT POLICY (ingress) POLICY (egress) IDENTITY LABELS (source:key[=value]) IPv6 IPv4 STATUS
ENFORCEMENT ENFORCEMENT
10 Disabled Disabled 63900 no labels 10.15.115.127 ready
310 Disabled Disabled 4 reserved:health 10.15.203.243 ready
4041 Disabled Disabled 1 reserved:host ready Test connection. Results are the same as $ nomad service info redis
Job ID Address Tags Node ID Alloc ID
example 10.15.115.127:6379 [] bac2e14a 97b83d17
example 172.26.64.138:6379 [] bac2e14a cf0cbb83
$ nc -v 10.15.115.127 6379
^C
$ nomad alloc logs -task ping 97
Pinging 10.15.115.127:6379
PONG
Pinging 172.26.64.138:6379
Pinging 10.15.115.127:6379
PONG
Pinging 172.26.64.138:6379 Change job so both groups are in the job "example" {
datacenters = ["dc1"]
group "cache-cni-1" {
network {
mode = "cni/cilium"
port "db" {
to = 6379
}
}
service {
name = "redis"
port = "db"
provider = "nomad"
address_mode = "alloc"
}
task "redis" {
driver = "docker"
config {
image = "redis:7"
ports = ["db"]
}
}
task "ping" {
driver = "docker"
lifecycle {
hook = "poststart"
sidecar = true
}
config {
image = "redis:7"
command = "/bin/bash"
args = ["/local/script.sh"]
}
template {
data = <<EOF
#!/usr/bin/env bash
while true; do
{{range nomadService "redis"}}
echo "Pinging {{.Address}}:{{.Port}}"
redis-cli -h {{.Address}} -p {{.Port}} PING
{{end}}
sleep 3
done
EOF
destination = "local/script.sh"
}
}
}
group "cache-cni-2" {
network {
mode = "cni/cilium"
port "db" {
to = 6379
}
}
service {
name = "redis"
port = "db"
provider = "nomad"
address_mode = "alloc"
}
task "redis" {
driver = "docker"
config {
image = "redis:7"
ports = ["db"]
}
}
task "ping" {
driver = "docker"
lifecycle {
hook = "poststart"
sidecar = true
}
config {
image = "redis:7"
command = "/bin/bash"
args = ["/local/script.sh"]
}
template {
data = <<EOF
#!/usr/bin/env bash
while true; do
{{range nomadService "redis"}}
echo "Pinging {{.Address}}:{{.Port}}"
redis-cli -h {{.Address}} -p {{.Port}} PING
{{end}}
sleep 3
done
EOF
destination = "/local/script.sh"
}
}
}
} Remove labels again. $ sudo cilium endpoint list
ENDPOINT POLICY (ingress) POLICY (egress) IDENTITY LABELS (source:key[=value]) IPv6 IPv4 STATUS
ENFORCEMENT ENFORCEMENT
563 Enabled Enabled 5 reserved:init 10.15.251.107 ready
680 Enabled Enabled 5 reserved:init 10.15.203.84 ready
3561 Disabled Disabled 4 reserved:health 10.15.203.243 ready
4041 Disabled Disabled 1 reserved:host ready
$ sudo cilium endpoint labels -d reserved:init 563
$ sudo cilium endpoint labels -d reserved:init 680
$ sudo cilium endpoint list
ENDPOINT POLICY (ingress) POLICY (egress) IDENTITY LABELS (source:key[=value]) IPv6 IPv4 STATUS
ENFORCEMENT ENFORCEMENT
563 Disabled Disabled 63900 no labels 10.15.251.107 ready
680 Disabled Disabled 63900 no labels 10.15.203.84 ready
3561 Disabled Disabled 4 reserved:health 10.15.203.243 ready
4041 Disabled Disabled 1 reserved:host ready Check connection. $ nomad service info redis
Job ID Address Tags Node ID Alloc ID
example 10.15.203.84:6379 [] bac2e14a 3414181a
example 10.15.251.107:6379 [] bac2e14a e4fc7bf2
$ nomad alloc logs -task ping 34
Pinging 10.15.203.84:6379
PONG
Pinging 10.15.251.107:6379
Pinging 10.15.203.84:6379
PONG
Pinging 10.15.251.107:6379
PONG
$ nomad alloc logs -task ping e4
Pinging 10.15.203.84:6379
PONG
Pinging 10.15.251.107:6379
PONG
Pinging 10.15.203.84:6379
PONG
Pinging 10.15.251.107:6379
PONG Although far from a production deployment, I think this does show that it's possible to setup custom CNI networks without modifying Nomad's default bridge. Except for the points I mentioned earlier, so I will try to list them all here and open follow-up issues for us to address them.
These are all limitations of our current CNI implementation that we need to address, and are planning to do so. The last item is more complicated since it requires more partnership and engagement with third-party providers, but we will also be looking into how to improve that. What's left to analyze is main the question:
For this I applied the same Cilium configuration directly to the code that generates the Nomad bridge. If I understood the proposal correctly chaining CNI plugins to the Nomad bridge would be the main use case for this feature, but please correct me if I'm wrong. But things were not much better, and most of the items above were still an issue. Cilium - embedded in NomadThe first thing you notice is what I mentioned in my previous comment. job "example" {
datacenters = ["dc1"]
group "cache" {
network {
mode = "bridge"
port "db" {
to = 6379
}
}
task "redis" {
driver = "docker"
config {
image = "redis:7"
ports = ["db"]
auth_soft_fail = true
}
resources {
cpu = 500
memory = 256
}
}
}
}
$ nomad job status example
ID = example
Name = example
Submit Date = 2023-02-08T00:41:51Z
Type = service
Priority = 50
Datacenters = dc1
Namespace = default
Status = running
Periodic = false
Parameterized = false
Summary
Task Group Queued Starting Running Failed Complete Lost Unknown
cache 0 1 0 1 0 0 0
Latest Deployment
ID = e0e9f013
Status = running
Description = Deployment is running
Deployed
Task Group Desired Placed Healthy Unhealthy Progress Deadline
cache 1 2 0 1 2023-02-08T00:51:51Z
Allocations
ID Node ID Task Group Version Desired Status Created Modified
30ffb0ce 643dc5fa cache 0 run pending 23s ago 23s ago
d7c34e23 643dc5fa cache 0 stop failed 1m30s ago 22s ago
$ nomad alloc status d7
ID = d7c34e23-0c11-e57f-1b28-ff2274264854
Eval ID = eccbefd9
Name = example.cache[0]
Node ID = 643dc5fa
Node Name = lima-default
Job ID = example
Job Version = 0
Client Status = failed
Client Description = Failed tasks
Desired Status = stop
Desired Description = alloc was rescheduled because it failed
Created = 1m45s ago
Modified = 37s ago
Deployment ID = e0e9f013
Deployment Health = unhealthy
Replacement Alloc ID = 30ffb0ce
Allocation Addresses (mode = "bridge"):
Label Dynamic Address
*db yes 127.0.0.1:30418 -> 6379
Task "redis" is "dead"
Task Resources:
CPU Memory Disk Addresses
500 MHz 256 MiB 300 MiB
Task Events:
Started At = N/A
Finished At = 2023-02-08T00:42:29Z
Total Restarts = 0
Last Restart = N/A
Recent Events:
Time Type Description
2023-02-08T00:42:30Z Killing Sent interrupt. Waiting 5s before force killing
2023-02-08T00:42:29Z Alloc Unhealthy Unhealthy because of failed task
2023-02-08T00:42:29Z Setup Failure failed to setup alloc: pre-run hook "network" failed: failed to configure networking for alloc: failed to configure network: plugin type="cilium-cni" failed (add): unable to connect to Cilium daemon: failed to create cilium agent client after 30.000000 seconds timeout: Get "http:///var/run/cilium/cilium.sock/v1/config": dial unix /var/run/cilium/cilium.sock: connect: no such file or directory
Is the agent running?
2023-02-08T00:41:51Z Received Task received by client Running the Cilium agent and deleting the endpoint labels as before fixes the problem and the allocation is now at least healthy. But, also like before, we can't access the task from the host or outside the Cilium network. Again, this is probably my fault and could be fixed with proper network configuration. Since we're in And so, looking at the list of issues above, the proposal here would only incidentally fix the first two items because of the way things are named and currently implemented, and both items are things we need to fix for CNI anyway. Now, to address some of the comments since the issue was close. From @the-maldridge.
Having to update jobspecs is indeed an unfortunate consequence, but this is often true for new features in general and, hopefully, it's a one-time process. Modifying Nomad's
Nomad networking features and improvements have been lagging and we're planning to address them. CNI, Consul Connect, IPv6 (which was the original use case you mentioned) are all things we are looking into improving, but unfortunately I don't have any dates to provide at this point to help you make a decision on which tool to use.
You are right, the issue I linked was about enabling Consul Connect on CNI networks. #14101 and #7905 are about IPv6 support in Consul Connect and Nomad's bridge.
Right now the only way I can think of to solve your issue is to run a patched version of Nomad to customize the hardcoded bridge config. But even that I'm not sure if it will be enough to fully enable Connect with IPv6. From @pruiz.
Agreed. We (the Nomad team) need to find a way to address this and better integrate with other networking solutions. We don't have any specifics at this point, but community support is always a good start and much appreciated!
💯 we need to improve our CNI integration.
That's correct, but so would be the proposal here if I understood it correctly?
Right, and the plan is to address this in #8953. It may be that removing the validation is enough. Having more people test the custom binary I provided there would be very helpful.
That's also true, but also not covered by this proposal? As far as I know, Kubernetes also suffers from the same issue and there are meta-plugins to multiplex different networks, like Multus. I have this in my list above to be created as a follow-up issue.
Yup, that's covered in #8953. One thing to clarify is what do you mean by "mixing jobs". Do you envision an alloc that uses Consul Connect to be able to reach an alloc on Cilium for example? If that's the case I'm not sure if it would work without a gateway 🤔
I'm sorry, I didn't quite follow this part. Are you talking about, for example, having to deploy the Cilium infrastructure to use something beyond Connect?
This is the void we expect CNI to fill by allowing users to create their own custom networks that fits their specific needs. This specific item is not about commercial support but feature support in general. We try to be careful about backwards compatibility and this would introduce a feature we expect to deprecate. I understand the frustration but, historically, we treat code shipped as code being used. For experimentations a temporary fork may be the best approach.
This is not a business decision, and I apologize if I made it sound like one. This was a technical decision as we found that arbitrary modifications to the default We are always happy to receive contributions, and I hope this doesn't discourage you from future contributions (we have lots to do!). But sometimes we need to close feature requests to make sure we are moving towards a direction we feel confident in maintaining.
Always! As I mentioned, the main point that I may be missing is understanding what you would be able to do with this feature that would not be possible with a well functioning CNI integration. Could you provide an example of what you would like to add to Nomad's bridge config? That can help us understand the use case better and yes, we are always willing to reconsider. From @brotherdust.
That's unfortunate but definitely understandable given where we are right now. Anything specific you could share to help us improve? To finish this (already very) long comment I want to make sure that it is clear that closing this issue it's just an indication that we find a stronger and better CNI integration to be a better approach for customized networking. What "stronger and better" means depends a lot from your input, so I appreciate all the discussion and feedback so far, please keep them coming 🙂 |
@lgfa29 , thank you for your thoughtful and detailed response. I'm sure it took some time out of your regular activities and I can appreciate it! I agree with you 100% that Nomad needs better CNI integration and much better IPv6 support.
I need some time to gather my thoughts into something more cogent. I'll get back to you soon. |
Wow, kudos for such an in-depth survey of the available options. I'm truly impressed that you got Cilium working and were able to use it even in a demo environment. I think perhaps the deeper issue that I encounter with this while looking at it is that there is a constant upgrade treadmill to operate an effective cluster. A treadmill that often times involves tracking down users in remote teams, who do not have dedicated operations resources but still expect the things they want to do in the hosted cluster environment to work. The kubernetes world solved this long ago with mutating ingress controllers to be able to monkey-patch jobspecs on the way in, and while I recognize the good arguments the Nomad team has made in the past against user-hosted ingress controllers, I can't deny that that converts operations teams into the very same mutating controller resources. As to having to update jobspecs to make use of the new features, I remember the 0.12 upgrade cycle far too well when I spent about a week trying to figure out why none of my network config worked as I understood it to at the time. I'm really starting to wonder if the answer here is to just not use any of the builtin networking at all, to always stand up a CNI network that I own, and then put everything there. That seems to be the supported mechanism for managing a stable experience for downstream Nomad consumers, would you agree? |
Edit: added mention of Fermyon-authored Cilium integration with Nomad.
Ok. Thoughts gathered! First, I want to qualify what I'm describing with the fact that I am, first and foremost, a network engineer. This isn't to say that I have expert opinions in this context, but to indicate that I might have a different set of tools in my bag than a software engineer or developer; therefore, there's a danger that I'm approaching this problem from the wrong perspective and I'm more than willing to hear advice on how to think about this differently. The goals enumerated below are enumerated for a reason: we'll be using them for reference later on. 1. Hardware Setup
2. Design Goals2.1 General
2.2 Workload Characteristics2.2.1 Types
2.2.2 Primary Use-Cases
2.3 Security
2.4 PKI
2.5 Networking
2.6 Storage
3. How It Went DownI set off finding the pieces that would fit. It eventually came down to k8s and Hashistack. I selected Hashistack because it's basically the opposite of k8s. I'll skip my usual extended diatribe about k8s and just say the k8s is very... opinionated... and is the ideal solution for boiling the ocean, should one so desire. Pain PointsIn a general sense, the most difficult parts of the evaluation comes down to one thing: where Hashistack doesn't cover the use-case, a third-party component must be integrated. Or, if it does cover the use-case, the docs are confusing or incomplete. CNITo the detriment of all, all the cool kids build service-mesh CNIs for k8s. They use k8s APIs, CRDs and such; things that Nomad (and Consul, indirectly) do not understand; and, frankly, shouldn't. Nomad has CNI support, but it's very basic in the sense that it cannot be programmatically or natively configured via Nomad jobspec. It seems there is some template functionality I wasn't aware of, as indicated by some of the content of this thread, so I'll have to revisit that. I very much agree with @lgfa29 that probably the best outcome is just to integrate Cilium as part of Nomad. That creates its own burden on the Hashicorp, so I'm not sure if they're going to be willing to do that. In this instance, I am happy to volunteer some time to maintain the integration once it is completed. Which brings me to a related note: I saw a HashiConf talk by Taylor Thomas from Fermyon. In it he describes a full-featured Cilium integration with Nomad they are planning on open sourcing. It hasn't happened yet due to time constraints, so I reached out to them to see what the timeline is and if they would like some help. Hopefully I or someone more qualified (which is pretty much anyone) can get the ball rolling on that. If anyone wants me to keep them up to date on this item, let me know. PKII realize this seems somewhat off-subject, but it is somewhat related. This article covers some of the issues I experience, which I'll quote from here:
So, besides experiencing exactly what the author mentioned, I can add: if you want to integrate any of these components with an existing enterprise CA, beware that, for example:
I think what happened is that the developers assumed that we'd want to use the self-signed CA that came with each component and nothing else. So, they weren't expecting a particular kind of error, or didn't see the need to comprehensively document what a certificate should look like. For lab purposes, this is acceptable. When one is trying to set up a production cluster, it's pretty rough. On a final note, I seriously appreciate that this is open source software and that I am more than welcome to provide a PR. I even thought about justifying an enterprise license. But, in this particular case, a PR wouldn't be enough to address the architectural decisions that lead to where we are now; and, based on my experience with enterprise support contracts, would probably never be addressed unless there were some serious money on the table. I get it, I do. My expectations are low; but I thought it was at least worth the time to write all this out so that you would benefit from my experience. Thanks again! Seriously great software! |
Hi @lgfa29, First, thnks for the thoughtful response, I'll try to answer some points I think relevant below.. ;)
I think the main deviation from your tested scenarios and the one I have in mind is that I want a single task (within a given allocation) should be able to use both Consul Connect and Cilium's networking.
This is the kind of integration between Consul Connect & Cilium I want to achieve. [...]
[...]
That would be an option for me, but given that we can use Connect on a custom CNI network, hopefully delegating to nomad's deployment/management of envoy proxy stuff.
Yeah, I know, kubernetes is similar here, but my point was that support for more than one networking, could be another way around for this.. just provide my tasks with one network 'connecting' to nomad's bridge, and another one connecting to cilium. :)
This is what I explained at the top: I think we could make Connect and Cilium work ontop of the same bridge.. and have both working together side by side.
No bad feelings ;), I understood your point. Just wished we could find a iterim solution for the current limitations of Connect. Regards |
I've heard some people mentioning an approach like this before (for example, here is Seatgeek speaking at HashiConf 2022), but I'm not sure if there's been any final decision on this by the team.
That's the direction we're going. The built-in networks should be enough for most users and a custom CNI should be used by those that need more customization. The problem right now (in addition to the CNI issues mentioned previously) is that there's a big gap between the two. We need to figure out a way to make CNI adoption more seamless. @brotherdust thanks for the detail report of your experience!
Yup, that's the part about partnerships I mentioned in my previous comment. But those can take some time to be established. The work that @pruiz has done in Cilium is huge for this!
Could you expand a little on this? What kind of dynamically values would you like to set and where?
Maybe I misspoke, but I don't expect any vendor specific code in Nomad at this point. The problem I mentioned is that, in theory, the CNI spec is orchestrator agnostic but in practice a lot of plugins have components that rely on Kubernetes APIs and, unfortunately, there is not much we can do about it.
And that's another important avenue as well. These types of integration are usually better maintained by people that actually use them, which is not our case. Everything I know about Cilium at this point was what I learned from community in #12120 🙂
I would suggest opening a separate issue for this (if one doesn't exist yet).
You're right, this will be a big effort that will require multiple PRs, but my is to break it down into smaller issues (some of them listed in my previous comment already) so maybe there will be something smaller that you can contribute 🙂 Things like documentation, blog posts, demos etc. are also extremely valuable to contribute.
Yup, I got that. But I want to make sure we're on the same as to why I closed this issue. So imagine the feature requested here were implemented, which From what I gathered so far the only things preventing you from doing what you want are shortcomings in our CNI implementation. If that's not the case I would like to hear what
Yes, the sidecar deployment is conditional on I would appreciate if you could test the binary I have linked in #8953 (comment) to see if it works for you.
Yup, I have this on my list and I will open a new issue about multiple network interfaces per alloc 👍 |
Hi all 👋 I just wanted to note that, as mentioned previously, I've created follow-up issues on specific areas that must be improved. You can find them linked above. Feel free to 👍, add more comments there, or create new issues if I missed anything. Thanks! |
@lgfa29 , thanks much! |
Proposal
Right now the configuration for the
nomad0
bridge device is hard coded. Among other things, this makes it impossible to use Consul Connect with nomad and IPv6.Use-cases
This would enable IPv6 with the bridge, it would also allow the use of more advanced or configurable CNI topologies.
Attempted Solutions
To the best of my knowledge, there is no current solution to make consul connect and nomad both play nice with IPv6, or other similarly advanced dual-stack network configurations.
The text was updated successfully, but these errors were encountered: