-
Notifications
You must be signed in to change notification settings - Fork 557
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Define Linux Network Devices #1271
base: main
Are you sure you want to change the base?
Conversation
/assign @samuelkarp |
https://github.com/opencontainers/runtime-spec/blob/main/features.md should be updated too |
51e5104
to
3a666eb
Compare
updated and addressed the comments |
AI @aojea (document the cleanup and destroy of the network interfaces) |
From the in-person discussion today:
|
config-linux.md
Outdated
|
||
This schema focuses solely on moving existing network devices identified by name into the container namespace. It does not cover the complexities of network device creation or network configuration, such as IP address assignment, routing, and DNS setup. | ||
|
||
**`netDevices`** (object, OPTIONAL) set of network devices that MUST be available in the container. The runtime is responsible for providing these devices; the underlying mechanism is implementation-defined. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This spec said "MUST" but, I think it can't do it in the rootless container because the rootless container doesn't have CAP_NET_ADMIN, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure we should take care of the rootless container.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be an error in the case of a rootless container, if the runtime is not able to satisfy the MUST condition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be an error in the case of a rootless container, if the runtime is not able to satisfy the MUST condition.
+1 but It'd be better to clarify it in the spec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added mor explanations about runtime and network devices lifecycle and runtime checks, PTAL
Pushed a new commit addressing those comments, the changelog is
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some more comments; to me it seems this won't be ready for 1.2.1 (or 1.3.0) if we want to cut a release soon.
To me it seems that a simple In case Or, it could be done on the host by a higher-level orchestration tools just before container creation. Provided you can configure the interface while it's down, everything can be pre-configured, and the runtime tasks would be limited to:
I'm also worried about feature creep here. The list of properties needed to be set will grow over time, needing more and more from the runtime implementation, making it even more bloated. Say, what about protocols other than IPv4 and IPv6? Various devices that need more specific properties? Static routing entries? The list can go on and on. Having said that, I can agree that a way to move a network device to a container would be nice to have. But I strongly believe that configuring its properties is a bit out of scope for container runtime. |
I'm completely fine with that and that is what I want and need for Kubernetes, the bootstrapping configuration options are very nice to have things but not essential, as you mentioned the scope was creeping during the review, so my question is if all of you we'll be fine just removing that part and focusing only on the "moving interfaces part", @kad you were very vocal about IP configuration, is this acceptable? |
c5e11ef
to
dca9c4c
Compare
@kolyshkin see last commit removing all the network configuration options and leaving the proposal to only the network interface plumbing. I think that this is a good next step for everyone involved in the review since it allows to unblock a lot of scenarios that today we need to workaround with out of band mechanisms and a lot of redtape ... and this is also not blocking a future expansion in case someone wants to pursue the network initialization side , but for Kubernetes this new feature in current proposal is really beneficial so I appreciate if we can still consider for next release |
dca9c4c
to
7c89c98
Compare
7c89c98
to
ebe9192
Compare
we have case where simple assign list of addresses is going to be helpful. Removing dependency to external hook would be very nice to have. Speaking of hooks, I need to remind that hooks usually leads to contamination of host OS (in addition to increasing complexity of deployment such hooks that sole purpose would be to call |
@aojea can you squash the commits? Otherwise LGTM, and this is definitely something we can build upon. |
ebe9192
to
0f9ee33
Compare
@kolyshkin squashed, thanks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM except for one nit.
The proposed "netdevices" field provides a declarative way to specify which host network devices should be moved into a container's network namespace. This approach is similar than the existing "devices" field used for block devices but uses a dictionary keyed by the interface name instead. The proposed scheme is based on the existing representation of network device by the `struct net_device` https://docs.kernel.org/networking/netdevices.html. This proposal focuses solely on moving existing network devices into the container namespace. It does not cover the complexities of network configuration or network interface creation, emphasizing the separation of device management and network configuration. Signed-off-by: Antonio Ojea <[email protected]>
0f9ee33
to
0ce07ae
Compare
The problem in my mind is where do you draw the line? At what point does "a few netlink api calls" become "a few too many" and now it's no longer acceptable? I agree with Kir that we should keep the scope really limited here so we don't accidentally stray too far into network management in the low level runtime, which is something we've traditionally left as a higher level concern for a lot of good reasons. |
One more comment -- why is the data structure a map? A map makes sense when we want to quickly find an element based on a key. Is it needed here? Perhaps a simple array is easier. |
interfaces names are unique per namespace, a map guarantees this avoiding user fat finger problems or misconfiguration, otherwise we need to validate that the names in the array are unique |
I don't think the uniqueness requirement justifies using map, and I don't think we should be concerned much about uniqueness. In case a network device will be listed twice, the second entry will fail with "no such device" because the first one will already be moved. Also, we have other similar fields (Devices, Namespaces, Mounts etc.) as arrays. I see that time namespace offsets is implemented via maps (and I think it was a mistake, for similar reasons -- an array would work just fine). Having said that, this is not a deal breaker. Also, I'm sure other maintainers may have different opinions. WDYT @opencontainers/runtime-spec-maintainers ? |
With the IP configuration part now removed, am I correct that the "easy" solution configuring IP networking for additional interfaces is still via a custom runtime hook? In our case the containers are static and cannot be amended with IP configuration tools or the necessary networking capabilities, and I doubt we are able to inject another container into the pod without causing a stir. IP (and yes, a route configuration) would have been a good and standard solution for simple secondary network needs. |
My "border line" is about features and fields provided by kernel netlink api. If something is done with it, I don't see big difference between call "move interface xyz to namespace abc" vs. "set property foo to value bar on interface xyz in namespace abc". Or to what it worth, "put value 123 into cgroup foobar". |
"linux": { | ||
"netDevices": { | ||
"eth0": { | ||
"name": 23 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is 23 here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
an integer to cause a bad config since it is expecting a string
To me, this sounds like a runtime should implement most of what If that's the case, here's a crazy idea. Considering that ip(8) is a de facto standard on Linux, perhaps such configuration can be achieved by supplying a sequence of Something like "linux": {
"netDevices": [
"name": "enp34s0u2u1u2",
"ct_name": "eth0",
"config": [
"ip addr add 10.2.3.4/25 dev $IF",
"ip link set $IF mtu 1350",
"ip route add 10.10.0.0/16 via 10.2.3.99 dev $IF proto static metric 100"
"ip route add default dev $IF"
]
]
} Which will result in runtime running the host's The upsides are:
The biggest downside, I guess, is the scope is not well defined, as |
I don't like the idea of execing commands, specially in golang implementations that are problematic with goroutines and namespaces I still think that preventing to detect failures at runtime is desirable, you can argue you should not add duplicates, but bugs exist and if somebody accidentally add a duplicate interface name it will cause a problem in production we could have avoided just not allowing to define it |
I believe that this is really bad idea, compared to have limited set of config options implemented. To provide more insights on our use case, here is snip of the code that we want to get rid of, currently implemented as OCI runtime hook (btw, including already unreliable call to |
The proposed "netdevices" field provides a declarative way to specify which host network devices should be moved into a container's network namespace.
This approach is similar than the existing "devices" field used for block devices but uses a dictionary keyed by the interface name instead.
The proposed scheme is based on the existing representation of network device by the
struct net_device
https://docs.kernel.org/networking/netdevices.html.
This proposal focuses solely on moving existing network devices into the container namespace. It does not cover the complexities of network configuration or network interface creation, emphasizing the separation of device management and network configuration.
Fixes: #1239