Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Emulate netlink Link Attributes #5

Open
vparames86 opened this issue Jul 9, 2019 · 12 comments
Open

Emulate netlink Link Attributes #5

vparames86 opened this issue Jul 9, 2019 · 12 comments
Labels
enhancement New feature or request

Comments

@vparames86
Copy link

Currently we can't set the link attributes like queuing which can be used to emulate the link parameters like speed. Enabling these features will aid in creating a more realistic network lab emulation.

@networkop networkop added the enhancement New feature or request label Jul 10, 2019
@Cerebus
Copy link
Contributor

Cerebus commented Feb 22, 2022

I've been looking into this and it seems to me the best way is to add shaping to koko, add shaping data to the Topology CRD, add shaping data to the returns from meshnetd, and then to each invocation of koko.MakeVeth() and koko.MakeVxLan(). Lastly, the CNI spec says that ADD commands are used to modify networks, so would need to update the qdiscs even when both local and peer exist (currently this is a noop branch in meshnet.go).

The alternative is a separate controller daemonset watching Topology resources and filtering on status.src_ip that adds a netem qdisc to each interface on Topology CREATE/PUT events. That would be quicker to implement but seems to be a waste.

@networkop
Copy link
Owner

I think the first option makes sense. In fact, there's a standard CNI plugin now that implements bandwidth shaping, so would be interesting to see if its code can be re-used as-is https://github.com/containernetworking/plugins/blob/76307bf0f6929b39986f1c6f5e4b6abdb82b8815/plugins/meta/bandwidth/ifb_creator.go#L61

@Cerebus
Copy link
Contributor

Cerebus commented Feb 23, 2022

I hadn't thought about a discrete chained plugin for this, but I like it.

The extant bandwidth plugin only applies rate limits to the CNI_IFNAME environment parameter so it won't work as-is, but it should be simple enough write a new one. So we add per-iface shaping data (rate, burst, delay, loss, corruption, duplication, and reordering) to Topology, record it in meshnetd, and then fetch and apply it by looping over the prevResult output on ADD and CHECK commands.

(ETA) It's not yet clear to me how or when the runtime decides to invoke CHECK, however. If it can't be reliably triggered, then there's a problem. Experimenting with bandwidth and the runtimeConfig annotations it looks like it never modifies after Pod creation.

(ETA again) Seems to happen eventually, but still poring over the CNI spec to see if it's defined or left up to the runtime. I deployed an iperf server and client and annotated an egress limit on the client after container start, and it didn't take effect right away. When I came back to it an hour later the limit was in effect.

@networkop
Copy link
Owner

yep, it looks like some of their functions, e.g. func CreateEgressQdisc are generic and not tied to a specific interface so we should be able to pass any interface name to it ( I think).

@Cerebus
Copy link
Contributor

Cerebus commented Feb 23, 2022

Unfortunately that code only handles a TBF, so loss/corruption/delay/dupe/reorder will need new code.

@networkop
Copy link
Owner

yep, but it's a good place to start (or a good template to copy?)

@Cerebus
Copy link
Contributor

Cerebus commented Feb 23, 2022

OK, so dockershim's implementation of the bandwidth capabilities convention is just plain broken; it sets the rate properly but sets burst to MaxInt32. So depending on all kinds of variables, any given iperf run can spike to max throughput no matter what limit is set.

So the upshot: AFAICT kubelet only fires CNI plugins once. The CHECK command seems to be ... nowhere I can find. Even then, the spec says this needs to do nothing or return an error, and I haven't figured out what happens if kubelet gets an error.

So that's not looking good for a chained plugin, b/c the behavior I'm after needs to be completely dynamic; I want to change qdiscs at runtime, not just at boot time.

@networkop
Copy link
Owner

in this case, what you mentioned above as option#2 (daemonset list/watching Topology resources and changing qdisc) seems like the only option. I'm not 100% how you'd be able to get a handle for a veth link from root NS once it's been moved to a container.

what's your use case for doing this at runtime? the topology is still built at create time.

@Cerebus
Copy link
Contributor

Cerebus commented Feb 23, 2022

I'm building a general network emulation platform on k8s to support engineering studies at higher fidelity than discrete event sims (like OPNET) and with less resource requirements than VM based emulators (like GNS3) with the ability to do real software-in-the-loop (and eventually hardware-in-the-loop). That means I need interactivity (comes for free on k8s) and scripted events. I'm also looking at hooking up with Chaos Toolkit as another driver.

(ETA) I'm already doing tc in a sidecar using downward API, but it's dependent on inotify which has resource limits, and I'm looking for a more efficient solution.

@networkop
Copy link
Owner

networkop commented Feb 24, 2022

right, gotcha. yeah, sidecars could be a viable option. networkservicemesh.io works this way. In fact, this would even make it possible to change the topology at runtime. but then you won't need meshnet-cni at all.

@Cerebus
Copy link
Contributor

Cerebus commented Mar 14, 2022

Back to looking at this feature req. now that I have a multinode cluster working again.

I'mna take a stab at adding this behavior to meshnetd; it should be straightforward to register a watch on topos, filter on status.src_ip, and then apply a simple canned tc-netem qdisc to the namedlocal_intf in status.net_ns.

I've done similar filters in controllers I've written in Python, so it should work b/c as a daemonset there's no chance that instances will fight with one another. That will make the traffic shaping dynamic while not interfering with how the plugin sets things up.

A first version will just set a rate/delay/loss/corrupt/dupe/reorder discipline on the device root. It would be interesting to be able to do arbitrary disciplines, but maybe that's better left to the workload. The caveat will be any workload that mucks with the qdisc will have to attach to this qdisc as a parent, unless I can figure out some way to be non-interfering.

@chrisy
Copy link

chrisy commented Dec 1, 2023

@Cerebus I know this is an old thread, but it's still open. :) Were you able to make any progress on this feature?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants