Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

networking: refactor building nomad bridge config #23772

Merged
merged 1 commit into from
Aug 14, 2024

Conversation

martisah
Copy link
Contributor

@martisah martisah commented Aug 8, 2024

This PR refactors building the Nomad Bridge network configuration JSON, in order to allow for more manipulation of variables within the network configuration.

Copy link
Member

@gulducat gulducat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is very promising! some food for thought for ya.

client/allocrunner/bridge_plugin_config.go Outdated Show resolved Hide resolved
client/allocrunner/networking_bridge_linux.go Outdated Show resolved Hide resolved
client/allocrunner/networking_bridge_linux.go Outdated Show resolved Hide resolved
client/allocrunner/networking_bridge_linux.go Outdated Show resolved Hide resolved
Comment on lines 213 to 214
if withConsulCNI {
CNIconfig.Plugins = append(CNIconfig.Plugins, ConsulCNIBlock{
Copy link
Member

@gulducat gulducat Aug 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this feels so much better to me than Sprint-ing the json string

client/allocrunner/networking_bridge_linux.go Show resolved Hide resolved
@@ -0,0 +1,44 @@
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to ensure the refactor is no-op, these files were generated by the old code (plus whitespace), and passed prior to the refactor.

@gulducat gulducat force-pushed the refactor-bridge-template branch from 420bd98 to c94cb74 Compare August 12, 2024 18:29
@gulducat gulducat marked this pull request as ready for review August 12, 2024 19:18
@gulducat gulducat changed the title bridge: refactor building Nomad bridge network config networking: refactor building nomad bridge config Aug 12, 2024
@gulducat gulducat requested review from tgross and jrasell August 12, 2024 19:19
@gulducat gulducat added theme/networking backport/ent/1.6.x+ent Changes are backported to 1.6.x+ent backport/ent/1.7.x+ent Changes are backported to 1.7.x+ent backport/1.8.x backport to 1.8.x release line labels Aug 12, 2024
@gulducat
Copy link
Member

Bit of added context for this refactor: Adding IPv6 support to the default Nomad bridge (#14101) will require a little more modification of the CNI conflist than I want to continue fmt.Sprint()ing in a flat string. This refactor will make it much easier to add the IPv6 stuff conditionally if the network device supports it (the bridge plugin errors if you ask it for ipv6 subnet(s) and the host can't do it).

Copy link
Member

@tgross tgross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM to ship. My comments are for discussion and should be considered non-blocking, but the package change feels like we're trying to carve out a CNI package when maybe what we want is a "networking" package?

// Copyright (c) HashiCorp, Inc.
// SPDX-License-Identifier: BUSL-1.1

package cni
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels a little weird having a separate package called cni here without it including much of any of the CNI configuration. There's no code isolation happening because everything is exported (has to be because of serialization.) All CNI setup including all the configuration tests are still in the client/allocrunner package. It seems hard to disentangle the "bridge configurator" code from CNI as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the initial motivation was to namespace the structs -- I found myself naming them like FirewallCNIPlugin just because they were in the allocrunner package, and I thought if I made a small new package, then they could be named more simply and still make sense. I also just like small, focused packages in general.

but were I to spend more time on this, I probably would move more of the CNI stuff into here, but I figure this is an incremental improvement to be perhaps built on further, later.

regarding the "networking" package comment, a decent chunk of our networking config, especially on linux, is CNI, but I might still like to distinguish the things that are specifically satisfying CNI spec from other more-general networking concerns. i.e. if I really went ham refactoring this area of allocrunner, I might make a "networking" package, then still put a (relatively smaller) "cni" package within there.

but as you might imagine, I am trying not to boil the ocean! 😅

all that said, I could move the new golden-file tests into here, since they are a direct output of this package. I left them where they were because that was the hook into this refactor, but I did create them here in the first place...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll also admit to being a little concerned about backports, since I've been intending to go back to 1.6.x for future maintainability, but e.g. consul-cni is 1.8.x only. backports won't be tooooo painful anyway, or I could forego them entirely, but at any rate, it was on my mind when deciding what all to move into this new package.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but as you might imagine, I am trying not to boil the ocean!

Totally reasonable 👍

Comment on lines +12 to +31
func TestConflist_Json(t *testing.T) {
conf := &Conflist{
CniVersion: "0.0.1",
Name: "test-config",
Plugins: []any{
Generic{Type: "test-plugin"},
},
}
bts, err := conf.Json()
must.NoError(t, err)
must.Eq(t, `{
"cniVersion": "0.0.1",
"name": "test-config",
"plugins": [
{
"type": "test-plugin"
}
]
}`, string(bts))
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this just test the stdlib?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it tests stdlib json, but the main target is the appropriate struct tags to match CNI spec. I was aiming for a minimal test, but this is probably better covered by the more holistic real-world bridge conflists.

client/allocrunner/cni/plugins.go Outdated Show resolved Hide resolved
Comment on lines 18 to 22
[
{
"subnet": ""
}
]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the test was called empty because it wasn't correctly populated as it will be in the caller, which is how you get this bogus range configuration. It might not be a bad idea to have an actually-default test that populates the minimum set of items we'd expect to see populated.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gosh you're right, good catch!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had half a mind to add validation to protect against this, but just stuck with what was here instead to keep it a true no-op refactor. think I should reach a little further and check for invalid empty values like this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine to keep new invalidation logic out of this refactor PR.

@apollo13
Copy link
Contributor

If the networking config gets touched this massively, would it also be possible to allow users to override it -- I'd really love to use a custom cni configuration for every job without manually adding it to the job

@gulducat
Copy link
Member

Hi @apollo13! It would certainly be possible to do, and has been suggested before in #14480, but I agree with Luiz's closing comment there that we should be cautious with what the magic "bridge" network string represents. If it could mean anything, then our ability to support our own "bridge" network is undercut drastically, and it's important that we be able to rely on it when designing other features.

I understand that it means a lot of mode = "cni/my-cool-network" in job files, but at least when that is explicit, we (engineers in github issues, and our enterprise support team) know that we're troubleshooting a custom network setup.

@tgross
Copy link
Member

tgross commented Aug 13, 2024

I'd really love to use a custom cni configuration for every job without manually adding it to the job

Can you dig into what you might be looking for a bit more @apollo13? network.mode = "host" is the default network, so setting network.mode = "bridge" doesn't seem meaningfully different than setting network.mode = "cni/example". The clients need to have the CNI config file of course, but you'd have to configure a "default bridge" similarly too, I'd think?

@apollo13
Copy link
Contributor

Mhm, you are right @tgross. I forgot about host -- I always set bridge so in my mind this was the default. I guess part of the motiviation is wanting to run connect-enabled jobs with my custom bridge. Though that should be possible if we were to remove the current limitations and checks around mode="bridge" and also allow cni/ there. Not sure though how that would work with the modifications needed for the transparent proxy though.

On a sidenote, namespaces have enabled_task_drivers/disabled_task_drivers. Would you accept a PR adding enabled_network_modes/disable_network_modes?

@tgross
Copy link
Member

tgross commented Aug 14, 2024

On a sidenote, namespaces have enabled_task_drivers/disabled_task_drivers. Would you accept a PR adding enabled_network_modes/disable_network_modes?

I like where you're going with that. Full disclosure: we're sort of getting a lot of internal pressure to add make any kind of feature like Enterprise, but given that we've already got the nearly-identical task driver restrictions in CE I don't see why we couldn't accept that PR. Especially if it came from the community. 😁

@apollo13
Copy link
Contributor

Full disclosure: we're sort of getting a lot of internal pressure to add make any kind of feature like Enterprise

It is so sad that a company like HashiCorp cannot figure out a good balance where both sides benefit. But I guess the writing was on the wall with the IPO. Feels like HashiCorp (well not you Tim and your colleagues but rather upper management) really just wants to make nomad uninteresting. What I don't get though: If you make the community offering bad enough that noone uses it, who will then advocate for enterprise at their work etc? But what do I know, all I know is that once I turn to k8s because I can't get my stuff done with nomad anymore (using our own fork anyways) then all my (granted small ones and not so many currently) contributions to nomad will stop as well -- is that a winning situation for anyone? :/ (sorry for the rant, I am not trying to shit on anyone, quite the opposite, I care deeply about nomad)

I don't see why we couldn't accept that PR. Especially if it came from the community. 😁

Thank you, will see what I can put up and then mention you on the PR. I just wanted to ask first after the bad turnout of #10355.

@gulducat gulducat merged commit 73ce56b into main Aug 14, 2024
19 checks passed
@gulducat gulducat deleted the refactor-bridge-template branch August 14, 2024 17:43
Copy link

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Dec 20, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
backport/ent/1.6.x+ent Changes are backported to 1.6.x+ent backport/ent/1.7.x+ent Changes are backported to 1.7.x+ent backport/1.8.x backport to 1.8.x release line theme/networking
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants