Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dynamic host volumes: ensure unique name per node #24748

Merged
merged 5 commits into from
Jan 6, 2025

Conversation

gulducat
Copy link
Member

Volume name must be unique per node.

For existing static host vols, the scheduler will prevent creation of duplicate names ("no node meets constraints"), and it will similarly prevent duplicate dynamic vols after fingerprint (node re-registration), but we need to prevent concurrent duplicate creation that may happen within a fingerprint interval, too.

Here a lock is set for name+id during Create (and restore), then released when Deleted to prevent such very-fast or concurrent creates.

If the a volume spec is submitted again with a valid id set, it can still update the volume, and the plugin's Create will be run with the new spec. I'm guessing that folks may want to do this if either A) the plugin has been updated and folks want to regenerate their vols in place? or B) to increase the vol size (in CSI parlance, "expand"). Not sure if perhaps any other reasons, but I wanted to leave this door open.

A last funny caveat is if a static host volume is added to agent config after a dynamic one has been created, and the agent restarted to use it, the dynamic one will take precedence. A warning is logged to inform the user if this happens.

tgross
tgross previously approved these changes Jan 6, 2025
Copy link
Member

@tgross tgross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

func (l *nameLocker) lock(name, id string) error {
currentID, exists := l.locks.LoadOrStore(name, id)
if exists && id != currentID.(string) {
return fmt.Errorf("%w: name=%q id=%q", ErrVolumeNameExists, name, id)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The caller has the name and ID and already logs, so we should probably return only ErrVolumeNameExists here and then have the caller log the error rather than formatting this by hand and having the formatting mismatch for JSON loggers.

Base automatically changed from dhv-misc-daniel-tidying to main January 6, 2025 19:20
@gulducat gulducat dismissed tgross’s stale review January 6, 2025 19:20

The base branch was changed.

users may modify an existing volume
(probably to change the size? why else?)
by setting an `id` in the volume spec
and re-issuing create
with the same name -- the static vol is discarded,
dynamic takes precedence, and log to warn the user
@gulducat gulducat force-pushed the dhv-serialize-ops-by-name branch from e7a7f9f to d12a57b Compare January 6, 2025 19:22
@gulducat gulducat added this to the 1.10.0 milestone Jan 6, 2025
@gulducat gulducat requested a review from tgross January 6, 2025 20:28
@gulducat gulducat marked this pull request as ready for review January 6, 2025 20:28
@gulducat gulducat requested review from a team as code owners January 6, 2025 20:28
Copy link
Member

@tgross tgross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I feel like the sync.Map is almost never what you want to use but in this case it was the perfect choice. :shipit:

@@ -50,7 +51,8 @@ SEND_BATCH:
// host volume updates
var hostVolChanged bool
c.batchNodeUpdates.batchHostVolumeUpdates(func(name string, vol *structs.ClientHostVolumeConfig) {
hostVolChanged = hvm.UpdateVolumeMap(newConfig.Node.HostVolumes, name, vol)
hostVolChanged = hvm.UpdateVolumeMap(c.logger.Named("node_updater").With("method", "batchFirstFingerprint"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-blocking: I get where you're going with having the method field here but I do wonder if the terminology is weird for a cluster adminstrator looking through their logs. They probably don't care about the Go method name and more what component is emitting the log line.

Also, as far as I can tell there's only ever one possible log line we can emit from the function we're injecting this logger into, and that can only happen on first fingerprint? That would make adding the method field less useful, I think?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

theoretically it should only happen in that one situation, yes, but UpdateVolumeMap is called from 3 spots, and I had such a heck of a time tracing the various call sites to construct this whole thing, I figured I'd just go extra specific with these. I could add a Trace in UpdateVolumeMap to shine light on some of the callbacky weirdness in this area of the codebase...

otherwise, while it should only happen one way presently, I tend to do a lot of guarding against potential future regressions -- it's not likely that someone will modify this at all (I'm sure it's perfect ;), much less cause breakage, but I I'm a little paranoid about the future.

I thought of log_include_location, which is great, but doesn't help in this case because the log isn't emitted right here where it would matter to know about it. I could un-DRY this like the other similar ones, but overall I do like the consistency that UpdateVolumeMap provides.

🤷

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aye, no worries. Let's go ahead and ship this.

@gulducat gulducat merged commit a9ee66a into main Jan 6, 2025
37 checks passed
@gulducat gulducat deleted the dhv-serialize-ops-by-name branch January 6, 2025 21:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants