Skip to content

Add k8s name related formats #384

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

jpbetz
Copy link
Contributor

@jpbetz jpbetz commented Apr 12, 2023

This adds the most heavily used name formats in k8s:

  • k8s.io/short-name (represents the DNS 1123 label format)
  • k8s.io/long-name (represents the DNS 1123 subdomain format)

Note: These formats implemented in k8s in ways that do not exactly match the RFCs (kubernetes/kubernetes#71140,
kubernetes/kubernetes#79351). But, intend to use these formats to communicate the existing Kubernetes validation rules via OpenAPI, so we want to keep them as-is.

Benefits:

  • Easier for CRDs to use the same name format already used in k8s.
  • Useful for publication of OpenAPI from Declarative Validation

xref: https://json-schema.org/draft/2020-12/json-schema-validation#name-custom-format-attributes, https://spec.openapis.org/registry/format/

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Apr 12, 2023
@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Apr 12, 2023
@liggitt
Copy link
Member

liggitt commented Apr 12, 2023

are non-standard format strings supposed to be namespaced or not?

what are the implications for CRDs created using these format strings prior to this, allowing in data that ignores those format requirements, then auto-tightening validation when upgrading to a control plane with this support added?

@jpbetz
Copy link
Contributor Author

jpbetz commented Apr 12, 2023

are non-standard format strings supposed to be namespaced or not?

I have not found any mention of namespacing in the specs. xref https://datatracker.ietf.org/doc/html/draft-bhutton-json-schema-validation-01#name-custom-format-attributes

I am open to naming suggestions for these formats though. I had briefly considered something like "kubernetes-name" but couldn't think of anything that aided rather than harmed clarity on what the format actually is.

what are the implications for CRDs created using these format strings prior to this, allowing in data that ignores those format requirements, then auto-tightening validation when upgrading to a control plane with this support added?

Yes, exactly.

Today, supported formats is used strip out unsupported
formats when used in older version of the kube-apiserver. We could extend this to have
two lists: supported and storage. All new formats will be added to storage for
at least one minor release of Kubernetes before the format is added to supported so
that any usages of the new formats written to CRDs downgrade safely.

@jpbetz
Copy link
Contributor Author

jpbetz commented Apr 26, 2023

/assign @apelisse

Would you be willing to review?

Copy link
Member

@apelisse apelisse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good thanks

@apelisse
Copy link
Member

Thanks, looks better to me :-)

@@ -124,6 +124,48 @@ func IsEmail(str string) bool {
return e == nil && addr.Address != ""
}

const dns1123LabelFmt string = "[a-z0-9]([-a-z0-9]*[a-z0-9])?"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we're copying/pasting from apimachinery validation, are we inheriting these issues:

kubernetes/kubernetes#71140

  • This requires things to be lowercase, which is more strict than the RFC

kubernetes/kubernetes#79351 (comment)

  • This tolerates individual labels exceeding 63 chars, which is not as strict as the RFC

Should we fix those issues before adopting those formats here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We intend to publish OpenAPI the in-tree types using these formats. So I think we're stuck inheriting the it all.

That said, we shouldn't using the RFC identifiers for formats that are not exactly compliant with the RFC. I'll propose some alternative names.

Comment on lines 251 to 258
dns1123subdomain := DNS1123Subdomain("")
Default.Add("dns1123subdomain", &dns1123subdomain, IsDNS1123Subdomain)

dns1123label := DNS1123Label("")
Default.Add("dns1123label", &dns1123label, IsDNS1123Label)

dns1035label := DNS1035Label("")
Default.Add("dns1035label", &dns1035label, IsDNS1035Label)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if these get added to the default set of recognized formats here, how do we roll them out in a controlled way in kubernetes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was held until CRD Ratcheting was promoted to beta in 1.30, we'll leverage that capability.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll update this code so we can manage the newly added formats separate from the pre-existing ones.

@liggitt
Copy link
Member

liggitt commented Apr 27, 2023

what are the implications for CRDs created using these format strings prior to this, allowing in data that ignores those format requirements, then auto-tightening validation when upgrading to a control plane with this support added?

Yes, exactly.

Today, supported formats is used strip out unsupported formats when used in older version of the kube-apiserver. We could extend this to have two lists: supported and storage. All new formats will be added to storage for at least one minor release of Kubernetes before the format is added to supported so that any usages of the new formats written to CRDs downgrade safely.

I'm not sure I understand how that makes existing persisted CR data safe. However long we take to roll out support for the format, at some point, would we start treating existing persisted data that didn't match the specified format as invalid?

@jpbetz
Copy link
Contributor Author

jpbetz commented May 4, 2023

I'm not sure I understand how that makes existing persisted CR data safe. However long we take to roll out support for the format, at some point, would we start treating existing persisted data that didn't match the specified format as invalid?

We could solve this together with the more general problem of CRD validation ratcheting. I.e. the case where a CRD author adds a format to a CRD, and there already persisted CR data which then immediately becomes invalid. The case we are discussing is similar except that instead of the CRD author adding a format, the CRD author already had a non-validating format in the CRD and then Kubernetes changed the format to be validating.

Ratcheting rule would be: if any CR data fails validation but is unchanged from the persisted value, we ignore the validation failure and allow the update.

EDIT: This is also what we need for rollback. Note that because any format is allowed on any CRD, this would allow for rollback to arbitrary old versions.

@liggitt
Copy link
Member

liggitt commented May 9, 2023

We could solve this together with the more general problem of CRD validation ratcheting.

I agree a solution to that would resolve the issue with this. Shouldn't we get that solution in place before adding this? This would be the first example I can think of where we made a change that would reinterpret existing persisted CRDs in ways that would invalidate existing data

@jpbetz
Copy link
Contributor Author

jpbetz commented May 9, 2023

We could solve this together with the more general problem of CRD validation ratcheting.

I agree a solution to that would resolve the issue with this. Shouldn't we get that solution in place before adding this? This would be the first example I can think of where we made a change that would reinterpret existing persisted CRDs in ways that would invalidate existing data

SGTM. I'll poke around and see who might be able to implement.

@jpbetz
Copy link
Contributor Author

jpbetz commented Jul 21, 2023

/close
When we circle back on this we'll either reopen or create a new PR.

@k8s-ci-robot
Copy link
Contributor

@jpbetz: Closed this PR.

In response to this:

/close
When we circle back on this we'll either reopen or create a new PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@alexzielenski
Copy link
Member

We could solve this together with the more general problem of CRD validation ratcheting.
I agree a solution to that would resolve the issue with this. Shouldn't we get that solution in place before adding this?

Just updating progress on this front was made with CRD Ratcheting Alpha added in 1.28: https://kep.k8s.io/4008

Will need to wait for it to mature before using it for name formats, but in the interim we can use inline OpenAPI schemas to describe the regex/cel expressions used for most formats.

@jpbetz
Copy link
Contributor Author

jpbetz commented Jun 28, 2024

/reopen

CRD Ratcheting was promoted to beta in 1.30, freeing us up to resume progress on this.

@k8s-ci-robot k8s-ci-robot reopened this Jun 28, 2024
@k8s-ci-robot
Copy link
Contributor

@jpbetz: Reopened this PR.

In response to this:

/reopen

CRD Ratcheting was promoted to beta in 1.30, freeing us up to resume progress on this.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@liggitt
Copy link
Member

liggitt commented Mar 20, 2025

Now that we have CRD ratcheting, it would be good to give guidance around when expanded openapi formats are the preferred way to model a validation, and when a CEL library is.

xref kubernetes/kubernetes#124490 (comment)

@jpbetz
Copy link
Contributor Author

jpbetz commented Mar 21, 2025

Now that we have CRD ratcheting, it would be good to give guidance around when expanded openapi formats are the preferred way to model a validation, and when a CEL library is.

xref kubernetes/kubernetes#124490 (comment)

We are missing well documented guidance. Let's get that going. I like the idea of CEL "supplementing" OpenAPI formats only were clearly needed.

Related- We've fallen behind on supporting formats. kubernetes/kubernetes#130639 describes where we haveCEL support for formats that SHOULD have an OpenAPI format.

To help catch up, we've added emulation version support to formats: kubernetes/kubernetes#130783. Combined with ratcheting, this should give us all the machinery we need to add the formats we agree should be added.

@jpbetz
Copy link
Contributor Author

jpbetz commented Mar 21, 2025

cc @thockin for the Declarative Validation aspect. We anticipate needing OpenAPI formats for the built-in formats that we validate with hand written code so that we can publish the declarative validation rules through OpenAPI.

@thockin
Copy link
Member

thockin commented Mar 21, 2025

Is there an extension-mechanism for format names? Like x-something or example.com/something?

The ultimate goal, I think is unification. CRDs can be written saying "this field is validated by format=x-k8s-dns-label" and we will do that server-side; Builtin types can express the same thing in a different way; apiserver publishes this in a way that an "enlightened" client can handle if they wanted to to do shift-left validation (e.g. as a git pre-commit).

@jpbetz
Copy link
Contributor Author

jpbetz commented Mar 21, 2025

Is there an extension-mechanism for format names? Like x-something or example.com/something?

Unfortunately no. The RFC drafts that describe "format" don't provide an extension mechanism. https://datatracker.ietf.org/doc/html/draft-handrews-json-schema-validation-01#section-7.2 proposes a core set of formats but the only wording for extensions is:

Implementations MAY add custom format attributes.  Save for agreement
   between parties, schema authors SHALL NOT expect a peer
   implementation to support this keyword and/or custom format
   attributes.

@thockin
Copy link
Member

thockin commented Mar 24, 2025 via email

@jpbetz
Copy link
Contributor Author

jpbetz commented Mar 25, 2025

Should we make one up?

Yes, that's probably best.

I poked around the set of formats listed in the draft RFC is being kept to a very small list and doesn't look to be accepting almost any new formats (semver was proposed and rejected).

The naming convention of formats is not specified, but the core set use lower-case with heavy uses of acronyms (uuid, uri, ...)

I'd prefer something not terribly long like x-k8s- or just k8s-. But I can see an argument for x-kubernetes- because we use that for JSON Schema customizations already. The context is different here. We're defining string values not field names, but still..

@jpbetz jpbetz reopened this Mar 25, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: jpbetz
Once this PR has been reviewed and has the lgtm label, please assign sttts for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@thockin
Copy link
Member

thockin commented Mar 25, 2025

Another option might be label-style? "k8s.io/dns-label"? As long as we pick something unique, we can always adapt later. The biggest concern I have is violating some documented or undocumented assumptions, like "must be alphanumeric".

I defer to your wisdom here

@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Mar 25, 2025
@jpbetz jpbetz force-pushed the k8s-formats branch 2 times, most recently from e2ab081 to 4e46223 Compare March 25, 2025 15:00
@jpbetz
Copy link
Contributor Author

jpbetz commented Mar 25, 2025

This PR has been refreshed and is ready for review.

I've gone with "k8s.io/" as the prefix after consulting a few LLMs about widely used conventions . The main feedback was that standard bodies are steering away from "x-" prefixes, and that the charset in this prefix is within norms.

Copy link
Member

@thockin thockin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple thoughts on names. Over time we have used 3 different DNS formats. But what we use is not really DNS compatible, it's DNS-ish.

  1. Should we continue with the "DNS" nomenclature? This looks like a significant move towards publishing this terminology so this may be our last chance to name the formats better. Or it might be too late already.

  2. Should we nominate one of the 3 as the "normal" format (e.g. k8s.io/dns-label) and let the others be wierd?

@jpbetz
Copy link
Contributor Author

jpbetz commented Mar 25, 2025

A couple thoughts on names. Over time we have used 3 different DNS formats. But what we use is not really DNS compatible, it's DNS-ish.

  1. Should we continue with the "DNS" nomenclature? This looks like a significant move towards publishing this terminology so this may be our last chance to name the formats better. Or it might be too late already.
  2. Should we nominate one of the 3 as the "normal" format (e.g. k8s.io/dns-label) and let the others be wierd?

Since DNS 1123 subdomain is the type we use most often for resource name validation, how about a format name that emphasizes that? "k8s.io/basic-name" or something?

@jpbetz
Copy link
Contributor Author

jpbetz commented Mar 25, 2025

How about:

  • k8s.io/qualified-identifier (dns 1123 subdomain)
  • k8s.io/identifier (dns 1123 label)
  • k8s.io/strict-identifier (dns 1035 label)

@thockin
Copy link
Member

thockin commented Mar 28, 2025

I am terrible at naming things, but a few quick thoughts:

  1. DNS 1123 Subdomain and Label are most common, 1035 less so, and it looks like 952 has been weeded out. So 1123 is "obviously" the default.

  2. We already use the word "qualified" in "qualified name". I hate it there, too.

  3. We should avoid "simple" in names.

  4. AFAICT Service is the last and only resource which demands 1035. Maybe we can ratchet it to 1123? Can we do the same for other users of 1035?

Then the name could just be "dns-label" or "dns-style-label" or something? Would love alternatives to "qualified name" too :)

@jpbetz
Copy link
Contributor Author

jpbetz commented Mar 29, 2025

  1. We already use the word "qualified" in "qualified name". I hate it there, too.

Maybe we just stick with subdomain then? We could consider "dns-full-name" or something. But if we're avoiding "qualified", is "full name" any better?

  1. We should avoid "simple" in names.

+1

  1. AFAICT Service is the last and only resource which demands 1035. Maybe we can ratchet it to 1123? Can we do the same for other users of 1035?

Dropping 1035 from the format list SGTM.

Then the name could just be "dns-label" or "dns-style-label" or something? Would love alternatives to "qualified name" too :)

Curious what value to the user using the term "DNS" adds? That aside, how about:

  • k8s.io/dns-subdomain (dns 1123 subdomain)
  • k8s.io/dns-subdomain-prefix
  • k8s.io/dns-label (dns 1123 label)
  • k8s.io/dns-label-prefix

?

@jpbetz
Copy link
Contributor Author

jpbetz commented Mar 29, 2025

My current concerns:

  • Not sure what value the term "DNS" really adds to the users of these formats (they don't conform to DNS specs.. even though they ALMOST conform)
  • The user of the term "label" is pulling in a term from DNS feels like it could confuse anyone that is thinking in k8s terms and immediately starts thinking of k8s labels when they see it.

@liggitt
Copy link
Member

liggitt commented Mar 31, 2025

I agree that invoking DNS is a bit confusing, especially if we invoke a particular detailed RFC which we then don't quite implement correctly (waves vaguely at all the issues complaining our 1123 validation... isn't)

I also agree that "label" is likely to confuse people in the context of Kubernetes. Perhaps "segment" would be clearer?


Taking a step back, I'm a little fuzzy on which of these goals we're trying to accomplish:

  • create formats that exactly match all currently used in-tree validations (with their corresponding bugs / inconsistencies / variants) for use by CRDs
  • create formats that exactly match all currently used in-tree validations (with their corresponding bugs / inconsistencies / variants) so we can publish openapi for in-tree types that can be translated into CRDs that will use identical validation
  • create formats that exactly match all currently used in-tree validations (with their corresponding bugs / inconsistencies / variants) so we can switch in-tree types to use generated validation
  • create formats that are based on in-tree validations but fix bugs / inconsistencies in them
  • evolve / change (either tightening or relaxing) validation of in-tree types

Example bugs:

Example inconsistencies / variants:

  • allowing or disallowing underscores
  • allowing or disallowing mixed-case
  • allowing or disallowing leading digits in segments
  • allowing all-numeric segments

@thockin
Copy link
Member

thockin commented Apr 1, 2025

Jordan, I think this was aiming at:

  • create formats that exactly match all currently used in-tree validations (with their corresponding bugs / inconsistencies / variants) so we can publish openapi for in-tree types that can be translated into CRDs that will use identical validation
  • create formats that exactly match all currently used in-tree validations (with their corresponding bugs / inconsistencies / variants) so we can switch in-tree types to use generated validation

If we can find a better name to use in this that doesn't reference DNS and doesn't abuse the word "label" (which I never really recognized before but now cannot unsee), then that's great.

If we don't need to handle Service, perhaps we can use 1.34 to alpha-gate service name ratcheting to the 1123 -ish spec. 1.34 would tolerate services with 1123-style names, but not allow new ones. 1.35 would allow new usage. Unless we can lean on compat-version to save the alpha-release? @danwinship @aojea

Then we only have to name one thing instead of two.

"kubernetes short name" vs "... long name" ?

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Apr 4, 2025
@jpbetz
Copy link
Contributor Author

jpbetz commented Apr 8, 2025

I've updated this to use k8s.io/short-name and k8s.io/long-name

@danwinship
Copy link

If we don't need to handle Service

I think either way, we should probably ignore Service for purposes of this PR; either we manage to fix Service and it becomes just a k8s.io/short-name, or we don't, and we can add k8s.io/service-name later.

@thockin
Copy link
Member

thockin commented Apr 16, 2025

I asked OAI/learn.openapis.org#128 to see if there is a real norm here. I like the k8s.io/something extension mechanism, but from reading a lot of the issues around this, I wonder if just k8s-something is more "regular"?

k8s-short-name + k8s-long-name ?

k8s-shortname + k8s-longname ?

k8s-name + k8s-long-name ?

k8s-shortname + k8s-name

@jpbetz
Copy link
Contributor Author

jpbetz commented Apr 16, 2025

I wonder if just k8s-something is more "regular"?

We've gone full circle. This is what I'd proposed originally.

@thockin
Copy link
Member

thockin commented Apr 20, 2025

We have two open questions in this thread:

1) Overall notation

Almost full circle - after looking at all the openapi and JSON schema got repos and specs I figured:

  1. Nothing in the spec prevents us from using k8s.io/something
  2. Nobody else seems to use any notation other than lower-case-with-dashes for format names
  3. It seems conventional to use a "sufficiently unique" prefix for locally defined format names

As such, I think the lowest risk and most non-contentious path is to declare a prefix (eg. k8s-) and use it on all of our custom format names. We should use k8s-something-name style, even though it is not the same syntax as label names.

I think this should extend into declarative validation, and all our custom formats should be "k8s-something"

Agree or discuss more?

2) Rename "DNS label" and "DNS subdomain"

I can't come up with anything more semantically meaningful that "short" and "long". Maybe just more descriptive. We want to eschew the words "DNS" and "label" in the name. "object name" isn't right because we use them other places. Even "name" is questionable. The short form is similar to a hostname, but we should avoid that, too since it is not exact.

When it comes to object names we use subdomain more than label (sadly) but for other fields we seem to use label more (totally haphazard estimate by line-counting calls to each func).

So I am fine with "k8s-short-name" and "k8s-long-name" or maybe with dropping either "long" or "short" (making that one more default-ish).

Or I am fine with "k8s-name" (or "k8s-short-name") and "k8s-dotted-name" or something similarly descriptive.

You are closest to the consequences of this decision, Joe, so I defer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants