-
Notifications
You must be signed in to change notification settings - Fork 217
Add k8s name related formats #384
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
are non-standard format strings supposed to be namespaced or not? what are the implications for CRDs created using these format strings prior to this, allowing in data that ignores those format requirements, then auto-tightening validation when upgrading to a control plane with this support added? |
I have not found any mention of namespacing in the specs. xref https://datatracker.ietf.org/doc/html/draft-bhutton-json-schema-validation-01#name-custom-format-attributes I am open to naming suggestions for these formats though. I had briefly considered something like "kubernetes-name" but couldn't think of anything that aided rather than harmed clarity on what the format actually is.
Yes, exactly. Today, supported formats is used strip out unsupported |
/assign @apelisse Would you be willing to review? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good thanks
Thanks, looks better to me :-) |
pkg/validation/strfmt/default.go
Outdated
@@ -124,6 +124,48 @@ func IsEmail(str string) bool { | |||
return e == nil && addr.Address != "" | |||
} | |||
|
|||
const dns1123LabelFmt string = "[a-z0-9]([-a-z0-9]*[a-z0-9])?" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we're copying/pasting from apimachinery validation, are we inheriting these issues:
- This requires things to be lowercase, which is more strict than the RFC
kubernetes/kubernetes#79351 (comment)
- This tolerates individual labels exceeding 63 chars, which is not as strict as the RFC
Should we fix those issues before adopting those formats here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We intend to publish OpenAPI the in-tree types using these formats. So I think we're stuck inheriting the it all.
That said, we shouldn't using the RFC identifiers for formats that are not exactly compliant with the RFC. I'll propose some alternative names.
pkg/validation/strfmt/default.go
Outdated
dns1123subdomain := DNS1123Subdomain("") | ||
Default.Add("dns1123subdomain", &dns1123subdomain, IsDNS1123Subdomain) | ||
|
||
dns1123label := DNS1123Label("") | ||
Default.Add("dns1123label", &dns1123label, IsDNS1123Label) | ||
|
||
dns1035label := DNS1035Label("") | ||
Default.Add("dns1035label", &dns1035label, IsDNS1035Label) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if these get added to the default set of recognized formats here, how do we roll them out in a controlled way in kubernetes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was held until CRD Ratcheting was promoted to beta in 1.30, we'll leverage that capability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll update this code so we can manage the newly added formats separate from the pre-existing ones.
I'm not sure I understand how that makes existing persisted CR data safe. However long we take to roll out support for the format, at some point, would we start treating existing persisted data that didn't match the specified format as invalid? |
We could solve this together with the more general problem of CRD validation ratcheting. I.e. the case where a CRD author adds a format to a CRD, and there already persisted CR data which then immediately becomes invalid. The case we are discussing is similar except that instead of the CRD author adding a format, the CRD author already had a non-validating format in the CRD and then Kubernetes changed the format to be validating. Ratcheting rule would be: if any CR data fails validation but is unchanged from the persisted value, we ignore the validation failure and allow the update. EDIT: This is also what we need for rollback. Note that because any format is allowed on any CRD, this would allow for rollback to arbitrary old versions. |
I agree a solution to that would resolve the issue with this. Shouldn't we get that solution in place before adding this? This would be the first example I can think of where we made a change that would reinterpret existing persisted CRDs in ways that would invalidate existing data |
SGTM. I'll poke around and see who might be able to implement. |
/close |
@jpbetz: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Just updating progress on this front was made with CRD Ratcheting Alpha added in 1.28: https://kep.k8s.io/4008 Will need to wait for it to mature before using it for name formats, but in the interim we can use inline OpenAPI schemas to describe the regex/cel expressions used for most formats. |
/reopen CRD Ratcheting was promoted to beta in 1.30, freeing us up to resume progress on this. |
@jpbetz: Reopened this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Now that we have CRD ratcheting, it would be good to give guidance around when expanded openapi formats are the preferred way to model a validation, and when a CEL library is. |
We are missing well documented guidance. Let's get that going. I like the idea of CEL "supplementing" OpenAPI formats only were clearly needed. Related- We've fallen behind on supporting formats. kubernetes/kubernetes#130639 describes where we haveCEL support for formats that SHOULD have an OpenAPI format. To help catch up, we've added emulation version support to formats: kubernetes/kubernetes#130783. Combined with ratcheting, this should give us all the machinery we need to add the formats we agree should be added. |
cc @thockin for the Declarative Validation aspect. We anticipate needing OpenAPI formats for the built-in formats that we validate with hand written code so that we can publish the declarative validation rules through OpenAPI. |
Is there an extension-mechanism for format names? Like The ultimate goal, I think is unification. CRDs can be written saying "this field is validated by format=x-k8s-dns-label" and we will do that server-side; Builtin types can express the same thing in a different way; apiserver publishes this in a way that an "enlightened" client can handle if they wanted to to do shift-left validation (e.g. as a git pre-commit). |
Unfortunately no. The RFC drafts that describe "format" don't provide an extension mechanism. https://datatracker.ietf.org/doc/html/draft-handrews-json-schema-validation-01#section-7.2 proposes a core set of formats but the only wording for extensions is:
|
Should we make one up?
…On Fri, Mar 21, 2025 at 4:54 PM Joe Betz ***@***.***> wrote:
Is there an extension-mechanism for format names? Like x-something or
example.com/something?
Unfortunately no. The RFC drafts that describe "format" don't provide an
extension mechanism.
https://datatracker.ietf.org/doc/html/draft-handrews-json-schema-validation-01#section-7.2
proposes a core set of formats but the only wording for extensions is:
Implementations MAY add custom format attributes. Save for agreement
between parties, schema authors SHALL NOT expect a peer
implementation to support this keyword and/or custom format
attributes.
—
Reply to this email directly, view it on GitHub
<#384 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABKWAVAGQL2FXR3MJJI2GAL2VSREDAVCNFSM6AAAAABZOKHFWWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDONBUGY4TKNRVHA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
[image: jpbetz]*jpbetz* left a comment (kubernetes/kube-openapi#384)
<#384 (comment)>
Is there an extension-mechanism for format names? Like x-something or
example.com/something?
Unfortunately no. The RFC drafts that describe "format" don't provide an
extension mechanism.
https://datatracker.ietf.org/doc/html/draft-handrews-json-schema-validation-01#section-7.2
proposes a core set of formats but the only wording for extensions is:
Implementations MAY add custom format attributes. Save for agreement
between parties, schema authors SHALL NOT expect a peer
implementation to support this keyword and/or custom format
attributes.
—
Reply to this email directly, view it on GitHub
<#384 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABKWAVAGQL2FXR3MJJI2GAL2VSREDAVCNFSM6AAAAABZOKHFWWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDONBUGY4TKNRVHA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Yes, that's probably best. I poked around the set of formats listed in the draft RFC is being kept to a very small list and doesn't look to be accepting almost any new formats (semver was proposed and rejected). The naming convention of formats is not specified, but the core set use I'd prefer something not terribly long like |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: jpbetz The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Another option might be label-style? "k8s.io/dns-label"? As long as we pick something unique, we can always adapt later. The biggest concern I have is violating some documented or undocumented assumptions, like "must be alphanumeric". I defer to your wisdom here |
e2ab081
to
4e46223
Compare
This PR has been refreshed and is ready for review. I've gone with "k8s.io/" as the prefix after consulting a few LLMs about widely used conventions . The main feedback was that standard bodies are steering away from "x-" prefixes, and that the charset in this prefix is within norms. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple thoughts on names. Over time we have used 3 different DNS formats. But what we use is not really DNS compatible, it's DNS-ish.
-
Should we continue with the "DNS" nomenclature? This looks like a significant move towards publishing this terminology so this may be our last chance to name the formats better. Or it might be too late already.
-
Should we nominate one of the 3 as the "normal" format (e.g.
k8s.io/dns-label
) and let the others be wierd?
Since DNS 1123 subdomain is the type we use most often for resource name validation, how about a format name that emphasizes that? "k8s.io/basic-name" or something? |
How about:
|
I am terrible at naming things, but a few quick thoughts:
Then the name could just be "dns-label" or "dns-style-label" or something? Would love alternatives to "qualified name" too :) |
Maybe we just stick with subdomain then? We could consider "dns-full-name" or something. But if we're avoiding "qualified", is "full name" any better?
+1
Dropping 1035 from the format list SGTM.
Curious what value to the user using the term "DNS" adds? That aside, how about:
? |
My current concerns:
|
I agree that invoking DNS is a bit confusing, especially if we invoke a particular detailed RFC which we then don't quite implement correctly (waves vaguely at all the issues complaining our 1123 validation... isn't) I also agree that "label" is likely to confuse people in the context of Kubernetes. Perhaps "segment" would be clearer? Taking a step back, I'm a little fuzzy on which of these goals we're trying to accomplish:
Example bugs:
Example inconsistencies / variants:
|
Jordan, I think this was aiming at:
If we can find a better name to use in this that doesn't reference DNS and doesn't abuse the word "label" (which I never really recognized before but now cannot unsee), then that's great. If we don't need to handle Service, perhaps we can use 1.34 to alpha-gate service name ratcheting to the 1123 -ish spec. 1.34 would tolerate services with 1123-style names, but not allow new ones. 1.35 would allow new usage. Unless we can lean on compat-version to save the alpha-release? @danwinship @aojea Then we only have to name one thing instead of two. "kubernetes short name" vs "... long name" ? |
I've updated this to use |
I think either way, we should probably ignore Service for purposes of this PR; either we manage to fix Service and it becomes just a |
I asked OAI/learn.openapis.org#128 to see if there is a real norm here. I like the k8s-short-name + k8s-long-name ? k8s-shortname + k8s-longname ? k8s-name + k8s-long-name ? k8s-shortname + k8s-name |
We've gone full circle. This is what I'd proposed originally. |
We have two open questions in this thread: 1) Overall notationAlmost full circle - after looking at all the openapi and JSON schema got repos and specs I figured:
As such, I think the lowest risk and most non-contentious path is to declare a prefix (eg. I think this should extend into declarative validation, and all our custom formats should be "k8s-something" Agree or discuss more? 2) Rename "DNS label" and "DNS subdomain"I can't come up with anything more semantically meaningful that "short" and "long". Maybe just more descriptive. We want to eschew the words "DNS" and "label" in the name. "object name" isn't right because we use them other places. Even "name" is questionable. The short form is similar to a hostname, but we should avoid that, too since it is not exact. When it comes to object names we use subdomain more than label (sadly) but for other fields we seem to use label more (totally haphazard estimate by line-counting calls to each func). So I am fine with "k8s-short-name" and "k8s-long-name" or maybe with dropping either "long" or "short" (making that one more default-ish). Or I am fine with "k8s-name" (or "k8s-short-name") and "k8s-dotted-name" or something similarly descriptive. You are closest to the consequences of this decision, Joe, so I defer. |
This adds the most heavily used name formats in k8s:
Note: These formats implemented in k8s in ways that do not exactly match the RFCs (kubernetes/kubernetes#71140,
kubernetes/kubernetes#79351). But, intend to use these formats to communicate the existing Kubernetes validation rules via OpenAPI, so we want to keep them as-is.
Benefits:
xref: https://json-schema.org/draft/2020-12/json-schema-validation#name-custom-format-attributes, https://spec.openapis.org/registry/format/