Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DNM: Retool how validators register and are called #89

Open
wants to merge 33 commits into
base: validation-gen
Choose a base branch
from

Conversation

thockin
Copy link
Collaborator

@thockin thockin commented Jan 4, 2025

DNM (yet): this builds on the function-style tag args PR.

Looking at how to not fail silently when tags are used wrong. I started adding more lint rules for things like "you can't use +optional on a type". It struck me that this is something that should be handled more first-class, rather than ex post facto. We will inevitably forget to add the lint rules for some tags.

We have the tag docs struct which includes a "contexts" list, which indicates when a tag is usable. I tried automating that check as a lint rule and it's not ideal as-is. It also highlighted that most of our tag docs are just WRONG (cargo culted) and that the context definitions we have are not really great.

Examples:

  • optional/required should only be on types
  • eachVal/eachKey claim they can be used on list values (they can't today)
  • listMapKey claims it can be used on list values (I don't think that makes sense?)
  • subfield claims it can be used on list values (I don't think it can)
  • unionDiscriminator/unionMember claim they can be used on list values (I don't think that makes sense?)

Lastly, each ExtractValidations() call doesn't have enough information to DTRT even if we fixed the above.

This PR started with expanding the contexts to distinguish struct-fields from list/map keys/values. It evolved into a different way of approaching tags.

Rather than each plugin's ExtractValidations() being responsible for checking that it is used in an appropriate context (and getting new args to indicate the extra information), I tried to do something a little more "scaffolding" centric. It changes the init sequence to register validators which include the tag string and contexts (basically the docs struct, but more fine-grained), but also having generic code look up the tags, ensure correct context, and then call a method to produce the specific FunctionGen objects.

Basically, centralize the tag-processing. Now generation will fail with errors like:

failed to generate validations: k8s.io/code-generator/cmd/validation-gen/output_tests/optional.T1: tag "k8s:optional" cannot be specified on type definitions

or

failed to generate validations: field k8s.io/code-generator/cmd/validation-gen/output_tests/options.T1.S2: tag "k8s:ifOptionDisabled": takes exactly 1 argument

...but the optional plugin didn't have to do the work. Some plugins still need to do work, like if they only apply to strings, but that seems OK to me.

This PR includes all my commits, even when later commits undo or change part of it, so readers can see the thought process. In particular, this centralized tag handling is insufficient for things like union which needs to run on the TYPE, but the tags are on FIELDs.

Now there are "tag validators" and "type validators" (which run AFTER child fields). Union has tag handlers that are used to validate context (e.g. "+union only applies to struct fields") and accumulate info, and a type handler which uses the accumulated info.

It may be possible to convert things like eachVal to "regular validators.

I wanted to move registry to a different package, but to avoid a circular dep, it has to become 3 packages, and it didn't seem worthwhile.

@thockin thockin changed the title Retool how validators register and are called DNM: Retool how validators register and are called Jan 4, 2025
@thockin thockin force-pushed the validation-gen_tag-descriptors branch from 5165747 to ca520e4 Compare January 6, 2025 01:02
@thockin
Copy link
Collaborator Author

thockin commented Jan 6, 2025

For giggles I added some WIP commits to make +eachVal a "regular" tag. To do that, I had to do listMapKey and listType, too.

This probably could have been simpler, and list-map could be treated "specially" (with info passed to validators), but I wanted to see if I could do it functional-style.

It sorta works. There are some issues around pointerness, still, and it's not as understandable as I'd like, but I think you can get the idea. Basically there's a family of list-oriented validators which work together to collect info about keys and then extract old values when we can.

The big remaining problem is that tags are discovered in sort-order, which means that +eachVal is run before +listMapKey. I see two ways to solve it:

  1. internally -- collect a list of funcs to call from the new FieldValidator, which runs after all tags.

  2. let tags declare a sort order for discover (they already do for execution). E.g. "schematic" vs "content" (words TBD).

If we like the approach, we can see how subfield would work, too.

@thockin thockin force-pushed the validation-gen_tag-descriptors branch from ca520e4 to 4e424a7 Compare January 6, 2025 17:19
@thockin
Copy link
Collaborator Author

thockin commented Jan 6, 2025

For more giggles, I made +k8s:subfield(ls)=+k8s:eachVal2=+k8s:format=ip-sloppy work - that's running on each value of a subfield of list type.

I can't yet reverse that, until I convert subfield to a regular tag.

thockin added 22 commits January 6, 2025 16:18
This defines a parser for tags which takes a single Go-style identifier
argument (but leaves room for better parsing).
Instead of registering a factory-function and expecting the resulting
type to extract tags itself, we now register a "tag descriptor" which
has enough information to centrally check some contextual requirements
(e.g. where can this tag be used) and then call the plugins to process
the extracted value.  This avoid all the plugins having to do validation
for context, though they may still have to do some handling, e.g. if
they care about lists vs scalars.

This commit leaves the tree in a working state.  It only does the
`optional` tag, and subsequent commits will do more.  At the end, the
legacy code will be removed.
This required handling embedded tags, since those are used all over the
place.  This commit also simplifies TagContext.
Also, since tag descriptors cover one tag, the new Docs() method only
needs to return one value.
Also split tag-descriptor code to a new file
TypeValidators will be called on every type and should be comparatively
rare.  This is needed to make the union validation work.  Specifically:

1) TagValidators are triggered to accumulate details about which fields
   are in which unions, as well as about discriminators. Emit no
   validations.
2) TypeValidator emits the final validation.

It's a bit more spread out than before but it allows the centralization
of tag handling.

More renames will follow, to simplify the UX.
This was exposed by a bug where I ended up clobbering rather than
appending.  No reason I can see not to do it this way always.
thockin added 11 commits January 6, 2025 16:26
I thought splitting them was cleaner, but it's not.
It makes more sense to sort here, rather than later (when emitting)
because:

Consider a type or field with the following comments:

```
// +k8s:validateFalse="111"
// +k8s:validateFalse="222"
// +k8s:ifOptionEnabled(Foo)=+k8s:validateFalse="333"
```

Tag extraction will retain the relative order between 111 and 222, but
333 is extracted as tag "k8s:ifOptionEnabled".  We iterate that map
(in a random order).  When it reaches the emit stage, the
"ifOptionEnabled" part is gone, and we have 3 FunctionGen objects, all
with tag "k8s:validateFalse".  They are in a non-deterministic order
because of that map iteration.  If we sort them now, we don't have
enough information to do something smart, unless we look at the args,
which are sort of opaque to us.

Sorting it earlier means we can sort "k8s:ifOptionEnabled" against
"k8s:validateFalse".  All of the records within each of those is
relatively ordered, so the result here would be to put "ifOptionEnabled"
before "validateFalse" (lexicographical is better than random).
...to enable eachKey and listMap tags, hopefully
This requires doing listMap at the same time.

It seems to build for a few test cases, but lots of debris to fix,
still.
@thockin thockin force-pushed the validation-gen_tag-descriptors branch from 4e424a7 to 2d63642 Compare January 7, 2025 00:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants