-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[PROPOSAL] Pydantic Kitfile
class
#40
base: main
Are you sure you want to change the base?
Conversation
@Burhan-Q Thanks for the PR. I had a similar thought not to long ago on moving towards Pydantic. I don't have any strong reservations here. I will have a look at this more in depth sometime in the coming days. My only callout is for the future, it may make sense to draft a quick issue over a PR for changes like these. I would hate for you or others in the community to invest significant time in a feature that not be merged. |
@jfuss I agree, an issue would have probably been the right way to go before starting work. I would love to see this work merged, but even if it weren't, I find (personal) value in doing it regardless, which is why I just started with a PR. |
I think this would be a useful addition, even for general "I want to parse a Kitfile in my python code" cases (i.e. outside of doing something with KitOps). I'm mostly on the Go side though, so I'll leave it for Nida and Jacob to review for appoval. One potential concern is maintaining sync between the classes here and the original definitions in KitOps, but this is becoming less of a concern as we've mostly stabilized the Kitfile format (the intention is that manifestVersion is used for this, but I'll admit to playing a little fast-and-loose with it so far). |
@amisevsk Is there anything else that gives you pause from the Kitfile POV? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the Models, path
s are the only required fields. Everything else needs to be marked as Optional.
kitops/modelkit/pydantic_kit.py
Outdated
raise FileNotFoundError(f"Path '{self.path}' not found.") | ||
if Path(self.path).is_absolute(): | ||
try: | ||
self.path = Path(self.path).relative_to(Path.cwd()).as_posix() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure we should be doing this relative_to
the cwd
. I could be running a script from a directory that is not the context for the commands. Could be missing something though.
My off the cuff thought here is, we should leave that validation to the kitops
cli we shell out too for pack and such.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a script is called from another directory, Path.cwd()
will return the directory where the shell is calling the script, not where the script is located. This validation attempt to update to a relative path if given an absolute path instead.
If all absolute paths are provided, then relative paths to the current working directory will be used, if it's a mixture of absolute and relative paths, with the relative paths being relative to another directory, then that would be an issue.
I presumed that a user would be most likely to provide either all relative or all absolute paths, not a mixture. That said, I would guess that at a higher-level there could be a check to verify all paths provided are relative to the same source path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This validation attempt to update to a relative path if given an absolute path instead.
We cannot do this as a library. The paths need to be relative to the context of the command not where a python script is running. From the docs on kit pack
:
"Unless a different location is specified, this command looks for the kitfile at the root of the provided context directory. Any relative paths defined within the kitfile are interpreted as being relative to this context directory."
We cannot be assume and mutate paths within a kitfile. We can validate they are files/dir, they exist and go as far as validating that can have a common context but we cannot assume what the context is. We shouldn't assume this is just as a script either. These are all assumptions that limit the use of a library.
My recommendation is to just is to just treat paths as strs
and let the kit cli validation further, since the context is passed into the commands. I know I pushed you to move toward FilePath
from str
, so sorry about the churn. But I believe this validation hinders the library more than it helps right now.
kitops/modelkit/kitfile.py
Outdated
docs (Optional[list[DocsEntry | dict]], optional): Included documentation for the model. Defaults to None. | ||
model (Optional[ModelSection | dict], optional): Details of the models included. Defaults to None. | ||
""" | ||
super().__init__( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we calling __init__
from a build
method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't call super().__init__()
from the subclass init
method. It can only be called after the file is loaded. If constructing the Kitfile
object manually, the parent class init
method needs to be called. The name build
was arbitrary and can be changed to anything that fits better.
The aim was to maintain the inputs as-is for the Kitfile
class, but also provide the flexibility to construct the class after init
using this method instead of property assignments. If using property assignments is critical to retain, then there might be a lot more effort to make that work without losing the data model assurance Pydantic provides.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't call super().init() from the subclass init method.
This doesn't make sense to me. You can and typically is good practice too. Do you have docs on this you can link to? Could be missing something. Maybe some constraint from pydantic?
Why this generally doesn't make sense to me
To do this, you have to instantiate the class through Kitfile(**kwargs)
. So by the time you are calling build
, you are calling an instance of the class but build
calls super().__init__()
again. Just feels really weird and I don't see the benefit in doing this over
kit = Kitfile(**kwargs)
...
kit.package = some_new_or_mutated_package
...
Can you give me a concrete example of usage? That might help me understand what you are trying to target for a UX point of view.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The existing code for init
:
pykitops/kitops/modelkit/kitfile.py
Lines 37 to 45 in f311302
class Kitfile: | |
""" | |
Kitfile class to manage KitOps ModelKits and Kitfiles. | |
Attributes: | |
path (str): Path to the Kitfile. | |
""" | |
def __init__(self, path: str | None = None) -> None: |
does not accept **kwargs
and instead relies on using @property.setter
methods to populate the attributes. I did not change the design of the init
method, only mirrored what existed. Adding **kwargs
(or even better, the accepted key word arguments) would certainly remove some of the need for this, but since the init
method currently only accepts the path
key word, when using a Pydantic model with any required fields, means that calling super().__init__()
should be deferred.
The current design doesn't follow any established patterns, or at least none I'm aware of, it was put together this way to ensure that no errors were thrown when trying to initialize the KitFile
class and to mirror the existing design as much as possible. To be honest, it was a choice to make it work sufficiently well so that I felt comfortable opening the PR (since it's the best way to get feedback). I'll admit, it's an odd design but from my standpoint it was better to avoid changing as many interfaces as possible so that most of the changes were "invisible" to anyone else.
That said, it can likely be changed. Given the updates around what fields are optional, it's likely that there is room to change the behavior here. When I have some time soon, I'll reevaluate this design and review the required fields, as I have some ideas on how to improve the design overall.
raise ValueError( | ||
f"Kitfile must be a dictionary with allowed keys: {', '.join(self._kitfile_allowed_keys)}" | ||
) from e | ||
if any(ALLOWED_KEYS.difference(data.keys())): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will cause issues if we add a new field right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. It is only a reflection of the existing code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me ponder this. I am not sure we care if we are making the classes now.
from pydantic import BaseModel, Field, model_validator | ||
|
||
|
||
class BasePathModel(BaseModel): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In all the models I think we need to add extras=allowed
. So we have some forward compatibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a maintainer of the repo, so it's not my call to make, but given the steps taken for validation and existing logic to ensure only permitted keys were in use, it seems to me it would be preferable to add anything new explicitly. If that's not done, then documentation and tests might be lacking, but it also might not be clear to users what is or isn't valid.
I have no issue adding it, but from the outside it doesn't seem like it aligns with the design.
@jfuss thanks for the feedback. I've incorporated most of the changes and marked those comments as resolved. I've replied to others to get your feedback. I did have one additional question that will help me understand better what to do next here regarding:
When I looked thru the docs https://kitops.org/docs/kitfile/format/ it wasn't clear which fields were required or not. So I want to make it clear for myself, it's not necessary for The |
@amisevsk Can keep me honest but the only required fields are any You can also look at the Go code for the kitfile: https://github.com/jozu-ai/kitops/blob/main/pkg/artifact/kitfile.go. Anything with |
Just asked GPT real quick, looks like the required attributes are GPT responseThe required attributes for the objects defined in KitFile
Package
Docs
Model
ModelPart
Code
DataSet
LayerInfo
In summary, the only required attributes are:
All other attributes are optional. |
I'll hold off until I hear from a trusted source (more than GPT). Might not get to it for another day or two anyhow. The feedback is appreciated |
You may have found a bug :) The following is from kitfile overview.
|
Sorry for the delay here! I'm realizing the Go file is not a great source for this information Required fields are:
The Kitfile Go definition has mostly everything as omitempty, which is maybe a mistake; I think additional validation captures this requirement but is harder to notice in the file (this is largely me working around Go serialization weirdness). Additionally, by "required", we mean "present and not the empty string", since Go interprets empty strings as "not present". Worth also noting, the Everything else is optional, and technically the file manifestVersion: v1.0.0 is a valid Kitfile, though it won't do anything for you :) Another way to think about it: within the Kitfile, all we're doing is applying semantic categories to local files/directories on disk. To do this, we require a path (else how would we refer to a local file/directory), and don't technically require anything else (though |
kitops/modelkit/pydantic_kit.py
Outdated
class BasePathModel(BaseModel): | ||
"""Base class for validating paths.""" | ||
|
||
path: FilePath |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some paths can be directories. Should this encapsulate that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added DirectoryPath
kitops/modelkit/kitfile.py
Outdated
""" | ||
Serialize the Kitfile to YAML format. | ||
|
||
Args: | ||
suppress_empty_values (bool, optional): Whether to suppress | ||
no_empty_values (bool, optional): Whether to suppress |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will break anyone doing passing suppress_empty_values
to to_yaml()
. Since this is just renaming, I would suggest to revert and go back to suppress_empty_values
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reverted
kitops/modelkit/kitfile.py
Outdated
"To print Kitfile use to_yaml() and your favorite way to log/print; date=2025-02-21", | ||
DeprecationWarning, | ||
stacklevel=2, | ||
) | ||
print("\n\nKitfile Contents...") | ||
print("===================\n") | ||
output = self.to_yaml() | ||
output: str = self.to_yaml() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If mypy has a hard time with it, let's update it. Otherwise, let's let mypy interpret the type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gone!
Updated
from kitops.modelkit.kitfile import Kitfile
kf = Kitfile()
kf
>>> Kitfile(
manifestVersion='1.0.0',
package=Package(
name='',
version='',
description='',
authors=[]
),
code=[],
datasets=[],
docs=[],
model=None
) # NOTE: wrapped for readability
pkg = Package(name="Burhan", version="0.6.9")
ce = CodeEntry(path="kitops/modelkit/pydantic_kit.py")
>>> Kitfile(
manifestVersion='1.0.0',
package=Package(
name='Burhan',
version='0.6.9',
description='',
authors=[]
),
code=[
CodeEntry(
path='kitops/modelkit/pydantic_kit.py',
description='',
license=''
)
],
datasets=[],
docs=[],
model=None
) # NOTE: wrapped for readability |
Summary
Instead of using custom validators, proposed changes incorporates Pydantic BaseModel and validation. This PR is opened as a proposal, as it's understood there could be reasons beyond my awareness as to why not to make such a change. The aim of these changes is to reduce the amount of code needed for validation, while maintain robust validation and rule checking by implementing the Pydantic library.
What changes
Kitfile
which can be initialized in code directly or by usingdict
build()
method for Kitfile definition viadict
or Pydantic ModelsAdvantages
Kitfile
with validation included.BaseModel
class.Kitfile
, minimum author count, relative paths).Kitfile
components, possible to use for independent validation.manifestVersion
into strings.Kitfile.model_json_schema()
Data Model Schema
Updated
Kitfile
UsageFrom file
From dictionary
From Objects
Additional Notes
manifestVersion
follows a specific format (if needed).Literal
strings orEnum
forlicense
to ensure valid licenses are used.semver
frompydantic-extra-types
.