-
Notifications
You must be signed in to change notification settings - Fork 131
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pydantic validator #1121
Pydantic validator #1121
Conversation
@cswartzvi would you mind creating an example under Otherwise just need to make sure the decorator turns up in our documentation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this is really cool what you did. Nice work! Really appreciate it!
None of those strike me as blockers or even problems. Vaex is just testing, if it works then I'm not worried. Happy to support pydantic>=2.0
only. So yeah! Docs will be great, let us know if you need help getting them set up!
Also, I don't think the tests are getting run. Can dig in tomorrow(this is a bit confusing), but it looks like:
|
@elijahbenizzy sorry, I got overloaded with work the past few days. I am going to add the documentation either today or tomorrow - I will take a crack at it and let you know if I run into issues. Did you ever figure out why my test were not running? If not, I can take a look into that as well. |
Thank you! I have not yet -- if you don't mind that would be much appreciated, otherwise happy to carve out some time to figure out what's happening (lots of bash scripts...). |
242ff16
to
f888ef0
Compare
I was looking into why my tests didn't run when I pushed the initial set of commits: https://app.circleci.com/pipelines/github/DAGWorks-Inc/hamilton/4035/workflows/c7dbcc07-b6df-4fd1-9a38-a76e81b9cb53 I don't have a lot of experience with CircleCI, so please forgive me if I am missing something basic, but I noticed that the git diff --name-only HEAD^ HEAD | grep '^.ci\|^.circleci\|^graph_adapter_tests\|^hamilton\|^plugin_tests\|^tests\|^requirements\|setup' > /dev/null This command compares the current commit ( If my hunch is right, I see three solutions
Thoughts? Still working on those docs BTW |
I think the second one is probably the cleanest -- better to run more tests than needed than undertest... I can make the change, or if you're in the mood to, go for it! These are tests we'll catch when merging to main (it should be smart enough to diff against main), but it's nice to catch earlier. As an edge-case, we can probably get the comparison branch rather than |
@elijahbenizzy I would be happy to make the change. I will start with option 2 and then see how hard it is to incorporate that edge case. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically there, just a few notes about docs (which I know is a WIP). To get shipped:
- Add some more docstrings + add to the right docs reference (ping if you want help, but start with this slack comment)
- Rebase against the check_for_changes PR after we merge to
main
ensure that it does as intended on this
Note that vaexio/vaex#2384 has been resolved
fa895af
to
7d11a4c
Compare
@elijahbenizzy I rebased and added that missing docstring - looks like changes were detected correctly in CI. How deep do you think I should go with the documentation? I see three potential places it could be added:
Does that sound reasonable? |
Sorry for the back and forth, this is ready! Only thing, re: docs: I think what you did is good enough out of those three. If you want to add an example it would obviously be appreciated but don't want to slow you down. The place that I'd add it is here: https://hamilton.dagworks.io/en/latest/reference/decorators/. This way we have a nice reference with the docstring. To do so you can:
If this is too much for you/you don't have time, let me know and I can add it in easily after merging. You've already done a ton! Otherwise I want to get this out in the next release (tuesday if we can!). Really appreciate it 🫡 |
No problem, I enjoy helping! I would like to add at least one more doc - I have something queued up, just one quick question. You mentioned adding Edit: I pushed an example of what I am talking about. If that's not what you're looking for just let me know 😄 |
Yes, I think that's pretty reasonable, perhaps a better place to put it! Only thing is to add the autoclass or whatever there (and just make it clear which maps to which). All nits though, this is pretty much good to go as far as I'm concerned! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, let's do the last docs stuff then merge! Appoving so all I have to do is click merge next.
I added the autoclass to the bottom of the page and tightened up a few of the examples. Let me know if it needs anything else! |
Looks great, thank you! Merging! |
This PR adds pydantic integration via a new data validator and plugin. Resolves #473.
Changes
I added a file called
data_quality/pydantic_validators.py
with a new default validatorPydanticModelValidator
. This validator is dynamically added to the list of available default validators if pydantic is available - very similar, some would say identical 😄, to how the pandera validators are added. The pydantic validator is passed amodel
parameter that is then used to validated the output of the decorated function:I also added a plugin file
plugins/h_pydantic.py
with a variant of thecheck_output
decorator that uses the return type annotation to establish the pydantic model to validate:My implementation uses the pydantic
TypeAdapter
- mainly becausepydantic.validate_call
does not have an option to only check the return value.TypeAdapter
allows you to specify astrict
mode where type coercion is turned off, since modifying the outputs is not currently allowed (that is correct, right?). I have enabledstrict
mode for both the validator and the plugin.Although I think that it is idiomatic to pydantic, I should point out that
strict
mode does not stop you from doing this (i.e. this passes validation):One last thing to mention is that
h_pydantic.check_output
currently checks that the return type annotation is a subclass ofpydantic.BaseModel
. In theory, you could use pydantic to check all kinds of things (builtins,Annotated
types, ...), however I was having trouble getting it to play nicely with validator resolution so I scraped it.How I tested this
I added a file to the testing suite
test_pydantic_data_quality.py
that tests the validator and thecheck_out
plugin decorator for both basic and complex cases.Notes
There are a couple of points I wanted to bring up:
TypeAdapter
is a pydantic 2.0 feature and in conflict with the[vaex]
extra dependency 😞. I didn't notice this until I was updatingpyproject.toml
, let me know if this is a deal breaker and I will come up with an alternative implementationmodel
(viceschema
) andpydantic.check_output
(vicepydantic.check_output_schema
) - I can change them back if desired, I just thought they fit better with the terminology of pydantic and the ergonomics of the pandera plugin, respectively.Checklist
Thanks for the opportunity to dig into this!