-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
adding configuration options to uniques functionality #224
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quite some lines! Greatly appreciated.
-
As written in comments, perhaps you can leave the postprocessing parts of this PR out to keep its size more manageable and to have a better discussion on what to do there. I'm not strongly against dealing with the postprocessing here.
-
The names of the utility functions feel a bit wonky to me. But right now I dont'know how to name them better.
-
In the utility functions, you use
filter
,reduce
withlambda
functions callinglist
at the end. Is there any chance to change some of them to list comprehensions instead?
src/datajudge/constraints/uniques.py
Outdated
It can be configured in more detail by supplying a custom ``filter_func`` function. | ||
Some exemplary implementations are available in this module as ``datajudge.utils.util_filternull_default_deprecated``, | ||
``datajudge.utils.util_filternull_never``, ``datajudge.utils.util_filternull_element_or_tuple_all``, ``datajudge.utils.util_filternull_element_or_tuple_any``. | ||
For new deployments, using one of the above filters or a custom one is recommended. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The word deployment in this line seems odd to me. I'm not sure that this sentence is actually needed at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed.
src/datajudge/constraints/uniques.py
Outdated
For new deployments, using one of the above filters or a custom one is recommended. | ||
Passing None as the argument is equivalent to ``datajudge.utils.util_filternull_default_deprecated``, but triggers a warning. | ||
The deprecated default may change in future versions. | ||
To silence the warning, set ``filter_func`` explicitly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One could silence it by configuring the warming module to do so.
I think the fact that's a warning indicating eventual future deprecation should be enough.
src/datajudge/utils.py
Outdated
@@ -42,3 +44,106 @@ def format_difference( | |||
f"{s1[:diff_idx]}{_fmt_diff_part(s1, diff_idx)}", | |||
f"{s2[:diff_idx]}{_fmt_diff_part(s2, diff_idx)}", | |||
) | |||
|
|||
|
|||
def util_output_postprocessing_sorter( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was goign to suggest a move like this. Now, since these new functions are already in a utils
namespace, doesn't make much sense to prepend such set of characters for each function.
src/datajudge/utils.py
Outdated
return [elem[1:] for elem in lst], [-elem[0] for elem in lst] | ||
|
||
|
||
def util_filternull_default_deprecated(values: List[T]) -> List[T]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the default behavior until now, which you're using when the None
is used. But eventually still makes sense to use it in the future right? If so, I don't think that the function itself is deprecated and having that word in the function name is misleading.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My main issue with the current default behavior is that if the user adds a second column to the unique constraint, the NULL-filtering no longer works, even if both the original and second column are NULL.
I've renamed it to filternull_element
( | ||
negation, | ||
["col_int"], | ||
[ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we instruct our linter/formatter not to do one element per line in this list? Is really unnecessarily bloating the line count. Same for the other parameterizations.
Alternatively, perhaps you can do something like list(range(30))
instead.
Thanks for the thorough review @ivergara :) - I've integrated your suggestions or replied to your comments. Factoring out the postprocessing imo is not a great idea, since this would break the |
…y scripts for postgres integration testing from a fresh db every time
I've now moved the additional output configuration options to the base |
Co-authored-by: Ignacio Vergara Kausel <[email protected]>
024b0f5
to
308ff99
Compare
src/datajudge/constraints/base.py
Outdated
[Collection, Optional[Collection]], Collection | ||
] = None, | ||
output_remainder_slicer=slice(5), | ||
output_processors: List[OutputProcessor] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed you could define this as OutputProcessor | List[OutputProcessor]
and in the body you can do something like
if not isinstance(output_processor, List]:
output_processor = [output_processor]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for your PR @SimonLangerQC - much appreciated!
I gave it a look; don't be surprised by the number of comments - most of them are merely docs-related polishing. :)
Co-authored-by: Kevin Klein <[email protected]>
Co-authored-by: Kevin Klein <[email protected]>
Co-authored-by: Kevin Klein <[email protected]>
Co-authored-by: Kevin Klein <[email protected]>
Co-authored-by: Kevin Klein <[email protected]>
Co-authored-by: Kevin Klein <[email protected]>
Co-authored-by: Kevin Klein <[email protected]>
Co-authored-by: Kevin Klein <[email protected]>
Co-authored-by: Kevin Klein <[email protected]>
Co-authored-by: Kevin Klein <[email protected]>
Co-authored-by: Kevin Klein <[email protected]>
Co-authored-by: Kevin Klein <[email protected]>
Thanks for your comments @kklein - I've added your suggestions :) |
) -> Collection: ... | ||
|
||
|
||
def output_processor_sort( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might it be useful to have unit tests for output_processor_sort
, output_processor_limit
and sort_tuple_none_aware
in datajudge/tests/unit/test_unit.py
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! :)
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #224 +/- ##
==========================================
- Coverage 93.03% 92.44% -0.59%
==========================================
Files 18 18
Lines 1894 1973 +79
==========================================
+ Hits 1762 1824 +62
- Misses 132 149 +17 ☔ View full report in Codecov by Sentry. |
Docs update Co-authored-by: Kevin Klein <[email protected]> Docs update Co-authored-by: Kevin Klein <[email protected]> Docs update Co-authored-by: Kevin Klein <[email protected]> Docs update Co-authored-by: Kevin Klein <[email protected]> Docs update Co-authored-by: Kevin Klein <[email protected]> Docs update Co-authored-by: Kevin Klein <[email protected]> Docs update Co-authored-by: Kevin Klein <[email protected]> Docs update Co-authored-by: Kevin Klein <[email protected]> Docs update Co-authored-by: Kevin Klein <[email protected]> Docs update Co-authored-by: Kevin Klein <[email protected]> Docs update Co-authored-by: Kevin Klein <[email protected]> Docs update Co-authored-by: Kevin Klein <[email protected]> Docs update Co-authored-by: Kevin Klein <[email protected]> Docs update Co-authored-by: Kevin Klein <[email protected]> Docs update Co-authored-by: Kevin Klein <[email protected]> Docs update Co-authored-by: Kevin Klein <[email protected]> Docs update Co-authored-by: Kevin Klein <[email protected]> update doc string on null columns everywhere and fix typo Update docs Co-authored-by: Kevin Klein <[email protected]> Update docs Co-authored-by: Kevin Klein <[email protected]> Update docs Co-authored-by: Kevin Klein <[email protected]> docs updates update docs filternull docs clarification replace assert by raise ValueError shorten name to apply_output_formatting add unit tests for new utils functions set default to limit 100 elements ensure all relevant tests run for impala and ensure they pass disable extralong test for bigquery due to slow speed capitalization test handle parallel if table already created
adding configuration options to uniques functionality (#224) Docs update Co-authored-by: Kevin Klein <[email protected]> Docs update Co-authored-by: Kevin Klein <[email protected]> Docs update Co-authored-by: Kevin Klein <[email protected]> Docs update Co-authored-by: Kevin Klein <[email protected]> Docs update Co-authored-by: Kevin Klein <[email protected]> Docs update Co-authored-by: Kevin Klein <[email protected]> Docs update Co-authored-by: Kevin Klein <[email protected]> Docs update Co-authored-by: Kevin Klein <[email protected]> Docs update Co-authored-by: Kevin Klein <[email protected]> Docs update Co-authored-by: Kevin Klein <[email protected]> Docs update Co-authored-by: Kevin Klein <[email protected]> Docs update Co-authored-by: Kevin Klein <[email protected]> Docs update Co-authored-by: Kevin Klein <[email protected]> Docs update Co-authored-by: Kevin Klein <[email protected]> Docs update Co-authored-by: Kevin Klein <[email protected]> Docs update Co-authored-by: Kevin Klein <[email protected]> Docs update Co-authored-by: Kevin Klein <[email protected]> update doc string on null columns everywhere and fix typo Update docs Co-authored-by: Kevin Klein <[email protected]> Update docs Co-authored-by: Kevin Klein <[email protected]> Update docs Co-authored-by: Kevin Klein <[email protected]> docs updates update docs filternull docs clarification replace assert by raise ValueError shorten name to apply_output_formatting add unit tests for new utils functions set default to limit 100 elements ensure all relevant tests run for impala and ensure they pass disable extralong test for bigquery due to slow speed capitalization test handle parallel if table already created
No description provided.