Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: dask_histogram.factory broken after dask 2024.12 #156

Merged
merged 7 commits into from
Dec 17, 2024
Merged

Conversation

lgray
Copy link
Collaborator

@lgray lgray commented Dec 17, 2024

Fixes #155

@lgray
Copy link
Collaborator Author

lgray commented Dec 17, 2024

just adding a test for now

@douglasdavis
Copy link
Collaborator

douglasdavis commented Dec 17, 2024

I don't think it's associated with Delayed this time. I think the issue is in dask_histogram.core._partitioned_histogram. The None's getting passed to dask-awkward's partitionwise need to be Alias or TaskRef

@lgray
Copy link
Collaborator Author

lgray commented Dec 17, 2024

I'll check it out in the morning. Not feeling like dealing with a partially documented and changing interface at the moment.

@lgray
Copy link
Collaborator Author

lgray commented Dec 17, 2024

But thank you for the pointers!

@douglasdavis
Copy link
Collaborator

douglasdavis commented Dec 17, 2024

No worries :) this gets the test to pass:

diff --git a/src/dask_histogram/core.py b/src/dask_histogram/core.py
index b80943d..b63823a 100644
--- a/src/dask_histogram/core.py
+++ b/src/dask_histogram/core.py
@@ -1026,6 +1026,7 @@ def _partitioned_histogram(
     # Single awkward array object.
     if len(data) == 1 and data_is_dak:
         from dask_awkward.lib.core import partitionwise_layer as dak_pwl
+        from dask_awkward.layers import _dask_uses_tasks
 
         x = data[0]
         if weights is not None and sample is not None:
@@ -1035,7 +1036,9 @@ def _partitioned_histogram(
         elif weights is None and sample is not None:
             g = dak_pwl(_blocked_dak, name, x, None, sample, histref=histref)
         else:
-            g = dak_pwl(_blocked_dak, name, x, None, None, histref=histref)
+            import functools
+            ff = functools.partial(_blocked_dak, weights=None, sample=None, histref=histref)
+            g = dak_pwl(ff, name, x)
 
     # Single object, not a dataframe
     elif len(data) == 1 and not data_is_df:

Looks possible to go around the new interface by just raising the _blocked_dak instances to higher order partials in each branch of that if with Nones, such that the Nones become part of the function object.

There are some other parts of the test-suite failing. I'll take a swipe at it as well.

@douglasdavis
Copy link
Collaborator

douglasdavis commented Dec 17, 2024

These tests are failing locally for me:

================================================================================= short test summary info =================================================================================
FAILED tests/test_boost.py::test_histogramdd_series - KeyError: '.0'
FAILED tests/test_boost.py::test_histogramdd_arrays_and_series - ValueError: spans must have compatible lengths
FAILED tests/test_boost.py::test_histogramdd_dataframe - ValueError: spans must have compatible lengths
FAILED tests/test_core.py::test_df_input[True] - ValueError: spans must have compatible lengths
FAILED tests/test_core.py::test_df_input[None] - KeyError: '.0'
======================================================================== 5 failed, 81 passed, 2 warnings in 2.90s =========================================================================

All have to do with dask.dataframe. Probably those are related to the Task spec amd need TaskRef/Aliasing. @lgray if you're OK with it I'd say this is fine to go in to fix the original issue. dask.dataframe things can be dealt with in another PR.

@lgray
Copy link
Collaborator Author

lgray commented Dec 17, 2024

Thanks! My only nit here is that dak.partitionwise_layer should be taking care of these cases correctly, no? So I think the right way to fix it would be to use TaskRef correctly over in dask-awkward?

@lgray
Copy link
Collaborator Author

lgray commented Dec 17, 2024

or is this sort-of-currying the function fine? it seems like a requirement on dependent code.

@lgray
Copy link
Collaborator Author

lgray commented Dec 17, 2024

will merge for now - we'll see if this nitpick crops up later.

@lgray lgray merged commit bf85767 into main Dec 17, 2024
19 checks passed
@douglasdavis
Copy link
Collaborator

Yeah I think the dak partitionwise layer should be able to handle "broadcasting" the None for all partitions of the actual data-carrying collection. I think in the past Dask was doing this correctly itself, but clearly they've changed things in blockwise.

For now we're just creating a higher order function to make sure a None gets used at each node as part of the callable instead of as an argument... the more I think about it the cleaner it seems.

@douglasdavis douglasdavis deleted the fix-dh-factory branch December 29, 2024 20:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Regression in use with dask-awkward
2 participants