Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Support dask query planning #73

Closed
1 of 4 tasks
ayushdg opened this issue May 21, 2024 · 2 comments
Closed
1 of 4 tasks

[FEA] Support dask query planning #73

ayushdg opened this issue May 21, 2024 · 2 comments
Labels
enhancement New feature or request

Comments

@ayushdg
Copy link
Collaborator

ayushdg commented May 21, 2024

Is your feature request related to a problem? Please describe.
Currently many functionalities/tests do not work when dask query planning is enabled (Default dask behavior).

This is an issue to track the gaps for query planning to work with Curator

  • FuzzyDuplicates
    • Melt (now supported in 24.06)
    • Custom blockwise/hlg logic in shuffle/merge steps
  • TestShuffle
    • Specifically around dataset.df[self.rand_col] = dataset.df.map_partitions(self._add_rand_col)
      TypeError: Expected Pandas-like Index, Series, DataFrame, or scalar, got numpy.ndarray
@ayushdg ayushdg added the enhancement New feature or request label May 21, 2024
@rjzamora
Copy link
Contributor

rjzamora commented Jun 3, 2024

@ayushdg - I'd like to start working on this. Do you want to support both query-planning "on" and "off" moving forward? It may be a bit hard to do this without a bunch of compatibly code.

Also, note that melt should now be supported with cudf-2406.

@sarahyurick
Copy link
Collaborator

Closed by #139.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants