[FEA] Support dask query planning #73

ayushdg · 2024-05-21T22:14:17Z

Is your feature request related to a problem? Please describe.
Currently many functionalities/tests do not work when dask query planning is enabled (Default dask behavior).

This is an issue to track the gaps for query planning to work with Curator

FuzzyDuplicates
- Melt (now supported in 24.06)
- Custom blockwise/hlg logic in shuffle/merge steps
TestShuffle
- Specifically around dataset.df[self.rand_col] = dataset.df.map_partitions(self._add_rand_col)
  TypeError: Expected Pandas-like Index, Series, DataFrame, or scalar, got numpy.ndarray

The text was updated successfully, but these errors were encountered:

rjzamora · 2024-06-03T18:57:00Z

@ayushdg - I'd like to start working on this. Do you want to support both query-planning "on" and "off" moving forward? It may be a bit hard to do this without a bunch of compatibly code.

Also, note that melt should now be supported with cudf-2406.

sarahyurick · 2024-10-18T20:33:32Z

Closed by #139.

ayushdg added the enhancement New feature or request label May 21, 2024

sarahyurick closed this as completed Oct 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] Support dask query planning #73

[FEA] Support dask query planning #73

ayushdg commented May 21, 2024 •

edited

Loading

rjzamora commented Jun 3, 2024

sarahyurick commented Oct 18, 2024

[FEA] Support dask query planning #73

[FEA] Support dask query planning #73

Comments

ayushdg commented May 21, 2024 • edited Loading

rjzamora commented Jun 3, 2024

sarahyurick commented Oct 18, 2024

ayushdg commented May 21, 2024 •

edited

Loading