-
Notifications
You must be signed in to change notification settings - Fork 99
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add blocksize to
DocumentDataset.read_*
that uses dask_cudf.read_*
(
#285) * fc Signed-off-by: Praateek <[email protected]> * review comments Signed-off-by: Praateek <[email protected]> * make blocksize work with parquet Signed-off-by: Praateek <[email protected]> * filetype Signed-off-by: Praateek <[email protected]> * fix merge Signed-off-by: Praateek <[email protected]> * add test cases Signed-off-by: Praateek <[email protected]> * add test file Signed-off-by: Praateek <[email protected]> * failing test for select_columns Signed-off-by: Praateek <[email protected]> * rename func name Signed-off-by: Praateek <[email protected]> * add test case for different columns Signed-off-by: Praateek <[email protected]> * improve test for different_cols Signed-off-by: Praateek <[email protected]> * .. Signed-off-by: Praateek <[email protected]> * review comments + add warnings for inconsistent schemas Signed-off-by: Praateek <[email protected]> * Update nemo_curator/utils/distributed_utils.py Co-authored-by: Sarah Yurick <[email protected]> Signed-off-by: Praateek Mahajan <[email protected]> * Update nemo_curator/utils/distributed_utils.py Co-authored-by: Sarah Yurick <[email protected]> Signed-off-by: Praateek Mahajan <[email protected]> * Update nemo_curator/utils/distributed_utils.py Co-authored-by: Sarah Yurick <[email protected]> Signed-off-by: Praateek Mahajan <[email protected]> * Update nemo_curator/utils/distributed_utils.py Co-authored-by: Sarah Yurick <[email protected]> Signed-off-by: Praateek Mahajan <[email protected]> * Update nemo_curator/utils/distributed_utils.py Co-authored-by: Sarah Yurick <[email protected]> Signed-off-by: Praateek Mahajan <[email protected]> * Update nemo_curator/utils/distributed_utils.py Co-authored-by: Sarah Yurick <[email protected]> Signed-off-by: Praateek Mahajan <[email protected]> * Update nemo_curator/utils/distributed_utils.py Co-authored-by: Sarah Yurick <[email protected]> Signed-off-by: Praateek Mahajan <[email protected]> * fix tests Signed-off-by: Praateek <[email protected]> --------- Signed-off-by: Praateek <[email protected]> Signed-off-by: Praateek Mahajan <[email protected]> Co-authored-by: Sarah Yurick <[email protected]>
- Loading branch information
1 parent
c54826a
commit e820b8b
Showing
7 changed files
with
814 additions
and
56 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.