Add Dataset.pipe method, based on pandas.DataFrame.pipe. #734

copybara-service · 2025-02-14T17:31:09Z

Add Dataset.pipe method, based on pandas.DataFrame.pipe.

pipe is convenient because it allows for using method chaining syntax in an extensible fashion, with transformations that are not built-in methods on Dataset.

For example, consider shuffling a dataset in windows. It would be convenient if we could write something like:

ds = (
    dataset.MapDataset.range(400)
    .window_shuffle(window_size=10, seed=42)
    .batch(16)
    .repeat()
)

Unfortunately this doesn't work, because there is no window_shuffle() method. Instead you would need to write something like:

ds = (
    shuffle.WindowShuffleMapDataset(
        dataset.MapDataset.range(400),
        window_size=10,
        seed=42,
    )
    .batch(16)
    .repeat()
)

Readability suffers here, because the shuffle transformation comes out of order.

Instead, pipe lets us write something like, keeping transformations in the order in which they are applied:

ds = (
    dataset.MapDataset.range(400)
    .pipe(
        shuffle.WindowShuffleMapDataset,
        window_size=10,
        seed=42,
    )
    .batch(16)
    .repeat()
)

`pipe` is convenient because it allows for using method chaining syntax in an extensible fashion, with transformations that are not built-in methods on `Dataset`. For example, consider shuffling a dataset in windows. It would be convenient if we could write something like: ``` ds = ( dataset.MapDataset.range(400) .window_shuffle(window_size=10, seed=42) .batch(16) .repeat() ) ``` Unfortunately this doesn't work, because there is no `window_shuffle()` method. Instead you would need to write something like: ``` ds = ( shuffle.WindowShuffleMapDataset( dataset.MapDataset.range(400), window_size=10, seed=42, ) .batch(16) .repeat() ) ``` Readability suffers here, because the shuffle transformation comes out of order. Instead, `pipe` lets us write something like, keeping transformations in the order in which they are applied: ``` ds = ( dataset.MapDataset.range(400) .pipe( shuffle.WindowShuffleMapDataset, window_size=10, seed=42, ) .batch(16) .repeat() ) ``` PiperOrigin-RevId: 729289880

copybara-service bot force-pushed the test_726948545 branch 3 times, most recently from 1851d76 to 0027e19 Compare February 21, 2025 00:04

copybara-service bot force-pushed the test_726948545 branch from 0027e19 to 42039a1 Compare February 21, 2025 00:16

copybara-service bot merged commit 42039a1 into main Feb 21, 2025
1 of 2 checks passed

copybara-service bot deleted the test_726948545 branch February 21, 2025 00:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Dataset.pipe method, based on pandas.DataFrame.pipe. #734

Add Dataset.pipe method, based on pandas.DataFrame.pipe. #734

copybara-service bot commented Feb 14, 2025 •

edited

Loading

Add Dataset.pipe method, based on pandas.DataFrame.pipe. #734

Add Dataset.pipe method, based on pandas.DataFrame.pipe. #734

Conversation

copybara-service bot commented Feb 14, 2025 • edited Loading

copybara-service bot commented Feb 14, 2025 •

edited

Loading