Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add NaturalIdPartitioner to FDS #2404

Merged
merged 13 commits into from
Nov 10, 2023
Merged

Add NaturalIdPartitioner to FDS #2404

merged 13 commits into from
Nov 10, 2023

Conversation

adam-narozniak
Copy link
Member

@adam-narozniak adam-narozniak commented Sep 21, 2023

Issue

There are many datasets in which the field, like client id, writer id or speaker id, exists, and they can be divided based on this canonical form. However, the division of a dataset in such a way is not currently available in Flower Datasets.

Proposal

Add a new partitioner, NaturalIdPartitioner that enables dividing datasets based on this canonical (already present) division. Each group is associated with a unique value in the columns specified by a user.

  • Add NaturalIdPartitioner.
  • Extend _all_ in _init_.
  • Add tests.

Example usage

natural_id_partitioner = NaturalIdPartitioner(partition_by="speaker_id")
fds = FederatedDataset(datset="speech_command", subset="v0.02", partitioners={"train": natural_id_partitioner})
# The idx=0 below is not the speaker_id from the dataset, instead, it's just an int. 
fds.load_partition(idx=0, split="train")
# The original index form the dataset can be retrieved
fds.node_id_to_natural_id[0]

@adam-narozniak adam-narozniak marked this pull request as ready for review September 25, 2023 13:06
@adam-narozniak adam-narozniak marked this pull request as draft September 25, 2023 13:08
@adam-narozniak adam-narozniak marked this pull request as ready for review September 25, 2023 13:12
@adam-narozniak adam-narozniak changed the title Add cid partitioner Add CidPartitioner to FDS Sep 25, 2023
@adam-narozniak adam-narozniak changed the title Add CidPartitioner to FDS Add IdPartitioner to FDS Nov 7, 2023
@adam-narozniak adam-narozniak changed the title Add IdPartitioner to FDS Add NaturalIdPartitioner to FDS Nov 7, 2023
@danieljanes danieljanes enabled auto-merge (squash) November 10, 2023 10:25
@danieljanes danieljanes merged commit a76364d into main Nov 10, 2023
29 checks passed
@danieljanes danieljanes deleted the fds-add-cid-partitioner branch November 10, 2023 10:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants