Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stratified/equal sampling in FewShotDataLoader #15

Merged
merged 7 commits into from
Dec 5, 2023

Conversation

azoz01
Copy link
Owner

@azoz01 azoz01 commented Dec 1, 2023

No description provided.

@azoz01 azoz01 requested a review from DawidPludowski December 1, 2023 21:45
Copy link

github-actions bot commented Dec 1, 2023

Coverage

coverage
FileStmtsMissCoverMissing
liltab
   __init__.py00100% 
liltab/data
   dataloaders.py109397%47, 57, 62
   datasets.py87298%79, 187
   preprocessing.py60100% 
liltab/model
   heterogenous_attributes_network.py980100% 
   utils.py280100% 
liltab/train
   logger.py491178%18, 40, 43, 46, 55, 78–80, 83–85
   trainer.py673055%57, 67, 74, 81–84, 87–93, 96–104, 163–165, 168–175, 180–187, 192–199
   utils.py49492%47, 62, 75, 85
TOTAL4935090% 

Tests Skipped Failures Errors Time
47 0 💤 0 ❌ 0 🔥 9.061s ⏱️

@@ -66,8 +67,10 @@ def __init__(

self.y = self.df[self.target_columns]
if self.encode_categorical_target:
self.y = pd.get_dummies(self.y.astype("category"))
self.y = torch.from_numpy(self.y.to_numpy()).type(torch.float32)
self.one_hot_encoder = OneHotEncoder(sparse=False).set_output(transform="pandas")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RandomFeaturesPandasDataset needs to have the same logic applied (target_column and cases, self.raw_y etc.)


self.curr_episode = 0

self.n_rows = len(self.dataset)

if sample_classes_equally:
self.y = dataset.raw_y
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This step can be done in datasets module with one-hot from pytorch (https://pytorch.org/docs/stable/generated/torch.nn.functional.one_hot.html) and reduce memory allocation (both self.y and self.raw_y in datasets)

Copy link
Collaborator

@DawidPludowski DawidPludowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Main requested changes:

  • implement PandasDataset new functionalities to RandomFeaturesPandasDataset
  • reduce code in init of FewShotDataLoader and also move it to new function

Copy link
Collaborator

@DawidPludowski DawidPludowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Main requested changes:

  • implement PandasDataset new functionalities to RandomFeaturesPandasDataset
  • reduce code in init of FewShotDataLoader and also move it to new function

@azoz01 azoz01 requested a review from DawidPludowski December 5, 2023 18:58
Copy link
Collaborator

@DawidPludowski DawidPludowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok

@azoz01 azoz01 merged commit d70e2e6 into develop Dec 5, 2023
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants