Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean up internal column logic in _run_classifier_helper function #457

Merged
merged 3 commits into from
Jan 3, 2025

Conversation

sarahyurick
Copy link
Collaborator

Closes #425.

Signed-off-by: Sarah Yurick <[email protected]>
@sarahyurick sarahyurick added the gpuci Run GPU CI/CD on PR label Dec 23, 2024
Signed-off-by: Sarah Yurick <[email protected]>
nemo_curator/classifiers/base.py Show resolved Hide resolved
Comment on lines +124 to +125
if prob_col:
df[prob_col] = 0
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted CrossFit to internally create the prob_col, but I was having trouble getting this to work. Specifically, I was able to get the entire classification pipeline to work, but then at the end prob_col would be dropped somewhere and not returned with the result.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making the PR. I like the changes, thanks for cleaning it up.

Would you mind creating a issue on crossfit (with a MRE), we can merge this PR and then fix it in crossfit.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have opened an issue here: rapidsai/crossfit#108. Please LMK if there is anything else I can clarify there. Thanks!

Copy link
Collaborator

@VibhuJawa VibhuJawa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for making these changes

Comment on lines +124 to +125
if prob_col:
df[prob_col] = 0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making the PR. I like the changes, thanks for cleaning it up.

Would you mind creating a issue on crossfit (with a MRE), we can merge this PR and then fix it in crossfit.

nemo_curator/classifiers/base.py Show resolved Hide resolved
@sarahyurick sarahyurick marked this pull request as ready for review January 3, 2025 19:44
@sarahyurick sarahyurick added gpuci Run GPU CI/CD on PR and removed gpuci Run GPU CI/CD on PR labels Jan 3, 2025
@sarahyurick sarahyurick merged commit 694970a into NVIDIA:main Jan 3, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
gpuci Run GPU CI/CD on PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Clean up _run_classifier_helper function
2 participants