Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix all the things by fixing multiprocessing #733

Merged
merged 3 commits into from
Apr 18, 2022
Merged

Conversation

stephenbach
Copy link
Member

Mutliprocessing in the helicopter view is the source of a lot of inconsistent bugs across OSs and Python versions. This PR fixes promptsource when using the spawn mode (default in python 3.8+ and required to run smoothly on latest OS X). It also changes it to the use spawn always so that hopefully there is more consistent behavior.

Before merging, we should test on a range of OSs and Python versions.

The main changes are actually pretty small, but the diff is huge because most of app.py had to be wrapped in a function. (This is so that this code is not executed in each subprocess, which was our problem when spawning.) The main changes besides wrapping the code is

  • pass all_infos to get_infos as an argument
  • Force start mode to spawn

@stephenbach
Copy link
Member Author

Fixes #572
Fixes #588
Fixes #416
Fixes #584

@tianjianjiang
Copy link
Contributor

A shameless self-promoter here: #411. It is almost like this PR without the UI changes. I've tested with python 3.8 and 3.9 on both macOS 10.15 Catalina and WSL.

@VictorSanh
Copy link
Member

@stephenbach LGTM!

I have tested on MacOs Catalina with Python 3.7, 3.8 and 3.9, and everything looks good.
I also tested on Unix with python 3.7 for sanity check, same.

NB: I had to change L184 to split_sizes = {k: v.num_examples for k, v in subset_infos.splits.items() if hasattr(v, "num_examples")} because downloaded infos seemed to have change for the zaid/coqa_expanded and zaid/quac_expanded. We can also remove L186-L188's comments since they don't apply anymore.

Maybe keep the if infos" condition as sanity

Copy link
Contributor

@tianjianjiang tianjianjiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tested using "datasets==1.18.4" without #735.
(With #735, "datasets==2.0.0" works, see https://github.com/tianjianjiang/bigscience-promptsource/runs/5802249762?check_suite_focus=true#step:4:26)

It works with "python>=3.7,<3.10" for macOS BigSur 11.6.5 on Intel, too.
I will also test Monterey and re-test Catalina soon, but they probably won't cause any issue.

For "python>=3.10,<4", see #733 (comment).

setup.py Outdated Show resolved Hide resolved
Co-authored-by: 姜 天戩 Mike Tian-Jian Jiang <[email protected]>
@stephenbach stephenbach merged commit f3bc55e into main Apr 18, 2022
@stephenbach stephenbach deleted the fix_multiprocessing branch April 18, 2022 15:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

3 participants