Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[T170073014] Rewrite distributed examples for Tensor Parallel, Sequence Parallel, 2D (FSDP + TP) #1201
[T170073014] Rewrite distributed examples for Tensor Parallel, Sequence Parallel, 2D (FSDP + TP) #1201
Changes from 10 commits
21a5fcf
f962b60
bc3c1dd
11a3bb2
9cebdf0
2447883
a388c20
842c3f0
3aa1c53
b54e2ec
4889e3b
b215178
242c328
2f4a083
742966b
7da71bc
836f798
2de0144
77fe3d8
5f4a5d3
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from L47-L62, I don't think we need these thing at all,
init_device_mesh
covers all of the setup of those lines, user don't need to have these sophisticated setupThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will reduce most of these, but the _rank and rank_print function is to make it cleaner to output the training info etc. to the user rather than generic show every print * every gpu and overwhelm, so do want to keep that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
after
init_device_mesh
, you can get a device mesh object and calldevice_mesh.get_rank()
to obtain the rank information, I would recommend not using init_process_group call anymoreThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh I did not realize we can now skip the init_process_group.
That will definitely help clean up things here.
Will update to that.