[Autoshard] Auto-parallelism solver #96

chhzh123 · 2023-05-22T06:36:18Z

Description

This PR introduces an auto-parallelism solver that finds the optimal sharding scheme for a given model. Basically, it models the parallelism scheme of each tensor as the combination of "R" and "S" specs, where R represents "replicated", and S denotes "sharded". We can explicitly calculate the computation and resharding cost (#95) of each operator, and sum all the costs together to form an optimization problem.

The problem is then encoded as a program synthesis problem and solved by a z3 solver using counter-example guided synthesis. Detailed process can be found in the solver.py file.

A sample output of a two-layer MLP is shown below. It shows how the solver finds the optimal scheme step by step. The cost of each scheme is also dumped for users to better reason about the tradeoff.

$ python3 tests/test_autoshard.py 
[2023-05-22 06:19:31,530][INFO][solver.py:339:dump_fx_node] 
 name      op             target                                    shape            dtype
--------  -------------  ----------------------------------------  ---------------  -------------
x         placeholder    x                                         [8, 1024, 1024]  torch.float32
fc1       call_module    <class 'torch.nn.modules.linear.Linear'>  [8, 1024, 4096]  torch.float32
|-weight                                                           [4096, 1024]     torch.float32
|-bias                                                             [4096]           torch.float32
gelu      call_function  <built-in function gelu>                  [8, 1024, 4096]  torch.float32
fc2       call_module    <class 'torch.nn.modules.linear.Linear'>  [8, 1024, 1024]  torch.float32
|-weight                                                           [1024, 4096]     torch.float32
|-bias                                                             [1024]           torch.float32
output    output         output                                    [8, 1024, 1024]  torch.float32 

[2023-05-22 06:19:31,553][INFO][solver.py:583:solve] =================== Iter 0 ===================
[2023-05-22 06:19:31,556][INFO][solver.py:594:solve] [fc1_1 = 0, fc1_0 = 2, fc2_0 = 1, fc2_1 = 2]
[2023-05-22 06:19:31,563][INFO][solver.py:517:calculate_new_cost] 
 Name    InSpec    OutSpec    Cost
------  --------  ---------  -------
fc1     SRxRR     SR         0
|-x     RR        SR         0
fc2     RSxSR     RR         1048576
|-gelu  SR        RS         458752
output  RR        RR         0
Total                        1507328 

[2023-05-22 06:19:31,563][INFO][solver.py:583:solve] =================== Iter 1 ===================
[2023-05-22 06:19:31,564][INFO][solver.py:594:solve] [fc1_1 = 1, fc1_0 = 0, fc2_0 = 1, fc2_1 = 2]
[2023-05-22 06:19:31,571][INFO][solver.py:517:calculate_new_cost] 
 Name    InSpec    OutSpec    Cost
------  --------  ---------  -------
fc1     RRxRS     RS         0
|-x     RR        RR         0
fc2     RSxSR     RR         1048576
|-gelu  RS        RS         0
output  RR        RR         0
Total                        1048576 

[2023-05-22 06:19:31,571][INFO][solver.py:583:solve] =================== Iter 2 ===================
[2023-05-22 06:19:31,573][INFO][solver.py:594:solve] [fc1_1 = 1, fc1_0 = 0, fc2_0 = 0, fc2_1 = 1]
[2023-05-22 06:19:31,579][INFO][solver.py:517:calculate_new_cost] 
 Name    InSpec    OutSpec    Cost
------  --------  ---------  ------
fc1     RRxRS     RS         0
|-x     RR        RR         0
fc2     RRxRS     RS         0
|-gelu  RS        RR         524288
output  RS        RR         131072
Total                        655360 

[2023-05-22 06:19:31,580][INFO][solver.py:583:solve] =================== Iter 3 ===================
[2023-05-22 06:19:31,581][INFO][solver.py:594:solve] [fc1_1 = 1, fc1_0 = 0, fc2_0 = 2, fc2_1 = 0]
[2023-05-22 06:19:31,588][INFO][solver.py:517:calculate_new_cost] 
 Name    InSpec    OutSpec    Cost
------  --------  ---------  ------
fc1     RRxRS     RS         0
|-x     RR        RR         0
fc2     SRxRR     SR         0
|-gelu  RS        SR         458752
output  SR        RR         131072
Total                        589824 

[2023-05-22 06:19:31,588][INFO][solver.py:583:solve] =================== Iter 4 ===================
[2023-05-22 06:19:31,589][INFO][solver.py:594:solve] [fc1_1 = 0, fc1_0 = 2, fc2_0 = 2, fc2_1 = 0]
[2023-05-22 06:19:31,596][INFO][solver.py:517:calculate_new_cost] 
 Name    InSpec    OutSpec    Cost
------  --------  ---------  ------
fc1     SRxRR     SR         0
|-x     RR        SR         0
fc2     SRxRR     SR         0
|-gelu  SR        SR         0
output  SR        RR         131072
Total                        131072 

[2023-05-22 06:19:31,596][INFO][solver.py:583:solve] =================== Iter 5 ===================
[2023-05-22 06:19:31,597][INFO][solver.py:590:solve] Cannot find better solutions

Best solution:
sch["fc1"].sync(mode="fwd_pre", sync_op_or_fn="RR->SR")
sch["fc2"].sync(mode="fwd_post", sync_op_or_fn="SR->RR")

Checklist

Support attention module
Add HF model tests

The autosharder is still in early shape and requires more rigorous testing, but I would like to first gather more suggestions on the interface and code organizations.

cc @comaniac @zarzen @whbldhwj

chhzh123 · 2023-05-23T06:55:03Z

To enable testing the HF model, #94 needs to be merged first

chhzh123 added 21 commits May 26, 2023 21:21

Add solver

f449537

Add test_autoshard

340121c

Add dump_node

2dff6e2

Add fx_op_map & param_dump

54884ac

Add dump_z3_graph

d5cf94f

Refactor & print

9d1c176

Add reshard schedule

fcf1a2e

Fix generate_output

bf542cf

Remove useless functions

35f1505

Add test

15b3358

Fix pylint

5f8f2d2

Fix z3 graph

4df1956

Update ln and softmax rules

f0e5f90

Add penalty for splitting

efc0a44

Use total shape size to calculate communication cost

472a27c

Dump z3 graph with specs

27067ab

Add attn test

9a326ef

Fix ViewOp

0b1de58

Fix test and output

ef2c565

Fix pylint

8759140

Update dependency

285e4a7

chhzh123 force-pushed the solver branch from 9327b9d to 285e4a7 Compare May 26, 2023 21:27

chhzh123 added 7 commits May 27, 2023 00:31

Add test_bert

49945c6

Fix viewop

59e0b9b

Add seq_par

e8d126a

Efficient seq_par

6324524

Add comments

14483c1

Move file

b61f536

Fix device

ff024a1

chhzh123 added 8 commits May 29, 2023 03:58

Fix TensorMetadata

9a55da0

Support more ops

83c6053

Add gpt support

4bce113

Support seq par for GPT

f933098

Add TODO

5a66148

Add gpt_attn unit test

5ccac8b

Rename

8d92d72

Update bs

1f165a2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Autoshard] Auto-parallelism solver #96

[Autoshard] Auto-parallelism solver #96

chhzh123 commented May 22, 2023 •

edited

Loading

chhzh123 commented May 23, 2023

[Autoshard] Auto-parallelism solver #96

Are you sure you want to change the base?

[Autoshard] Auto-parallelism solver #96

Conversation

chhzh123 commented May 22, 2023 • edited Loading

Description

Checklist

chhzh123 commented May 23, 2023

chhzh123 commented May 22, 2023 •

edited

Loading