You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the VitCoD.py file, the PE_height parameter is defined, but it doesn't seem to be used anywhere.
Additionally, in line 173, the code _tile_q in range(math.ceil(Q.shape[1] / (args.PE_width/head))) uses PE_width, but it seems like PE_height should be used instead. Could you clarify this?
Additionally, could you please confirm if PE_width represents the number of MAC lanes and PE_height represents the number of MACs within each lane?
(2) Parallelism pattern:
In the paper, it seems the parallelism method for dense q*k Computing is described as follows:
for (head) //parallel in MAC_lines
for(L)
for(K) // in dense SpMM can be instead of global_tokens
for(dk) // parallel in 8 MAC
However, in the code, the parallelism appears to be:
for (L) // parallel in MAC_lines
for (K)
for (head)
for (dk) // parallel in 8 MAC
Could you please clarify what the actual parallelism method is?
(3) Number of MAC lines for encoding/decoding:
The paper does not seem to mention the specific number of MAC lines dedicated to encoding/decoding. Could you provide a typical value for this?
Thank you very much.
The text was updated successfully, but these errors were encountered:
(1) Understanding the simulator code:
In the VitCoD.py file, the PE_height parameter is defined, but it doesn't seem to be used anywhere.
Additionally, in line 173, the code _tile_q in range(math.ceil(Q.shape[1] / (args.PE_width/head))) uses PE_width, but it seems like PE_height should be used instead. Could you clarify this?
Additionally, could you please confirm if PE_width represents the number of MAC lanes and PE_height represents the number of MACs within each lane?
(2) Parallelism pattern:
In the paper, it seems the parallelism method for dense q*k Computing is described as follows:
for (head) //parallel in MAC_lines
for(L)
for(K) // in dense SpMM can be instead of global_tokens
for(dk) // parallel in 8 MAC
However, in the code, the parallelism appears to be:
for (L) // parallel in MAC_lines
for (K)
for (head)
for (dk) // parallel in 8 MAC
Could you please clarify what the actual parallelism method is?
(3) Number of MAC lines for encoding/decoding:
The paper does not seem to mention the specific number of MAC lines dedicated to encoding/decoding. Could you provide a typical value for this?
Thank you very much.
The text was updated successfully, but these errors were encountered: