Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"RuntimeError: radix_sort: failed on 1st step: cudaErrorInvalidDevice: invalid device ordinal" for GPUs with compute capability 8.6 and higher #26

Open
rachelselinar opened this issue Apr 2, 2024 · 0 comments
Assignees

Comments

@rachelselinar
Copy link
Owner

rachelselinar commented Apr 2, 2024

DREAMPlaceFPGA run on GPUs with compute capability 8.6 and higher has a CUDA runtime error during LUT/FF legalization. Pasting from FPGA-example1 run:

Preclusters: 829 (819 + 10) Initialization completed in 0.074 seconds 

Traceback (most recent call last): 
  File "dreamplacefpga/Placer.py", line 120, in <module> 
    placeFPGA(params) 
  File "dreamplacefpga/Placer.py", line 44, in placeFPGA 
    metrics = placer(params, placedb) 
  File "/DREAMPlaceFPGA/dreamplacefpga/NonLinearPlace.py", line 793, in __call__ 
    self.op_collections.lut_ff_legalization_op.runDLIter(self.pos[0], model.precondWL[:placedb.num_physical_nodes], sortedNodeMap, sortedNodeIdx, sortedNetMap, sortedNetIdx, sortedPinMap, activeStatus, illegalStatus, dlIter) 
  File "/DREAMPlaceFPGA/dreamplacefpga/ops/lut_ff_legalization/lut_ff_legalization.py", line 323, in runDLIter 
    lut_ff_legalization_cuda.runDLIter(pos, self.pin_offset_x, self.pin_offset_y, self.net_bbox, self.net_pinIdArrayX,  
RuntimeError: radix_sort: failed on 1st step: cudaErrorInvalidDevice: invalid device ordinal 

Cause:
Similar to issue.
Error is due to the use of thrust libraries in lut_ff_legalization_cuda_kernel.cu. Observed the runtime error in gpu machines with compute capability 8.6 and 8.9.
The CUDA runtime error does not exist for gpu machines with compute capability 8.0 or lower.

Current Work around:
In lut_ff_legalization_cuda_kernel.cu, use the single kernel for runDLIter instead of split kernel approach with rearranging.
Comment out thrust libraries
Comment out split kernels for DL
Uncomment single DL kernel
With a single kernel for direct legalization, the sites are not rearranged in descending order of number of new site candidates to be explored and incurs a minimal runtime increase.

Opening this issue to track and provide a fix.

@rachelselinar rachelselinar self-assigned this Apr 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant