Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windows - RuntimeError: No available kernel. Aborting execution. #25

Open
SoftologyPro opened this issue Oct 28, 2024 · 10 comments
Open

Comments

@SoftologyPro
Copy link

Trying to get this working under Windows.

I clone the repository, create a new venv and try and install requirements.txt. xformers fails with

Collecting xformers==0.0.28.post1
  Downloading xformers-0.0.28.post1.tar.gz (7.8 MB)
     ???????????????????????????????????????? 7.8/7.8 MB 6.6 MB/s eta 0:00:00
  Preparing metadata (setup.py) ... error
  error: subprocess-exited-with-error

  × python setup.py egg_info did not run successfully.
  ¦ exit code: 1
  ?-> [6 lines of output]
      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "C:\Users\Jason\AppData\Local\Temp\pip-install-hg2meh3o\xformers_89fe3807baaa4f888830dbd3996a3b04\setup.py", line 24, in <module>
          import torch
      ModuleNotFoundError: No module named 'torch'
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
?-> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

If I try and install torch first before requirements it still fails.
So, I remove xformers and let the rest of the requirements finish.
Once they are done I install xformers and torch using...

pip install --no-cache-dir --ignore-installed --force-reinstall --no-warn-conflicts xformers==0.0.27.post2 --index-url https://download.pytorch.org/whl/cu121
pip uninstall -y torch
pip install --no-cache-dir --ignore-installed --force-reinstall --no-warn-conflicts torch==2.4.0+cu121 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

Then when I run single_inference I get

  0%|                                                                                                                                                                                                                                                | 0/100 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "D:\Tests\Allegro\Allegro\single_inference.py", line 99, in <module>
    single_inference(args)
  File "D:\Tests\Allegro\Allegro\single_inference.py", line 65, in single_inference
    out_video = allegro_pipeline(
  File "D:\Tests\Allegro\Allegro\venv\lib\site-packages\torch\utils\_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "D:\Tests\Allegro\Allegro\allegro\pipelines\pipeline_allegro.py", line 773, in __call__
    noise_pred = self.transformer(
  File "D:\Tests\Allegro\Allegro\venv\lib\site-packages\torch\nn\modules\module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "D:\Tests\Allegro\Allegro\venv\lib\site-packages\torch\nn\modules\module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "D:\Tests\Allegro\Allegro\allegro\models\transformers\transformer_3d_allegro.py", line 331, in forward
    hidden_states = block(
  File "D:\Tests\Allegro\Allegro\venv\lib\site-packages\torch\nn\modules\module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "D:\Tests\Allegro\Allegro\venv\lib\site-packages\torch\nn\modules\module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "D:\Tests\Allegro\Allegro\allegro\models\transformers\block.py", line 1093, in forward
    attn_output = self.attn1(
  File "D:\Tests\Allegro\Allegro\venv\lib\site-packages\torch\nn\modules\module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "D:\Tests\Allegro\Allegro\venv\lib\site-packages\torch\nn\modules\module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "D:\Tests\Allegro\Allegro\allegro\models\transformers\block.py", line 553, in forward
    return self.processor(
  File "D:\Tests\Allegro\Allegro\allegro\models\transformers\block.py", line 824, in __call__
    hidden_states = F.scaled_dot_product_attention(
RuntimeError: No available kernel. Aborting execution.

What version of xformers and torch do I need to get this to work under Windows?

@MinervaArgus
Copy link

I successfully installed the correct versions of torch with CUDA 12.4 enabled through torch and xformers 0.0.28.post1 and still get this error.

@Grownz
Copy link

Grownz commented Oct 28, 2024

Check issue #17

@SoftologyPro
Copy link
Author

SoftologyPro commented Oct 29, 2024

Changing line 824 in Allegro/allegro/models/transformers/block.py
from
with sdpa_kernel(SDPBackend.FLASH_ATTENTION):
to
with torch.backends.cuda.sdp_kernel(enable_flash=False, enable_math=True, enable_mem_efficient=True):
gets past the no available kernel error.
But, has an estimated 2 hours 40 minutes to finish on a 4090. In the end it took over 3 hours to finish the 5 second default settings.

@SoftologyPro
Copy link
Author

SoftologyPro commented Oct 29, 2024

Changing line 13 in single_inference.py
from
dtype=torch.bfloat16
to
dtype=torch.float16
(as also shown in #17)
will take an estimated 19 hours(!!) so do not try that change.

@SoftologyPro
Copy link
Author

Are there any other possible ways we can get this down to a reasonable time on a 24GB consumer GPU?

@randaller
Copy link

randaller commented Oct 29, 2024

with torch.backends.cuda.sdp_kernel(enable_flash=False, enable_math=True, enable_mem_efficient=True):

this helped, but about 4 hours to an end on 3090 :) with --enable_cpu_offload

image

@SoftologyPro
Copy link
Author

SoftologyPro commented Oct 29, 2024

Adding the --enable_cpu_offload argument to single_inference.py gets the estimated time down to 1 hour 40 minutes on a 24GB 4090.

@nightsnack
Copy link
Collaborator

@SoftologyPro Seems make sense. I tested on H100 enable-cpu-offload a single 100 steps video takes 1h10min. That's why I wrote the inference time will increase significantly
Btw, do you have more than one 4090? I'm going to release the multi-card inference code. Context-parallel seems helps a lot with 4090.

@SoftologyPro
Copy link
Author

No, I only have a single 4090. This interest came from a request for me to support Allegro in Visions of Chaos. But if it takes 2 hours on the best consumer GPU it is too slow for local Windows. If some speed breakthrough is made I will be happy to include it.

@nightsnack
Copy link
Collaborator

nightsnack commented Oct 30, 2024

@SoftologyPro Currently I have no idea. I suggest the method of distillation to reduce the inference steps like reduce from 100 steps to 4 steps, but it harms the quality severely.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants