Windows - RuntimeError: No available kernel. Aborting execution. #25

SoftologyPro · 2024-10-28T09:13:31Z

Trying to get this working under Windows.

I clone the repository, create a new venv and try and install requirements.txt. xformers fails with

Collecting xformers==0.0.28.post1
  Downloading xformers-0.0.28.post1.tar.gz (7.8 MB)
     ???????????????????????????????????????? 7.8/7.8 MB 6.6 MB/s eta 0:00:00
  Preparing metadata (setup.py) ... error
  error: subprocess-exited-with-error

  × python setup.py egg_info did not run successfully.
  ¦ exit code: 1
  ?-> [6 lines of output]
      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "C:\Users\Jason\AppData\Local\Temp\pip-install-hg2meh3o\xformers_89fe3807baaa4f888830dbd3996a3b04\setup.py", line 24, in <module>
          import torch
      ModuleNotFoundError: No module named 'torch'
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
?-> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

If I try and install torch first before requirements it still fails.
So, I remove xformers and let the rest of the requirements finish.
Once they are done I install xformers and torch using...

pip install --no-cache-dir --ignore-installed --force-reinstall --no-warn-conflicts xformers==0.0.27.post2 --index-url https://download.pytorch.org/whl/cu121
pip uninstall -y torch
pip install --no-cache-dir --ignore-installed --force-reinstall --no-warn-conflicts torch==2.4.0+cu121 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

Then when I run single_inference I get

  0%|                                                                                                                                                                                                                                                | 0/100 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "D:\Tests\Allegro\Allegro\single_inference.py", line 99, in <module>
    single_inference(args)
  File "D:\Tests\Allegro\Allegro\single_inference.py", line 65, in single_inference
    out_video = allegro_pipeline(
  File "D:\Tests\Allegro\Allegro\venv\lib\site-packages\torch\utils\_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "D:\Tests\Allegro\Allegro\allegro\pipelines\pipeline_allegro.py", line 773, in __call__
    noise_pred = self.transformer(
  File "D:\Tests\Allegro\Allegro\venv\lib\site-packages\torch\nn\modules\module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "D:\Tests\Allegro\Allegro\venv\lib\site-packages\torch\nn\modules\module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "D:\Tests\Allegro\Allegro\allegro\models\transformers\transformer_3d_allegro.py", line 331, in forward
    hidden_states = block(
  File "D:\Tests\Allegro\Allegro\venv\lib\site-packages\torch\nn\modules\module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "D:\Tests\Allegro\Allegro\venv\lib\site-packages\torch\nn\modules\module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "D:\Tests\Allegro\Allegro\allegro\models\transformers\block.py", line 1093, in forward
    attn_output = self.attn1(
  File "D:\Tests\Allegro\Allegro\venv\lib\site-packages\torch\nn\modules\module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "D:\Tests\Allegro\Allegro\venv\lib\site-packages\torch\nn\modules\module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "D:\Tests\Allegro\Allegro\allegro\models\transformers\block.py", line 553, in forward
    return self.processor(
  File "D:\Tests\Allegro\Allegro\allegro\models\transformers\block.py", line 824, in __call__
    hidden_states = F.scaled_dot_product_attention(
RuntimeError: No available kernel. Aborting execution.

What version of xformers and torch do I need to get this to work under Windows?

The text was updated successfully, but these errors were encountered:

MinervaArgus · 2024-10-28T20:00:09Z

I successfully installed the correct versions of torch with CUDA 12.4 enabled through torch and xformers 0.0.28.post1 and still get this error.

Grownz · 2024-10-28T22:55:51Z

Check issue #17

SoftologyPro · 2024-10-29T00:02:50Z

Changing line 824 in Allegro/allegro/models/transformers/block.py
from
with sdpa_kernel(SDPBackend.FLASH_ATTENTION):
to
with torch.backends.cuda.sdp_kernel(enable_flash=False, enable_math=True, enable_mem_efficient=True):
gets past the no available kernel error.
But, has an estimated 2 hours 40 minutes to finish on a 4090. In the end it took over 3 hours to finish the 5 second default settings.

SoftologyPro · 2024-10-29T00:26:37Z

Changing line 13 in single_inference.py
from
dtype=torch.bfloat16
to
dtype=torch.float16
(as also shown in #17)
will take an estimated 19 hours(!!) so do not try that change.

SoftologyPro · 2024-10-29T00:27:03Z

Are there any other possible ways we can get this down to a reasonable time on a 24GB consumer GPU?

randaller · 2024-10-29T14:59:07Z

with torch.backends.cuda.sdp_kernel(enable_flash=False, enable_math=True, enable_mem_efficient=True):

this helped, but about 4 hours to an end on 3090 :) with --enable_cpu_offload

SoftologyPro · 2024-10-29T20:25:27Z

Adding the --enable_cpu_offload argument to single_inference.py gets the estimated time down to 1 hour 40 minutes on a 24GB 4090.

nightsnack · 2024-10-30T03:49:43Z

@SoftologyPro Seems make sense. I tested on H100 enable-cpu-offload a single 100 steps video takes 1h10min. That's why I wrote the inference time will increase significantly
Btw, do you have more than one 4090? I'm going to release the multi-card inference code. Context-parallel seems helps a lot with 4090.

SoftologyPro · 2024-10-30T04:04:36Z

No, I only have a single 4090. This interest came from a request for me to support Allegro in Visions of Chaos. But if it takes 2 hours on the best consumer GPU it is too slow for local Windows. If some speed breakthrough is made I will be happy to include it.

nightsnack · 2024-10-30T04:56:36Z

@SoftologyPro Currently I have no idea. I suggest the method of distillation to reduce the inference steps like reduce from 100 steps to 4 steps, but it harms the quality severely.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Windows - RuntimeError: No available kernel. Aborting execution. #25

Windows - RuntimeError: No available kernel. Aborting execution. #25

SoftologyPro commented Oct 28, 2024

MinervaArgus commented Oct 28, 2024

Grownz commented Oct 28, 2024

SoftologyPro commented Oct 29, 2024 •

edited

Loading

SoftologyPro commented Oct 29, 2024 •

edited

Loading

SoftologyPro commented Oct 29, 2024

randaller commented Oct 29, 2024 •

edited

Loading

SoftologyPro commented Oct 29, 2024 •

edited

Loading

nightsnack commented Oct 30, 2024

SoftologyPro commented Oct 30, 2024

nightsnack commented Oct 30, 2024 •

edited

Loading

Windows - RuntimeError: No available kernel. Aborting execution. #25

Windows - RuntimeError: No available kernel. Aborting execution. #25

Comments

SoftologyPro commented Oct 28, 2024

MinervaArgus commented Oct 28, 2024

Grownz commented Oct 28, 2024

SoftologyPro commented Oct 29, 2024 • edited Loading

SoftologyPro commented Oct 29, 2024 • edited Loading

SoftologyPro commented Oct 29, 2024

randaller commented Oct 29, 2024 • edited Loading

SoftologyPro commented Oct 29, 2024 • edited Loading

nightsnack commented Oct 30, 2024

SoftologyPro commented Oct 30, 2024

nightsnack commented Oct 30, 2024 • edited Loading

SoftologyPro commented Oct 29, 2024 •

edited

Loading

SoftologyPro commented Oct 29, 2024 •

edited

Loading

randaller commented Oct 29, 2024 •

edited

Loading

SoftologyPro commented Oct 29, 2024 •

edited

Loading

nightsnack commented Oct 30, 2024 •

edited

Loading