Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Errors after new version release #195

Open
pianogospel opened this issue Oct 11, 2024 · 15 comments
Open

Errors after new version release #195

pianogospel opened this issue Oct 11, 2024 · 15 comments

Comments

@pianogospel
Copy link

I need help, I tried to get help in DISCORD but nobody answered, I already use ai toolkit since august, it works flawlesly. I tried to install the newer version in windows (not an update, a newer installation from the beggining) and I followed all the steps but it simply doesn't work and shows multiple error messages. Can anyone tell me what is the pre requisites for the latest version of ai toolkit? I have Python 3.10.11, nvidia toolkit cuda_11.8.r11.8 , visual studio 2022, cl.exe in environment variable, but it doesn't work. Thanks for any help.

@Anothergazz
Copy link

I would like to add that I have now done a new instal, twice, and I believe I have the exact same issue as pianogospel. Here is what the command window shows

(venv) C:\AI\ai-toolkit>python run.py config/JL_lora_flux_24gb.yaml
Running 1 job
C:\AI\ai-toolkit\venv\lib\site-packages\albumentations_init_.py:13: UserWarning: A new version of Albumentations is available: 1.4.18 (you have 1.4.15). Upgrade using: pip install -U albumentations. To disable automatic update checks, set the environment variable NO_ALBUMENTATIONS_UPDATE to 1.
check_for_updates()
C:\AI\ai-toolkit\venv\lib\site-packages\controlnet_aux\mediapipe_face\mediapipe_face_common.py:7: UserWarning: The module 'mediapipe' is not installed. The package will have limited functionality. Please install it using the command: pip install 'mediapipe'
warnings.warn(
C:\AI\ai-toolkit\venv\lib\site-packages\controlnet_aux\segment_anything\modeling\tiny_vit_sam.py:654: UserWarning: Overwriting tiny_vit_5m_224 in registry with controlnet_aux.segment_anything.modeling.tiny_vit_sam.tiny_vit_5m_224. This is because the name being registered conflicts with an existing name. Please check if this is not expected.
return register_model(fn_wrapper)
C:\AI\ai-toolkit\venv\lib\site-packages\controlnet_aux\segment_anything\modeling\tiny_vit_sam.py:654: UserWarning: Overwriting tiny_vit_11m_224 in registry with controlnet_aux.segment_anything.modeling.tiny_vit_sam.tiny_vit_11m_224. This is because the name being registered conflicts with an existing name. Please check if this is not expected.
return register_model(fn_wrapper)
C:\AI\ai-toolkit\venv\lib\site-packages\controlnet_aux\segment_anything\modeling\tiny_vit_sam.py:654: UserWarning: Overwriting tiny_vit_21m_224 in registry with controlnet_aux.segment_anything.modeling.tiny_vit_sam.tiny_vit_21m_224. This is because the name being registered conflicts with an existing name. Please check if this is not expected.
return register_model(fn_wrapper)
C:\AI\ai-toolkit\venv\lib\site-packages\controlnet_aux\segment_anything\modeling\tiny_vit_sam.py:654: UserWarning: Overwriting tiny_vit_21m_384 in registry with controlnet_aux.segment_anything.modeling.tiny_vit_sam.tiny_vit_21m_384. This is because the name being registered conflicts with an existing name. Please check if this is not expected.
return register_model(fn_wrapper)
C:\AI\ai-toolkit\venv\lib\site-packages\controlnet_aux\segment_anything\modeling\tiny_vit_sam.py:654: UserWarning: Overwriting tiny_vit_21m_512 in registry with controlnet_aux.segment_anything.modeling.tiny_vit_sam.tiny_vit_21m_512. This is because the name being registered conflicts with an existing name. Please check if this is not expected.
return register_model(fn_wrapper)
{
"type": "sd_trainer",
"training_folder": "output",
"device": "cuda:0",
"network": {
"type": "lora",
"linear": 16,
"linear_alpha": 16
},
"save": {
"dtype": "float16",
"save_every": 250,
"max_step_saves_to_keep": 4,
"push_to_hub": false
},
"datasets": [
{
"folder_path": "C:\AI\Lora\jennytestFlux",
"caption_ext": "txt",
"caption_dropout_rate": 0.05,
"shuffle_tokens": false,
"cache_latents_to_disk": true,
"resolution": [
512,
768,
1024
]
}
],
"train": {
"batch_size": 1,
"steps": 2000,
"gradient_accumulation_steps": 1,
"train_unet": true,
"train_text_encoder": false,
"gradient_checkpointing": true,
"noise_scheduler": "flowmatch",
"optimizer": "adamw8bit",
"lr": 0.0001,
"ema_config": {
"use_ema": true,
"ema_decay": 0.99
},
"dtype": "bf16"
},
"model": {
"name_or_path": "black-forest-labs/FLUX.1-dev",
"is_flux": true,
"quantize": true,
"low_vram": true
},
"sample": {
"sampler": "flowmatch",
"sample_every": 250,
"width": 1024,
"height": 1024,
"prompts": [
"JL, a woman holding a coffee cup, in a beanie, sitting at a cafe",
"a woman holding a coffee cup, in a beanie, sitting at a cafe"
],
"neg": "",
"seed": 42,
"walk_seed": true,
"guidance_scale": 4,
"sample_steps": 20
}
}
Using EMA
C:\AI\ai-toolkit\extensions_built_in\sd_trainer\SDTrainer.py:61: FutureWarning: torch.cuda.amp.GradScaler(args...) is deprecated. Please use torch.amp.GradScaler('cuda', args...) instead.
self.scaler = torch.cuda.amp.GradScaler()

#############################################

Running job: JL_flux_lora_v1

#############################################

Running 1 process
Loading Flux model
Loading transformer
Quantizing transformer
C:\AI\ai-toolkit\venv\lib\site-packages\torch\utils\cpp_extension.py:380: UserWarning: Error checking compiler version for cl: [WinError 2] The system cannot find the file specified
warnings.warn(f'Error checking compiler version for {compiler}: {error}')
C:\AI\ai-toolkit\venv\lib\site-packages\torch\utils\cpp_extension.py:1965: UserWarning: TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation.
If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'].
warnings.warn(
INFO: Could not find files for the given pattern(s).
Error running job: Command '['where', 'cl']' returned non-zero exit status 1.

========================================
Result:

  • 0 completed jobs
  • 1 failure
    ========================================
    Traceback (most recent call last):
    File "C:\AI\ai-toolkit\run.py", line 90, in
    main()
    File "C:\AI\ai-toolkit\run.py", line 86, in main
    raise e
    File "C:\AI\ai-toolkit\run.py", line 78, in main
    job.run()
    File "C:\AI\ai-toolkit\jobs\ExtensionJob.py", line 22, in run
    process.run()
    File "C:\AI\ai-toolkit\jobs\process\BaseSDTrainProcess.py", line 1241, in run
    self.sd.load_model()
    File "C:\AI\ai-toolkit\toolkit\stable_diffusion_model.py", line 613, in load_model
    transformer.to(self.device_torch)
    File "C:\AI\ai-toolkit\venv\lib\site-packages\torch\nn\modules\module.py", line 1174, in to
    return self._apply(convert)
    File "C:\AI\ai-toolkit\venv\lib\site-packages\torch\nn\modules\module.py", line 780, in _apply
    module._apply(fn)
    File "C:\AI\ai-toolkit\venv\lib\site-packages\torch\nn\modules\module.py", line 780, in _apply
    module._apply(fn)
    File "C:\AI\ai-toolkit\venv\lib\site-packages\torch\nn\modules\module.py", line 780, in _apply
    module._apply(fn)
    File "C:\AI\ai-toolkit\venv\lib\site-packages\torch\nn\modules\module.py", line 805, in apply
    param_applied = fn(param)
    File "C:\AI\ai-toolkit\venv\lib\site-packages\torch\nn\modules\module.py", line 1160, in convert
    return t.to(
    File "C:\AI\ai-toolkit\venv\lib\site-packages\optimum\quanto\tensor\weights\qbytes.py", line 273, in torch_function
    return func(*args, **kwargs)
    File "C:\AI\ai-toolkit\venv\lib\site-packages\optimum\quanto\tensor\weights\qbytes.py", line 299, in torch_dispatch
    return WeightQBytesTensor.create(
    File "C:\AI\ai-toolkit\venv\lib\site-packages\optimum\quanto\tensor\weights\qbytes.py", line 140, in create
    return MarlinF8QBytesTensor(qtype, axis, size, stride, data, scale, requires_grad)
    File "C:\AI\ai-toolkit\venv\lib\site-packages\optimum\quanto\tensor\weights\marlin\fp8\qbits.py", line 80, in init
    data_packed = MarlinF8PackedTensor.pack(data) # pack fp8 data to in32, and apply marlier re-ordering.
    File "C:\AI\ai-toolkit\venv\lib\site-packages\optimum\quanto\tensor\weights\marlin\fp8\packed.py", line 183, in pack
    data_int32 = torch.ops.quanto.pack_fp8_marlin(
    File "C:\AI\ai-toolkit\venv\lib\site-packages\torch_ops.py", line 1061, in call
    return self
    .op(*args, **(kwargs or {}))
    File "C:\AI\ai-toolkit\venv\lib\site-packages\optimum\quanto\library\extensions\cuda_init
    .py", line 164, in gptq_marlin_repack
    return ext.lib.gptq_marlin_repack(b_q_weight, perm, size_k, size_n, num_bits)
    File "C:\AI\ai-toolkit\venv\lib\site-packages\optimum\quanto\library\extensions\extension.py", line 42, in lib
    self._lib = load(
    File "C:\AI\ai-toolkit\venv\lib\site-packages\torch\utils\cpp_extension.py", line 1312, in load
    return _jit_compile(
    File "C:\AI\ai-toolkit\venv\lib\site-packages\torch\utils\cpp_extension.py", line 1722, in _jit_compile
    _write_ninja_file_and_build_library(
    File "C:\AI\ai-toolkit\venv\lib\site-packages\torch\utils\cpp_extension.py", line 1821, in _write_ninja_file_and_build_library
    _write_ninja_file_to_build_library(
    File "C:\AI\ai-toolkit\venv\lib\site-packages\torch\utils\cpp_extension.py", line 2246, in _write_ninja_file_to_build_library
    _write_ninja_file(
    File "C:\AI\ai-toolkit\venv\lib\site-packages\torch\utils\cpp_extension.py", line 2382, in _write_ninja_file
    cl_paths = subprocess.check_output(['where',
    File "C:\Users\MrUSB\AppData\Local\Programs\Python\Python310\lib\subprocess.py", line 421, in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
    File "C:\Users\MrUSB\AppData\Local\Programs\Python\Python310\lib\subprocess.py", line 526, in run
    raise CalledProcessError(retcode, process.args,
    subprocess.CalledProcessError: Command '['where', 'cl']' returned non-zero exit status 1.

@spezialspezial
Copy link

@Anothergazz Do you sport a 3090 by any chance? Just a guess from spotting quanto and fp8 in your error log.

Ampere does not support float8 and qfloat8 is hardcoded in toolkit/stable_diffusion_model.py. For example:
https://github.com/ostris/ai-toolkit/blob/ce759ebd8c653a5ac61c15c1bdacb210aa37df9e/toolkit/stable_diffusion_model.py#L611C39-L611C64
Changing all quantization to qint8 worked for me. Maybe this could be made configurable at some point.

@pianogospel
Copy link
Author

My error is this:
Screenshot 2024-10-10 090721

@Anothergazz
Copy link

pianogospel

Thanks for the suggestion but I splashed out on a 4090 (in a moment of weakness)on widows 11

@Astilen
Copy link

Astilen commented Oct 13, 2024

On windows: I too can confirm that all builds published the last few days crush after "Quantizing transformer".
I have a 4090 and 32gb ram,
I have tested various versions of NVIDIA driver, CUDA, python, visual studio, torch and environment settings, all of which seem to work perfectly fine.

Request:
I would be grateful if anyone can mention the latest build that worked for them on windows.

I like the fact that this is very experimental, you guys are doing great, take your time, and thanks!

my error:

Running 1 process
Loading Flux model
Loading transformer
Quantizing transformer
Error running job: Error building extension 'quanto_cuda': [1/2] C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\bin\nvcc --generate-dependencies-with-compile --dependency-output gemm_cuda.cuda.o.d -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcompiler /EHsc -Xcompiler /wd4068 -Xcompiler /wd4067 -Xcompiler /wd4624 -Xcompiler /wd4190 -Xcompiler /wd4018 -Xcompiler /wd4275 -Xcompiler /wd4267 -Xcompiler /wd4244 -Xcompiler /wd4251 -Xcompiler /wd4819 -Xcompiler /MD -DTORCH_EXTENSION_NAME=quanto_cuda -DTORCH_API_INCLUDE_EXTENSION_H -IC:\ai-toolkit\venv\Lib\site-packages\torch\include -IC:\ai-toolkit\venv\Lib\site-packages\torch\include\torch\csrc\api\include -IC:\ai-toolkit\venv\Lib\site-packages\torch\include\TH -IC:\ai-toolkit\venv\Lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\include" -IC:\Python311\Include -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_89,code=sm_89 -std=c++17 --expt-extended-lambda --use_fast_math -DQUANTO_CUDA_ARCH=890 -c C:\ai-toolkit\venv\Lib\site-packages\optimum\quanto\library\extensions\cuda\awq\v2\gemm_cuda.cu -o gemm_cuda.cuda.o
FAILED: gemm_cuda.cuda.o
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\bin\nvcc --generate-dependencies-with-compile --dependency-output gemm_cuda.cuda.o.d -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcompiler /EHsc -Xcompiler /wd4068 -Xcompiler /wd4067 -Xcompiler /wd4624 -Xcompiler /wd4190 -Xcompiler /wd4018 -Xcompiler /wd4275 -Xcompiler /wd4267 -Xcompiler /wd4244 -Xcompiler /wd4251 -Xcompiler /wd4819 -Xcompiler /MD -DTORCH_EXTENSION_NAME=quanto_cuda -DTORCH_API_INCLUDE_EXTENSION_H -IC:\ai-toolkit\venv\Lib\site-packages\torch\include -IC:\ai-toolkit\venv\Lib\site-packages\torch\include\torch\csrc\api\include -IC:\ai-toolkit\venv\Lib\site-packages\torch\include\TH -IC:\ai-toolkit\venv\Lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\include" -IC:\Python311\Include -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_89,code=sm_89 -std=c++17 --expt-extended-lambda --use_fast_math -DQUANTO_CUDA_ARCH=890 -c C:\ai-toolkit\venv\Lib\site-packages\optimum\quanto\library\extensions\cuda\awq\v2\gemm_cuda.cu -o gemm_cuda.cuda.o
C:\ai-toolkit\venv\Lib\site-packages\optimum\quanto\library\extensions\cuda\awq\v2\gemm_cuda.cu(95): error: identifier "asm" is undefined
asm volatile(
^

C:\ai-toolkit\venv\Lib\site-packages\optimum\quanto\library\extensions\cuda\awq\v2\gemm_cuda.cu(98): error: expected a ")"
: "=r"(((unsigned *)(shared_warp + (ax0_0 * 8)))[0]), "=r"(((unsigned *)(shared_warp + (ax0_0 * 8)))[1]), "=r"(((unsigned *)(shared_warp + (ax0_0 * 8)))[2]), "=r"(((unsigned *)(shared_warp + (ax0_0 * 8)))[3])
^

C:\ai-toolkit\venv\Lib\site-packages\optimum\quanto\library\extensions\cuda\awq\v2\gemm_cuda.cu(104): error: identifier "asm" is undefined
asm volatile(
^

C:\ai-toolkit\venv\Lib\site-packages\optimum\quanto\library\extensions\cuda\awq\v2\gemm_cuda.cu(107): error: expected a ")"
: "=r"(((unsigned *)(shared_warp + (ax0_0 * 8)))[0]), "=r"(((unsigned *)(shared_warp + (ax0_0 * 8)))[1]), "=r"(((unsigned *)(shared_warp + (ax0_0 * 8)))[2]), "=r"(((unsigned *)(shared_warp + (ax0_0 * 8)))[3])
^

C:\ai-toolkit\venv\Lib\site-packages\optimum\quanto\library\extensions\cuda\awq\v2\gemm_cuda.cu(126): error: identifier "asm" is undefined
asm volatile(
^

C:\ai-toolkit\venv\Lib\site-packages\optimum\quanto\library\extensions\cuda\awq\v2\gemm_cuda.cu(129): error: expected a ")"
: "=f"(((float *)C_warp)[0]), "=f"(((float *)C_warp)[1]), "=f"(((float *)C_warp)[2]), "=f"(((float *)C_warp)[3])
^

6 errors detected in the compilation of "C:/ai-toolkit/venv/Lib/site-packages/optimum/quanto/library/extensions/cuda/awq/v2/gemm_cuda.cu".
gemm_cuda.cu
ninja: build stopped: subcommand failed.

========================================
Result:

  • 0 completed jobs
  • 1 failure
    ========================================
    Traceback (most recent call last):
    File "C:\ai-toolkit\venv\Lib\site-packages\torch\utils\cpp_extension.py", line 2105, in _run_ninja_build
    subprocess.run(
    File "C:\Python311\Lib\subprocess.py", line 571, in run
    raise CalledProcessError(retcode, process.args,
    subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "C:\ai-toolkit\run.py", line 90, in
main()
File "C:\ai-toolkit\run.py", line 86, in main
raise e
File "C:\ai-toolkit\run.py", line 78, in main
job.run()
File "C:\ai-toolkit\jobs\ExtensionJob.py", line 22, in run
process.run()
File "C:\ai-toolkit\jobs\process\BaseSDTrainProcess.py", line 1241, in run
self.sd.load_model()
File "C:\ai-toolkit\toolkit\stable_diffusion_model.py", line 613, in load_model
transformer.to(self.device_torch)
File "C:\ai-toolkit\venv\Lib\site-packages\torch\nn\modules\module.py", line 1174, in to
return self.apply(convert)
^^^^^^^^^^^^^^^^^^^^
File "C:\ai-toolkit\venv\Lib\site-packages\torch\nn\modules\module.py", line 780, in apply
module.apply(fn)
File "C:\ai-toolkit\venv\Lib\site-packages\torch\nn\modules\module.py", line 780, in apply
module.apply(fn)
File "C:\ai-toolkit\venv\Lib\site-packages\torch\nn\modules\module.py", line 780, in apply
module.apply(fn)
File "C:\ai-toolkit\venv\Lib\site-packages\torch\nn\modules\module.py", line 805, in apply
param_applied = fn(param)
^^^^^^^^^
File "C:\ai-toolkit\venv\Lib\site-packages\torch\nn\modules\module.py", line 1160, in convert
return t.to(
^^^^^
File "C:\ai-toolkit\venv\Lib\site-packages\optimum\quanto\tensor\weights\qbytes.py", line 273, in torch_function
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "C:\ai-toolkit\venv\Lib\site-packages\optimum\quanto\tensor\weights\qbytes.py", line 299, in torch_dispatch
return WeightQBytesTensor.create(
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\ai-toolkit\venv\Lib\site-packages\optimum\quanto\tensor\weights\qbytes.py", line 140, in create
return MarlinF8QBytesTensor(qtype, axis, size, stride, data, scale, requires_grad)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\ai-toolkit\venv\Lib\site-packages\optimum\quanto\tensor\weights\marlin\fp8\qbits.py", line 80, in init
data_packed = MarlinF8PackedTensor.pack(data) # pack fp8 data to in32, and apply marlier re-ordering.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\ai-toolkit\venv\Lib\site-packages\optimum\quanto\tensor\weights\marlin\fp8\packed.py", line 183, in pack
data_int32 = torch.ops.quanto.pack_fp8_marlin(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\ai-toolkit\venv\Lib\site-packages\torch_ops.py", line 1061, in call
return self
.op(*args, **(kwargs or {}))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\ai-toolkit\venv\Lib\site-packages\optimum\quanto\library\extensions\cuda_init
.py", line 164, in gptq_marlin_repack
return ext.lib.gptq_marlin_repack(b_q_weight, perm, size_k, size_n, num_bits)
^^^^^^^
File "C:\ai-toolkit\venv\Lib\site-packages\optimum\quanto\library\extensions\extension.py", line 42, in lib
self.lib = load(
^^^^^
File "C:\ai-toolkit\venv\Lib\site-packages\torch\utils\cpp_extension.py", line 1312, in load
return jit_compile(
^^^^^^^^^^^^^
File "C:\ai-toolkit\venv\Lib\site-packages\torch\utils\cpp_extension.py", line 1722, in jit_compile
write_ninja_file_and_build_library(
File "C:\ai-toolkit\venv\Lib\site-packages\torch\utils\cpp_extension.py", line 1834, in write_ninja_file_and_build_library
run_ninja_build(
File "C:\ai-toolkit\venv\Lib\site-packages\torch\utils\cpp_extension.py", line 2121, in run_ninja_build
raise RuntimeError(message) from e
RuntimeError: Error building extension 'quanto_cuda': [1/2] C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\bin\nvcc --generate-dependencies-with-compile --dependency-output gemm_cuda.cuda.o.d -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcompiler /EHsc -Xcompiler /wd4068 -Xcompiler /wd4067 -Xcompiler /wd4624 -Xcompiler /wd4190 -Xcompiler /wd4018 -Xcompiler /wd4275 -Xcompiler /wd4267 -Xcompiler /wd4244 -Xcompiler /wd4251 -Xcompiler /wd4819 -Xcompiler /MD -DTORCH_EXTENSION_NAME=quanto_cuda -DTORCH_API_INCLUDE_EXTENSION_H -IC:\ai-toolkit\venv\Lib\site-packages\torch\include -IC:\ai-toolkit\venv\Lib\site-packages\torch\include\torch\csrc\api\include -IC:\ai-toolkit\venv\Lib\site-packages\torch\include\TH -IC:\ai-toolkit\venv\Lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\include" -IC:\Python311\Include -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS
-D__CUDA_NO_HALF_CONVERSIONS
-D__CUDA_NO_BFLOAT16_CONVERSIONS
-D__CUDA_NO_HALF2_OPERATORS
--expt-relaxed-constexpr -gencode=arch=compute_89,code=sm_89 -std=c++17 --expt-extended-lambda --use_fast_math -DQUANTO_CUDA_ARCH=890 -c C:\ai-toolkit\venv\Lib\site-packages\optimum\quanto\library\extensions\cuda\awq\v2\gemm_cuda.cu -o gemm_cuda.cuda.o
FAILED: gemm_cuda.cuda.o
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\bin\nvcc --generate-dependencies-with-compile --dependency-output gemm_cuda.cuda.o.d -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcompiler /EHsc -Xcompiler /wd4068 -Xcompiler /wd4067 -Xcompiler /wd4624 -Xcompiler /wd4190 -Xcompiler /wd4018 -Xcompiler /wd4275 -Xcompiler /wd4267 -Xcompiler /wd4244 -Xcompiler /wd4251 -Xcompiler /wd4819 -Xcompiler /MD -DTORCH_EXTENSION_NAME=quanto_cuda -DTORCH_API_INCLUDE_EXTENSION_H -IC:\ai-toolkit\venv\Lib\site-packages\torch\include -IC:\ai-toolkit\venv\Lib\site-packages\torch\include\torch\csrc\api\include -IC:\ai-toolkit\venv\Lib\site-packages\torch\include\TH -IC:\ai-toolkit\venv\Lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\include" -IC:\Python311\Include -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS
-D__CUDA_NO_HALF_CONVERSIONS
-D__CUDA_NO_BFLOAT16_CONVERSIONS
-D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_89,code=sm_89 -std=c++17 --expt-extended-lambda --use_fast_math -DQUANTO_CUDA_ARCH=890 -c C:\ai-toolkit\venv\Lib\site-packages\optimum\quanto\library\extensions\cuda\awq\v2\gemm_cuda.cu -o gemm_cuda.cuda.o
C:\ai-toolkit\venv\Lib\site-packages\optimum\quanto\library\extensions\cuda\awq\v2\gemm_cuda.cu(95): error: identifier "asm" is undefined
asm volatile(
^

C:\ai-toolkit\venv\Lib\site-packages\optimum\quanto\library\extensions\cuda\awq\v2\gemm_cuda.cu(98): error: expected a ")"
: "=r"(((unsigned *)(shared_warp + (ax0_0 * 8)))[0]), "=r"(((unsigned *)(shared_warp + (ax0_0 * 8)))[1]), "=r"(((unsigned *)(shared_warp + (ax0_0 * 8)))[2]), "=r"(((unsigned *)(shared_warp + (ax0_0 * 8)))[3])
^

C:\ai-toolkit\venv\Lib\site-packages\optimum\quanto\library\extensions\cuda\awq\v2\gemm_cuda.cu(104): error: identifier "asm" is undefined
asm volatile(
^

C:\ai-toolkit\venv\Lib\site-packages\optimum\quanto\library\extensions\cuda\awq\v2\gemm_cuda.cu(107): error: expected a ")"
: "=r"(((unsigned *)(shared_warp + (ax0_0 * 8)))[0]), "=r"(((unsigned *)(shared_warp + (ax0_0 * 8)))[1]), "=r"(((unsigned *)(shared_warp + (ax0_0 * 8)))[2]), "=r"(((unsigned *)(shared_warp + (ax0_0 * 8)))[3])
^

C:\ai-toolkit\venv\Lib\site-packages\optimum\quanto\library\extensions\cuda\awq\v2\gemm_cuda.cu(126): error: identifier "asm" is undefined
asm volatile(
^

C:\ai-toolkit\venv\Lib\site-packages\optimum\quanto\library\extensions\cuda\awq\v2\gemm_cuda.cu(129): error: expected a ")"
: "=f"(((float *)C_warp)[0]), "=f"(((float *)C_warp)[1]), "=f"(((float *)C_warp)[2]), "=f"(((float *)C_warp)[3])
^

6 errors detected in the compilation of "C:/ai-toolkit/venv/Lib/site-packages/optimum/quanto/library/extensions/cuda/awq/v2/gemm_cuda.cu".
gemm_cuda.cu
ninja: build stopped: subcommand failed.

(venv) C:\ai-toolkit>python
Python 3.11.3 (tags/v3.11.3:f3909b8, Apr 4 2023, 23:49:59) [MSC v.1934 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.

import torch
import numpy
print(torch.version)
2.4.1+cu121
print(numpy.version)
1.26.3

@nebooz
Copy link

nebooz commented Oct 13, 2024

On windows: I too can confirm that all builds published the last few days crush after "Quantizing transformer". I have a 4090 and 32gb ram, I have tested various versions of NVIDIA driver, CUDA, python, visual studio, torch and environment settings, all of which seem to work perfectly fine.

Request: I would be grateful if anyone can mention the latest build that worked for them on windows.

I like the fact that this is very experimental, you guys are doing great, take your time, and thanks!

my error:

Running 1 process
Loading Flux model
Loading transformer
Quantizing transformer
Error running job: Error building extension 'quanto_cuda': [1/2] C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\bin\nvcc --generate-dependencies-with-compile --dependency-output gemm_cuda.cuda.o.d -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcompiler /EHsc -Xcompiler /wd4068 -Xcompiler /wd4067 -Xcompiler /wd4624 -Xcompiler /wd4190 -Xcompiler /wd4018 -Xcompiler /wd4275 -Xcompiler /wd4267 -Xcompiler /wd4244 -Xcompiler /wd4251 -Xcompiler /wd4819 -Xcompiler /MD -DTORCH_EXTENSION_NAME=quanto_cuda -DTORCH_API_INCLUDE_EXTENSION_H -IC:\ai-toolkit\venv\Lib\site-packages\torch\include -IC:\ai-toolkit\venv\Lib\site-packages\torch\include\torch\csrc\api\include -IC:\ai-toolkit\venv\Lib\site-packages\torch\include\TH -IC:\ai-toolkit\venv\Lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\include" -IC:\Python311\Include -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_89,code=sm_89 -std=c++17 --expt-extended-lambda --use_fast_math -DQUANTO_CUDA_ARCH=890 -c C:\ai-toolkit\venv\Lib\site-packages\optimum\quanto\library\extensions\cuda\awq\v2\gemm_cuda.cu -o gemm_cuda.cuda.o
FAILED: gemm_cuda.cuda.o
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\bin\nvcc --generate-dependencies-with-compile --dependency-output gemm_cuda.cuda.o.d -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcompiler /EHsc -Xcompiler /wd4068 -Xcompiler /wd4067 -Xcompiler /wd4624 -Xcompiler /wd4190 -Xcompiler /wd4018 -Xcompiler /wd4275 -Xcompiler /wd4267 -Xcompiler /wd4244 -Xcompiler /wd4251 -Xcompiler /wd4819 -Xcompiler /MD -DTORCH_EXTENSION_NAME=quanto_cuda -DTORCH_API_INCLUDE_EXTENSION_H -IC:\ai-toolkit\venv\Lib\site-packages\torch\include -IC:\ai-toolkit\venv\Lib\site-packages\torch\include\torch\csrc\api\include -IC:\ai-toolkit\venv\Lib\site-packages\torch\include\TH -IC:\ai-toolkit\venv\Lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\include" -IC:\Python311\Include -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_89,code=sm_89 -std=c++17 --expt-extended-lambda --use_fast_math -DQUANTO_CUDA_ARCH=890 -c C:\ai-toolkit\venv\Lib\site-packages\optimum\quanto\library\extensions\cuda\awq\v2\gemm_cuda.cu -o gemm_cuda.cuda.o
C:\ai-toolkit\venv\Lib\site-packages\optimum\quanto\library\extensions\cuda\awq\v2\gemm_cuda.cu(95): error: identifier "asm" is undefined
asm volatile(
^
C:\ai-toolkit\venv\Lib\site-packages\optimum\quanto\library\extensions\cuda\awq\v2\gemm_cuda.cu(98): error: expected a ")"
: "=r"(((unsigned *)(shared_warp + (ax0_0 * 8)))[0]), "=r"(((unsigned *)(shared_warp + (ax0_0 * 8)))[1]), "=r"(((unsigned *)(shared_warp + (ax0_0 * 8)))[2]), "=r"(((unsigned *)(shared_warp + (ax0_0 * 8)))[3])
^
C:\ai-toolkit\venv\Lib\site-packages\optimum\quanto\library\extensions\cuda\awq\v2\gemm_cuda.cu(104): error: identifier "asm" is undefined
asm volatile(
^
C:\ai-toolkit\venv\Lib\site-packages\optimum\quanto\library\extensions\cuda\awq\v2\gemm_cuda.cu(107): error: expected a ")"
: "=r"(((unsigned *)(shared_warp + (ax0_0 * 8)))[0]), "=r"(((unsigned *)(shared_warp + (ax0_0 * 8)))[1]), "=r"(((unsigned *)(shared_warp + (ax0_0 * 8)))[2]), "=r"(((unsigned *)(shared_warp + (ax0_0 * 8)))[3])
^
C:\ai-toolkit\venv\Lib\site-packages\optimum\quanto\library\extensions\cuda\awq\v2\gemm_cuda.cu(126): error: identifier "asm" is undefined
asm volatile(
^
C:\ai-toolkit\venv\Lib\site-packages\optimum\quanto\library\extensions\cuda\awq\v2\gemm_cuda.cu(129): error: expected a ")"
: "=f"(((float *)C_warp)[0]), "=f"(((float *)C_warp)[1]), "=f"(((float *)C_warp)[2]), "=f"(((float *)C_warp)[3])
^
6 errors detected in the compilation of "C:/ai-toolkit/venv/Lib/site-packages/optimum/quanto/library/extensions/cuda/awq/v2/gemm_cuda.cu".
gemm_cuda.cu
ninja: build stopped: subcommand failed.

Result:

  • 0 completed jobs
  • 1 failure

    Traceback (most recent call last):
    File "C:\ai-toolkit\venv\Lib\site-packages\torch\utils\cpp_extension.py", line 2105, in _run_ninja_build
    subprocess.run(
    File "C:\Python311\Lib\subprocess.py", line 571, in run
    raise CalledProcessError(retcode, process.args,
    subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "C:\ai-toolkit\run.py", line 90, in
main()
File "C:\ai-toolkit\run.py", line 86, in main
raise e
File "C:\ai-toolkit\run.py", line 78, in main
job.run()
File "C:\ai-toolkit\jobs\ExtensionJob.py", line 22, in run
process.run()
File "C:\ai-toolkit\jobs\process\BaseSDTrainProcess.py", line 1241, in run
self.sd.load_model()
File "C:\ai-toolkit\toolkit\stable_diffusion_model.py", line 613, in load_model
transformer.to(self.device_torch)
File "C:\ai-toolkit\venv\Lib\site-packages\torch\nn\modules\module.py", line 1174, in to
return self.apply(convert)
^^^^^^^^^^^^^^^^^^^^
File "C:\ai-toolkit\venv\Lib\site-packages\torch\nn\modules\module.py", line 780, in apply
module.apply(fn)
File "C:\ai-toolkit\venv\Lib\site-packages\torch\nn\modules\module.py", line 780, in apply
module.apply(fn)
File "C:\ai-toolkit\venv\Lib\site-packages\torch\nn\modules\module.py", line 780, in apply
module.apply(fn)
File "C:\ai-toolkit\venv\Lib\site-packages\torch\nn\modules\module.py", line 805, in apply
param_applied = fn(param)
^^^^^^^^^
File "C:\ai-toolkit\venv\Lib\site-packages\torch\nn\modules\module.py", line 1160, in convert
return t.to(
^^^^^
File "C:\ai-toolkit\venv\Lib\site-packages\optimum\quanto\tensor\weights\qbytes.py", line 273, in torch_function
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "C:\ai-toolkit\venv\Lib\site-packages\optimum\quanto\tensor\weights\qbytes.py", line 299, in torch_dispatch
return WeightQBytesTensor.create(
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\ai-toolkit\venv\Lib\site-packages\optimum\quanto\tensor\weights\qbytes.py", line 140, in create
return MarlinF8QBytesTensor(qtype, axis, size, stride, data, scale, requires_grad)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\ai-toolkit\venv\Lib\site-packages\optimum\quanto\tensor\weights\marlin\fp8\qbits.py", line 80, in init
data_packed = MarlinF8PackedTensor.pack(data) # pack fp8 data to in32, and apply marlier re-ordering.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\ai-toolkit\venv\Lib\site-packages\optimum\quanto\tensor\weights\marlin\fp8\packed.py", line 183, in pack
data_int32 = torch.ops.quanto.pack_fp8_marlin(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\ai-toolkit\venv\Lib\site-packages\torch_ops.py", line 1061, in call
return self
.op(*args, **(kwargs or {}))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\ai-toolkit\venv\Lib\site-packages\optimum\quanto\library\extensions\cuda__init
.py", line 164, in gptq_marlin_repack
return ext.lib.gptq_marlin_repack(b_q_weight, perm, size_k, size_n, num_bits)
^^^^^^^
File "C:\ai-toolkit\venv\Lib\site-packages\optimum\quanto\library\extensions\extension.py", line 42, in lib
self.lib = load(
^^^^^
File "C:\ai-toolkit\venv\Lib\site-packages\torch\utils\cpp_extension.py", line 1312, in load
return jit_compile(
^^^^^^^^^^^^^
File "C:\ai-toolkit\venv\Lib\site-packages\torch\utils\cpp_extension.py", line 1722, in jit_compile
write_ninja_file_and_build_library(
File "C:\ai-toolkit\venv\Lib\site-packages\torch\utils\cpp_extension.py", line 1834, in write_ninja_file_and_build_library
run_ninja_build(
File "C:\ai-toolkit\venv\Lib\site-packages\torch\utils\cpp_extension.py", line 2121, in run_ninja_build
raise RuntimeError(message) from e
RuntimeError: Error building extension 'quanto_cuda': [1/2] C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\bin\nvcc --generate-dependencies-with-compile --dependency-output gemm_cuda.cuda.o.d -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcompiler /EHsc -Xcompiler /wd4068 -Xcompiler /wd4067 -Xcompiler /wd4624 -Xcompiler /wd4190 -Xcompiler /wd4018 -Xcompiler /wd4275 -Xcompiler /wd4267 -Xcompiler /wd4244 -Xcompiler /wd4251 -Xcompiler /wd4819 -Xcompiler /MD -DTORCH_EXTENSION_NAME=quanto_cuda -DTORCH_API_INCLUDE_EXTENSION_H -IC:\ai-toolkit\venv\Lib\site-packages\torch\include -IC:\ai-toolkit\venv\Lib\site-packages\torch\include\torch\csrc\api\include -IC:\ai-toolkit\venv\Lib\site-packages\torch\include\TH -IC:\ai-toolkit\venv\Lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\include" -IC:\Python311\Include -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS
-D__CUDA_NO_HALF_CONVERSIONS
-D__CUDA_NO_BFLOAT16_CONVERSIONS
-D__CUDA_NO_HALF2_OPERATORS
--expt-relaxed-constexpr -gencode=arch=compute_89,code=sm_89 -std=c++17 --expt-extended-lambda --use_fast_math -DQUANTO_CUDA_ARCH=890 -c C:\ai-toolkit\venv\Lib\site-packages\optimum\quanto\library\extensions\cuda\awq\v2\gemm_cuda.cu -o gemm_cuda.cuda.o
FAILED: gemm_cuda.cuda.o
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\bin\nvcc --generate-dependencies-with-compile --dependency-output gemm_cuda.cuda.o.d -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcompiler /EHsc -Xcompiler /wd4068 -Xcompiler /wd4067 -Xcompiler /wd4624 -Xcompiler /wd4190 -Xcompiler /wd4018 -Xcompiler /wd4275 -Xcompiler /wd4267 -Xcompiler /wd4244 -Xcompiler /wd4251 -Xcompiler /wd4819 -Xcompiler /MD -DTORCH_EXTENSION_NAME=quanto_cuda -DTORCH_API_INCLUDE_EXTENSION_H -IC:\ai-toolkit\venv\Lib\site-packages\torch\include -IC:\ai-toolkit\venv\Lib\site-packages\torch\include\torch\csrc\api\include -IC:\ai-toolkit\venv\Lib\site-packages\torch\include\TH -IC:\ai-toolkit\venv\Lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\include" -IC:\Python311\Include -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS
-D__CUDA_NO_HALF_CONVERSIONS
-D__CUDA_NO_BFLOAT16_CONVERSIONS
_ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_89,code=sm_89 -std=c++17 --expt-extended-lambda --use_fast_math -DQUANTO_CUDA_ARCH=890 -c C:\ai-toolkit\venv\Lib\site-packages\optimum\quanto\library\extensions\cuda\awq\v2\gemm_cuda.cu -o gemm_cuda.cuda.o
C:\ai-toolkit\venv\Lib\site-packages\optimum\quanto\library\extensions\cuda\awq\v2\gemm_cuda.cu(95): error: identifier "asm" is undefined
asm volatile(
^
C:\ai-toolkit\venv\Lib\site-packages\optimum\quanto\library\extensions\cuda\awq\v2\gemm_cuda.cu(98): error: expected a ")"
: "=r"(((unsigned *)(shared_warp + (ax0_0 * 8)))[0]), "=r"(((unsigned *)(shared_warp + (ax0_0 * 8)))[1]), "=r"(((unsigned *)(shared_warp + (ax0_0 * 8)))[2]), "=r"(((unsigned *)(shared_warp + (ax0_0 * 8)))[3])
^
C:\ai-toolkit\venv\Lib\site-packages\optimum\quanto\library\extensions\cuda\awq\v2\gemm_cuda.cu(104): error: identifier "asm" is undefined
asm volatile(
^
C:\ai-toolkit\venv\Lib\site-packages\optimum\quanto\library\extensions\cuda\awq\v2\gemm_cuda.cu(107): error: expected a ")"
: "=r"(((unsigned *)(shared_warp + (ax0_0 * 8)))[0]), "=r"(((unsigned *)(shared_warp + (ax0_0 * 8)))[1]), "=r"(((unsigned *)(shared_warp + (ax0_0 * 8)))[2]), "=r"(((unsigned *)(shared_warp + (ax0_0 * 8)))[3])
^
C:\ai-toolkit\venv\Lib\site-packages\optimum\quanto\library\extensions\cuda\awq\v2\gemm_cuda.cu(126): error: identifier "asm" is undefined
asm volatile(
^
C:\ai-toolkit\venv\Lib\site-packages\optimum\quanto\library\extensions\cuda\awq\v2\gemm_cuda.cu(129): error: expected a ")"
: "=f"(((float *)C_warp)[0]), "=f"(((float *)C_warp)[1]), "=f"(((float *)C_warp)[2]), "=f"(((float *)C_warp)[3])
^
6 errors detected in the compilation of "C:/ai-toolkit/venv/Lib/site-packages/optimum/quanto/library/extensions/cuda/awq/v2/gemm_cuda.cu".
gemm_cuda.cu
ninja: build stopped: subcommand failed.
(venv) C:\ai-toolkit>python
Python 3.11.3 (tags/v3.11.3:f3909b8, Apr 4 2023, 23:49:59) [MSC v.1934 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.

import torch
import numpy
print(torch.version)
2.4.1+cu121
print(numpy.version)
1.26.3

I think I was running into the same issue, also with a 4090 and 32gb of RAM. I was looking for something to correct since I've been messing with CUDA tookit and torch versions for a few hours. Ran into this comment: #169 (comment) --- I downgraded to optimum-quanto 0.2.4 and the errors are gone and the batch is running now. 10% in but hoping it will be fine.

@sanctimon
Copy link

Same issue here on a 3090. Previous version of ai-toolkit still working fine.

@Astilen
Copy link

Astilen commented Oct 14, 2024

I think I was running into the same issue, also with a 4090 and 32gb of RAM. I was looking for something to correct since I've been messing with CUDA tookit and torch versions for a few hours. Ran into this comment: #169 (comment) --- I downgraded to optimum-quanto 0.2.4 and the errors are gone and the batch is running now. 10% in but hoping it will be fine.

SUCCESS!
Your suggestion worked, I trained 70 images for over 4 hours and I am very happy with the outcome.
I'll list some details of my build for those who still seek answers.

The latest build as of October 14th worked for me (RTX4090 + 32gb RAM) using:
CUDA 12.6.2
NVIDIA 565.90
cuDNN 9.5.0

You may need to adjust some directories here if the above installations have not done it automatically, ask an AI to explain:
System Properties > Advanced > Environment Variables

You also need Visual Studio 2022 (not just the build tools)
Ensure that "MSVC v143 - VS 2022 C++ x64/x86 build tools" (or v142 for VS2019) is selected during installation.
Under "Individual components", ensure you have "C++ CMake tools for Windows" and "Windows 10 SDK (10.0.19041)" checked.

This is the list of all installed packages and their versions within the environment, no more than a couple of them may have been unnecessarily been added/altered during troubleshooting, if you dont know what to do with this list give to an AI and ask to help you find and compare with yours:

absl-py==2.1.0
accelerate==1.0.0
aiofiles==23.2.1
albucore==0.0.16
albumentations==1.4.15
annotated-types==0.7.0
antlr4-python3-runtime==4.9.3
anyio==4.6.0
attrs==24.2.0
bitsandbytes==0.44.1
certifi==2024.8.30
charset-normalizer==3.4.0
clean-fid==0.1.35
click==8.1.7
clip-anytorch==2.6.0
colorama==0.4.6
controlnet-aux==0.0.7
dctorch==0.1.2
diffusers @ git+https://github.com/huggingface/diffusers.git@38a3e4df926c59bc122191c0fc8066755e98b6d2
docker-pycreds==0.4.0
einops==0.8.0
eval_type_backport==0.2.0
fastapi==0.115.0
ffmpy==0.4.0
filelock==3.13.1
flatten-json==0.1.14
fsspec==2024.2.0
ftfy==6.2.3
gitdb==4.0.11
GitPython==3.1.43
gradio==5.0.1
gradio_client==1.4.0
grpcio==1.66.2
h11==0.14.0
hf_transfer==0.1.8
httpcore==1.0.6
httpx==0.27.2
huggingface-hub==0.25.2
idna==3.10
imageio==2.35.1
importlib_metadata==8.5.0
invisible-watermark==0.2.0
Jinja2==3.1.3
jsonmerge==1.9.2
jsonschema==4.23.0
jsonschema-specifications==2024.10.1
k-diffusion==0.1.1.post1
kornia==0.7.3
kornia_rs==0.1.5
lazy_loader==0.4
lpips==0.1.4
lycoris-lora==1.8.3
Markdown==3.7
markdown-it-py==3.0.0
MarkupSafe==2.1.5
mdurl==0.1.2
mpmath==1.3.0
networkx==3.2.1
ninja==1.11.1.1
numpy==1.26.3
omegaconf==2.3.0
open_clip_torch==2.26.1
opencv-python==4.10.0.84
opencv-python-headless==4.10.0.84
optimum-quanto==0.2.4
orjson==3.10.7
oyaml==1.0
packaging==24.1
pandas==2.2.3
peft==0.13.1
pillow==10.2.0
platformdirs==4.3.6
prodigyopt==1.0
protobuf==5.28.2
psutil==6.0.0
pydantic==2.9.2
pydantic_core==2.23.4
pydub==0.25.1
Pygments==2.18.0
python-dateutil==2.9.0.post0
python-dotenv==1.0.1
python-multipart==0.0.12
python-slugify==8.0.4
pytorch-fid==0.3.0
pytz==2024.2
PyWavelets==1.7.0
PyYAML==6.0.2
referencing==0.35.1
regex==2024.9.11
requests==2.32.3
rich==13.9.2
rpds-py==0.20.0
ruff==0.6.9
safetensors==0.4.5
scikit-image==0.24.0
scipy==1.14.1
semantic-version==2.10.0
sentencepiece==0.2.0
sentry-sdk==2.16.0
setproctitle==1.3.3
shellingham==1.5.4
six==1.16.0
smmap==5.0.1
sniffio==1.3.1
starlette==0.38.6
sympy==1.12
tensorboard==2.18.0
tensorboard-data-server==0.7.2
text-unidecode==1.3
tifffile==2024.9.20
timm==1.0.9
tokenizers==0.20.1
toml==0.10.2
tomlkit==0.12.0
torch==2.4.1+cu121
torchdiffeq==0.2.4
torchsde==0.2.6
torchvision==0.19.1+cu121
tqdm==4.66.5
trampoline==0.1.2
transformers==4.45.2
typer==0.12.5
typing_extensions==4.9.0
tzdata==2024.2
urllib3==2.2.3
uvicorn==0.31.1
wandb==0.18.3
wcwidth==0.2.13
websockets==12.0
Werkzeug==3.0.4
zipp==3.20.2

IMPORTANT: As the amazing person above has advised us, the
optimum-quanto 0.2.5
must be downgraded to
optimum-quanto==0.2.4

I dont know if this next command enabled my build to work or not but here it is.
After entering your environment to attempt training, and before you run your final command to begin the training, you may enter this command to circumvent a related warning, your gpu may require a different number, ask an AI to help you find it:
set TORCH_CUDA_ARCH_LIST=8.9

This was all done in windows CMD in administrator mode to allow caching of all downloads, this is useful for reattempting the installation without downloading the same stuff.

Finally, at some point the build failed because of a "long path" type of error. Simply install this project close to your drives root.

Thanks everyone!

@pianogospel
Copy link
Author

One more question: how to downgrade optimum-quanto from 0.2.5 to 0.2.4?

@inflamously
Copy link

One more question: how to downgrade optimum-quanto from 0.2.5 to 0.2.4?

either pip install optimum-quanto==0.2.4 or change the line in requirements.txt. Pip will tell you if you have dependencies that mismatch. You'll need to adjust them.

@pianogospel
Copy link
Author

One more question: how to downgrade optimum-quanto from 0.2.5 to 0.2.4?

either pip install optimum-quanto==0.2.4 or change the line in requirements.txt. Pip will tell you if you have dependencies that mismatch. You'll need to adjust them.

Thanks inflamously, works flawlessly

@ironico
Copy link

ironico commented Oct 15, 2024

Hi,
After the update, it stopped working.
I deleted everything, cloned it again, installed everything using the optimum-quantum==0.2.4 library, but now I have a different error, as shown in the screenshot.
Does anyone have any suggestions to resolve this?

Screenshot 2024-10-15 154944

@dene-
Copy link

dene- commented Oct 15, 2024

Downgrade timm module: pip install timm==1.0.8 and it'll work again. @ironico

Should all dependencies in requirements.txt be frozen? We'll avoid future problems like this.

@ironico
Copy link

ironico commented Oct 15, 2024

Hi, @dene- Thank you!
It's work now

@hben35096
Copy link

I use pip install timm<=0.9.5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants