[BUG]Issue with Zero Optimization for Llama-2-7b Fine-Tuning on Intel GPUs #6713

molang66 · 2024-11-05T09:02:24Z

Describe the bug
I’m experiencing an issue when fine-tuning the Llama-2-7b model from Hugging Face with Zero optimization enabled. I am running on 8 Intel Max 1550 GPUs using the code from the examples provided in Intel Extension for DeepSpeed.

The model loads and runs successfully without Zero optimization, but when I enable Zero optimization (particularly with stage 3), I encounter the following errors:
[rank0]: RuntimeError: could not create an engine
2024:11:05-02:39:09:(678567) |CCL_INFO| finalizing level-zero
2024:11:05-02:39:09:(678567) |CCL_INFO| finalized level-zero
0%| | 0/50 [00:00<?, ?it/s]
2024:11:05-02:39:09:(678572) |CCL_INFO| finalizing level-zero
2024:11:05-02:39:09:(678566) |CCL_INFO| finalizing level-zero
...
[2024-11-05 02:39:10,447] [INFO] [launch.py:319:sigkill_handler] Killing subprocess 678572

**System info **
Model: Llama-2-7b from Hugging Face
GPUs: 8x Intel Max 1550 GPUs
Software:
• Intel Extension for pytorch
• DeepSpeed with Zero Optimization (Stage 3)
• oneCCL for communication backend

Launcher context
cd transformers
deepspeed --num_gpus=8 examples/pytorch/language-modeling/run_clm.py
--deepspeed tests/deepspeed/ds_config_zero3.json
--model_name_or_path meta-llama/Llama-2-7b-hf
--dataset_name wikitext
--dataset_config_name wikitext-2-raw-v1
--dataloader_num_workers 0
--per_device_train_batch_size 1
--warmup_steps 10
--max_steps 50
--bf16
--do_train
--output_dir /tmp/test-clm
--overwrite_output_dir

tjruwase · 2024-11-05T17:57:36Z

@delock, can you please help? Thanks!

Liangliang-Ma · 2024-11-07T07:01:34Z

@molang66 Hi, I reran the cmd that you pasted in this issue and found that no such error appeared. So I think that must be some version mismatch or outdated.

I verify the cmd with following versions:

Ubuntu 22.04.2 LTS
torch 2.3
intel-extension-for-pytorch 2.3.110
oneccl-bind-pt 2.3.0+gpu
(torch/ipex/onecclbindpt wheels can be found at https://pytorch-extension.intel.com/release-whl/stable/xpu/us/)
oneAPI 2024.2.1
GPU Driver 950.13 (rolling stable version)

Can you provide more details about your development environment? Or you can try using my verified versions :)

delock · 2024-11-08T02:30:48Z

@delock, can you please help? Thanks!

Hi @tjruwase, @Liangliang-Ma will followup with this issue. Thanks!

molang66 · 2024-11-08T06:41:40Z

Thank so much for help. I have updated my CCL version, and now I am encountering this issue：

[rank0]: Traceback (most recent call last):
[rank0]: File "/work2/09250/molang66/stampede3/transformers/examples/pytorch/language-modeling/run_clm.py", line 657, in
[rank0]: main()
[rank0]: File "/work2/09250/molang66/stampede3/transformers/examples/pytorch/language-modeling/run_clm.py", line 605, in main
[rank0]: train_result = trainer.train(resume_from_checkpoint=checkpoint)
[rank0]: File "/work2/09250/molang66/stampede3/miniconda3/envs/intel_xpu/lib/python3.9/site-packages/transformers/trainer.py", line 2141, in train
[rank0]: return inner_training_loop(
[rank0]: File "/work2/09250/molang66/stampede3/miniconda3/envs/intel_xpu/lib/python3.9/site-packages/transformers/trainer.py", line 2495, in _inner_training_loop
[rank0]: tr_loss_step = self.training_step(model, inputs, num_items_in_batch)
[rank0]: File "/work2/09250/molang66/stampede3/miniconda3/envs/intel_xpu/lib/python3.9/site-packages/transformers/trainer.py", line 3613, in training_step
[rank0]: loss = self.compute_loss(model, inputs, num_items_in_batch=num_items_in_batch)
[rank0]: File "/work2/09250/molang66/stampede3/miniconda3/envs/intel_xpu/lib/python3.9/site-packages/transformers/trainer.py", line 3667, in compute_loss
[rank0]: outputs = model(**inputs)
[rank0]: File "/work2/09250/molang66/stampede3/miniconda3/envs/intel_xpu/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/work2/09250/molang66/stampede3/miniconda3/envs/intel_xpu/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: File "/work2/09250/molang66/stampede3/miniconda3/envs/intel_xpu/lib/python3.9/site-packages/deepspeed/utils/nvtx.py", line 18, in wrapped_fn
[rank0]: ret_val = func(*args, **kwargs)
[rank0]: File "/work2/09250/molang66/stampede3/miniconda3/envs/intel_xpu/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 1899, in forward
[rank0]: loss = self.module(*inputs, **kwargs)
[rank0]: File "/work2/09250/molang66/stampede3/miniconda3/envs/intel_xpu/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/work2/09250/molang66/stampede3/miniconda3/envs/intel_xpu/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: File "/work2/09250/molang66/stampede3/miniconda3/envs/intel_xpu/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 1199, in forward
[rank0]: outputs = self.model(
[rank0]: File "/work2/09250/molang66/stampede3/miniconda3/envs/intel_xpu/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/work2/09250/molang66/stampede3/miniconda3/envs/intel_xpu/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: File "/work2/09250/molang66/stampede3/miniconda3/envs/intel_xpu/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 926, in forward
[rank0]: position_embeddings = self.rotary_emb(hidden_states, position_ids)
[rank0]: File "/work2/09250/molang66/stampede3/miniconda3/envs/intel_xpu/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/work2/09250/molang66/stampede3/miniconda3/envs/intel_xpu/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: File "/work2/09250/molang66/stampede3/miniconda3/envs/intel_xpu/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
[rank0]: return func(*args, **kwargs)
[rank0]: File "/work2/09250/molang66/stampede3/miniconda3/envs/intel_xpu/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 160, in forward
[rank0]: freqs = (inv_freq_expanded.float() @ position_ids_expanded.float()).transpose(1, 2)
[rank0]: RuntimeError: could not create an engine

I was running on Stampede3 cluster, and my environments are as follows

OS:  centos 
conda python = 3.9
intel_extension_for_pytorch   2.3.110+xpu
oneccl-bind-pt                2.3.100+xpu
torch                         2.3.1+cxx11.abi
oneAPI  2024.2.1

GPU driver:
[level_zero:gpu][level_zero:7] Intel(R) Level-Zero, Intel(R) Data Center GPU Max 1550 1.3 [1.3.27642]

Liangliang-Ma · 2024-11-11T01:56:41Z

Thank so much for help. I have updated my CCL version, and now I am encountering this issue：

[rank0]: Traceback (most recent call last):
[rank0]: File "/work2/09250/molang66/stampede3/transformers/examples/pytorch/language-modeling/run_clm.py", line 657, in
[rank0]: main()
[rank0]: File "/work2/09250/molang66/stampede3/transformers/examples/pytorch/language-modeling/run_clm.py", line 605, in main
[rank0]: train_result = trainer.train(resume_from_checkpoint=checkpoint)
[rank0]: File "/work2/09250/molang66/stampede3/miniconda3/envs/intel_xpu/lib/python3.9/site-packages/transformers/trainer.py", line 2141, in train
[rank0]: return inner_training_loop(
[rank0]: File "/work2/09250/molang66/stampede3/miniconda3/envs/intel_xpu/lib/python3.9/site-packages/transformers/trainer.py", line 2495, in _inner_training_loop
[rank0]: tr_loss_step = self.training_step(model, inputs, num_items_in_batch)
[rank0]: File "/work2/09250/molang66/stampede3/miniconda3/envs/intel_xpu/lib/python3.9/site-packages/transformers/trainer.py", line 3613, in training_step
[rank0]: loss = self.compute_loss(model, inputs, num_items_in_batch=num_items_in_batch)
[rank0]: File "/work2/09250/molang66/stampede3/miniconda3/envs/intel_xpu/lib/python3.9/site-packages/transformers/trainer.py", line 3667, in compute_loss
[rank0]: outputs = model(**inputs)
[rank0]: File "/work2/09250/molang66/stampede3/miniconda3/envs/intel_xpu/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/work2/09250/molang66/stampede3/miniconda3/envs/intel_xpu/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: File "/work2/09250/molang66/stampede3/miniconda3/envs/intel_xpu/lib/python3.9/site-packages/deepspeed/utils/nvtx.py", line 18, in wrapped_fn
[rank0]: ret_val = func(*args, **kwargs)
[rank0]: File "/work2/09250/molang66/stampede3/miniconda3/envs/intel_xpu/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 1899, in forward
[rank0]: loss = self.module(*inputs, **kwargs)
[rank0]: File "/work2/09250/molang66/stampede3/miniconda3/envs/intel_xpu/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/work2/09250/molang66/stampede3/miniconda3/envs/intel_xpu/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: File "/work2/09250/molang66/stampede3/miniconda3/envs/intel_xpu/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 1199, in forward
[rank0]: outputs = self.model(
[rank0]: File "/work2/09250/molang66/stampede3/miniconda3/envs/intel_xpu/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/work2/09250/molang66/stampede3/miniconda3/envs/intel_xpu/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: File "/work2/09250/molang66/stampede3/miniconda3/envs/intel_xpu/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 926, in forward
[rank0]: position_embeddings = self.rotary_emb(hidden_states, position_ids)
[rank0]: File "/work2/09250/molang66/stampede3/miniconda3/envs/intel_xpu/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/work2/09250/molang66/stampede3/miniconda3/envs/intel_xpu/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: File "/work2/09250/molang66/stampede3/miniconda3/envs/intel_xpu/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
[rank0]: return func(*args, **kwargs)
[rank0]: File "/work2/09250/molang66/stampede3/miniconda3/envs/intel_xpu/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 160, in forward
[rank0]: freqs = (inv_freq_expanded.float() @ position_ids_expanded.float()).transpose(1, 2)
[rank0]: RuntimeError: could not create an engine

I was running on Stampede3 cluster, and my environments are as follows
OS:  centos 
conda python = 3.9
intel_extension_for_pytorch   2.3.110+xpu
oneccl-bind-pt                2.3.100+xpu
torch                         2.3.1+cxx11.abi
oneAPI  2024.2.1
GPU driver: [level_zero:gpu][level_zero:7] Intel(R) Level-Zero, Intel(R) Data Center GPU Max 1550 1.3 [1.3.27642]

Do you have the latest version of deepspeed? I have seen similar issue for outdated deepspeed.

molang66 · 2024-11-11T04:49:24Z

@Liangliang-Ma My deepspeed version is 0.15.3. I think this is the latest version.
here is my pip list:

absl-py                       2.1.0
accelerate                    1.1.1
aiohappyeyeballs              2.4.3
aiohttp                       3.10.10
aiosignal                     1.3.1
annotated-types               0.7.0
async-timeout                 4.0.3
attrs                         24.2.0
bitsandbytes                  0.44.1
certifi                       2024.8.30
chardet                       5.2.0
charset-normalizer            3.4.0
click                         8.1.7
colorama                      0.4.6
cpuid                         0.0.11
cpuid-native                  0.0.8
DataProperty                  1.0.1
datasets                      3.1.0
deepspeed                     0.15.3
diffusers                     0.31.0
dill                          0.3.8
dpcpp-cpp-rt                  2024.2.1
einops                        0.8.0
evaluate                      0.4.3
filelock                      3.16.1
frozenlist                    1.5.0
fsspec                        2024.9.0
hjson                         3.1.0
huggingface-hub               0.26.2
idna                          3.10
impi-devel                    2021.13.1
impi-rt                       2021.13.1
importlib_metadata            8.5.0
intel-cmplr-lib-rt            2024.2.1
intel-cmplr-lib-ur            2024.2.1
intel-cmplr-lic-rt            2024.2.1
intel_extension_for_pytorch   2.3.110+xpu
intel-opencl-rt               2024.2.1
intel-openmp                  2024.2.1
intel-sycl-rt                 2024.2.1
Jinja2                        3.1.4
joblib                        1.4.2
jsonlines                     4.0.0
lm_eval                       0.4.2
lxml                          5.3.0
MarkupSafe                    3.0.2
mbstrdecoder                  1.1.3
mkl                           2024.2.1
mkl-dpcpp                     2024.2.1
more-itertools                10.5.0
mpi4py                        4.0.1
mpmath                        1.3.0
msgpack                       1.1.0
multidict                     6.1.0
multiprocess                  0.70.16
networkx                      3.2.1
ninja                         1.11.1.1
nltk                          3.9.1
numexpr                       2.10.1
numpy                         2.0.2
oneccl-bind-pt                2.3.100+xpu
oneccl-devel                  2021.13.1
onemkl-sycl-blas              2024.2.1
onemkl-sycl-datafitting       2024.2.1
onemkl-sycl-dft               2024.2.1
onemkl-sycl-lapack            2024.2.1
onemkl-sycl-rng               2024.2.1
onemkl-sycl-sparse            2024.2.1
onemkl-sycl-stats             2024.2.1
onemkl-sycl-vm                2024.2.1
packaging                     24.1
pandas                        2.2.3
pathvalidate                  3.2.1
peft                          0.13.2
pillow                        11.0.0
pip                           24.3.1
portalocker                   2.10.1
propcache                     0.2.0
protobuf                      3.20.3
psutil                        6.1.0
py-cpuinfo                    9.0.0
pyarrow                       18.0.0
pybind11                      2.13.6
pydantic                      2.9.2
pydantic_core                 2.23.4
pytablewriter                 1.2.0
python-dateutil               2.9.0.post0
pytz                          2024.2
PyYAML                        6.0.2
regex                         2024.11.6
requests                      2.32.3
rouge_score                   0.1.2
ruamel.yaml                   0.18.6
ruamel.yaml.clib              0.2.12
sacrebleu                     2.4.3
safetensors                   0.4.5
scikit-learn                  1.5.2
scipy                         1.13.1
sentencepiece                 0.2.0
setuptools                    75.3.0
six                           1.16.0
sqlitedict                    2.1.0
sympy                         1.13.3
tabledata                     1.3.3
tabulate                      0.9.0
tbb                           2021.13.1
tcolorpy                      0.1.6
threadpoolctl                 3.5.0
tokenizers                    0.20.3
torch                         2.3.1+cxx11.abi
torchaudio                    2.3.1+cxx11.abi
torchvision                   0.18.1+cxx11.abi
tqdm                          4.67.0
tqdm-multiprocess             0.0.11
transformers                  4.47.0.dev0
transformers-stream-generator 0.0.5
typepy                        1.3.2
typing_extensions             4.12.2
tzdata                        2024.2
unzip                         1.0.0
urllib3                       2.2.3
wheel                         0.44.0
word2number                   1.1
xxhash                        3.5.0
yarl                          1.17.1
zipp                          3.20.2
zstandard                     0.23.0

Could it be my gpu drive version? I don't know what the latest version of the drive is

Liangliang-Ma · 2024-11-13T07:45:37Z

@molang66 I think the gpu driver version is different. Can you try the gpu verison: GPU Driver 950.13 (rolling stable version) and test again?

molang66 · 2024-11-14T07:34:02Z

@Liangliang-Ma Thank you for your response. I’d like to know which command can check the GPU driver; I didn’t see any indication for the rolling stable version.
Additionally, I tried updating oneAPI to 25.0 and DeepSpeed to 0.15.4, but encountered the following error when compiling FusedAdam:

Using /home1/09250/molang66/.cache/torch_extensions/py39_xpu as PyTorch extensions root...
/work2/09250/molang66/stampede3/miniconda3/envs/intel/lib/python3.9/site-packages/intel_extension_for_pytorch/utils/_logger.py:67: UserWarning: [MissingDependency]
                               !! WARNING !!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (/scratch/projects/compilers/intel25.0/compiler/2025.0/bin/icpx 1.9.0) may be ABI-incompatible with PyTorch!
Please use a compiler that is ABI-compatible with GCC 5.0 and above.
See https://gcc.gnu.org/onlinedocs/libstdc++/manual/abi.html.
See https://gist.github.com/goldsborough/d466f43e8ffc948ff92de7486c5216d6
for instructions on how to install GCC 5 or higher.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
                              !! WARNING !!

  warnings.warn(msg)
2024-11-14 01:20:59,864 - _logger.py - IPEX - WARNING - [MissingDependency]
                               !! WARNING !!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (/scratch/projects/compilers/intel25.0/compiler/2025.0/bin/icpx 1.9.0) may be ABI-incompatible with PyTorch!
Please use a compiler that is ABI-compatible with GCC 5.0 and above.
See https://gcc.gnu.org/onlinedocs/libstdc++/manual/abi.html.
See https://gist.github.com/goldsborough/d466f43e8ffc948ff92de7486c5216d6
for instructions on how to install GCC 5 or higher.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
                              !! WARNING !!

Emitting ninja build file /home1/09250/molang66/.cache/torch_extensions/py39_xpu/fused_adam/build.ninja...
Building extension module fused_adam...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/2] /scratch/projects/compilers/intel25.0/compiler/2025.0/bin/icpx -MMD -MF multi_tensor_adam.dp.o.d -DTORCH_EXTENSION_NAME=fused_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1016\" -I/work2/09250/molang66/stampede3/miniconda3/envs/intel/lib/python3.9/site-packages/deepspeed/ops/csrc/xpu/includes -I/work2/09250/molang66/stampede3/miniconda3/envs/intel/lib/python3.9/site-packages/deepspeed/ops/csrc/xpu/adam -isystem /work2/09250/molang66/stampede3/miniconda3/envs/intel/lib/python3.9/site-packages/torch/include -isystem /work2/09250/molang66/stampede3/miniconda3/envs/intel/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /work2/09250/molang66/stampede3/miniconda3/envs/intel/lib/python3.9/site-packages/torch/include/TH -isystem /scratch/projects/compilers/intel25.0/compiler/2025.0/linux/include -isystem /scratch/projects/compilers/intel25.0/compiler/2025.0/linux/include/sycl -isystem /scratch/projects/compilers/intel25.0/mkl/2025.0/include -isystem /work2/09250/molang66/stampede3/miniconda3/envs/intel/lib/python3.9/site-packages/intel_extension_for_pytorch/include -isystem /work2/09250/molang66/stampede3/miniconda3/envs/intel/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=1 -fPIC -std=c++17 -fsycl -fsycl-targets=spir64_gen -g -gdwarf-4 -O3 -std=c++17 -fPIC -DMKL_ILP64 -fno-strict-aliasing -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DVERSION_GE_1_5 -fsycl -c /work2/09250/molang66/stampede3/miniconda3/envs/intel/lib/python3.9/site-packages/deepspeed/ops/csrc/xpu/adam/multi_tensor_adam.dp.cpp -o multi_tensor_adam.dp.o
FAILED: multi_tensor_adam.dp.o
/scratch/projects/compilers/intel25.0/compiler/2025.0/bin/icpx -MMD -MF multi_tensor_adam.dp.o.d -DTORCH_EXTENSION_NAME=fused_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1016\" -I/work2/09250/molang66/stampede3/miniconda3/envs/intel/lib/python3.9/site-packages/deepspeed/ops/csrc/xpu/includes -I/work2/09250/molang66/stampede3/miniconda3/envs/intel/lib/python3.9/site-packages/deepspeed/ops/csrc/xpu/adam -isystem /work2/09250/molang66/stampede3/miniconda3/envs/intel/lib/python3.9/site-packages/torch/include -isystem /work2/09250/molang66/stampede3/miniconda3/envs/intel/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /work2/09250/molang66/stampede3/miniconda3/envs/intel/lib/python3.9/site-packages/torch/include/TH -isystem /scratch/projects/compilers/intel25.0/compiler/2025.0/linux/include -isystem /scratch/projects/compilers/intel25.0/compiler/2025.0/linux/include/sycl -isystem /scratch/projects/compilers/intel25.0/mkl/2025.0/include -isystem /work2/09250/molang66/stampede3/miniconda3/envs/intel/lib/python3.9/site-packages/intel_extension_for_pytorch/include -isystem /work2/09250/molang66/stampede3/miniconda3/envs/intel/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=1 -fPIC -std=c++17 -fsycl -fsycl-targets=spir64_gen -g -gdwarf-4 -O3 -std=c++17 -fPIC -DMKL_ILP64 -fno-strict-aliasing -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DVERSION_GE_1_5 -fsycl -c /work2/09250/molang66/stampede3/miniconda3/envs/intel/lib/python3.9/site-packages/deepspeed/ops/csrc/xpu/adam/multi_tensor_adam.dp.cpp -o multi_tensor_adam.dp.o
In file included from /work2/09250/molang66/stampede3/miniconda3/envs/intel/lib/python3.9/site-packages/deepspeed/ops/csrc/xpu/adam/multi_tensor_adam.dp.cpp:18:
/work2/09250/molang66/stampede3/miniconda3/envs/intel/lib/python3.9/site-packages/deepspeed/ops/csrc/xpu/adam/multi_tensor_apply.dp.hpp:192:70: warning: 'constant_buffer' is deprecated: use 'target::device' instead [-Wdeprecated-declarations]
  192 |                                                        sycl::target::constant_buffer>(cgh);
      |                                                                      ^
/scratch/projects/compilers/intel25.0/compiler/2025.0/bin/compiler/../../include/sycl/access/access.hpp:24:19: note: 'constant_buffer' has been explicitly marked deprecated here
   24 |   constant_buffer __SYCL2020_DEPRECATED("use 'target::device' instead") = 2015,
      |                   ^
/scratch/projects/compilers/intel25.0/compiler/2025.0/bin/compiler/../../include/sycl/detail/defines_elementary.hpp:62:40: note: expanded from macro '__SYCL2020_DEPRECATED'
   62 | #define __SYCL2020_DEPRECATED(message) __SYCL_DEPRECATED(message)
      |                                        ^
/scratch/projects/compilers/intel25.0/compiler/2025.0/bin/compiler/../../include/sycl/detail/defines_elementary.hpp:53:38: note: expanded from macro '__SYCL_DEPRECATED'
   53 | #define __SYCL_DEPRECATED(message) [[deprecated(message)]]
      |                                      ^
In file included from /work2/09250/molang66/stampede3/miniconda3/envs/intel/lib/python3.9/site-packages/deepspeed/ops/csrc/xpu/adam/multi_tensor_adam.dp.cpp:18:
/work2/09250/molang66/stampede3/miniconda3/envs/intel/lib/python3.9/site-packages/deepspeed/ops/csrc/xpu/adam/multi_tensor_apply.dp.hpp:199:17: warning: expression result unused [-Wunused-value]
  199 |                 0;
      |                 ^
In file included from /work2/09250/molang66/stampede3/miniconda3/envs/intel/lib/python3.9/site-packages/deepspeed/ops/csrc/xpu/adam/multi_tensor_adam.dp.cpp:19:
/work2/09250/molang66/stampede3/miniconda3/envs/intel/lib/python3.9/site-packages/deepspeed/ops/csrc/xpu/includes/type_shim.h:91:54: warning: 'this_nd_item<3>' is deprecated: use sycl::ext::oneapi::this_work_item::get_nd_item() instead [-Wdeprecated-declarations]
   91 |     auto item_ct1 = sycl::ext::oneapi::experimental::this_nd_item<3>();
      |                                                      ^
/scratch/projects/compilers/intel25.0/compiler/2025.0/bin/compiler/../../include/sycl/ext/oneapi/free_function_queries.hpp:50:1: note: 'this_nd_item<3>' has been explicitly marked deprecated here
   50 | __SYCL_DEPRECATED(
      | ^
/scratch/projects/compilers/intel25.0/compiler/2025.0/bin/compiler/../../include/sycl/detail/defines_elementary.hpp:53:38: note: expanded from macro '__SYCL_DEPRECATED'
   53 | #define __SYCL_DEPRECATED(message) [[deprecated(message)]]
      |                                      ^
/work2/09250/molang66/stampede3/miniconda3/envs/intel/lib/python3.9/site-packages/deepspeed/ops/csrc/xpu/adam/multi_tensor_adam.dp.cpp:45:58: warning: 'this_nd_item<3>' is deprecated: use sycl::ext::oneapi::this_work_item::get_nd_item() instead [-Wdeprecated-declarations]
   45 |         auto item_ct1 = sycl::ext::oneapi::experimental::this_nd_item<3>();
      |                                                          ^
/scratch/projects/compilers/intel25.0/compiler/2025.0/bin/compiler/../../include/sycl/ext/oneapi/free_function_queries.hpp:50:1: note: 'this_nd_item<3>' has been explicitly marked deprecated here
   50 | __SYCL_DEPRECATED(
      | ^
/scratch/projects/compilers/intel25.0/compiler/2025.0/bin/compiler/../../include/sycl/detail/defines_elementary.hpp:53:38: note: expanded from macro '__SYCL_DEPRECATED'
   53 | #define __SYCL_DEPRECATED(message) [[deprecated(message)]]
      |                                      ^
In file included from /work2/09250/molang66/stampede3/miniconda3/envs/intel/lib/python3.9/site-packages/deepspeed/ops/csrc/xpu/adam/multi_tensor_adam.dp.cpp:11:
In file included from /work2/09250/molang66/stampede3/miniconda3/envs/intel/lib/python3.9/site-packages/torch/include/ATen/ATen.h:7:
In file included from /work2/09250/molang66/stampede3/miniconda3/envs/intel/lib/python3.9/site-packages/torch/include/ATen/Context.h:3:
In file included from /work2/09250/molang66/stampede3/miniconda3/envs/intel/lib/python3.9/site-packages/torch/include/ATen/CPUGeneratorImpl.h:3:
In file included from /work2/09250/molang66/stampede3/miniconda3/envs/intel/lib/python3.9/site-packages/torch/include/ATen/core/Generator.h:21:
In file included from /work2/09250/molang66/stampede3/miniconda3/envs/intel/lib/python3.9/site-packages/torch/include/c10/core/GeneratorImpl.h:8:
In file included from /work2/09250/molang66/stampede3/miniconda3/envs/intel/lib/python3.9/site-packages/torch/include/c10/core/TensorImpl.h:11:
In file included from /work2/09250/molang66/stampede3/miniconda3/envs/intel/lib/python3.9/site-packages/torch/include/c10/core/ScalarType.h:7:
In file included from /work2/09250/molang66/stampede3/miniconda3/envs/intel/lib/python3.9/site-packages/torch/include/c10/util/Float8_e4m3fnuz.h:136:
In file included from /work2/09250/molang66/stampede3/miniconda3/envs/intel/lib/python3.9/site-packages/torch/include/c10/util/Float8_e4m3fnuz-inl.h:4:
In file included from /work2/09250/molang66/stampede3/miniconda3/envs/intel/lib/python3.9/site-packages/torch/include/c10/util/Float8_fnuz_cvt.h:4:
In file included from /scratch/projects/compilers/intel25.0/compiler/2025.0/bin/compiler/../../include/sycl/sycl.hpp:25:
In file included from /scratch/projects/compilers/intel25.0/compiler/2025.0/bin/compiler/../../include/sycl/detail/core.hpp:21:
In file included from /scratch/projects/compilers/intel25.0/compiler/2025.0/bin/compiler/../../include/sycl/accessor.hpp:15:
/scratch/projects/compilers/intel25.0/compiler/2025.0/bin/compiler/../../include/sycl/buffer.hpp:174:17: error: static assertion failed due to requirement 'is_device_copyable_v<multi_tensor_apply_kernel<TensorListMetadata<4>, AdamFunctor<double>, float, float, float, float, float, float, adamMode_t, float>>': Underlying type of a buffer must be device copyable!
  174 |   static_assert(is_device_copyable_v<T>,
      |                 ^~~~~~~~~~~~~~~~~~~~~~~
/work2/09250/molang66/stampede3/miniconda3/envs/intel/lib/python3.9/site-packages/deepspeed/ops/csrc/xpu/adam/multi_tensor_apply.dp.hpp:187:34: note: in instantiation of template class 'sycl::buffer<multi_tensor_apply_kernel<TensorListMetadata<4>, AdamFunctor<double>, float, float, float, float, float, float, adamMode_t, float>>' requested here
  187 |                     sycl::buffer params(const_cast<const decltype(capture)*>(&capture),
      |                                  ^
/work2/09250/molang66/stampede3/miniconda3/envs/intel/lib/python3.9/site-packages/deepspeed/ops/csrc/xpu/adam/multi_tensor_adam.dp.cpp:146:36: note: in instantiation of function template specialization 'multi_tensor_apply<4, AdamFunctor<double>, float, float, float, float, float, float, adamMode_t, float>' requested here
  146 |                                    multi_tensor_apply<4>(BLOCK_SIZE,
      |                                    ^
In file included from /work2/09250/molang66/stampede3/miniconda3/envs/intel/lib/python3.9/site-packages/deepspeed/ops/csrc/xpu/adam/multi_tensor_adam.dp.cpp:18:
/work2/09250/molang66/stampede3/miniconda3/envs/intel/lib/python3.9/site-packages/deepspeed/ops/csrc/xpu/adam/multi_tensor_apply.dp.hpp:199:17: warning: expression result unused [-Wunused-value]
  199 |                 0;
      |                 ^
/work2/09250/molang66/stampede3/miniconda3/envs/intel/lib/python3.9/site-packages/deepspeed/ops/csrc/xpu/adam/multi_tensor_adam.dp.cpp:146:36: note: in instantiation of function template specialization 'multi_tensor_apply<4, AdamFunctor<double>, float, float, float, float, float, float, adamMode_t, float>' requested here
  146 |                                    multi_tensor_apply<4>(BLOCK_SIZE,
      |                                    ^
In file included from /work2/09250/molang66/stampede3/miniconda3/envs/intel/lib/python3.9/site-packages/deepspeed/ops/csrc/xpu/adam/multi_tensor_adam.dp.cpp:11:
In file included from /work2/09250/molang66/stampede3/miniconda3/envs/intel/lib/python3.9/site-packages/torch/include/ATen/ATen.h:7:
In file included from /work2/09250/molang66/stampede3/miniconda3/envs/intel/lib/python3.9/site-packages/torch/include/ATen/Context.h:3:
In file included from /work2/09250/molang66/stampede3/miniconda3/envs/intel/lib/python3.9/site-packages/torch/include/ATen/CPUGeneratorImpl.h:3:
In file included from /work2/09250/molang66/stampede3/miniconda3/envs/intel/lib/python3.9/site-packages/torch/include/ATen/core/Generator.h:21:
In file included from /work2/09250/molang66/stampede3/miniconda3/envs/intel/lib/python3.9/site-packages/torch/include/c10/core/GeneratorImpl.h:8:
In file included from /work2/09250/molang66/stampede3/miniconda3/envs/intel/lib/python3.9/site-packages/torch/include/c10/core/TensorImpl.h:11:
In file included from /work2/09250/molang66/stampede3/miniconda3/envs/intel/lib/python3.9/site-packages/torch/include/c10/core/ScalarType.h:7:
In file included from /work2/09250/molang66/stampede3/miniconda3/envs/intel/lib/python3.9/site-packages/torch/include/c10/util/Float8_e4m3fnuz.h:136:
In file included from /work2/09250/molang66/stampede3/miniconda3/envs/intel/lib/python3.9/site-packages/torch/include/c10/util/Float8_e4m3fnuz-inl.h:4:
In file included from /work2/09250/molang66/stampede3/miniconda3/envs/intel/lib/python3.9/site-packages/torch/include/c10/util/Float8_fnuz_cvt.h:4:
In file included from /scratch/projects/compilers/intel25.0/compiler/2025.0/bin/compiler/../../include/sycl/sycl.hpp:25:
In file included from /scratch/projects/compilers/intel25.0/compiler/2025.0/bin/compiler/../../include/sycl/detail/core.hpp:21:
In file included from /scratch/projects/compilers/intel25.0/compiler/2025.0/bin/compiler/../../include/sycl/accessor.hpp:15:
/scratch/projects/compilers/intel25.0/compiler/2025.0/bin/compiler/../../include/sycl/buffer.hpp:174:17: error: static assertion failed due to requirement 'is_device_copyable_v<multi_tensor_apply_kernel<TensorListMetadata<4>, AdamFunctor<float>, float, float, float, float, float, float, adamMode_t, float>>': Underlying type of a buffer must be device copyable!
  174 |   static_assert(is_device_copyable_v<T>,
      |                 ^~~~~~~~~~~~~~~~~~~~~~~
/work2/09250/molang66/stampede3/miniconda3/envs/intel/lib/python3.9/site-packages/deepspeed/ops/csrc/xpu/adam/multi_tensor_apply.dp.hpp:187:34: note: in instantiation of template class 'sycl::buffer<multi_tensor_apply_kernel<TensorListMetadata<4>, AdamFunctor<float>, float, float, float, float, float, float, adamMode_t, float>>' requested here
  187 |                     sycl::buffer params(const_cast<const decltype(capture)*>(&capture),
      |                                  ^
/work2/09250/molang66/stampede3/miniconda3/envs/intel/lib/python3.9/site-packages/deepspeed/ops/csrc/xpu/adam/multi_tensor_adam.dp.cpp:146:36: note: in instantiation of function template specialization 'multi_tensor_apply<4, AdamFunctor<float>, float, float, float, float, float, float, adamMode_t, float>' requested here
  146 |                                    multi_tensor_apply<4>(BLOCK_SIZE,
      |                                    ^
In file included from /work2/09250/molang66/stampede3/miniconda3/envs/intel/lib/python3.9/site-packages/deepspeed/ops/csrc/xpu/adam/multi_tensor_adam.dp.cpp:18:
/work2/09250/molang66/stampede3/miniconda3/envs/intel/lib/python3.9/site-packages/deepspeed/ops/csrc/xpu/adam/multi_tensor_apply.dp.hpp:199:17: warning: expression result unused [-Wunused-value]
  199 |                 0;
      |                 ^
/work2/09250/molang66/stampede3/miniconda3/envs/intel/lib/python3.9/site-packages/deepspeed/ops/csrc/xpu/adam/multi_tensor_adam.dp.cpp:146:36: note: in instantiation of function template specialization 'multi_tensor_apply<4, AdamFunctor<float>, float, float, float, float, float, float, adamMode_t, float>' requested here
  146 |                                    multi_tensor_apply<4>(BLOCK_SIZE,
      |                                    ^
In file included from /work2/09250/molang66/stampede3/miniconda3/envs/intel/lib/python3.9/site-packages/deepspeed/ops/csrc/xpu/adam/multi_tensor_adam.dp.cpp:11:
In file included from /work2/09250/molang66/stampede3/miniconda3/envs/intel/lib/python3.9/site-packages/torch/include/ATen/ATen.h:7:
In file included from /work2/09250/molang66/stampede3/miniconda3/envs/intel/lib/python3.9/site-packages/torch/include/ATen/Context.h:3:
In file included from /work2/09250/molang66/stampede3/miniconda3/envs/intel/lib/python3.9/site-packages/torch/include/ATen/CPUGeneratorImpl.h:3:
In file included from /work2/09250/molang66/stampede3/miniconda3/envs/intel/lib/python3.9/site-packages/torch/include/ATen/core/Generator.h:21:
In file included from /work2/09250/molang66/stampede3/miniconda3/envs/intel/lib/python3.9/site-packages/torch/include/c10/core/GeneratorImpl.h:8:
In file included from /work2/09250/molang66/stampede3/miniconda3/envs/intel/lib/python3.9/site-packages/torch/include/c10/core/TensorImpl.h:11:
In file included from /work2/09250/molang66/stampede3/miniconda3/envs/intel/lib/python3.9/site-packages/torch/include/c10/core/ScalarType.h:7:
In file included from /work2/09250/molang66/stampede3/miniconda3/envs/intel/lib/python3.9/site-packages/torch/include/c10/util/Float8_e4m3fnuz.h:136:
In file included from /work2/09250/molang66/stampede3/miniconda3/envs/intel/lib/python3.9/site-packages/torch/include/c10/util/Float8_e4m3fnuz-inl.h:4:
In file included from /work2/09250/molang66/stampede3/miniconda3/envs/intel/lib/python3.9/site-packages/torch/include/c10/util/Float8_fnuz_cvt.h:4:
In file included from /scratch/projects/compilers/intel25.0/compiler/2025.0/bin/compiler/../../include/sycl/sycl.hpp:25:
In file included from /scratch/projects/compilers/intel25.0/compiler/2025.0/bin/compiler/../../include/sycl/detail/core.hpp:21:
In file included from /scratch/projects/compilers/intel25.0/compiler/2025.0/bin/compiler/../../include/sycl/accessor.hpp:15:
/scratch/projects/compilers/intel25.0/compiler/2025.0/bin/compiler/../../include/sycl/buffer.hpp:174:17: error: static assertion failed due to requirement 'is_device_copyable_v<multi_tensor_apply_kernel<TensorListMetadata<4>, AdamFunctor<c10::Half>, float, float, float, float, float, float, adamMode_t, float>>': Underlying type of a buffer must be device copyable!
  174 |   static_assert(is_device_copyable_v<T>,
      |                 ^~~~~~~~~~~~~~~~~~~~~~~
/work2/09250/molang66/stampede3/miniconda3/envs/intel/lib/python3.9/site-packages/deepspeed/ops/csrc/xpu/adam/multi_tensor_apply.dp.hpp:187:34: note: in instantiation of template class 'sycl::buffer<multi_tensor_apply_kernel<TensorListMetadata<4>, AdamFunctor<c10::Half>, float, float, float, float, float, float, adamMode_t, float>>' requested here
  187 |                     sycl::buffer params(const_cast<const decltype(capture)*>(&capture),
      |                                  ^
/work2/09250/molang66/stampede3/miniconda3/envs/intel/lib/python3.9/site-packages/deepspeed/ops/csrc/xpu/adam/multi_tensor_adam.dp.cpp:146:36: note: in instantiation of function template specialization 'multi_tensor_apply<4, AdamFunctor<c10::Half>, float, float, float, float, float, float, adamMode_t, float>' requested here
  146 |                                    multi_tensor_apply<4>(BLOCK_SIZE,
      |                                    ^
In file included from /work2/09250/molang66/stampede3/miniconda3/envs/intel/lib/python3.9/site-packages/deepspeed/ops/csrc/xpu/adam/multi_tensor_adam.dp.cpp:18:
/work2/09250/molang66/stampede3/miniconda3/envs/intel/lib/python3.9/site-packages/deepspeed/ops/csrc/xpu/adam/multi_tensor_apply.dp.hpp:199:17: warning: expression result unused [-Wunused-value]
  199 |                 0;
      |                 ^
/work2/09250/molang66/stampede3/miniconda3/envs/intel/lib/python3.9/site-packages/deepspeed/ops/csrc/xpu/adam/multi_tensor_adam.dp.cpp:146:36: note: in instantiation of function template specialization 'multi_tensor_apply<4, AdamFunctor<c10::Half>, float, float, float, float, float, float, adamMode_t, float>' requested here
  146 |                                    multi_tensor_apply<4>(BLOCK_SIZE,
      |                                    ^
In file included from /work2/09250/molang66/stampede3/miniconda3/envs/intel/lib/python3.9/site-packages/deepspeed/ops/csrc/xpu/adam/multi_tensor_adam.dp.cpp:11:
In file included from /work2/09250/molang66/stampede3/miniconda3/envs/intel/lib/python3.9/site-packages/torch/include/ATen/ATen.h:7:
In file included from /work2/09250/molang66/stampede3/miniconda3/envs/intel/lib/python3.9/site-packages/torch/include/ATen/Context.h:3:
In file included from /work2/09250/molang66/stampede3/miniconda3/envs/intel/lib/python3.9/site-packages/torch/include/ATen/CPUGeneratorImpl.h:3:
In file included from /work2/09250/molang66/stampede3/miniconda3/envs/intel/lib/python3.9/site-packages/torch/include/ATen/core/Generator.h:21:
In file included from /work2/09250/molang66/stampede3/miniconda3/envs/intel/lib/python3.9/site-packages/torch/include/c10/core/GeneratorImpl.h:8:
In file included from /work2/09250/molang66/stampede3/miniconda3/envs/intel/lib/python3.9/site-packages/torch/include/c10/core/TensorImpl.h:11:
In file included from /work2/09250/molang66/stampede3/miniconda3/envs/intel/lib/python3.9/site-packages/torch/include/c10/core/ScalarType.h:7:
In file included from /work2/09250/molang66/stampede3/miniconda3/envs/intel/lib/python3.9/site-packages/torch/include/c10/util/Float8_e4m3fnuz.h:136:
In file included from /work2/09250/molang66/stampede3/miniconda3/envs/intel/lib/python3.9/site-packages/torch/include/c10/util/Float8_e4m3fnuz-inl.h:4:
In file included from /work2/09250/molang66/stampede3/miniconda3/envs/intel/lib/python3.9/site-packages/torch/include/c10/util/Float8_fnuz_cvt.h:4:
In file included from /scratch/projects/compilers/intel25.0/compiler/2025.0/bin/compiler/../../include/sycl/sycl.hpp:25:
In file included from /scratch/projects/compilers/intel25.0/compiler/2025.0/bin/compiler/../../include/sycl/detail/core.hpp:21:
In file included from /scratch/projects/compilers/intel25.0/compiler/2025.0/bin/compiler/../../include/sycl/accessor.hpp:15:
/scratch/projects/compilers/intel25.0/compiler/2025.0/bin/compiler/../../include/sycl/buffer.hpp:174:17: error: static assertion failed due to requirement 'is_device_copyable_v<multi_tensor_apply_kernel<TensorListMetadata<4>, AdamFunctor<c10::BFloat16>, float, float, float, float, float, float, adamMode_t, float>>': Underlying type of a buffer must be device copyable!
  174 |   static_assert(is_device_copyable_v<T>,
      |                 ^~~~~~~~~~~~~~~~~~~~~~~
/work2/09250/molang66/stampede3/miniconda3/envs/intel/lib/python3.9/site-packages/deepspeed/ops/csrc/xpu/adam/multi_tensor_apply.dp.hpp:187:34: note: in instantiation of template class 'sycl::buffer<multi_tensor_apply_kernel<TensorListMetadata<4>, AdamFunctor<c10::BFloat16>, float, float, float, float, float, float, adamMode_t, float>>' requested here
  187 |                     sycl::buffer params(const_cast<const decltype(capture)*>(&capture),
      |                                  ^
/work2/09250/molang66/stampede3/miniconda3/envs/intel/lib/python3.9/site-packages/deepspeed/ops/csrc/xpu/adam/multi_tensor_adam.dp.cpp:146:36: note: in instantiation of function template specialization 'multi_tensor_apply<4, AdamFunctor<c10::BFloat16>, float, float, float, float, float, float, adamMode_t, float>' requested here
  146 |                                    multi_tensor_apply<4>(BLOCK_SIZE,
      |                                    ^
In file included from /work2/09250/molang66/stampede3/miniconda3/envs/intel/lib/python3.9/site-packages/deepspeed/ops/csrc/xpu/adam/multi_tensor_adam.dp.cpp:18:
/work2/09250/molang66/stampede3/miniconda3/envs/intel/lib/python3.9/site-packages/deepspeed/ops/csrc/xpu/adam/multi_tensor_apply.dp.hpp:199:17: warning: expression result unused [-Wunused-value]
  199 |                 0;
      |                 ^
/work2/09250/molang66/stampede3/miniconda3/envs/intel/lib/python3.9/site-packages/deepspeed/ops/csrc/xpu/adam/multi_tensor_adam.dp.cpp:146:36: note: in instantiation of function template specialization 'multi_tensor_apply<4, AdamFunctor<c10::BFloat16>, float, float, float, float, float, float, adamMode_t, float>' requested here
  146 |                                    multi_tensor_apply<4>(BLOCK_SIZE,
      |                                    ^
8 warnings and 4 errors generated.
ninja: build stopped: subcommand failed.

Is this normal? Compilation worked fine with version 24.2.1.

Liangliang-Ma · 2024-11-15T08:05:22Z

ii intel-fw-gpu 2024.24.5-337-22.04
ii intel-level-zero-gpu-devel 1.5.9999.17145-embargo
ri level-zero 1.16.11
ii libigc-dev 1.0.17537.24-996-22.04
ii libze-intel-gpu-dev 24.35.30872.31-996-22.04
(Just paste some of them, you can compare the numbers)

And we suggest you keep using oneAPI 2024.2.1 with ipex2.3.110 because it currently matching each other for usage.

molang66 added bug Something isn't working training labels Nov 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG]Issue with Zero Optimization for Llama-2-7b Fine-Tuning on Intel GPUs #6713

[BUG]Issue with Zero Optimization for Llama-2-7b Fine-Tuning on Intel GPUs #6713

molang66 commented Nov 5, 2024

tjruwase commented Nov 5, 2024

Liangliang-Ma commented Nov 7, 2024

delock commented Nov 8, 2024

molang66 commented Nov 8, 2024

Liangliang-Ma commented Nov 11, 2024

molang66 commented Nov 11, 2024

Liangliang-Ma commented Nov 13, 2024

molang66 commented Nov 14, 2024

Liangliang-Ma commented Nov 15, 2024 •

edited

Loading

[BUG]Issue with Zero Optimization for Llama-2-7b Fine-Tuning on Intel GPUs #6713

[BUG]Issue with Zero Optimization for Llama-2-7b Fine-Tuning on Intel GPUs #6713

Comments

molang66 commented Nov 5, 2024

tjruwase commented Nov 5, 2024

Liangliang-Ma commented Nov 7, 2024

delock commented Nov 8, 2024

molang66 commented Nov 8, 2024

Liangliang-Ma commented Nov 11, 2024

molang66 commented Nov 11, 2024

Liangliang-Ma commented Nov 13, 2024

molang66 commented Nov 14, 2024

Liangliang-Ma commented Nov 15, 2024 • edited Loading

Liangliang-Ma commented Nov 15, 2024 •

edited

Loading