Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to train or ft? #23

Open
dongjicheng opened this issue Jun 18, 2024 · 4 comments
Open

how to train or ft? #23

dongjicheng opened this issue Jun 18, 2024 · 4 comments

Comments

@dongjicheng
Copy link

No description provided.

@dongjicheng
Copy link
Author

0%| | 0/1250 [00:00<?, ?it/s]loc("/mnt/anaconda3/envs/tf2/lib/python3.10/site-packages/mmfreelm-0.1-py3.10.egg/mmfreelm/ops/hgrn/recurrent_fuse.py":105:22): error: 'arith.addf' op requires the same encoding for all operands and results
Traceback (most recent call last):
File "/mnt/jicheng/uniem-main/mmfree/match_entity_number_mmfree.py", line 325, in
loss.backward()
File "/mnt/anaconda3/envs/tf2/lib/python3.10/site-packages/torch/_tensor.py", line 492, in backward
torch.autograd.backward(
File "/mnt/anaconda3/envs/tf2/lib/python3.10/site-packages/torch/autograd/init.py", line 251, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
File "/mnt/anaconda3/envs/tf2/lib/python3.10/site-packages/torch/autograd/function.py", line 288, in apply
return user_fn(self, *args)
File "/mnt/anaconda3/envs/tf2/lib/python3.10/site-packages/mmfreelm-0.1-py3.10.egg/mmfreelm/utils.py", line 9, in wrapper
File "/mnt/anaconda3/envs/tf2/lib/python3.10/site-packages/mmfreelm-0.1-py3.10.egg/mmfreelm/ops/hgrn/recurrent_fuse.py", line 167, in backward
File "/mnt/anaconda3/envs/tf2/lib/python3.10/site-packages/triton/runtime/autotuner.py", line 100, in run
timings = {config: self._bench(*args, config=config, **kwargs)
File "/mnt/anaconda3/envs/tf2/lib/python3.10/site-packages/triton/runtime/autotuner.py", line 100, in
timings = {config: self._bench(*args, config=config, **kwargs)
File "/mnt/anaconda3/envs/tf2/lib/python3.10/site-packages/triton/runtime/autotuner.py", line 83, in _bench
return do_bench(kernel_call, warmup=self.warmup, rep=self.rep, quantiles=(0.5, 0.2, 0.8))
File "/mnt/anaconda3/envs/tf2/lib/python3.10/site-packages/triton/testing.py", line 104, in do_bench
fn()
File "/mnt/anaconda3/envs/tf2/lib/python3.10/site-packages/triton/runtime/autotuner.py", line 81, in kernel_call
self.fn.run(*args, num_warps=config.num_warps, num_stages=config.num_stages, **current)
File "", line 63, in fused_recurrent_hgrn_bwd_kernel
File "/mnt/anaconda3/envs/tf2/lib/python3.10/site-packages/triton/compiler/compiler.py", line 476, in compile
next_module = compile_kernel(module)
File "/mnt/anaconda3/envs/tf2/lib/python3.10/site-packages/triton/compiler/compiler.py", line 383, in
lambda src: optimize_ttgir(ttir_to_ttgir(src, num_warps), num_stages, arch))
File "/mnt/anaconda3/envs/tf2/lib/python3.10/site-packages/triton/compiler/compiler.py", line 91, in optimize_ttgir
pm.run(mod)
RuntimeError: PassManager::run failed
0%|

@ridgerchu
Copy link
Owner

Hi, it seems that the triton compiling process failed, are you using CUDA devices to run it?

@hsb1995
Copy link

hsb1995 commented Jun 27, 2024

@dongjicheng @ridgerchu Which python file is executed first? What is this parameter set to? Would you be so kind as to say?
Because when I look at the code all I see is a built-in module, there is only a "setup" file and a "generate" file. These two files are not working. I see that you are inquiring about fine-tuning and pre-training, so I would like to ask.

@hsb1995
Copy link

hsb1995 commented Jun 27, 2024

@ridgerchu As this project is highly relevant to my research topic, I would like to consult as much as possible. I would like to reproduce it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants