Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LLVM ERROR: Cannot select: intrinsic %llvm.nvvm.shfl.sync.bfly.i32 #34

Open
jeisonmp opened this issue Jul 3, 2024 · 5 comments
Open

Comments

@jeisonmp
Copy link

jeisonmp commented Jul 3, 2024

When I run with gpt2 models, all its ok! But when I run with anyone those models exists ridger/MMfreeLM-370M, MMfreeLM-1.3B or MMfreeLM-2.7 this error occur.Why? Can anyone help me?

Error: LLVM ERROR: Cannot select: intrinsic %llvm.nvvm.shfl.sync.bfly.i32
[1] 93105 IOT instruction python3 generate_text.py

import os
os.environ["TOKENIZERS_PARALLELISM"] = "false"
import mmfreelm
from transformers import AutoModelForCausalLM, AutoTokenizer

# Nome do modelo pré-treinado
#name = 'ridger/MMfreeLM-370M'
name = 'ridger/MMfreeLM-1.3B'
#name = 'ridger/MMfreeLM-2.7B'
#name = 'openai-community/gpt2'

# # Carregar o tokenizador e o modelo
tokenizer = AutoTokenizer.from_pretrained(name)
model = AutoModelForCausalLM.from_pretrained(name).cuda().half()

# input_prompt = "In a shocking finding, scientist discovered a herd of unicorns living in a remote, "
# input_ids = tokenizer(input_prompt, return_tensors="pt").input_ids.cuda()
# outputs = model.generate(input_ids, max_length=32,  do_sample=True, top_p=0.4, temperature=0.6)
# print(tokenizer.batch_decode(outputs, skip_special_tokens=True)[0])

def generate_response(prompt):
    inputs = tokenizer(prompt, return_tensors="pt")
    input_ids = inputs.input_ids.cuda()
    attention_mask = inputs.attention_mask.cuda()
    outputs = model.generate(input_ids, attention_mask=attention_mask, max_length=32, do_sample=True, top_p=0.4, temperature=0.6, pad_token_id=tokenizer.eos_token_id)
    return tokenizer.batch_decode(outputs, skip_special_tokens=True)[0]

while True:
    prompt = input("Você: ")
    if prompt.lower() in ['exit', 'quit']:
        break
    response = generate_response(prompt)
    print(f"Modelo: {response}")

@jeisonmp
Copy link
Author

jeisonmp commented Jul 5, 2024

Do you know if run in WSL 2 win10?

@nevercast
Copy link

What NVIDIA GPU are you using?

@jeisonmp
Copy link
Author

Hi @nevercast ! Is NVIDIA GeForce GTX 1050

@nevercast
Copy link

Hi!

My immediate assumption is that the GTX 1050 does not have a compute version new enough to support this kernel function - I can validate this later for you, haven't had coffee yet.

If that is the case though, you might want to consider trying to run this project on a free Google Colab notebook with a T4 GPU attached.

@nevercast
Copy link

nevercast commented Jul 18, 2024

GTX 1050 is Compute version 6.1. Triton (which I believe this project uses) has dropped support for anything less than 7.0.

You may be able to get a build to work but you'd be own your own effort and against the grain so to speak.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants