Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix multinomial sampling #1228

Merged
merged 2 commits into from
Mar 3, 2024
Merged

Conversation

grimoire
Copy link
Collaborator

@grimoire grimoire commented Mar 2, 2024

No description provided.

@lvhan028
Copy link
Collaborator

lvhan028 commented Mar 2, 2024

pipeline test result:

falcon-7b tp=2 failed

2024-03-02 15:45:58,440 - lmdeploy - ERROR - rank[0] failed with error: CUDA out of memory. Tried to allocate 1.73 GiB. GPU 0 has a total capacty of 79.21 GiB of which 1.47 GiB is free. Process 140034 has 1016.00 MiB memory in use. Process 140106 has 76.75 GiB memory in use. Of the allocated memory 74.76 GiB is allocated by PyTorch, and 424.71 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
internlm-chat-7b,tp=2, repetition_penalty=1.002 failed
internlm-chat-20b,tp=2, repetition_penalty=1.002 failed
baichuan2/Baichuan2-7B-Chat, tp=2, repetition_penalty=1.002 failed
baichuan2/Baichuan2-13B-Chat,tp=2, repetition_penalty=1.002 failed
chatglm2-6b,tp=2, repetition_penalty=1.002 failed
chatglm3-6b,tp=2, repetition_penalty=1.002 failed
gemma,tp=2, repetition_penalty=1.002 failed

File "/workspace/lmdeploy/lmdeploy/pytorch/engine/logits_process.py", line 49, in _process_repetition_penalty
    scores.scatter_(1, input_ids, score)
RuntimeError: scatter(): Expected self.dtype to be equal to src.dtype
2024-03-02 15:56:07,688 - lmdeploy - ERROR - Engine main loop stopped.

Share pipeline test script:

from lmdeploy import pipeline, GenerationConfig, PytorchEngineConfig
import multiprocessing

models = [
    ('llama2', '/workspace/models-140/llama2/huggingface/llama-2-7b-chat/'),
    ('llama2', '/workspace/models-140/llama2/huggingface/llama-2-13b-chat/'),
    ('internlm2-chat-7b', '/workspace/models-140/InternLM/internlm2-chat-7b'),
    ('internlm2-chat-20b', '/workspace/models-140/InternLM/internlm2-chat-20b'),
    ('internlm-chat-7b', '/workspace/models-140/InternLM/internlm-chat-7b'),
    ('internlm-chat-20b', '/workspace/models-140/InternLM/internlm-chat-20b'),
    # ('qwen-7b', '/workspace/models-140/Qwen/Qwen-7B-Chat/'), # not supported yet
    # ('qwen-14b', '/workspace/models-140/Qwen/Qwen-14B-Chat/'), # not supported yet
    ('qwen1.5', '/workspace/models-140/Qwen/Qwen1.5-7B-Chat/'),
    # ('baichuan', '/workspace/models-140/baichuan/Baichuan-13B-Chat/'), # transformers 版本太高
    ('baichuan2', '/workspace/models-140/baichuan2/Baichuan2-7B-Chat/'),
    ('baichuan2', '/workspace/models-140/baichuan2/Baichuan2-13B-Chat/'),
    ('codellama', '/workspace/models-140/codellama/CodeLlama-7b-Instruct-hf/'),
    ('chatglm2', '/workspace/models-140/chatglm2-6b/'),
    ('chatglm3', '/workspace/models-140/chatglm3-6b/'),
    ('falcon', '/workspace/models-142/models/falcon-7b-instruct/'),
    ('yi', '/workspace/models-140/Yi/Yi-34B-Chat/'),
    ('mistral', '/workspace/models-140/mistralai/models--mistralai--Mistral-7B-Instruct-v0.1/snapshots/9ab9e76e2b09f9f29ea2d56aa5bd139e4445c59e'),
    ('deepseek', '/workspace/models-140/deepseek/deepseek-coder-1.3b-instruct'),
    ('mixtral', '/workspace/models-140/mistralai/Mixtral-8x7B-Instruct-v0.1/'),
    ('gemma', '/workspace/models-140/Gemma/gemma-7b-it')
]


def test_pipeline(model_path, prompts, **kwargs):
    print(f'-- start to test model: {model_path}')
    try:
        if kwargs:
            print(f'kwargs: {kwargs}')
            backend_config=PytorchEngineConfig()
            gen_config=GenerationConfig()
            for k, v in kwargs.items():
                if hasattr(backend_config, k):
                    setattr(backend_config, k, v)
                if hasattr(gen_config, k):
                    setattr(gen_config, k, v)
            print(backend_config)
        else:
            print(f'empty kwargs')
            backend_config=PytorchEngineConfig()
            gen_config = None
        pipe = pipeline(model_path, backend_config=backend_config, log_level='INFO')
        response = pipe(prompts, gen_config=gen_config)
        print(response)
        print(f'-- test successfully')
    except Exception as e:
        print(f'-- test model failed with {e}')
        raise(RuntimeError, 'build pipe failed')


if __name__ == '__main__':
    # pytorch engine default parameters
    for model_name, model_path in models:
        args = (model_path, ["Hi, pls intro yourself", "Shanghai is"], )
        if model_name == 'mixtral':
            # at least 2 GPUs are required
            continue
        proc = multiprocessing.Process(target=test_pipeline, args=args)
        proc.start()
        proc.join()

    # pytorch engine tp
    for _, model_path in models:
        args = (model_path, ["Hi, pls intro yourself", "Shanghai is"], )
        proc = multiprocessing.Process(target=test_pipeline, args=args, kwargs=dict(tp=2))
        proc.start()
        proc.join()

    # generate config
    for _, model_path in models:
        args = (model_path, ["Hi, pls intro yourself", "Shanghai is"], )
        proc = multiprocessing.Process(
            target=test_pipeline, 
            args=args, 
            kwargs=dict(tp=2,
                        top_k=40,
                        top_p=0.8,
                        temperature=0.6,
                        repetition_penalty=1.002)
            )
        proc.start()
        proc.join()

@grimoire
Copy link
Collaborator Author

grimoire commented Mar 3, 2024

falcon tp error would be fixed in other pr

@lvhan028 lvhan028 merged commit 79ac87b into InternLM:main Mar 3, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants