Skip to content

Commit

Permalink
langchain_huggingface: Fix multiple GPU usage bug in from_model_id fu…
Browse files Browse the repository at this point in the history
…nction (#23628)

- [ ]  **Description:**   
   - pass the device_map into model_kwargs 
- removing the unused device_map variable in the hf_pipeline function
call
- [ ] **Issue:** issue #13128 
When using the from_model_id function to load a Hugging Face model for
text generation across multiple GPUs, the model defaults to loading on
the CPU despite multiple GPUs being available using the expected format
``` python
llm = HuggingFacePipeline.from_model_id(
    model_id="model-id",
    task="text-generation",
    device_map="auto",
)
```
Currently, to enable multiple GPU , we have to pass in variable in this
format instead
``` python
llm = HuggingFacePipeline.from_model_id(
    model_id="model-id",
    task="text-generation",
    device=None,
    model_kwargs={
        "device_map": "auto",
    }
)
```
This issue arises due to improper handling of the device and device_map
parameters.

- [ ] **Explanation:**
1. In from_model_id, the model is created using model_kwargs and passed
as the model variable of the pipeline function. So at this moment, to
load the model with multiple GPUs, "device_map" needs to be set to
"auto" within model_kwargs. Otherwise, the model defaults to loading on
the CPU.
2. The device_map variable in from_model_id is not utilized correctly.
In the pipeline function's source code of tnansformer:
- The device_map variable is stored in the model_kwargs dictionary
(lines 867-878 of transformers/src/transformers/pipelines/\__init__.py).
```python
    if device_map is not None:
        ......
        model_kwargs["device_map"] = device_map
```
- The model is constructed with model_kwargs containing the device_map
value ONLY IF it is a string (lines 893-903 of
transformers/src/transformers/pipelines/\__init__.py).
```python
    if isinstance(model, str) or framework is None:
        model_classes = {"tf": targeted_task["tf"], "pt": targeted_task["pt"]}
        framework, model = infer_framework_load_model( ... , **model_kwargs, )
```
- Consequently, since a model object is already passed to the pipeline
function, the device_map variable from from_model_id is never used.

3. The device_map variable in from_model_id not only appears unused but
also causes errors. Without explicitly setting device=None, attempting
to load the model on multiple GPUs may result in the following error:
 ```
Device has 2 GPUs available. Provide device={deviceId} to
`from_model_id` to use available GPUs for execution. deviceId is -1
(default) for CPU and can be a positive integer associated with CUDA
device id.
  Traceback (most recent call last):
    File "foo.py", line 15, in <module>
      llm = HuggingFacePipeline.from_model_id(
File
"foo\site-packages\langchain_huggingface\llms\huggingface_pipeline.py",
line 217, in from_model_id
      pipeline = hf_pipeline(
File "foo\lib\site-packages\transformers\pipelines\__init__.py", line
1108, in pipeline
return pipeline_class(model=model, framework=framework, task=task,
**kwargs)
File "foo\lib\site-packages\transformers\pipelines\text_generation.py",
line 96, in __init__
      super().__init__(*args, **kwargs)
File "foo\lib\site-packages\transformers\pipelines\base.py", line 835,
in __init__
      raise ValueError(
ValueError: The model has been loaded with `accelerate` and therefore
cannot be moved to a specific device. Please discard the `device`
argument when creating your pipeline object.
```
This error occurs because, in from_model_id, the default values in from_model_id for device and device_map are -1 and None, respectively. It would passes the statement (`device_map is not None and device < 0`) and keep the device as -1 so the pipeline function later raises an error when trying to move a GPU-loaded model back to the CPU. 
https://github.com/langchain-ai/langchain/blob/19eb82e68bbba12f8c7cde01aba1ee92dfae2ec2/libs/community/langchain_community/llms/huggingface_pipeline.py#L204-L213




If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.

---------

Co-authored-by: William FH <[email protected]>
Co-authored-by: Erick Friis <[email protected]>
Co-authored-by: vbarda <[email protected]>
  • Loading branch information
4 people authored Oct 23, 2024
1 parent 031d0e4 commit 6d2a76a
Showing 1 changed file with 16 additions and 3 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,7 @@ def from_model_id(
model_id: str,
task: str,
backend: str = "default",
device: Optional[int] = -1,
device: Optional[int] = None,
device_map: Optional[str] = None,
model_kwargs: Optional[dict] = None,
pipeline_kwargs: Optional[dict] = None,
Expand All @@ -96,7 +96,21 @@ def from_model_id(
"Please install it with `pip install transformers`."
)

_model_kwargs = model_kwargs or {}
_model_kwargs = model_kwargs.copy() if model_kwargs else {}
if device_map is not None:
if device is not None:
raise ValueError(
"Both `device` and `device_map` are specified. "
"`device` will override `device_map`. "
"You will most likely encounter unexpected behavior."
"Please remove `device` and keep "
"`device_map`."
)

if "device_map" in _model_kwargs:
raise ValueError("`device_map` is already specified in `model_kwargs`.")

_model_kwargs["device_map"] = device_map
tokenizer = AutoTokenizer.from_pretrained(model_id, **_model_kwargs)

try:
Expand Down Expand Up @@ -218,7 +232,6 @@ def from_model_id(
model=model,
tokenizer=tokenizer,
device=device,
device_map=device_map,
batch_size=batch_size,
model_kwargs=_model_kwargs,
**_pipeline_kwargs,
Expand Down

0 comments on commit 6d2a76a

Please sign in to comment.