Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HuggingFace Endpoint Inference Model Deployer #86

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
466f638
Add skeleton for huggingface inference endpoint custom deployment
dudeperf3ct Jan 16, 2024
b7aa875
Improve typing and reuse variable for timeout
dudeperf3ct Jan 16, 2024
2c8bb2f
Update deployment step to use custom model deployer
dudeperf3ct Jan 16, 2024
70a6bca
Fix incorrect return type for step
dudeperf3ct Jan 16, 2024
1c3028b
Address PR review comments
dudeperf3ct Jan 16, 2024
8b949cb
Test run deployment pipeline
dudeperf3ct Jan 17, 2024
50feb19
Use latest deployment step and test the custom deployer
dudeperf3ct Jan 19, 2024
7fafffd
Refactor and extend model deployer implementation
dudeperf3ct Jan 21, 2024
4b01ed5
Merge 'main' branch into 'feature/zencoder-model-deployer' branch
dudeperf3ct Jan 21, 2024
8da2698
Update readme with new commands
dudeperf3ct Jan 21, 2024
53fd3ef
Edited config + changed step
htahir1 Jan 22, 2024
b1e206e
Test running the deployment pipeline
dudeperf3ct Jan 22, 2024
39a4a4e
Add logic for 'find_model_server' abstract function
dudeperf3ct Jan 23, 2024
455439a
Basic logic
htahir1 Jan 23, 2024
03d937a
Error handle and update find_model_server function
dudeperf3ct Jan 23, 2024
ba482cd
Update save_artifact function to use is_deployment_artifact
dudeperf3ct Jan 23, 2024
447377f
Update logic in find_model_server
dudeperf3ct Jan 23, 2024
37ef007
Modify HuggingFaceBaseConfig to contain optional fields
dudeperf3ct Jan 23, 2024
e1f9c34
Fetch metadata artifact and test
dudeperf3ct Jan 23, 2024
0abaa15
Update docstrings and handle circular condition when deprovision
dudeperf3ct Jan 24, 2024
0bef8b0
Address PR review comments
dudeperf3ct Jan 24, 2024
f517951
Update docstrings
dudeperf3ct Jan 24, 2024
df505af
Update comments
dudeperf3ct Jan 24, 2024
a552068
remove generate_random_letters function
dudeperf3ct Jan 24, 2024
1a93d39
Set default to False in deploy_model
dudeperf3ct Jan 24, 2024
c86b3cb
Fix bug in find_model_server
dudeperf3ct Jan 24, 2024
bd3bd90
Fix endpoint name when replacing the service
dudeperf3ct Jan 25, 2024
d8febd3
Add logger error message
dudeperf3ct Jan 25, 2024
6b9c448
Modify HuggingfaceModelDeployerConfig class
dudeperf3ct Jan 25, 2024
f305f4a
Fix bug in get_model_server_info function
dudeperf3ct Jan 25, 2024
fd12e00
Update syntax for fetching artifact
dudeperf3ct Jan 25, 2024
ec4f16a
Merge branch 'main' into feature/zencoder-huggingface-model-deployer
htahir1 Jan 26, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
47 changes: 43 additions & 4 deletions llm-finetuning/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -78,13 +78,51 @@ python run.py --training-pipeline --config finetune_gcp.yaml

# Deployment
python run.py --deployment-pipeline --config <NAME_OF_CONFIG_IN_CONFIGS_FOLDER>
python run.py --deployment-pipeline --config finetune_gcp.yaml
python run.py --deployment-pipeline --config deployment_a100.yaml
```

The `feature_engineering` and `deployment` pipeline can be run simply with the `default` stack, but the training pipelines [stack](https://docs.zenml.io/user-guide/production-guide/understand-stacks) will depend on the config.

The `deployment` pipelines relies on the `training_pipeline` to have run before.

## :cloud: Deployment

We have create a custom zenml model deployer for deploying models on the huggingface inference endpoint. The code for custom deployer is in [huggingface](./huggingface/) folder.

For running deployment pipeline, we create a custom zenml stack. As we are using a custom model deployer, we will have to register the flavor and model deployer. We update the stack to use this custom model deployer for running deployment pipeline.

```bash
zenml init
zenml stack register zencoder_hf_stack -o default -a default
zenml stack set zencoder_hf_stack
export HUGGINGFACE_USERNAME=<here>
export HUGGINGFACE_TOKEN=<here>
export NAMESPACE=<here>
zenml secret create huggingface_creds --username=$HUGGINGFACE_USERNAME --token=$HUGGINGFACE_TOKEN
zenml model-deployer flavor register huggingface.hf_model_deployer_flavor.HuggingFaceModelDeployerFlavor
```

Afterward, you should see the new flavor in the list of available flavors:

```bash
zenml model-deployer flavor list
```

Register model deployer component into the current stack

```bash
zenml model-deployer register hfendpoint --flavor=hfendpoint --token=$HUGGINGFACE_TOKEN --namespace=$NAMESPACE
zenml stack update zencoder_hf_stack -d hfendpoint
```

Run the deployment pipeline using the CLI:

```shell
# Deployment
python run.py --deployment-pipeline --config <NAME_OF_CONFIG_IN_CONFIGS_FOLDER>
python run.py --deployment-pipeline --config deployment_a100.yaml
```

## 🥇Recent developments

A working prototype has been trained and deployed as of Jan 19 2024. The model is using minimal data and finetuned using QLoRA and PEFT. The model was trained using 1 A100 GPU on the cloud:
Expand Down Expand Up @@ -114,16 +152,17 @@ This project recently did a [call of volunteers](https://www.linkedin.com/feed/u
- [x] Create a functioning training pipeline.
- [ ] Curate a set of 5-10 repositories that are using the ZenML latest syntax and use data generation pipeline to push dataset to HuggingFace.
- [ ] Create a Dockerfile for the training pipeline with all requirements installed including ZenML, torch, CUDA etc. CUrrently I am having trouble creating this in this [config file](configs/finetune_local.yaml). Probably might make sense to create a docker imag with the right CUDA and requirements including ZenML. See here: https://sdkdocs.zenml.io/0.54.0/integration_code_docs/integrations-aws/#zenml.integrations.aws.flavors.sagemaker_step_operator_flavor.SagemakerStepOperatorSettings

- [ ] Tests trained model on various metrics
- [ ] Create a custom [model deployer](https://docs.zenml.io/stacks-and-components/component-guide/model-deployers) that deploys a huggingface model from the hub to a huggingface inference endpoint. This would involve creating a [custom model deployer](https://docs.zenml.io/stacks-and-components/component-guide/model-deployers/custom) and editing the [deployment pipeline accordingly](pipelines/deployment.py)

## :bulb: More Applications

While the work here is solely based on the task of finetuning the model for the ZenML library, the pipeline can be changed with minimal effort to point to any set of repositories on GitHub. Theoretically, one could extend this work to point to proprietary codebases to learn from them for any use-case.

For example, see how [VMWare fine-tuned StarCoder to learn their style](https://octo.vmware.com/fine-tuning-starcoder-to-learn-vmwares-coding-style/).
For example, see how [VMWare fine-tuned StarCoder to learn their style](https://octo.vmware.com/fine-tuning-starcoder-to-learn-vmwares-coding-style/).

Also, make sure to join our <a href="https://zenml.io/slack" target="_blank">
<img width="15" src="https://cdn3.iconfinder.com/data/icons/logos-and-brands-adobe/512/306_Slack-512.png" alt="Slack"/>
<b>Slack Community</b>
</a> to become part of the ZenML family!
<b>Slack Community</b>
</a> to become part of the ZenML family!
37 changes: 19 additions & 18 deletions llm-finetuning/configs/deployment_a10.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -10,21 +10,22 @@ model:
steps:
deploy_model_to_hf_hub:
parameters:
framework: pytorch
task: text-generation
accelerator: gpu
vendor: aws
region: us-east-1
max_replica: 1
instance_size: xxlarge
instance_type: g5.12xlarge
namespace: zenml
custom_image:
health_route: /health
env:
MAX_BATCH_PREFILL_TOKENS: "2048"
MAX_INPUT_LENGTH: "1024"
MAX_TOTAL_TOKENS: "1512"
QUANTIZE: bitsandbytes
MODEL_ID: /repository
url: registry.internal.huggingface.tech/api-inference/community/text-generation-inference:sha-564f2a3
hf_endpoint_cfg:
framework: pytorch
task: text-generation
accelerator: gpu
vendor: aws
region: us-east-1
max_replica: 1
instance_size: xxlarge
instance_type: g5.12xlarge
namespace: zenml
custom_image:
health_route: /health
env:
MAX_BATCH_PREFILL_TOKENS: "2048"
MAX_INPUT_LENGTH: "1024"
MAX_TOTAL_TOKENS: "1512"
QUANTIZE: bitsandbytes
MODEL_ID: /repository
url: registry.internal.huggingface.tech/api-inference/community/text-generation-inference:sha-564f2a3
37 changes: 19 additions & 18 deletions llm-finetuning/configs/deployment_a100.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -10,21 +10,22 @@ model:
steps:
deploy_model_to_hf_hub:
parameters:
framework: pytorch
task: text-generation
accelerator: gpu
vendor: aws
region: us-east-1
max_replica: 1
instance_size: xlarge
instance_type: p4de
namespace: zenml
custom_image:
health_route: /health
env:
MAX_BATCH_PREFILL_TOKENS: "2048"
MAX_INPUT_LENGTH: "1024"
MAX_TOTAL_TOKENS: "1512"
QUANTIZE: bitsandbytes
MODEL_ID: /repository
url: registry.internal.huggingface.tech/api-inference/community/text-generation-inference:sha-564f2a3
hf_endpoint_cfg:
framework: pytorch
task: text-generation
accelerator: gpu
vendor: aws
region: us-east-1
max_replica: 1
instance_size: xlarge
instance_type: p4de
namespace: zenml
custom_image:
health_route: /health
env:
MAX_BATCH_PREFILL_TOKENS: "2048"
MAX_INPUT_LENGTH: "1024"
MAX_TOTAL_TOKENS: "1512"
QUANTIZE: bitsandbytes
MODEL_ID: /repository
url: registry.internal.huggingface.tech/api-inference/community/text-generation-inference:sha-564f2a3
37 changes: 19 additions & 18 deletions llm-finetuning/configs/deployment_t4.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -10,21 +10,22 @@ model:
steps:
deploy_model_to_hf_hub:
parameters:
framework: pytorch
task: text-generation
accelerator: gpu
vendor: aws
region: us-east-1
max_replica: 1
instance_size: large
instance_type: g4dn.12xlarge
namespace: zenml
custom_image:
health_route: /health
env:
MAX_BATCH_PREFILL_TOKENS: "2048"
MAX_INPUT_LENGTH: "1024"
MAX_TOTAL_TOKENS: "1512"
QUANTIZE: bitsandbytes
MODEL_ID: /repository
url: registry.internal.huggingface.tech/api-inference/community/text-generation-inference:sha-564f2a3
hf_endpoint_cfg:
framework: pytorch
task: text-generation
accelerator: gpu
vendor: aws
region: us-east-1
max_replica: 1
instance_size: large
instance_type: g4dn.12xlarge
namespace: zenml
custom_image:
health_route: /health
env:
MAX_BATCH_PREFILL_TOKENS: "2048"
MAX_INPUT_LENGTH: "1024"
MAX_TOTAL_TOKENS: "1512"
QUANTIZE: bitsandbytes
MODEL_ID: /repository
url: registry.internal.huggingface.tech/api-inference/community/text-generation-inference:sha-564f2a3
Empty file.
25 changes: 25 additions & 0 deletions llm-finetuning/huggingface/hf_deployment_base_config.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
from pydantic import BaseModel
from typing import Optional, Dict
from zenml.utils.secret_utils import SecretField


class HuggingFaceBaseConfig(BaseModel):
"""Huggingface Inference Endpoint configuration."""

endpoint_name: Optional[str] = ""
repository: Optional[str] = None
framework: Optional[str] = None
accelerator: Optional[str] = None
instance_size: Optional[str] = None
instance_type: Optional[str] = None
region: Optional[str] = None
vendor: Optional[str] = None
token: Optional[str] = None
account_id: Optional[str] = None
min_replica: Optional[int] = 0
max_replica: Optional[int] = 1
revision: Optional[str] = None
task: Optional[str] = None
custom_image: Optional[Dict] = None
namespace: Optional[str] = None
endpoint_type: str = "public"
Loading
Loading