Clone the repository Employee Producitivty Application
Set up a Hugging Face account and request access to the Meta Llama models.
We will be using the meta-llama/Llama-3.1-8B-Instruct
specifically but once you've gained access to the gated repo, you will gain access to all the models with meta-llama repository.
You will need to put in your information and agree to the terms, usage, and licensing agreement and wait for your request to be approved to gain access to the repository.
It may take a while, but you can check the status of your request in Profile > Settings > Gated Repositories
Create an access token and save it for later to be able to make requests to the llama model.
- Select your profile image in the top right corner and at the bottom select
Access Token
- Select
+Create Access Token
- For the purpose of this project, all you need is a "Read" type access token
- Give it any name and select
Create Token
Before you can serve a model in Red Hat OpenShift AI, you will need to install RHOAI and enable NVIDIA GPU by following these links:
This project uses MinIO
to store the model:
- Install the
oc
client to use MinIO for model storage
Open up Red Hat OpenShift AI
by selecting it from OpenShift Application Launcher.
This will open up Red Hat OpenShift AI
in a new tab.
Create a Data Science project in Red Hat OpenShift AI
window.
To setup MinIO, for storing the model, execute the following commands in a terminal/console:
# Login to OpenShift (if not already logged in)
oc login --token=<OCP_TOKEN>
# Install MinIO
MINIO_USER=<USERNAME> \
MINIO_PASSWORD="<PASSWORD>" \
envsubst < minio-setup/minio-setup.yml | \
oc apply -f - -n <DATA_SCIENCE_PROJECT_CREATED_IN_PREVIOUS_STEP>
- Set
<USERNAME>
and<PASSWORD>
to some valid values, in the above command, before executing it
Once MinIO is setup, you can access it within your project. The yaml that was applied above creates these two routes:
minio-ui
- for accessing the MinIO UIminio-api
- for API access to MinIO- Take note of the
minio-api
route location as that will be needed in next section.
- Take note of the
To use RHOAI for this project, you need to create a workbench first. In the newly created data science project, create a new Workbench by clicking Create workbench
button in the Workbenches
tab.
When creating the workbench, add the following environment variables:
-
AWS_ACCESS_KEY_ID
- MinIO user name
-
AWS_SECRET_ACCESS_KEY
- MinIO password
-
AWS_S3_ENDPOINT
minio-api
route location
-
AWS_S3_BUCKET
- This bucket should either be existing or will be created by one of the Jupyter notebooks to upload the model
-
AWS_DEFAULT_REGION
- Set it to
us-east-1
The environment variables can be added one by one, or all together by uploading a secret yaml file
# Save your ENV values as base64 and save it in the secret yaml file located at workbench_env.yaml echo -n 'YOUR_AWS_ACCESS_KEY_ID' | base64 echo -n 'YOUR_AWS_SECRET_ACCESS_KEY' | base64 echo -n 'YOUR_AWS_DEFAULT_REGION' | base64 echo -n 'YOUR_AWS_S3_ENDPOINT' | base64 echo -n 'YOUR_AWS_S3_BUCKET' | base64
- Set it to
Use the following values for other fields:
- Notebook image:
- Image selection: PyTorch
- Version selection: 2024.1
- Deployment size:
- Container size: Medium
- Accelerator: NVIDIA GPU
- Number of accelerators: 1
- Cluster storage: 50GB
Create the workbench with above settings.
Create a new data connection that can be used by the init-container (storage-initializer
) to fetch the model uploaded in next step when deploying the model.
To create a Data connection, use the following steps:
- Click on
Add data connection
button in theData connections
tab in your newly created project - Use the following values for this data connection:
- Name:
minio
- Access key: value specified for
AWS_ACCESS_KEY_ID
field inCreate Workbench
section - Secret key: value specified for
AWS_SECRET_ACCESS_KEY
field inCreate Workbench
section - Endpoint: value specified for
AWS_S3_ENDPOINT
field inCreate Workbench
section - Access key: value specified for
AWS_DEFAULT_REGION
field inCreate Workbench
section - Bucket: value specified for
AWS_S3_BUCKET
field inCreate Workbench
section
- Name:
- Create the data connection by clicking on
Add data connection
button
To run the Llama3.1 model in RHOAI, you will need to duplicate the vLLM ServingRuntime for KServe
Serving runtime and edit it as well.
Follow these steps for duplicating and editing the above mentioned Serving runtime:
- Expand
Settings
sidebar menu in RHOAI - Click on
Serving runtimes
in the expanded sidebar menu - Click the three dots at the end of
vLLM ServingRuntime for KServe
Serving runtime and selectDuplicate
- In the duplicated runtime:
- Change the following to make them unique for your use-case:
metadata.annotations.openshift.io/display-name
metadata.name
- Add the following argument to
spec.containers.args
property:--max_model_len=4096
- If you do not set the above argument then you will run into the following error when starting the model:
ValueError: The model's max seq len (131072) is larger than the maximum number of tokens that can be stored in KV cache (28560). Try increasing `gpu_memory_utilization` or decreasing `max_model_len` when initializing the engine.
- If you do not set the above argument then you will run into the following error when starting the model:
- Change the following to make them unique for your use-case:
- Click on
Create
button to create this new Serving runtime
You can read more about Model serving here
Once the initial notebook has run successfully and the data connection is created, you can deploy the model by following these steps:
- In the RHOAI tab, select
Models
tab (for your newly created project) and click onDeploy model
button - Fill in the following fields as described below:
- Model name: <PROVIDE_a_name_for_the_model>
- Serving runtime: NAME YOU GAVE TO THE DUPLICATED SERVING RUNTIME
- Model framework: vLLM
- Model server size: Small
- Accelerator: NVIDIA GPU
- Model route:
- If you want to access this model endpoint from outside the cluster, make sure to check the
Make deployed models available through an external route
checkbox. By default the model endpoint is only available as an internal service.
- If you want to access this model endpoint from outside the cluster, make sure to check the
- Model location: Select Existing data connection option
- Name: Name of data connection created in previous step
- Path: models
- Click on
Deploy
to deploy this model
Copy the inference endpoint
once the model is deployed successfully (it will take a few minutes to deploy the model).
-
In your Data Science Project, select the Workbench tab and select
Open
to launch the Jupyter Notebook. -
Select the upload button and upload the
rhoai
project folder to Jupyter Notebook. -
Once uploaded in Jupyter, use the navigation side panel to select the file
rhoai-upload_model_to_minio.ipynb
-
Run all the cells in
rhoai-upload_model_to_minio.ipynb
until it has finished uploading the model to MinIO.Note: You will have to input and login using the Hugging Face access token you saved above in the previous step.
-
After the model is finished uploading to RHOAI, navigate to
rhoai/rhoai-Llama-3.1-8B-Instruct.ipynb
and replace each the values for inference_endpoint and base_url with your own url. -
Run each cell. You can adjust the prompt questions or create new ones as you see fit.
invoke_model(chat_chain, "YOUR_QUESTION")
Disclaimer: This application is a currently a chat only enabled version of the AWS Employee Productivty Application.
-
Follow along with the README and install all the require prerequisites.
-
In the the app.py, located in /appui/backend/src/websocket/chat/app.py
Change the ChatOpenAI model base_url to the inference endpoint you saved in the step above and add "/v1" to the end of the url
# Configure the ChatOpenAI model chat_model = ChatOpenAI( model = "llama3", temperature = "0.1", base_url = "https://example.com/v1", api_key = "YOUR_API_KEY", )
-
Deploy your application. For this exact project we will deploy it locally. Run the command in your terminal in the project
appui
folder./deploy.sh --region=your-aws-region --email=your-email
-
Follow the link provided. Login in with your credentials and create a new password if prompted. Navigate to the Chat tab and chat away.