Skip to content

opdev/aws-genai-demo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Employee Productivity GenAI Assistant Demo

Clone the repository Employee Producitivty Application

Setting up Hugging Face & MinIO

Prerequisites

Set up Hugging Face & Access to Meta Llama models

Set up a Hugging Face account and request access to the Meta Llama models. We will be using the meta-llama/Llama-3.1-8B-Instruct specifically but once you've gained access to the gated repo, you will gain access to all the models with meta-llama repository. You will need to put in your information and agree to the terms, usage, and licensing agreement and wait for your request to be approved to gain access to the repository. It may take a while, but you can check the status of your request in Profile > Settings > Gated Repositories

Create an access token and save it for later to be able to make requests to the llama model.

  • Select your profile image in the top right corner and at the bottom select Access Token
  • Select +Create Access Token
  • For the purpose of this project, all you need is a "Read" type access token
  • Give it any name and select Create Token

Set up RHOAI & NVIDIA GPU

Before you can serve a model in Red Hat OpenShift AI, you will need to install RHOAI and enable NVIDIA GPU by following these links:

This project uses MinIO to store the model:

  • Install the oc client to use MinIO for model storage

Quickstart

Open up Red Hat OpenShift AI by selecting it from OpenShift Application Launcher. This will open up Red Hat OpenShift AI in a new tab.

Create a Data Science project in Red Hat OpenShift AI window.

Setup MinIO

To setup MinIO, for storing the model, execute the following commands in a terminal/console:

# Login to OpenShift (if not already logged in)
oc login --token=<OCP_TOKEN>

# Install MinIO
MINIO_USER=<USERNAME> \
   MINIO_PASSWORD="<PASSWORD>" \
   envsubst < minio-setup/minio-setup.yml | \
   oc apply -f - -n <DATA_SCIENCE_PROJECT_CREATED_IN_PREVIOUS_STEP>
  • Set <USERNAME> and <PASSWORD> to some valid values, in the above command, before executing it

Once MinIO is setup, you can access it within your project. The yaml that was applied above creates these two routes:

  • minio-ui - for accessing the MinIO UI
  • minio-api - for API access to MinIO
    • Take note of the minio-api route location as that will be needed in next section.

Create workbench

To use RHOAI for this project, you need to create a workbench first. In the newly created data science project, create a new Workbench by clicking Create workbench button in the Workbenches tab.

When creating the workbench, add the following environment variables:

  • AWS_ACCESS_KEY_ID

    • MinIO user name
  • AWS_SECRET_ACCESS_KEY

    • MinIO password
  • AWS_S3_ENDPOINT

    • minio-api route location
  • AWS_S3_BUCKET

    • This bucket should either be existing or will be created by one of the Jupyter notebooks to upload the model
  • AWS_DEFAULT_REGION

    • Set it to us-east-1

    The environment variables can be added one by one, or all together by uploading a secret yaml file

    # Save your ENV values as base64 and save it in the secret yaml file located at workbench_env.yaml
    echo -n 'YOUR_AWS_ACCESS_KEY_ID' | base64
    echo -n 'YOUR_AWS_SECRET_ACCESS_KEY' | base64
    echo -n 'YOUR_AWS_DEFAULT_REGION' | base64
    echo -n 'YOUR_AWS_S3_ENDPOINT' | base64
    echo -n 'YOUR_AWS_S3_BUCKET' | base64
    

Use the following values for other fields:

  • Notebook image:
    • Image selection: PyTorch
    • Version selection: 2024.1
  • Deployment size:
    • Container size: Medium
    • Accelerator: NVIDIA GPU
    • Number of accelerators: 1
  • Cluster storage: 50GB

Create the workbench with above settings.

Create Data connection

Create a new data connection that can be used by the init-container (storage-initializer) to fetch the model uploaded in next step when deploying the model.

To create a Data connection, use the following steps:

  • Click on Add data connection button in the Data connections tab in your newly created project
  • Use the following values for this data connection:
    • Name: minio
    • Access key: value specified for AWS_ACCESS_KEY_ID field in Create Workbench section
    • Secret key: value specified for AWS_SECRET_ACCESS_KEY field in Create Workbench section
    • Endpoint: value specified for AWS_S3_ENDPOINT field in Create Workbench section
    • Access key: value specified for AWS_DEFAULT_REGION field in Create Workbench section
    • Bucket: value specified for AWS_S3_BUCKET field in Create Workbench section
  • Create the data connection by clicking on Add data connection button

Add Serving runtime

To run the Llama3.1 model in RHOAI, you will need to duplicate the vLLM ServingRuntime for KServe Serving runtime and edit it as well.

Follow these steps for duplicating and editing the above mentioned Serving runtime:

  • Expand Settings sidebar menu in RHOAI
  • Click on Serving runtimes in the expanded sidebar menu
  • Click the three dots at the end of vLLM ServingRuntime for KServe Serving runtime and select Duplicate
  • In the duplicated runtime:
    • Change the following to make them unique for your use-case:
      • metadata.annotations.openshift.io/display-name
      • metadata.name
    • Add the following argument to spec.containers.args property:
      • --max_model_len=4096
        • If you do not set the above argument then you will run into the following error when starting the model:
          ValueError: The model's max seq len (131072) is larger than the maximum number of tokens that can be stored in KV cache (28560). Try increasing `gpu_memory_utilization` or decreasing `max_model_len` when initializing the engine.
          
  • Click on Create button to create this new Serving runtime

You can read more about Model serving here

Deploy model

Once the initial notebook has run successfully and the data connection is created, you can deploy the model by following these steps:

  • In the RHOAI tab, select Models tab (for your newly created project) and click on Deploy model button
  • Fill in the following fields as described below:
    • Model name: <PROVIDE_a_name_for_the_model>
    • Serving runtime: NAME YOU GAVE TO THE DUPLICATED SERVING RUNTIME
    • Model framework: vLLM
    • Model server size: Small
    • Accelerator: NVIDIA GPU
    • Model route:
      • If you want to access this model endpoint from outside the cluster, make sure to check the Make deployed models available through an external route checkbox. By default the model endpoint is only available as an internal service.
    • Model location: Select Existing data connection option
      • Name: Name of data connection created in previous step
      • Path: models
  • Click on Deploy to deploy this model

Copy the inference endpoint once the model is deployed successfully (it will take a few minutes to deploy the model).

Deploy and Test the application in Jupyter Notebooks

  1. In your Data Science Project, select the Workbench tab and select Open to launch the Jupyter Notebook.

  2. Select the upload button and upload the rhoai project folder to Jupyter Notebook.

  3. Once uploaded in Jupyter, use the navigation side panel to select the file rhoai-upload_model_to_minio.ipynb

  4. Run all the cells in rhoai-upload_model_to_minio.ipynb until it has finished uploading the model to MinIO.

    Note: You will have to input and login using the Hugging Face access token you saved above in the previous step.

  5. After the model is finished uploading to RHOAI, navigate to rhoai/rhoai-Llama-3.1-8B-Instruct.ipynb and replace each the values for inference_endpoint and base_url with your own url.

  6. Run each cell. You can adjust the prompt questions or create new ones as you see fit.

    invoke_model(chat_chain, "YOUR_QUESTION")

Setting up AWS Employee Productivity Application

Disclaimer: This application is a currently a chat only enabled version of the AWS Employee Productivty Application.

  1. Follow along with the README and install all the require prerequisites.

  2. In the the app.py, located in /appui/backend/src/websocket/chat/app.py

    Change the ChatOpenAI model base_url to the inference endpoint you saved in the step above and add "/v1" to the end of the url

     # Configure the ChatOpenAI model
        chat_model = ChatOpenAI(
            model = "llama3",
            temperature = "0.1",
            base_url = "https://example.com/v1",
            api_key = "YOUR_API_KEY",
        )
    
  3. Deploy your application. For this exact project we will deploy it locally. Run the command in your terminal in the project appui folder

    ./deploy.sh --region=your-aws-region --email=your-email
    
  4. Follow the link provided. Login in with your credentials and create a new password if prompted. Navigate to the Chat tab and chat away.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published