Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
Host it on a infracloud helm repo.
  • Loading branch information
Sameer Kulkarni committed Sep 16, 2024
1 parent 9110825 commit 2528dec
Show file tree
Hide file tree
Showing 13 changed files with 609 additions and 0 deletions.
23 changes: 23 additions & 0 deletions charts/opea-tgi/.helmignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
# Patterns to ignore when building packages.
# This supports shell glob matching, relative path matching, and
# negation (prefixed with !). Only one pattern per line.
.DS_Store
# Common VCS dirs
.git/
.gitignore
.bzr/
.bzrignore
.hg/
.hgignore
.svn/
# Common backup files
*.swp
*.bak
*.tmp
*.orig
*~
# Various IDEs
.project
.idea/
*.tmproj
.vscode/
10 changes: 10 additions & 0 deletions charts/opea-tgi/Chart.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0

apiVersion: v2
name: opea-tgi
description: The Helm chart for HuggingFace Text Generation Inference Server
type: application
version: 1.0.0
# The HF TGI version
appVersion: "2.1.0"
51 changes: 51 additions & 0 deletions charts/opea-tgi/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
# tgi

Helm chart for deploying Hugging Face Text Generation Inference service.

## Installing the Chart

To install the chart, run the following:

```console
cd GenAIInfra/helm-charts/common
export MODELDIR=/mnt/opea-models
export MODELNAME="bigscience/bloom-560m"
export HFTOKEN="insert-your-huggingface-token-here"
helm install tgi tgi --set global.modelUseHostPath=${MODELDIR} --set LLM_MODEL_ID=${MODELNAME} --set global.HUGGINGFACEHUB_API_TOKEN=${HFTOKEN}
# To deploy on Gaudi enabled kubernetes cluster
# helm install tgi tgi --set global.modelUseHostPath=${MODELDIR} --set LLM_MODEL_ID=${MODELNAME} --set global.HUGGINGFACEHUB_API_TOKEN=${HFTOKEN} --values gaudi-values.yaml
```

By default, the tgi service will downloading the "bigscience/bloom-560m" which is about 1.1GB.

If you already cached the model locally, you can pass it to container like this example:

MODELDIR=/mnt/opea-models

MODELNAME="/data/models--bigscience--bloom-560m"

## Verify

To verify the installation, run the command `kubectl get pod` to make sure all pods are runinng.

Then run the command `kubectl port-forward svc/tgi 2080:80` to expose the tgi service for access.

Open another terminal and run the following command to verify the service if working:

```console
curl http://localhost:2080/generate \
-X POST \
-d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":17, "do_sample": true}}' \
-H 'Content-Type: application/json'
```

## Values

| Key | Type | Default | Description |
| ------------------------------- | ------ | ------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| LLM_MODEL_ID | string | `"bigscience/bloom-560m"` | Models id from https://huggingface.co/, or predownloaded model directory |
| global.HUGGINGFACEHUB_API_TOKEN | string | `insert-your-huggingface-token-here` | Hugging Face API token |
| global.modelUseHostPath | string | `"/mnt/opea-models"` | Cached models directory, tgi will not download if the model is cached here. The host path "modelUseHostPath" will be mounted to container as /data directory. Set this to null/empty will force it to download model. |
| image.repository | string | `"ghcr.io/huggingface/text-generation-inference"` | |
| image.tag | string | `"1.4"` | |
| horizontalPodAutoscaler.enabled | bool | false | Enable HPA autoscaling for the service deployment based on metrics it provides. See HPA section in ../../README.md before enabling! |
20 changes: 20 additions & 0 deletions charts/opea-tgi/gaudi-values.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0

# Default values for tgi.
# This is a YAML-formatted file.
# Declare variables to be passed into your templates.

accelDevice: "gaudi"

image:
repository: ghcr.io/huggingface/tgi-gaudi
tag: "2.0.1"

MAX_INPUT_LENGTH: "1024"
MAX_TOTAL_TOKENS: "2048"
CUDA_GRAPHS: ""

resources:
limits:
habana.ai/gaudi: 1
18 changes: 18 additions & 0 deletions charts/opea-tgi/nv-values.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0

# Default values for tgi.
# This is a YAML-formatted file.
# Declare variables to be passed into your templates.

accelDevice: "nvidia"

image:
repository: ghcr.io/huggingface/text-generation-inference
tag: "2.2.0"

resources:
limits:
nvidia.com/gpu: 1

CUDA_GRAPHS: ""
69 changes: 69 additions & 0 deletions charts/opea-tgi/templates/_helpers.tpl
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
{{/*
Expand the name of the chart.
*/}}
{{- define "tgi.name" -}}
{{- default .Chart.Name .Values.nameOverride | trunc 63 | trimSuffix "-" }}
{{- end }}

{{/*
Create a default fully qualified app name.
We truncate at 63 chars because some Kubernetes name fields are limited to this (by the DNS naming spec).
If release name contains chart name it will be used as a full name.
*/}}
{{- define "tgi.fullname" -}}
{{- if .Values.fullnameOverride }}
{{- .Values.fullnameOverride | trunc 63 | trimSuffix "-" }}
{{- else }}
{{- $name := default .Chart.Name .Values.nameOverride }}
{{- if contains $name .Release.Name }}
{{- .Release.Name | trunc 63 | trimSuffix "-" }}
{{- else }}
{{- printf "%s-%s" .Release.Name $name | trunc 63 | trimSuffix "-" }}
{{- end }}
{{- end }}
{{- end }}

{{/*
Create chart name and version as used by the chart label.
*/}}
{{- define "tgi.chart" -}}
{{- printf "%s-%s" .Chart.Name .Chart.Version | replace "+" "_" | trunc 63 | trimSuffix "-" }}
{{- end }}

{{/*
Convert chart name to a string suitable as metric prefix
*/}}
{{- define "tgi.metricPrefix" -}}
{{- include "tgi.fullname" . | replace "-" "_" | regexFind "[a-zA-Z_:][a-zA-Z0-9_:]*" }}
{{- end }}

{{/*
Common labels
*/}}
{{- define "tgi.labels" -}}
helm.sh/chart: {{ include "tgi.chart" . }}
{{ include "tgi.selectorLabels" . }}
{{- if .Chart.AppVersion }}
app.kubernetes.io/version: {{ .Chart.AppVersion | quote }}
{{- end }}
app.kubernetes.io/managed-by: {{ .Release.Service }}
{{- end }}

{{/*
Selector labels
*/}}
{{- define "tgi.selectorLabels" -}}
app.kubernetes.io/name: {{ include "tgi.name" . }}
app.kubernetes.io/instance: {{ .Release.Name }}
{{- end }}

{{/*
Create the name of the service account to use
*/}}
{{- define "tgi.serviceAccountName" -}}
{{- if .Values.serviceAccount.create }}
{{- default (include "tgi.fullname" .) .Values.serviceAccount.name }}
{{- else }}
{{- default "default" .Values.serviceAccount.name }}
{{- end }}
{{- end }}
31 changes: 31 additions & 0 deletions charts/opea-tgi/templates/configmap.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0

apiVersion: v1
kind: ConfigMap
metadata:
name: {{ include "tgi.fullname" . }}-config
labels:
{{- include "tgi.labels" . | nindent 4 }}
data:
MODEL_ID: {{ .Values.LLM_MODEL_ID | quote }}
PORT: {{ .Values.port | quote }}
HF_TOKEN: {{ .Values.global.HUGGINGFACEHUB_API_TOKEN | quote}}
{{- if .Values.global.HF_ENDPOINT }}
HF_ENDPOINT: {{ .Values.global.HF_ENDPOINT | quote}}
{{- end }}
http_proxy: {{ .Values.global.http_proxy | quote }}
https_proxy: {{ .Values.global.https_proxy | quote }}
no_proxy: {{ .Values.global.no_proxy | quote }}
HABANA_LOGS: "/tmp/habana_logs"
NUMBA_CACHE_DIR: "/tmp"
HF_HOME: "/tmp/.cache/huggingface"
{{- if .Values.MAX_INPUT_LENGTH }}
MAX_INPUT_LENGTH: {{ .Values.MAX_INPUT_LENGTH | quote }}
{{- end }}
{{- if .Values.MAX_TOTAL_TOKENS }}
MAX_TOTAL_TOKENS: {{ .Values.MAX_TOTAL_TOKENS | quote }}
{{- end }}
{{- if .Values.CUDA_GRAPHS }}
CUDA_GRAPHS: {{ .Values.CUDA_GRAPHS | quote }}
{{- end }}
124 changes: 124 additions & 0 deletions charts/opea-tgi/templates/deployment.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,124 @@
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0

apiVersion: apps/v1
kind: Deployment
metadata:
name: {{ include "tgi.fullname" . }}
labels:
{{- include "tgi.labels" . | nindent 4 }}
spec:
{{- if ne (int .Values.replicaCount) 1 }}
# remove if replica count should not be reset on pod update with HPA
replicas: {{ .Values.replicaCount }}
{{- end }}
selector:
matchLabels:
{{- include "tgi.selectorLabels" . | nindent 6 }}
template:
metadata:
{{- with .Values.podAnnotations }}
annotations:
{{- toYaml . | nindent 8 }}
{{- end }}
labels:
{{- include "tgi.selectorLabels" . | nindent 8 }}
spec:
{{- with .Values.imagePullSecrets }}
imagePullSecrets:
{{- toYaml . | nindent 8 }}
{{- end }}
securityContext:
{{- toYaml .Values.podSecurityContext | nindent 8 }}
containers:
- name: {{ .Chart.Name }}
envFrom:
- configMapRef:
name: {{ include "tgi.fullname" . }}-config
{{- if .Values.global.extraEnvConfig }}
- configMapRef:
name: {{ .Values.global.extraEnvConfig }}
optional: true
{{- end }}
securityContext:
{{- if .Values.global.modelUseHostPath }}
{}
{{- else }}
{{- toYaml .Values.securityContext | nindent 12 }}
{{- end }}
image: "{{ .Values.image.repository }}:{{ .Values.image.tag | default .Chart.AppVersion }}"
imagePullPolicy: {{ .Values.image.pullPolicy }}
{{- if .Values.extraCmdArgs }}
args:
{{- range .Values.extraCmdArgs }}
- {{ . | quote }}
{{- end }}
{{- end }}
volumeMounts:
- mountPath: /data
name: model-volume
- mountPath: /dev/shm
name: shm
- mountPath: /tmp
name: tmp
ports:
- name: http
containerPort: {{ .Values.port }}
protocol: TCP
{{- if .Values.livenessProbe }}
livenessProbe:
{{- toYaml .Values.livenessProbe | nindent 12 }}
{{- end }}
{{- if .Values.readinessProbe }}
readinessProbe:
{{- toYaml .Values.readinessProbe | nindent 12 }}
{{- end }}
{{- if .Values.startupProbe }}
startupProbe:
{{- toYaml .Values.startupProbe | nindent 12 }}
{{- end }}
resources:
{{- toYaml .Values.resources | nindent 12 }}
volumes:
- name: model-volume
{{- if .Values.global.modelUsePVC }}
persistentVolumeClaim:
claimName: {{ .Values.global.modelUsePVC }}
{{- else if .Values.global.modelUseHostPath }}
hostPath:
path: {{ .Values.global.modelUseHostPath }}
type: Directory
{{- else }}
emptyDir: {}
{{- end }}
- name: shm
emptyDir:
medium: Memory
sizeLimit: {{ .Values.shmSize }}
- name: tmp
emptyDir: {}
{{- with .Values.nodeSelector }}
nodeSelector:
{{- toYaml . | nindent 8 }}
{{- end }}
{{- with .Values.affinity }}
affinity:
{{- toYaml . | nindent 8 }}
{{- end }}
{{- with .Values.tolerations }}
tolerations:
{{- toYaml . | nindent 8 }}
{{- end }}
{{- if not .Values.accelDevice }}
# extra time to finish processing buffered requests on CPU before pod is forcibly terminated
terminationGracePeriodSeconds: 120
{{- end }}
{{- if .Values.evenly_distributed }}
topologySpreadConstraints:
- maxSkew: 1
topologyKey: kubernetes.io/hostname
whenUnsatisfiable: ScheduleAnyway
labelSelector:
matchLabels:
{{- include "tgi.selectorLabels" . | nindent 14 }}
{{- end }}
Loading

0 comments on commit 2528dec

Please sign in to comment.