We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TGI Versions:
ERROR text_generation_launcher: Error when initializing model
ERROR text_generation_launcher: Method Prefill encountered an error
Both the following LoRA adapters exhibit the same issue across versions:
vineetsharma/qlora-adapter-Llama-2-7b-hf-TweetSumm
xtuner/Llama-2-7b-qlora-moss-003-sft
Attached is the error log archive.
This YAML configuration was deployed on a single A100-40GB node (a2-highgpu-2g) within GKE:
apiVersion: apps/v1 kind: Deployment metadata: name: tgi labels: app: tgi spec: selector: matchLabels: app: tgi template: metadata: labels: app: tgi ai.gke.io/inference-server: text-generation-inference examples.ai.gke.io/source: ai-on-gke-benchmarks spec: volumes: - name: dshm emptyDir: medium: Memory sizeLimit: 1Gi - name: data hostPath: path: /mnt/stateful_partition/kube-ephemeral-ssd/data containers: - name: text-generation-inference ports: - containerPort: 80 image: "ghcr.io/huggingface/text-generation-inference:2.4.0" # or "ghcr.io/huggingface/text-generation-inference:2.1.1" args: ["--model-id", "meta-llama/Llama-2-7b-hf", "--num-shard", "1", "--max-concurrent-requests", "128", "--lora-adapters", "vineetsharma/qlora-adapter-Llama-2-7b-hf-TweetSumm"] env: - name: HUGGING_FACE_HUB_TOKEN valueFrom: secretKeyRef: name: hf-token key: token resources: limits: nvidia.com/gpu: 1 requests: cpu: "1" volumeMounts: - mountPath: /dev/shm name: dshm - mountPath: /data name: data startupProbe: failureThreshold: 3600 periodSeconds: 10 timeoutSeconds: 60 exec: command: - /usr/bin/curl - http://localhost:80/generate - -X - POST - -d - '{"inputs":"test", "parameters":{"max_new_tokens":1, "adapter_id" : "vineetsharma/qlora-adapter-Llama-2-7b-hf-TweetSumm"}}' - -H - 'Content-Type: application/json' - --fail --- apiVersion: v1 kind: Service metadata: name: tgi labels: app: tgi spec: type: LoadBalancer ports: - port: 80 targetPort: 80 protocol: TCP selector: app: tgi
Both deployment and startup should succeed without errors, as the same LoRA adapters are known to work with vLLM and LoRAX.
vLLM
LoRAX
The text was updated successfully, but these errors were encountered:
No branches or pull requests
System Info
TGI Versions:
ERROR text_generation_launcher: Error when initializing model
ERROR text_generation_launcher: Method Prefill encountered an error
Both the following LoRA adapters exhibit the same issue across versions:
vineetsharma/qlora-adapter-Llama-2-7b-hf-TweetSumm
xtuner/Llama-2-7b-qlora-moss-003-sft
Attached is the error log archive.
Configuration Details
Deployment Setup
This YAML configuration was deployed on a single A100-40GB node (a2-highgpu-2g) within GKE:
Expected Behavior
Both deployment and startup should succeed without errors, as the same LoRA adapters are known to work with
vLLM
andLoRAX
.The text was updated successfully, but these errors were encountered: