Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]vllm volume-expand ops is always Running and sts is pending #5441

Closed
ahjing99 opened this issue Oct 13, 2023 · 1 comment
Closed

[BUG]vllm volume-expand ops is always Running and sts is pending #5441

ahjing99 opened this issue Oct 13, 2023 · 1 comment
Assignees
Labels
bug kind/bug Something isn't working severity/major Great chance user will encounter the same problem
Milestone

Comments

@ahjing99
Copy link
Collaborator

➜ ~ kbcli version
Kubernetes: v1.27.3-gke.100
KubeBlocks: 0.7.0-beta.3
kbcli: 0.7.0-beta.3

  1. Create cluster and hscale

      `helm repo add kubeblocks-kbcli  https://jihulab.com/api/v4/projects/150246/packages/helm/stable`

"kubeblocks-kbcli" already exists with the same configuration, skipping

      `helm repo update kubeblocks-kbcli `

Hang tight while we grab the latest from your chart repositories...
...Successfully got an update from the "kubeblocks-kbcli" chart repository
Update Complete. ⎈Happy Helming!⎈

      `helm upgrade --install vllm kubeblocks-kbcli/vllm --version 0.7.0-beta.3 `

Release "vllm" does not exist. Installing it now.
NAME: vllm
LAST DEPLOYED: Fri Oct 13 16:20:23 2023
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE: None


      `kbcli cluster create  vllm-xtkrze             --termination-policy=DoNotTerminate             --monitoring-interval=0 --enable-all-logs=false --cluster-definition=vllm --cluster-version=vllm-vicuna-7b --set cpu=500m,memory=1Gi,replicas=1,storage=20Gi  --namespace default `

Cluster vllm-xtkrze created


      `kbcli cluster hscale vllm-xtkrze --auto-approve --components vllm --replicas 2 --namespace default `

OpsRequest vllm-xtkrze-horizontalscaling-kmcq9 created successfully, you can view the progress:
	kbcli cluster describe-ops vllm-xtkrze-horizontalscaling-kmcq9 -n default


      `kbcli cluster list-instances vllm-xtkrze --namespace default `

NAME                 NAMESPACE   CLUSTER       COMPONENT   STATUS    ROLE     ACCESSMODE   AZ              CPU(REQUEST/LIMIT)   MEMORY(REQUEST/LIMIT)   STORAGE     NODE                                                CREATED-TIME
vllm-xtkrze-vllm-0   default     vllm-xtkrze   vllm        Running   <none>   <none>       us-central1-c   500m / 500m          1Gi / 1Gi               data:20Gi   gke-yjtest-default-pool-c51609d3-ss98/10.128.0.46   Oct 13,2023 16:21 UTC+0800
vllm-xtkrze-vllm-1   default     vllm-xtkrze   vllm        Running   <none>   <none>       us-central1-c   500m / 500m          1Gi / 1Gi               data:20Gi   gke-yjtest-default-pool-c51609d3-jp0w/10.128.0.45   Oct 13,2023 16:28 UTC+0800
  1. volumeexpand pending

      `kbcli cluster volume-expand vllm-xtkrze --auto-approve                 --components vllm                 --volume-claim-templates data                 --storage 23Gi --namespace default `

OpsRequest vllm-xtkrze-volumeexpansion-5vhc5 created successfully, you can view the progress:
	kbcli cluster describe-ops vllm-xtkrze-volumeexpansion-5vhc5 -n default

➜  ~ kbcli cluster describe vllm-xtkrze
Name: vllm-xtkrze	 Created Time: Oct 13,2023 16:21 UTC+0800
NAMESPACE   CLUSTER-DEFINITION   VERSION          STATUS     TERMINATION-POLICY
default     vllm                 vllm-vicuna-7b   Updating   DoNotTerminate

Endpoints:
COMPONENT   MODE        INTERNAL                                          EXTERNAL
vllm        ReadWrite   vllm-xtkrze-vllm.default.svc.cluster.local:8000   <none>

Topology:
COMPONENT   INSTANCE             ROLE     STATUS    AZ              NODE                                                CREATED-TIME
vllm        vllm-xtkrze-vllm-0   <none>   Running   us-central1-c   gke-yjtest-default-pool-c51609d3-ss98/10.128.0.46   Oct 13,2023 16:21 UTC+0800
vllm        vllm-xtkrze-vllm-1   <none>   Running   us-central1-c   gke-yjtest-default-pool-c51609d3-jp0w/10.128.0.45   Oct 13,2023 16:28 UTC+0800

Resources Allocation:
COMPONENT   DEDICATED   CPU(REQUEST/LIMIT)   MEMORY(REQUEST/LIMIT)   STORAGE-SIZE   STORAGE-CLASS
vllm        false       500m / 500m          1Gi / 1Gi               data:23Gi      kb-default-sc

Images:
COMPONENT   TYPE   IMAGE
vllm        vllm   docker.io/apecloud/vllm:latest-amd64

Show cluster events: kbcli cluster list-events -n default vllm-xtkrze
➜  ~ k get pv | grep llm
pvc-aeee536a-6378-487e-a1b6-1597c9714197   23Gi       RWO            Delete           Bound      default/data-vllm-xtkrze-vllm-1                kb-default-sc                              21m
pvc-f0772c79-46d1-4f70-bca6-d389673045be   23Gi       RWO            Delete           Bound      default/data-vllm-xtkrze-vllm-0                kb-default-sc                              28m
➜  ~ k get pvc | grep llm
data-vllm-xtkrze-vllm-0                Bound    pvc-f0772c79-46d1-4f70-bca6-d389673045be   20Gi       RWO            kb-default-sc                     28m
data-vllm-xtkrze-vllm-1                Bound    pvc-aeee536a-6378-487e-a1b6-1597c9714197   20Gi       RWO            kb-default-sc                     21m

➜  ~ k get sts vllm-xtkrze-vllm -o yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
  annotations:
    config.kubeblocks.io/tpl-vllm-scripts: vllm-xtkrze-vllm-vllm-scripts
    kubeblocks.io/generation: "1"
  creationTimestamp: "2023-10-13T08:21:20Z"
  finalizers:
  - rsm.workloads.kubeblocks.io/finalizer
  generation: 2
  labels:
    rsm.workloads.kubeblocks.io/controller-generation: "2"
    workloads.kubeblocks.io/instance: vllm-xtkrze-vllm
    workloads.kubeblocks.io/managed-by: ReplicatedStateMachine
  name: vllm-xtkrze-vllm
  namespace: default
  ownerReferences:
  - apiVersion: workloads.kubeblocks.io/v1alpha1
    blockOwnerDeletion: true
    controller: true
    kind: ReplicatedStateMachine
    name: vllm-xtkrze-vllm
    uid: 86747c7d-dee1-498a-bbd2-b464a4b27a86
  resourceVersion: "3863330"
  uid: 197d7a99-2a97-443c-bc57-4693f5d3920b
spec:
  persistentVolumeClaimRetentionPolicy:
    whenDeleted: Retain
    whenScaled: Retain
  podManagementPolicy: OrderedReady
  replicas: 2
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app.kubernetes.io/instance: vllm-xtkrze
      app.kubernetes.io/managed-by: kubeblocks
      app.kubernetes.io/name: vllm
      apps.kubeblocks.io/component-name: vllm
  serviceName: vllm-xtkrze-vllm-headless
  template:
    metadata:
      creationTimestamp: null
      labels:
        app.kubernetes.io/component: vllm
        app.kubernetes.io/instance: vllm-xtkrze
        app.kubernetes.io/managed-by: kubeblocks
        app.kubernetes.io/name: vllm
        app.kubernetes.io/version: vllm-vicuna-7b
        apps.kubeblocks.io/component-name: vllm
        apps.kubeblocks.io/workload-type: Stateful
    spec:
      affinity:
        nodeAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - preference:
              matchExpressions:
              - key: kb-data
                operator: In
                values:
                - "true"
            weight: 100
        podAntiAffinity: {}
      containers:
      - command:
        - /scripts/start.sh
        env:
        - name: KB_POD_NAME
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: metadata.name
        - name: KB_POD_UID
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: metadata.uid
        - name: KB_NAMESPACE
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
        - name: KB_SA_NAME
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: spec.serviceAccountName
        - name: KB_NODENAME
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: spec.nodeName
        - name: KB_HOST_IP
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: status.hostIP
        - name: KB_POD_IP
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: status.podIP
        - name: KB_POD_IPS
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: status.podIPs
        - name: KB_HOSTIP
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: status.hostIP
        - name: KB_PODIP
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: status.podIP
        - name: KB_PODIPS
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: status.podIPs
        - name: KB_CLUSTER_NAME
          value: vllm-xtkrze
        - name: KB_COMP_NAME
          value: vllm
        - name: KB_CLUSTER_COMP_NAME
          value: vllm-xtkrze-vllm
        - name: KB_CLUSTER_UID_POSTFIX_8
          value: 4dbc28c4
        - name: KB_POD_FQDN
          value: $(KB_POD_NAME).$(KB_CLUSTER_COMP_NAME)-headless.$(KB_NAMESPACE).svc
        - name: MODEL_NAME
          value: lmsys/vicuna-7b-v1.5
        - name: EXTRA_ARGS
          value: --trust-remote-code
        envFrom:
        - configMapRef:
            name: vllm-xtkrze-vllm-env
        - configMapRef:
            name: vllm-xtkrze-vllm-rsm-env
            optional: false
        image: docker.io/apecloud/vllm:latest-amd64
        imagePullPolicy: IfNotPresent
        name: vllm
        ports:
        - containerPort: 8000
          name: model
          protocol: TCP
        resources:
          limits:
            cpu: 500m
            memory: 1Gi
          requests:
            cpu: 500m
            memory: 1Gi
        securityContext:
          allowPrivilegeEscalation: true
          privileged: true
          runAsUser: 0
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /llm/config/
          name: config
        - mountPath: /scripts
          name: scripts
        - mountPath: /llm/storage
          name: model-store
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      serviceAccount: kb-vllm-xtkrze
      serviceAccountName: kb-vllm-xtkrze
      terminationGracePeriodSeconds: 30
      tolerations:
      - effect: NoSchedule
        key: kb-data
        operator: Equal
        value: "true"
      volumes:
      - emptyDir: {}
        name: model-store
      - configMap:
          defaultMode: 493
          name: vllm-xtkrze-vllm-vllm-scripts
        name: scripts
      - emptyDir: {}
        name: config
  updateStrategy:
    rollingUpdate:
      partition: 0
    type: RollingUpdate
  volumeClaimTemplates:
  - apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
      creationTimestamp: null
      labels:
        apps.kubeblocks.io/vct-name: data
      name: data
    spec:
      accessModes:
      - ReadWriteOnce
      resources:
        requests:
          storage: 20Gi
      storageClassName: kb-default-sc
      volumeMode: Filesystem
    status:
      phase: Pending
status:
  availableReplicas: 2
  collisionCount: 0
  currentReplicas: 2
  currentRevision: vllm-xtkrze-vllm-5765f8c86b
  observedGeneration: 2
  readyReplicas: 2
  replicas: 2
  updateRevision: vllm-xtkrze-vllm-5765f8c86b
  updatedReplicas: 2
@ahjing99 ahjing99 added kind/bug Something isn't working severity/major Great chance user will encounter the same problem labels Oct 13, 2023
@ahjing99 ahjing99 added this to the Release 0.7.0 milestone Oct 13, 2023
@lynnleelhl
Copy link
Contributor

close as vllm doesn't need persistent volume

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug kind/bug Something isn't working severity/major Great chance user will encounter the same problem
Projects
None yet
Development

No branches or pull requests

3 participants