Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: 10% perf drop on mixtral8x22b due to commit b62fba85ac03326e9f466d8d37e91ae1b14a6511 #305

Open
hlin99 opened this issue Sep 20, 2024 · 4 comments
Labels
bug Something isn't working

Comments

@hlin99
Copy link

hlin99 commented Sep 20, 2024

Your current environment

The output of `python collect_env.py`

🐛 Describe the bug

seq_group_metadata_list.extend(
self.create_dummy_seq_group_metadata(0, 0, is_prompt)
for _ in range(batch_size_padding))

this piece of code introduces metadata certation in loop, and observe 10% perf drop. is this code change intentional?

@hlin99 hlin99 added the bug Something isn't working label Sep 20, 2024
@iboiko-habana
Copy link

Please re-check perf with #301

@hlin99
Copy link
Author

hlin99 commented Sep 23, 2024

Unfortunately, performance has not improved, and the data looks identical before and after applying the patch. It seems that dummy creation and list extension are not the root cause of the performance drop. Instead, the issue appears to stem from changes to the dummy metadata, which are affecting subsequent calling path changes.

@iboiko-habana
Copy link

Please share traces or steps for reproduction

@hlin99
Copy link
Author

hlin99 commented Sep 23, 2024

  1. Below is my docker configuration with VLLM environments setup.
  2. Then, in the docker environment, goto vllm/benchmark
  3. run benchmark cmd
    python benchmark_throughput.py --backend vllm --dataset ./ShareGPT_V3_unfiltered_cleaned_split.json --tensor-parallel-size 8 --model mistralai/Mixtral-8x22B-Instruct-v0.1 --device hpu --dtype bfloat16 --gpu-memory-utilization 0.7 --max-num-batched-tokens 262144

before the change, the output throughput is about 2500 tokens/s, after the change it becomes 2200 tokens/s.


#!/bin/bash

export DOCKER_IMAGE=${DOCKER_IMAGES:-vault.habana.ai/gaudi-docker/1.17.0/ubuntu22.04/habanalabs/pytorch-installer-2.3.1:latest}
export CONTAINER_NAME=${CONTAINER_NAME:-vllm-server-mixtral-8x22b}
export DATA_DIR=${DATA_DIR:-/data0}
export SSH_PORT=${SSH_PORT:-3022}
export HABANA_VISIBLE_DEVICES=${HABANA_VISIBLE_DEVICES:-all}
export HF_TOKEN=${HF_TOKEN}

print_help(){
echo "Usage: $0 [options]"
echo "This script create and setup the docker container or $CONTAINER_NAME"
echo "Enter the container bash shell if no option specified."
echo
echo "Options:"
echo " -h, --help Show this help message and exit."
echo " 1 Create and setup the base container and exit."
echo " 2 Setup the container based on the setup.sh and exit"
echo " 0 Stop the container"
echo " -1 Stop and remove the container"
}

if [[ "$1" == "-h" || "$1" == "--help" ]]; then
print_help
exit 0
fi

if [ ! "${HABANA_VISIBLE_DEVICES}" == "all" ]; then
index_module_data=$(hl-smi --query-aip=index,module_id --format=csv)
echo "$index_module_data"
declare -A index_module_map
while IFS=", " read -r index module_id; do
index_module_map[$index]=$module_id
done <<< "$(echo "$index_module_data" | tail -n +2)"
indices=(${HABANA_VISIBLE_DEVICES//,/ })
module_ids=()
for index in "${indices[@]}"; do
module_ids+=(${index_module_map[$index]})
done
visible_modules=$( IFS=,; echo "${module_ids[*]}")
echo HABANA_VISIBLE_DEVICES=${HABANA_VISIBLE_DEVICES}
echo HABANA_VISIBLE_MODULES=${visible_modules}
else
visible_modules="0,1,2,3,4,5,6,7"
fi

container_existing=$(docker ps -a --filter "name=^/${CONTAINER_NAME}$" --format '{{.Names}}')
container_running=$(docker ps --filter "name=^/${CONTAINER_NAME}$" --format '{{.Names}}')

if [[ "$1" == "1" ]] || [[ -z "$container_existing" ]]; then
if [ ! -z "$container_existing" ]; then
echo "Error: Container ${CONTAINER_NAME} exists. Remove the existing container first."
exit -1
fi
docker run --runtime=habana --name ${CONTAINER_NAME} -td
-e HABANA_VISIBLE_DEVICES=${HABANA_VISIBLE_DEVICES}
-e OMPI_MCA_btl_vader_single_copy_mechanism=none
--cap-add=sys_nice --net=host --ipc=host
--env http_proxy=${http_proxy}
--env https_proxy=${https_proxy}
--env no_proxy=${no_proxy}
--env HF_HOME=${DATA_DIR}/huggingface
--env DATA_DIR=${DATA_DIR}
--env WORKSPACE_ROOT=/workspace
--env HABANA_VISIBLE_MODULES=${visible_modules}
--env "HUGGING_FACE_HUB_TOKEN=${HF_TOKEN}"
--env PT_HPU_ENABLE_LAZY_COLLECTIVES=true
--env PT_HPUGRAPH_DISABLE_TENSOR_CACHE=1
--env VLLM_GRAPH_RESERVED_MEM=0.6
--env VLLM_GRAPH_PROMPT_RATIO=0
--env VLLM_DECODE_BLOCK_BUCKET_MAX=2048
--env VLLM_PROMPT_BS_BUCKET_STEP=128
--env VLLM_PROMPT_BS_BUCKET_MAX=256
--volume pwd:/workspace
--volume ${DATA_DIR}:${DATA_DIR}
--name ${CONTAINER_NAME}
${DOCKER_IMAGE} bash

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants