Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feat] SGLang SRT commands in one go, async input for openai server #212

Merged
merged 4 commits into from
Aug 27, 2024

Conversation

kcz358
Copy link
Collaborator

@kcz358 kcz358 commented Aug 27, 2024

Before you open a pull-request, please check if a similar issue already exists or has been closed before.

When you open a pull-request, please be sure to include the following

  • A descriptive title: [xxx] XXXX
  • A detailed description

Thank you for your contributions!


This PR is to support sglang srt model to evaluate llava in one command. Now no longer needs to use a separate command to set up the backend server. An example command would be the following

# After update, there is no need to use an extra command to setup backend server
# the server will be initialized in the init process

# launch lmms-eval srt_api model
CKPT_PATH=$1
TASK=$2
MODALITY=$3
TP_SIZE=$4
echo $TASK
TASK_SUFFIX="${TASK//,/_}"
echo $TASK_SUFFIX

python3 -m lmms_eval \
    --model srt_api \
    --model_args modality=$MODALITY,model_version=$CKPT_PATH,tp=$TP_SIZE,host=127.0.0.1,port=30000,timeout=600 \
    --tasks $TASK \
    --batch_size 1 \
    --log_samples \
    --log_samples_suffix $TASK_SUFFIX \
    --output_path ./logs/

Also, the PR uses async to submit request to sglang server so that requests can be processed in batches. The speed for running MME reaches around 3 its /sec for num_processes=48. The speed is around 1.5 seconds /it when using single batch

@Luodian Luodian merged commit 0d7ffcc into main Aug 27, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants