-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[New Feature] Add the serving support #279
base: main
Are you sure you want to change the base?
Conversation
gpu-memory-utilization: 0.9 | ||
max-model-len: 32768 | ||
max-num-seqs: 256 | ||
port: 4567 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
port也是vllm里需要的参数吗?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
是的 vllm字段内的参数,与原生的vllm serve命令行保持一致
max-model-len: 32768 | ||
max-num-seqs: 256 | ||
port: 4567 | ||
action-args: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
action-args和上面的字段有什么区别?什么字段会放在这里
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
举例说明:vllm serve /models/Qwen2.5-7B-Instruct --tensor-parallel-size=1 --gpu-memory-utilization=0.9 --max-model-len=32768 --max-num-seqs=256 --port=4567 --trust-remote-code --enable-chunked-prefill
--trust-remote-code --enable-chunked-prefill 这两个参数是预设的行为,命令行里是禁止传入参数值的,属于action args。
本PR将增加文档说明部署参数选项,降低开发者的接入成本。
else: | ||
run_local_command(f"bash {host_run_script_file}", dryrun) | ||
|
||
def run(self, with_test=False, dryrun=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with_test的含义是要不要放在后台执行吗?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
是的
"localhost", | ||
available_addr, | ||
available_port, | ||
1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
local的话只支持单节点单卡吗?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
本pr里仅支持单卡,多卡场景支持中
pip install modelscope | ||
modelscope download --model Qwen/Qwen2.5-7B-Instruct --local_dir /models/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
准备模型的流程和上面一样加一个小标题
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
flagscale/serve/README.md
Outdated
python run.py --config-path examples/qwen/ --config-name config action=run | ||
``` | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
后续可以加一个文档说明怎么替换模型和数据,用户怎么使用工具来部署自己的模型
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
好的,本pr会补充
flagscale/serve/run_simple_vllm.py
Outdated
logger.info("Standard Output:") | ||
logger.info(stdout) | ||
# logger.info(stdout.decode()) | ||
logger.info("Standard Error:") | ||
logger.info(stderr) | ||
# logger.info(stderr.decode()) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
可以精简一下
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
New Feature
Flagscale supports serve.
Description
This pull request introduces support for deploying large models with FlagScale, leveraging the Ray framework for efficient orchestration and scalability. Currently, this implementation supports the Qwen model, enabling users to easily deploy and manage large-scale machine learning services.
Future Key features include:
More details in Serve.