Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[New Feature] Add the serving support #279

Open
wants to merge 24 commits into
base: main
Choose a base branch
from

Conversation

cyber-pioneer
Copy link
Collaborator

@cyber-pioneer cyber-pioneer commented Nov 26, 2024

New Feature

Flagscale supports serve.

Description

This pull request introduces support for deploying large models with FlagScale, leveraging the Ray framework for efficient orchestration and scalability. Currently, this implementation supports the Qwen model, enabling users to easily deploy and manage large-scale machine learning services.

Future Key features include:

  • Easy distributed Serve on base of eamless integration with Ray.
  • Optimized resource management for serve excution of multiple tasks.
  • Simplified deployment process for the LLM and Multimodal models.

More details in Serve.

@cyber-pioneer cyber-pioneer requested a review from a team as a code owner November 26, 2024 06:53
@cyber-pioneer cyber-pioneer changed the title [No Merge][Serve] New Feature: Flagscale support serve! [No Merge][Serve] New Feature: Flagscale supports serve! Nov 26, 2024
gpu-memory-utilization: 0.9
max-model-len: 32768
max-num-seqs: 256
port: 4567
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

port也是vllm里需要的参数吗?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

是的 vllm字段内的参数,与原生的vllm serve命令行保持一致

max-model-len: 32768
max-num-seqs: 256
port: 4567
action-args:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

action-args和上面的字段有什么区别?什么字段会放在这里

Copy link
Collaborator Author

@cyber-pioneer cyber-pioneer Nov 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

举例说明:vllm serve /models/Qwen2.5-7B-Instruct --tensor-parallel-size=1 --gpu-memory-utilization=0.9 --max-model-len=32768 --max-num-seqs=256 --port=4567 --trust-remote-code --enable-chunked-prefill

--trust-remote-code --enable-chunked-prefill 这两个参数是预设的行为,命令行里是禁止传入参数值的,属于action args。

本PR将增加文档说明部署参数选项,降低开发者的接入成本。

else:
run_local_command(f"bash {host_run_script_file}", dryrun)

def run(self, with_test=False, dryrun=False):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with_test的含义是要不要放在后台执行吗?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

是的

"localhost",
available_addr,
available_port,
1,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

local的话只支持单节点单卡吗?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

本pr里仅支持单卡,多卡场景支持中

Comment on lines +19 to +20
pip install modelscope
modelscope download --model Qwen/Qwen2.5-7B-Instruct --local_dir /models/
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

准备模型的流程和上面一样加一个小标题

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

python run.py --config-path examples/qwen/ --config-name config action=run
```


Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

后续可以加一个文档说明怎么替换模型和数据,用户怎么使用工具来部署自己的模型

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

好的,本pr会补充

Comment on lines 34 to 40
logger.info("Standard Output:")
logger.info(stdout)
# logger.info(stdout.decode())
logger.info("Standard Error:")
logger.info(stderr)
# logger.info(stderr.decode())

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

可以精简一下

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@cyber-pioneer cyber-pioneer changed the title [No Merge][Serve] New Feature: Flagscale supports serve! [Serve] New Feature: Flagscale supports serve! Dec 2, 2024
@aoyulong aoyulong changed the title [Serve] New Feature: Flagscale supports serve! [New Feature] Add the serving support Dec 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants