Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[API server] hornor cgroup cpu limit when set worker numbers #4848

Merged
merged 3 commits into from
Mar 1, 2025

Conversation

aylei
Copy link
Collaborator

@aylei aylei commented Feb 28, 2025

close #4847

Tested (run the relevant ones):

On my laptop:

$ sky api stop && sky api start && grep "Starting SkyPilot API serve" ~/.sky/api_server/server.log
...
INFO:__main__:Starting SkyPilot API server, workers=1

$ sky api stop && sky api start --deploy && grep "Starting SkyPilot API serve" ~/.sky/api_server/server.log
SkyPilot API server stopped.
...
INFO:__main__:Starting SkyPilot API server, workers=12

On K8S (cpu limit to 4 cores):

INFO:__main__:Starting SkyPilot API server, workers=4
  • Code formatting: bash format.sh
  • Manual test, see above
  • Unit test coverage
  • All smoke tests: pytest tests/test_smoke.py
  • Relevant individual smoke tests: pytest tests/test_smoke.py::test_fill_in_the_name
  • Backward compatibility tests: conda deactivate; bash -i tests/backward_compatibility_tests.sh

Copy link
Collaborator

@Michaelvll Michaelvll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @aylei! LGTM with some minor nits.


workers = []
try:
workers = executor.start(cmd_args.deploy)
logger.info('Starting SkyPilot API server')
logger.info(f'Starting SkyPilot API server, workers={num_workers}')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be a bit too detailed for a user to know. Can we hide the num workers in logger.debug or some other logs, not directly printing to the terminal?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO, users of remote API server and sky CLI do not see this log in normal use cases unless they check it explicitly in ~/.sky/api_server/server.log or by kubectl logs (usually for diagnosis). In the meantime, infra operators of a centralized API server do care about the detailed setup. This info would help like the number of Long and Short workers we printed previously for them. Wdyt?

Copy link
Collaborator

@Michaelvll Michaelvll Feb 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, that makes sense, as long as it is not shown to the user in console, it is good to me. In that case, let's make the workers here more explicit being uvicorn workers: {num_workers}

@@ -0,0 +1,108 @@
"""Unit tests for the SkyPilot API server."""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! This is great. Let's move this unit_tests to the specific subfolder, i.e. unit_tests/sky/server/test_server.py?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we keep directories flat?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can be confusing with a flat structure, as we have multiple server.py in different directories, e.g. jobs.server.server.py and serve.server.server.py. It seems a best practice to mirror the structure of tests: https://pytest-with-eric.com/pytest-best-practices/pytest-organize-tests/#Best-Practice-2-Test-Structure-Should-Mirror-Application-Code
and some other sources.

Signed-off-by: Aylei <[email protected]>
@aylei aylei merged commit 11e6238 into master Mar 1, 2025
18 checks passed
@aylei aylei deleted the uvicorn-hornor-cgroup branch March 1, 2025 09:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

uvicorn worker numbers does not hornor cgroup resource limit
3 participants