Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use absolute paths instead of relying on HF cache #506

Merged
merged 2 commits into from
Mar 3, 2025

Conversation

rishic3
Copy link
Collaborator

@rishic3 rishic3 commented Mar 1, 2025

Relying on Huggingface cache to load LLMs is finicky on DBFS and when an access token is needed. Switch to downloading the model and using the abs path, which is better practice anyway.
Added barrier to pytriton server startup to ensure all servers are shut down if one fails.

@rishic3 rishic3 requested a review from eordentlich March 1, 2025 00:02
Copy link
Collaborator

@eordentlich eordentlich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added barrier to pytriton server startup to ensure all servers are shut down if one fails.
What is the benefit of this since they are independent?

@rishic3
Copy link
Collaborator Author

rishic3 commented Mar 1, 2025

Added barrier to pytriton server startup to ensure all servers are shut down if one fails. What is the benefit of this since they are independent?

If only one server fails and thus raises an error, the barrier stage will fail and the other servers will be left dangling without the manager having stored their PIDs. The subsequent tasks rely on all executors having a live server.

@eordentlich
Copy link
Collaborator

Ok. Good point. Does this actually solve the problem then or are the dangling servers still running after the barrier fails. Only the python workers are killed. Does this take out the triton servers too?

@rishic3
Copy link
Collaborator Author

rishic3 commented Mar 1, 2025

Ok. Good point. Does this actually solve the problem then or are the dangling servers still running after the barrier fails. Only the python workers are killed. Does this take out the triton servers too?

Yes killing the parent python worker, which is running the pytriton event loop, will kill all associated triton server processes.

Copy link
Collaborator

@eordentlich eordentlich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@rishic3 rishic3 merged commit 8c934b9 into NVIDIA:main Mar 3, 2025
3 of 4 checks passed
@rishic3 rishic3 deleted the model-loading-fixes branch March 3, 2025 15:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants