Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Autoscaling inference endpoints #412

Merged
merged 15 commits into from
Dec 5, 2024
Merged

Conversation

clefourrier
Copy link
Member

@clefourrier clefourrier commented Nov 29, 2024

This PR:

  • makes a lot of parameters completely optional to allow creating an inference endpoint from only the model name if needed
  • tries to auto-suggest relevant endpoint size
  • if not possible, tries to spin up the endpoint with the lowest hardware needed, then tries rescaling automatically for an hour until an appropriate hardware size is reached (or until 1h has elapsed)
  • catches a bunch of possible exceptions (endpoint already present, connection issues, compute not available, etc)

@HuggingFaceDocBuilderDev
Copy link
Collaborator

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@clefourrier
Copy link
Member Author

cc @albertvillanova this could interest you btw :)

Copy link
Member

@albertvillanova albertvillanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the ping.

Just a preliminary comment while trying to understand the logic:

  • The PR does not allow the user to create an endpoint with an arbitrary name. The user can only:
    • Either create an endpoint without being able to choose the endpoint name
    • Or reuse and existing (arbitrarily named) endpoint

@clefourrier
Copy link
Member Author

Yep! That's the idea, since I wanted to streamline process as much as possible - either the user already has an endpoint, or they want one created and don't care about the name

@clefourrier
Copy link
Member Author

Would you want do be able to select a name in all cases?

@albertvillanova
Copy link
Member

albertvillanova commented Dec 3, 2024

I do not have a strong opinion. I just wanted to be sure to understand all possible use cases and if we should allow all of them:

  • Use an existing endpoint
    • endpoint_name is required, model not passed, instance not passed, endpoint not deleted
  • Create a new endpoint
    • model is required
    • Custom endpoint_name? Either pass it or not
    • Delete endpoint at the end? Yes or not
    • Custom instance? Yes or not (autoscale)

The simplest cases:

  • User only passes endpoint_name: use an existing endpoint
  • User only passes model: create endpoint (we decide its name), autoscale, and delete

@clefourrier clefourrier changed the title First draft of autoscale Autoscaling inference endpoints Dec 3, 2024
@clefourrier clefourrier requested a review from NathanHB December 3, 2024 12:31
@clefourrier
Copy link
Member Author

At the moment we only cover the simple cases with this PR: user passes endpoint name or model name, we see if it's an already existing endpoint or if we need to spin it up, if we need to spin it up we delete it afterwards

@albertvillanova
Copy link
Member

My general suggestion would be:

  • make it as easy as possible for simple use cases, convenient for most users
  • also make it highly customizable, for advanced users

But some aspects if considered useful could be addressed in other PRs.

The only inconvenient I see is:

  • this PR changes current YAML format: endpoint_name and model are replaced by model_or_endpoint_name
    • this is a breaking change and has an impact in UX
  • future advanced use cases may again require differentiating between endpoint_name and model

@clefourrier
Copy link
Member Author

Very fair point, I'll revert this specific change

@clefourrier clefourrier requested a review from NathanHB December 4, 2024 11:47
Copy link
Member

@NathanHB NathanHB left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good ! Only small nit on the config file

@clefourrier clefourrier merged commit b68d5bc into main Dec 5, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants