-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More Inference Endpoints features and fixes #68
Commits on Jul 3, 2024
-
feat(generator): better handle exceptions on multiprocessing
This will raise an error, signaling there was a problem. Before the root thread was getting stuck waiting for the agent that was dead. This way it should exit.
Configuration menu - View commit details
-
Copy full SHA for 1f414bb - Browse repository at this point
Copy the full SHA 1f414bbView commit details -
Configuration menu - View commit details
-
Copy full SHA for 0bf300e - Browse repository at this point
Copy the full SHA 0bf300eView commit details -
chore(docker): entrypoint json output is set by default
It is possible to disable it by setting JSON_OUTPUT_DISABLE. It is now possible also to play with more batch sizes.
Configuration menu - View commit details
-
Copy full SHA for 5d61783 - Browse repository at this point
Copy the full SHA 5d61783View commit details
Commits on Jul 4, 2024
-
Configuration menu - View commit details
-
Copy full SHA for d8e6ef6 - Browse repository at this point
Copy the full SHA d8e6ef6View commit details -
feat(generator): store position_id in current slot
This will further simplify the implementation of prefill bucketing.
Configuration menu - View commit details
-
Copy full SHA for 333378a - Browse repository at this point
Copy the full SHA 333378aView commit details -
Configuration menu - View commit details
-
Copy full SHA for 355f976 - Browse repository at this point
Copy the full SHA 355f976View commit details -
fix(TGI): fix input truncation
Truncation was sub-optimal, and it was done on the wrong side.
Configuration menu - View commit details
-
Copy full SHA for bfd9b51 - Browse repository at this point
Copy the full SHA bfd9b51View commit details -
Configuration menu - View commit details
-
Copy full SHA for e73e2fd - Browse repository at this point
Copy the full SHA e73e2fdView commit details -
feat(tgi): warmup runs prefill/decode on all supported combinations
This will prevent XLA compilation at inference time. Note that I had to disable dynamo compilation though, otherwise the model was not generating correct results. This leads to slower generation, but at least generation seems stable now.
Configuration menu - View commit details
-
Copy full SHA for b271955 - Browse repository at this point
Copy the full SHA b271955View commit details -
ci(tgi): create images when pushing on current branch
This will allow for testing IE before release.
Configuration menu - View commit details
-
Copy full SHA for 3145343 - Browse repository at this point
Copy the full SHA 3145343View commit details -
Configuration menu - View commit details
-
Copy full SHA for 0204586 - Browse repository at this point
Copy the full SHA 0204586View commit details
Commits on Jul 5, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 5d852b1 - Browse repository at this point
Copy the full SHA 5d852b1View commit details