Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More Inference Endpoints features and fixes #68

Merged
merged 12 commits into from
Jul 6, 2024
Merged

Commits on Jul 3, 2024

  1. feat(generator): better handle exceptions on multiprocessing

    This will raise an error, signaling there was a problem. Before the
    root thread was getting stuck waiting for the agent that was dead. This
    way it should exit.
    tengomucho committed Jul 3, 2024
    Configuration menu
    Copy the full SHA
    1f414bb View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    0bf300e View commit details
    Browse the repository at this point in the history
  3. chore(docker): entrypoint json output is set by default

    It is possible to disable it by setting JSON_OUTPUT_DISABLE.
    It is now possible also to play with more batch sizes.
    tengomucho committed Jul 3, 2024
    Configuration menu
    Copy the full SHA
    5d61783 View commit details
    Browse the repository at this point in the history

Commits on Jul 4, 2024

  1. Configuration menu
    Copy the full SHA
    d8e6ef6 View commit details
    Browse the repository at this point in the history
  2. feat(generator): store position_id in current slot

    This will further simplify the implementation of prefill bucketing.
    tengomucho committed Jul 4, 2024
    Configuration menu
    Copy the full SHA
    333378a View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    355f976 View commit details
    Browse the repository at this point in the history
  4. fix(TGI): fix input truncation

    Truncation was sub-optimal, and it was done on the wrong side.
    tengomucho committed Jul 4, 2024
    Configuration menu
    Copy the full SHA
    bfd9b51 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    e73e2fd View commit details
    Browse the repository at this point in the history
  6. feat(tgi): warmup runs prefill/decode on all supported combinations

    This will prevent XLA compilation at inference time. Note that I had to
    disable dynamo compilation though, otherwise the model was not
    generating correct results. This leads to slower generation, but at
    least generation seems stable now.
    tengomucho committed Jul 4, 2024
    Configuration menu
    Copy the full SHA
    b271955 View commit details
    Browse the repository at this point in the history
  7. ci(tgi): create images when pushing on current branch

    This will allow for testing IE before release.
    tengomucho committed Jul 4, 2024
    Configuration menu
    Copy the full SHA
    3145343 View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    0204586 View commit details
    Browse the repository at this point in the history

Commits on Jul 5, 2024

  1. Configuration menu
    Copy the full SHA
    5d852b1 View commit details
    Browse the repository at this point in the history