Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce Local Server for OpenAI-Compatible APIs (Beta) #4

Open
wants to merge 2 commits into
base: dev
Choose a base branch
from

Conversation

future-xy
Copy link
Collaborator

Pull Request: Local Server Beta for OpenAI-Compatible APIs

This PR introduces a beta version of a local server that provides OpenAI-compatible APIs, specifically v1/chat/completions and v1/completions. This initial version supports serving a single model and recognizes only two required fields in requests: messages/prompt and model. Please note that other fields may not have an effect at this stage. For detailed information, refer to the README.md and the ./tests/ directory. It's important to mention that, in this beta version, we utilize vanilla HuggingFace Transformers models instead of the more advanced MoE-Infinity architecture.

Known Limitations and Todos

  • Enable MoE-Infinity support.
  • Support streaming mode.
  • Include finish reason.
  • Implement support for batching requests for improved efficiency.
  • Gather and implement necessary features based on user feedback to enhance functionality and user experience.

Your feedback and contributions are welcome to help evolve this project into a more robust solution. Please refer to the README.md for guidelines on contributing and testing.

@future-xy
Copy link
Collaborator Author

I got an error when trying to use MoE for google/switch-base-16:

ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "/home/fy/.conda/envs/moe_inf/lib/python3.9/site-packages/uvicorn/protocols/http/h11_impl.py", line 408, in run_asgi
    result = await app(  # type: ignore[func-returns-value]
  File "/home/fy/.conda/envs/moe_inf/lib/python3.9/site-packages/uvicorn/middleware/proxy_headers.py", line 69, in __call__
    return await self.app(scope, receive, send)
  File "/home/fy/.conda/envs/moe_inf/lib/python3.9/site-packages/fastapi/applications.py", line 1054, in __call__
    await super().__call__(scope, receive, send)
  File "/home/fy/.conda/envs/moe_inf/lib/python3.9/site-packages/starlette/applications.py", line 123, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/home/fy/.conda/envs/moe_inf/lib/python3.9/site-packages/starlette/middleware/errors.py", line 186, in __call__
    raise exc
  File "/home/fy/.conda/envs/moe_inf/lib/python3.9/site-packages/starlette/middleware/errors.py", line 164, in __call__
    await self.app(scope, receive, _send)
  File "/home/fy/.conda/envs/moe_inf/lib/python3.9/site-packages/starlette/middleware/exceptions.py", line 62, in __call__
    await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
  File "/home/fy/.conda/envs/moe_inf/lib/python3.9/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
    raise exc
  File "/home/fy/.conda/envs/moe_inf/lib/python3.9/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    await app(scope, receive, sender)
  File "/home/fy/.conda/envs/moe_inf/lib/python3.9/site-packages/starlette/routing.py", line 758, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/home/fy/.conda/envs/moe_inf/lib/python3.9/site-packages/starlette/routing.py", line 778, in app
    await route.handle(scope, receive, send)
  File "/home/fy/.conda/envs/moe_inf/lib/python3.9/site-packages/starlette/routing.py", line 299, in handle
    await self.app(scope, receive, send)
  File "/home/fy/.conda/envs/moe_inf/lib/python3.9/site-packages/starlette/routing.py", line 79, in app
    await wrap_app_handling_exceptions(app, request)(scope, receive, send)
  File "/home/fy/.conda/envs/moe_inf/lib/python3.9/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
    raise exc
  File "/home/fy/.conda/envs/moe_inf/lib/python3.9/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    await app(scope, receive, sender)
  File "/home/fy/.conda/envs/moe_inf/lib/python3.9/site-packages/starlette/routing.py", line 74, in app
    response = await func(request)
  File "/home/fy/.conda/envs/moe_inf/lib/python3.9/site-packages/fastapi/routing.py", line 278, in app
    raw_response = await run_endpoint_function(
  File "/home/fy/.conda/envs/moe_inf/lib/python3.9/site-packages/fastapi/routing.py", line 191, in run_endpoint_function
    return await dependant.call(**values)
  File "/mnt/data/fy/Desktop/MoE-Infinity/moe_infinity/entrypoints/openai/api_server.py", line 226, in completion
    _ = model.generate(
  File "/home/fy/.conda/envs/moe_inf/lib/python3.9/site-packages/moe_infinity/entrypoints/big_modeling.py", line 161, in generate
    return self.model.generate(input_ids, **kwargs)
  File "/home/fy/.conda/envs/moe_inf/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/fy/.conda/envs/moe_inf/lib/python3.9/site-packages/transformers/generation/utils.py", line 1413, in generate
    model_kwargs = self._prepare_encoder_decoder_kwargs_for_generation(
  File "/home/fy/.conda/envs/moe_inf/lib/python3.9/site-packages/transformers/generation/utils.py", line 518, in _prepare_encoder_decoder_kwargs_for_generation
    model_kwargs["encoder_outputs"]: ModelOutput = encoder(**encoder_kwargs)
  File "/home/fy/.conda/envs/moe_inf/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/fy/.conda/envs/moe_inf/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1561, in _call_impl
    result = forward_call(*args, **kwargs)
  File "/home/fy/.conda/envs/moe_inf/lib/python3.9/site-packages/transformers/models/switch_transformers/modeling_switch_transformers.py", line 1043, in forward
    layer_outputs = layer_module(
  File "/home/fy/.conda/envs/moe_inf/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/fy/.conda/envs/moe_inf/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1561, in _call_impl
    result = forward_call(*args, **kwargs)
  File "/home/fy/.conda/envs/moe_inf/lib/python3.9/site-packages/transformers/models/switch_transformers/modeling_switch_transformers.py", line 767, in forward
    hidden_states = self.layer[-1](hidden_states, output_router_logits)
  File "/home/fy/.conda/envs/moe_inf/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/fy/.conda/envs/moe_inf/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1561, in _call_impl
    result = forward_call(*args, **kwargs)
  File "/home/fy/.conda/envs/moe_inf/lib/python3.9/site-packages/transformers/models/switch_transformers/modeling_switch_transformers.py", line 345, in forward
    forwarded_states = self.mlp(forwarded_states)
  File "/home/fy/.conda/envs/moe_inf/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/fy/.conda/envs/moe_inf/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1561, in _call_impl
    result = forward_call(*args, **kwargs)
  File "/home/fy/.conda/envs/moe_inf/lib/python3.9/site-packages/moe_infinity/models/switch_transformers.py", line 75, in forward
    router_mask, router_probs, router_logits = self.router(hidden_states)
  File "/home/fy/.conda/envs/moe_inf/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/fy/.conda/envs/moe_inf/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1561, in _call_impl
    result = forward_call(*args, **kwargs)
  File "/home/fy/.conda/envs/moe_inf/lib/python3.9/site-packages/transformers/models/switch_transformers/modeling_switch_transformers.py", line 208, in forward
    router_probs, router_logits = self._compute_router_probabilities(hidden_states)
  File "/home/fy/.conda/envs/moe_inf/lib/python3.9/site-packages/transformers/models/switch_transformers/modeling_switch_transformers.py", line 177, in _compute_router_probabilities
    router_logits = self.classifier(hidden_states)
  File "/home/fy/.conda/envs/moe_inf/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/fy/.conda/envs/moe_inf/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1540, in _call_impl
    args_kwargs_result = hook(self, args, kwargs)  # type: ignore[misc]
  File "/home/fy/.conda/envs/moe_inf/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/fy/.conda/envs/moe_inf/lib/python3.9/site-packages/moe_infinity/runtime/model_offload.py", line 882, in _pre_forward_module_hook
    self.offload_set.remove(param.data.data_ptr())
KeyError: 889163264

@lausannel
Copy link
Collaborator

@future-xy Fixed in the latest version available on TestPyPI, please feel free to give it another try.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants