Banana Phone API π is a relay API that acts as a bridge between clients and larger language model inferencing servers. It is specifically designed to compliment LM Studio Server, but should be compatible with any, OpenAI-compatible endpoints. It features a suite of functionalities including API key-based access control, automatic message formatting to match the active model (some setup required), and real-time streaming responses from language models. It is lightweight and kind-hearted.
- π API Key Verification: Secure the API by granting access only to requests with valid API keys.
- π Automatic Message Styling (Autostyle): Customizes model interactions with message prefixes, suffixes, and stop sequences.
- π Response Streaming: Supports streaming language model responses for interactive sessions.
- π§ Model Configuration Flexibility: Easily add and manage language model configurations via the
models.json
file.
To get Banana Phone API up and ringing, follow the steps below.
- Python 3.x
- pip for installing Python packages
- Clone the repository and step into the new directory:
git clone https://github.com/your-username/banana-phone.git
cd banana-phone
- Rename .env.example to .env and configure it (see [[Configuration]]):
cp .env.example .env
nano .env # or use any other text editor
- Launch
ring.sh
to build the virtual environment, install necessary dependencies within that environment, and kick off the API server:
chmod +X ring.sh # only required the first time you run it
./ring.sh --port 3456
Adjust the API settings by tweaking the .env
file variables:
LOCAL_PORT
: The port Banana Phone API will listen on.DESTINATION_API
: The destination API URL, like the LM Studio Server.ENDPOINT_COMPLETIONS
: The endpoint on the estination API for getting completions.ENDPOINT_MODELS
: The endpoint on the destination API for retrieving available models.API_KEYS
: List of API keys, comma-separated, authorizing access to Banana Phone API.WAN_ENABLED
: Toggle (false
ortrue
) to control remote host access.AUTOSTYLE
: Enable (true
) or disable (false
) autostyle for formatting messages.SYSTEM_MSG
: Default message used when a query lacks a system message.SYSTEM_OVERRIDE
: Set (true
) to replace existing system messages withSYSTEM_MSG
.
Command-line arguments for ring.sh
to override config settings:
--port <port_number>
: Define the port number.--api-url <url>
: Update the destination API URL.--sys <system_message>
: Set the system message.--forcesys
: Ensure system message is applied.--tmux
: Utilizetmux
for session control.--wan
: Enable access from anywhere, not just localhost.--nostyle
: Turn off message autostyling.--reload
: Auto-reload the server upon file changes.
- Enabling
--wan
poses security risks, especially if:- No API key is set (i.e.,
API_KEYS
is empty). - Your network's firewall leaves the specified port open.
- No API key is set (i.e.,
- For safer remote access, consider establishing a Cloudflare Tunnel or setting up a reverse proxy rather than using the WAN option directly.
- When you specify API keys in the .env file, make sur eyou separate them by commas without spaces: just,like,so.
- Specifying any API keys bars queries that don't include them in their header. Set API_KEYS to "" for effective glasnost.
- Use the same format for API keys as OpenAI uses when querying, i.e.,
Authorization: Bearer {{key}}
.
models.json
includes premade configurations for Alpaca, ChatML, Llama2, Mistral, Orca, Phind, Vicuna, and Zephyr prompt formats, and has populated these configurations with a handful of currently popular models for automatic matching. To add more models under an existing configuation, simply take the model name or a sufficiently unique portion of a model name, taking care to match the case, and add it to the models array within the larger configuration dictionary. For example, if you wanted to add Mistral 7b, you would add it like so:
{
... other configurations ...
"Mistral": {
"models": [
"mistral instruct",
"mistral 7b" # <--- simply add it here!
],
"prefix": "\n[INST] ",
"stops": [
"[/INST]",
"[INST]",
"</s>"
],
"suffix": "[/INST]\n",
"sysPrefix": "",
"sysSuffix": "\n<s>"
},
... other configurations ...
}
Similarly, you can add entirely new configurations by replicating the structure of existing ones and filling the relevant information prefixes, suffixes, and stops, which are all readily found on HuggingFace
.
- Ensure the model IDs in the
models
array match those returned by the API's model endpoint, or at least a sufficiently unique portion of them. - You may encounter unexpected behavior if you use overly broad model shortnames such that the model in use matches more than one configuration.
- JSON formatting is notoriously persnickity. A missing comma, curly bracket, or even inadvertently using curly instead of straight quotation marks will likely break the whole script.
- Consider a tool like
OK JSON
if you find yourself editing this or other JSONs frequently.