[octoai] add chat models

pkelaita · Jul 25, 2024 · df27c53 · df27c53
1 parent d530a70
commit df27c53
Show file tree

Hide file tree

Showing 7 changed files with 194 additions and 41 deletions.
diff --git a/README.md b/README.md
@@ -1,12 +1,12 @@
 # L2M2: A Simple Python LLM Manager 💬👍
 
-[![Tests](https://github.com/pkelaita/l2m2/actions/workflows/tests.yml/badge.svg?timestamp=1721868974)](https://github.com/pkelaita/l2m2/actions/workflows/tests.yml) [![codecov](https://codecov.io/github/pkelaita/l2m2/graph/badge.svg?token=UWIB0L9PR8)](https://codecov.io/github/pkelaita/l2m2) [![PyPI version](https://badge.fury.io/py/l2m2.svg?timestamp=1721868974)](https://badge.fury.io/py/l2m2)
+[![Tests](https://github.com/pkelaita/l2m2/actions/workflows/tests.yml/badge.svg?timestamp=1721884488)](https://github.com/pkelaita/l2m2/actions/workflows/tests.yml) [![codecov](https://codecov.io/github/pkelaita/l2m2/graph/badge.svg?token=UWIB0L9PR8)](https://codecov.io/github/pkelaita/l2m2) [![PyPI version](https://badge.fury.io/py/l2m2.svg?timestamp=1721884488)](https://badge.fury.io/py/l2m2)
 
 **L2M2** ("LLM Manager" &rarr; "LLMM" &rarr; "L2M2") is a tiny and very simple LLM manager for Python that exposes lots of models through a unified API. This is useful for evaluation, demos, production applications etc. that need to easily be model-agnostic.
 
 ### Features
 
-- <!--start-count-->17<!--end-count--> supported models (see below) – regularly updated and with more on the way
+- <!--start-count-->21<!--end-count--> supported models (see below) – regularly updated and with more on the way
 - Session chat memory – even across multiple models
 - JSON mode
 - Prompt loading tools
@@ -23,25 +23,29 @@ L2M2 currently supports the following models:
 
 <!--start-model-table-->
 
-| Model Name          | Provider(s)                                                          | Model Version(s)                                                    |
-| ------------------- | -------------------------------------------------------------------- | ------------------------------------------------------------------- |
-| `gpt-4o`            | [OpenAI](https://openai.com/product)                                 | `gpt-4o-2024-05-13`                                                 |
-| `gpt-4o-mini`       | [OpenAI](https://openai.com/product)                                 | `gpt-4o-mini-2024-07-18`                                            |
-| `gpt-4-turbo`       | [OpenAI](https://openai.com/product)                                 | `gpt-4-turbo-2024-04-09`                                            |
-| `gpt-3.5-turbo`     | [OpenAI](https://openai.com/product)                                 | `gpt-3.5-turbo-0125`                                                |
-| `gemini-1.5-pro`    | [Google](https://ai.google.dev/)                                     | `gemini-1.5-pro`                                                    |
-| `gemini-1.0-pro`    | [Google](https://ai.google.dev/)                                     | `gemini-1.0-pro`                                                    |
-| `claude-3.5-sonnet` | [Anthropic](https://www.anthropic.com/api)                           | `claude-3-5-sonnet-20240620`                                        |
-| `claude-3-opus`     | [Anthropic](https://www.anthropic.com/api)                           | `claude-3-opus-20240229`                                            |
-| `claude-3-sonnet`   | [Anthropic](https://www.anthropic.com/api)                           | `claude-3-sonnet-20240229`                                          |
-| `claude-3-haiku`    | [Anthropic](https://www.anthropic.com/api)                           | `claude-3-haiku-20240307`                                           |
-| `command-r`         | [Cohere](https://docs.cohere.com/)                                   | `command-r`                                                         |
-| `command-r-plus`    | [Cohere](https://docs.cohere.com/)                                   | `command-r-plus`                                                    |
-| `mixtral-8x7b`      | [Groq](https://wow.groq.com/)                                        | `mixtral-8x7b-32768`                                                |
-| `gemma-7b`          | [Groq](https://wow.groq.com/)                                        | `gemma-7b-it`                                                       |
-| `llama3-8b`         | [Groq](https://wow.groq.com/), [Replicate](https://replicate.com/)   | `llama3-8b-8192`, `meta/meta-llama-3-8b-instruct`                   |
-| `llama3-70b`        | [Groq](https://wow.groq.com/), [Replicate](https://replicate.com/)   | `llama3-70b-8192`, `meta/meta-llama-3-70b-instruct`                 |
-| `llama3.1-405b`     | [Replicate](https://replicate.com/), [OctoAI](https://octoai.cloud/) | `meta/meta-llama-3.1-405b-instruct`, `meta-llama-3.1-405b-instruct` |
+| Model Name          | Provider(s)                                                                                         | Model Version(s)                                                                 |
+| ------------------- | --------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------- |
+| `gpt-4o`            | [OpenAI](https://openai.com/product)                                                                | `gpt-4o-2024-05-13`                                                              |
+| `gpt-4o-mini`       | [OpenAI](https://openai.com/product)                                                                | `gpt-4o-mini-2024-07-18`                                                         |
+| `gpt-4-turbo`       | [OpenAI](https://openai.com/product)                                                                | `gpt-4-turbo-2024-04-09`                                                         |
+| `gpt-3.5-turbo`     | [OpenAI](https://openai.com/product)                                                                | `gpt-3.5-turbo-0125`                                                             |
+| `gemini-1.5-pro`    | [Google](https://ai.google.dev/)                                                                    | `gemini-1.5-pro`                                                                 |
+| `gemini-1.0-pro`    | [Google](https://ai.google.dev/)                                                                    | `gemini-1.0-pro`                                                                 |
+| `claude-3.5-sonnet` | [Anthropic](https://www.anthropic.com/api)                                                          | `claude-3-5-sonnet-20240620`                                                     |
+| `claude-3-opus`     | [Anthropic](https://www.anthropic.com/api)                                                          | `claude-3-opus-20240229`                                                         |
+| `claude-3-sonnet`   | [Anthropic](https://www.anthropic.com/api)                                                          | `claude-3-sonnet-20240229`                                                       |
+| `claude-3-haiku`    | [Anthropic](https://www.anthropic.com/api)                                                          | `claude-3-haiku-20240307`                                                        |
+| `command-r`         | [Cohere](https://docs.cohere.com/)                                                                  | `command-r`                                                                      |
+| `command-r-plus`    | [Cohere](https://docs.cohere.com/)                                                                  | `command-r-plus`                                                                 |
+| `mistral-7b`        | [OctoAI](https://octoai.cloud/)                                                                     | `mistral-7b-instruct`                                                            |
+| `mixtral-8x7b`      | [Groq](https://wow.groq.com/), [OctoAI](https://octoai.cloud/)                                      | `mixtral-8x7b-32768`, `mixtral-8x7b-instruct`                                    |
+| `mixtral-8x22b`     | [OctoAI](https://octoai.cloud/)                                                                     | `mixtral-8x22b-instruct`                                                         |
+| `gemma-7b`          | [Groq](https://wow.groq.com/)                                                                       | `gemma-7b-it`                                                                    |
+| `llama3-8b`         | [Groq](https://wow.groq.com/), [Replicate](https://replicate.com/)                                  | `llama3-8b-8192`, `meta/meta-llama-3-8b-instruct`                                |
+| `llama3-70b`        | [Groq](https://wow.groq.com/), [Replicate](https://replicate.com/), [OctoAI](https://octoai.cloud/) | `llama3-70b-8192`, `meta/meta-llama-3-70b-instruct`, `meta-llama-3-70b-instruct` |
+| `llama3.1-8b`       | [OctoAI](https://octoai.cloud/)                                                                     | `meta-llama-3.1-8b-instruct`                                                     |
+| `llama3.1-70b`      | [OctoAI](https://octoai.cloud/)                                                                     | `meta-llama-3.1-70b-instruct`                                                    |
+| `llama3.1-405b`     | [Replicate](https://replicate.com/), [OctoAI](https://octoai.cloud/)                                | `meta/meta-llama-3.1-405b-instruct`, `meta-llama-3.1-405b-instruct`              |
 
 <!--end-model-table-->
 

diff --git a/l2m2/_internal/http.py b/l2m2/_internal/http.py
@@ -39,10 +39,10 @@ async def _handle_replicate_201(
 async def llm_post(
     client: httpx.AsyncClient,
     provider: str,
+    model_id: str,
     api_key: str,
     data: Dict[str, Any],
     timeout: Optional[int],
-    model_id: Optional[str] = None,
 ) -> Any:
     endpoint = PROVIDER_INFO[provider]["endpoint"]
     if API_KEY in endpoint:

diff --git a/l2m2/client/base_llm_client.py b/l2m2/client/base_llm_client.py
@@ -22,6 +22,7 @@
     get_extra_message,
     run_json_strats_out,
 )
+from l2m2.exceptions import LLMOperationError
 from l2m2._internal.http import llm_post
 
 
@@ -501,6 +502,7 @@ async def _call_openai(
         result = await llm_post(
             client=self.httpx_client,
             provider="openai",
+            model_id=model_id,
             api_key=self.api_keys["openai"],
             data={"model": model_id, "messages": messages, **params},
             timeout=timeout,
@@ -532,6 +534,7 @@ async def _call_anthropic(
         result = await llm_post(
             client=self.httpx_client,
             provider="anthropic",
+            model_id=model_id,
             api_key=self.api_keys["anthropic"],
             data={"model": model_id, "messages": messages, **params},
             timeout=timeout,
@@ -564,6 +567,7 @@ async def _call_cohere(
         result = await llm_post(
             client=self.httpx_client,
             provider="cohere",
+            model_id=model_id,
             api_key=self.api_keys["cohere"],
             data={"model": model_id, "message": prompt, **params},
             timeout=timeout,
@@ -595,6 +599,7 @@ async def _call_groq(
         result = await llm_post(
             client=self.httpx_client,
             provider="groq",
+            model_id=model_id,
             api_key=self.api_keys["groq"],
             data={"model": model_id, "messages": messages, **params},
             timeout=timeout,
@@ -633,10 +638,10 @@ async def _call_google(
         result = await llm_post(
             client=self.httpx_client,
             provider="google",
+            model_id=model_id,
             api_key=self.api_keys["google"],
             data=data,
             timeout=timeout,
-            model_id=model_id,
         )
         result = result["candidates"][0]
 
@@ -657,12 +662,12 @@ async def _call_replicate(
         json_mode_strategy: JsonModeStrategy,
     ) -> str:
         if isinstance(self.memory, ChatMemory):
-            raise ValueError(
+            raise LLMOperationError(
                 "Chat memory is not supported with Replicate."
                 + " Try using Groq, or using ExternalMemory instead."
             )
         if json_mode_strategy.strategy_name == StrategyName.PREPEND:
-            raise ValueError(
+            raise LLMOperationError(
                 "JsonModeStrategy.prepend() is not supported with Replicate."
                 + " Try using Groq, or using JsonModeStrategy.strip() instead."
             )
@@ -673,10 +678,10 @@ async def _call_replicate(
         result = await llm_post(
             client=self.httpx_client,
             provider="replicate",
+            model_id=model_id,
             api_key=self.api_keys["replicate"],
             data={"input": {"prompt": prompt, **params}},
             timeout=timeout,
-            model_id=model_id,
         )
         return "".join(result["output"])
 
@@ -690,6 +695,12 @@ async def _call_octoai(
         json_mode: bool,
         json_mode_strategy: JsonModeStrategy,
     ) -> str:
+        if isinstance(self.memory, ChatMemory) and model_id == "mixtral-8x22b-instruct":
+            raise LLMOperationError(
+                "Chat memory is not supported with mixtral-8x22b via OctoAI. Try using"
+                + " ExternalMemory instead, or ChatMemory with a different model/provider."
+            )
+
         messages = []
         if system_prompt is not None:
             messages.append({"role": "system", "content": system_prompt})
@@ -705,6 +716,7 @@ async def _call_octoai(
         result = await llm_post(
             client=self.httpx_client,
             provider="octoai",
+            model_id=model_id,
             api_key=self.api_keys["octoai"],
             data={"model": model_id, "messages": messages, **params},
             timeout=timeout,

diff --git a/l2m2/exceptions.py b/l2m2/exceptions.py
@@ -8,3 +8,9 @@ class LLMRateLimitError(Exception):
     """Raised when a request to an LLM provider API is rate limited."""
 
     pass
+
+
+class LLMOperationError(Exception):
+    """Raised when a model does not support a particular feature or mode."""
+
+    pass
diff --git a/l2m2/model_info.py b/l2m2/model_info.py
@@ -310,6 +310,22 @@ class ModelEntry(TypedDict):
             "extras": {},
         },
     },
+    "mistral-7b": {
+        "octoai": {
+            "model_id": "mistral-7b-instruct",
+            "params": {
+                "temperature": {
+                    "default": PROVIDER_DEFAULT,
+                    "max": 2.0,
+                },
+                "max_tokens": {
+                    "default": PROVIDER_DEFAULT,
+                    "max": INF,
+                },
+            },
+            "extras": {},
+        },
+    },
     "mixtral-8x7b": {
         "groq": {
             "model_id": "mixtral-8x7b-32768",
@@ -325,6 +341,36 @@ class ModelEntry(TypedDict):
             },
             "extras": {},
         },
+        "octoai": {
+            "model_id": "mixtral-8x7b-instruct",
+            "params": {
+                "temperature": {
+                    "default": PROVIDER_DEFAULT,
+                    "max": 2.0,
+                },
+                "max_tokens": {
+                    "default": PROVIDER_DEFAULT,
+                    "max": INF,
+                },
+            },
+            "extras": {},
+        },
+    },
+    "mixtral-8x22b": {
+        "octoai": {
+            "model_id": "mixtral-8x22b-instruct",
+            "params": {
+                "temperature": {
+                    "default": PROVIDER_DEFAULT,
+                    "max": 2.0,
+                },
+                "max_tokens": {
+                    "default": PROVIDER_DEFAULT,
+                    "max": INF,
+                },
+            },
+            "extras": {},
+        },
     },
     "gemma-7b": {
         "groq": {
@@ -348,7 +394,7 @@ class ModelEntry(TypedDict):
             "params": {
                 "temperature": {
                     "default": PROVIDER_DEFAULT,
-                    "max": 2,
+                    "max": 2.0,
                 },
                 "max_tokens": {
                     "default": PROVIDER_DEFAULT,
@@ -379,7 +425,7 @@ class ModelEntry(TypedDict):
             "params": {
                 "temperature": {
                     "default": PROVIDER_DEFAULT,
-                    "max": 2,
+                    "max": 2.0,
                 },
                 "max_tokens": {
                     "default": PROVIDER_DEFAULT,
@@ -403,6 +449,52 @@ class ModelEntry(TypedDict):
             },
             "extras": {},
         },
+        "octoai": {
+            "model_id": "meta-llama-3-70b-instruct",
+            "params": {
+                "temperature": {
+                    "default": PROVIDER_DEFAULT,
+                    "max": 2.0,
+                },
+                "max_tokens": {
+                    "default": PROVIDER_DEFAULT,
+                    "max": INF,
+                },
+            },
+            "extras": {},
+        },
+    },
+    "llama3.1-8b": {
+        "octoai": {
+            "model_id": "meta-llama-3.1-8b-instruct",
+            "params": {
+                "temperature": {
+                    "default": PROVIDER_DEFAULT,
+                    "max": 2.0,
+                },
+                "max_tokens": {
+                    "default": PROVIDER_DEFAULT,
+                    "max": INF,
+                },
+            },
+            "extras": {},
+        },
+    },
+    "llama3.1-70b": {
+        "octoai": {
+            "model_id": "meta-llama-3.1-70b-instruct",
+            "params": {
+                "temperature": {
+                    "default": PROVIDER_DEFAULT,
+                    "max": 2.0,
+                },
+                "max_tokens": {
+                    "default": PROVIDER_DEFAULT,
+                    "max": INF,
+                },
+            },
+            "extras": {},
+        },
     },
     "llama3.1-405b": {
         "replicate": {