From 40253309616580c172dc258d557a2ffcf576d31e Mon Sep 17 00:00:00 2001 From: Paul-Cornell Date: Wed, 30 Oct 2024 14:27:25 -0700 Subject: [PATCH] Added embedding dimensions by model (#307) --- platform/embedding.mdx | 13 ++++++++--- platform/workflows.mdx | 22 +++++++++++++++---- .../embedding-configuration.mdx | 12 +++++----- 3 files changed, 34 insertions(+), 13 deletions(-) diff --git a/platform/embedding.mdx b/platform/embedding.mdx index 9485f2ff..93f91582 100644 --- a/platform/embedding.mdx +++ b/platform/embedding.mdx @@ -59,9 +59,16 @@ on Hugging Face: ## Generate embeddings -To generate embeddings, choose one of the following embedding providers in the **Providers** section of an **Embedder** node in a workflow: +To generate embeddings, choose one of the following embedding providers and models in the **Providers** section of an **Embedder** node in a workflow: You can change a workflow's predefined provider only through [Custom](/platform/workflows#create-a-custom-workflow) workflow settings. -- **OpenAI**: Use [OpenAI](https://openai.com) to generate embeddings. -- **Vertex AI**: Use [Vertex AI](https://cloud.google.com/vertex-ai) to generate embeddings. \ No newline at end of file +- **OpenAI**: Use [OpenAI](https://openai.com) to generate embeddings. Also, choose the model to use: + + - **text-embedding-3-small**, with 1536 dimensions. + - **text-embedding-3-large**, with 3072 dimensions. + - **Ada 002 (Text)**, with 1536 dimensions. + + [Learn more](https://platform.openai.com/docs/guides/embeddings). + +- **Vertex AI**: Use [Vertex AI](https://cloud.google.com/vertex-ai) to generate embeddings by using the [textembedding-gecko@001](https://cloud.google.com/vertex-ai/generative-ai/docs/embeddings/get-text-embeddings) model, with 768 dimensions. \ No newline at end of file diff --git a/platform/workflows.mdx b/platform/workflows.mdx index 4dad9e67..e52feebf 100644 --- a/platform/workflows.mdx +++ b/platform/workflows.mdx @@ -171,8 +171,15 @@ There are two ways to create a custom workflow: 16. In the **Embed** area, for **Provider**, choose one of the following: - **None**: Do not generate embeddings. - - **OpenAI**: Use OpenAI to generate embeddings. - - **Vertex AI**: Use Vertex AI to generate embeddings. + - **OpenAI**: Use OpenAI to generate embeddings. Also, choose the model to use: + + - **text-embedding-3-small**, with 1536 dimensions. + - **text-embedding-3-large**, with 3072 dimensions. + - **Ada 002 (Text)**, with 1536 dimensions. + + [Learn more](https://platform.openai.com/docs/guides/embeddings). + + - **Vertex AI**: Use Vertex AI to generate embeddings by using the `textembedding-gecko@001` model, with 768 dimensions. [Learn more](https://cloud.google.com/vertex-ai/generative-ai/docs/embeddings/get-text-embeddings). Learn more: @@ -299,8 +306,15 @@ There are two ways to create a custom workflow: For **Providers**, select one of the following: - - **OpenAI**: Use OpenAI to generate embeddings. - - **Vertex AI**: Use Vertex AI to generate embeddings. + - **OpenAI**: Use OpenAI to generate embeddings. Also, choose the model to use: + + - **text-embedding-3-small**, with 1536 dimensions. + - **text-embedding-3-large**, with 3072 dimensions. + - **Ada 002 (Text)**, with 1536 dimensions. + + [Learn more](https://platform.openai.com/docs/guides/embeddings). + + - **Vertex AI**: Use Vertex AI to generate embeddings by using the `textembedding-gecko@001` model, with 768 dimensions. [Learn more](https://cloud.google.com/vertex-ai/generative-ai/docs/embeddings/get-text-embeddings). Learn more: diff --git a/snippets/ingest-configuration-shared/embedding-configuration.mdx b/snippets/ingest-configuration-shared/embedding-configuration.mdx index d16a8ab8..228b2a75 100644 --- a/snippets/ingest-configuration-shared/embedding-configuration.mdx +++ b/snippets/ingest-configuration-shared/embedding-configuration.mdx @@ -31,16 +31,16 @@ A common embedding configuration is a critical component that allows for dynamic * `aws-bedrock`: None -* `huggingface`: `sentence-transformers/all-MiniLM-L6-v2` +* `huggingface`: `sentence-transformers/all-MiniLM-L6-v2`, with 384 dimensions -* `mixedbread-ai`: `mixedbread-ai/mxbai-embed-large-v1` +* `mixedbread-ai`: `mixedbread-ai/mxbai-embed-large-v1`, with 1024 dimensions -* `octoai`: `thenlper/gte-large` +* `octoai`: `thenlper/gte-large`, with 1024 dimensions -* `openai`: `text-embedding-ada-002` +* `openai`: `text-embedding-ada-002`, with 1536 dimensions -* `togetherai`: `togethercomputer/m2-bert-80M-8k-retrieval` +* `togetherai`: `togethercomputer/m2-bert-80M-8k-retrieval`, with 768 dimensions -* `vertexai`: `textembedding-gecko@001` +* `vertexai`: `textembedding-gecko@001`, with 768 dimensions * `voyageai`: None \ No newline at end of file