Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

- #29161

Closed
Closed

- #29161

Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
our contribution
  • Loading branch information
norbit8 committed Jan 12, 2025
commit 865fd8b275d2d64446bff317d9b4fc1c933cd8fa
250 changes: 250 additions & 0 deletions docs/docs/integrations/retrievers/nimbleway.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,250 @@
{
"cells": [
{
"cell_type": "raw",
"id": "afaf8039",
"metadata": {},
"source": [
"---\n",
"sidebar_label: Nimble\n",
"---"
]
},
{
"cell_type": "markdown",
"id": "72ee0c4b-9764-423a-9dbf-95129e185210",
"metadata": {},
"source": [
"# NimbleRetriever\n",
"\n",
"This will help you getting started with the Nimble [retriever](/docs/concepts/#retrievers). For detailed documentation of all NimbleRetriever features and configurations head to the [API reference](https://api.python.langchain.com/en/latest/retrievers/langchain_nimble.retrievers.Nimble.NimbleRetriever.html).\n",
"\n",
"\n",
"## Setup\n",
"\n",
"If you want to get automated tracing from individual queries, you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:"
]
},
{
"cell_type": "code",
"id": "a15d341e-3e26-4ca3-830b-5aab30ed66de",
"metadata": {
"ExecuteTime": {
"end_time": "2025-01-12T17:53:37.779960Z",
"start_time": "2025-01-12T17:53:37.775887Z"
}
},
"source": [
"# os.environ[\"LANGSMITH_API_KEY\"] = getpass.getpass(\"Enter your LangSmith API key: \")\n",
"# os.environ[\"LANGSMITH_TRACING\"] = \"true\""
],
"outputs": [],
"execution_count": 1
},
{
"cell_type": "markdown",
"id": "0730d6a1-c893-4840-9817-5e5251676d5d",
"metadata": {},
"source": [
"### Installation\n",
"\n",
"This retriever lives in the `langchain-community` package."
]
},
{
"cell_type": "code",
"id": "652d6238-1f87-422a-b135-f5abbb8652fc",
"metadata": {
"ExecuteTime": {
"end_time": "2025-01-12T17:53:42.215483Z",
"start_time": "2025-01-12T17:53:37.907588Z"
}
},
"source": "%pip install -qU langchain-community",
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Note: you may need to restart the kernel to use updated packages.\n"
]
}
],
"execution_count": 2
},
{
"metadata": {},
"cell_type": "markdown",
"source": "We also need to set out Nimble API key.",
"id": "e0b6f0a0eb215a80"
},
{
"metadata": {
"ExecuteTime": {
"end_time": "2025-01-12T17:55:32.914431Z",
"start_time": "2025-01-12T17:53:42.296223Z"
}
},
"cell_type": "code",
"source": [
"import getpass\n",
"import os\n",
"\n",
"os.environ[\"NIMBLE_API_KEY\"] = getpass.getpass()"
],
"id": "4c6dc24c441ec1f0",
"outputs": [],
"execution_count": 3
},
{
"cell_type": "markdown",
"id": "a38cde65-254d-4219-a441-068766c0d4b5",
"metadata": {},
"source": [
"## Instantiation\n",
"\n",
"Now we can instantiate our retriever:\n",
"\n",
"- TODO: Update model instantiation with relevant params."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "70cc8e65-2a02-408a-bbc6-8ef649057d82",
"metadata": {},
"outputs": [],
"source": [
"from langchain_community.retrievers import NimbleRetriever\n",
"\n",
"retriever = NimbleRetriever(num_resulst=3)"
]
},
{
"cell_type": "markdown",
"id": "5c5f2839-4020-424e-9fc9-07777eede442",
"metadata": {},
"source": [
"## Usage"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "51a60dbe-9f2e-4e04-bb62-23968f17164a",
"metadata": {},
"outputs": [],
"source": [
"query = \"Nimbleway\"\n",
"\n",
"retriever.invoke(query)"
]
},
{
"cell_type": "markdown",
"id": "dfe8aad4-8626-4330-98a9-7ea1ca5d2e0e",
"metadata": {},
"source": [
"## Use within a chain\n",
"\n",
"Like other retrievers, NimbleRetriever can be incorporated into LLM applications via [chains](/docs/how_to/sequence/).\n",
"\n",
"We will need a LLM or chat model:\n",
"\n",
"```{=mdx}\n",
"import ChatModelTabs from \"@theme/ChatModelTabs\";\n",
"\n",
"<ChatModelTabs customVarName=\"llm\" />\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "25b647a3-f8f2-4541-a289-7a241e43f9df",
"metadata": {},
"outputs": [],
"source": [
"# | output: false\n",
"# | echo: false\n",
"\n",
"from langchain_openai import ChatOpenAI\n",
"\n",
"llm = ChatOpenAI(model=\"gpt-3.5-turbo-0125\", temperature=0)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "23e11cc9-abd6-4855-a7eb-799f45ca01ae",
"metadata": {},
"outputs": [],
"source": [
"from langchain_core.output_parsers import StrOutputParser\n",
"from langchain_core.prompts import ChatPromptTemplate\n",
"from langchain_core.runnables import RunnablePassthrough\n",
"\n",
"prompt = ChatPromptTemplate.from_template(\n",
" \"\"\"Answer the question based only on the context provided.\n",
"\n",
"Context: {context}\n",
"\n",
"Question: {question}\"\"\"\n",
")\n",
"\n",
"\n",
"def format_docs(docs):\n",
" return \"\\n\\n\".join(doc.page_content for doc in docs)\n",
"\n",
"\n",
"chain = (\n",
" {\"context\": retriever | format_docs, \"question\": RunnablePassthrough()}\n",
" | prompt\n",
" | llm\n",
" | StrOutputParser()\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d47c37dd-5c11-416c-a3b6-bec413cd70e8",
"metadata": {},
"outputs": [],
"source": [
"chain.invoke(\"...\")"
]
},
{
"cell_type": "markdown",
"id": "3a5bb5ca-c3ae-4a58-be67-2cd18574b9a3",
"metadata": {},
"source": [
"## API reference\n",
"\n",
"For detailed documentation of all NimbleRetriever features and configurations head to the [API reference](https://api.python.langchain.com/en/latest/retrievers/langchain_nimble.retrievers.Nimble.NimbleRetriever.html)."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.4"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
6 changes: 5 additions & 1 deletion libs/community/langchain_community/retrievers/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,130 +22,132 @@
from typing import TYPE_CHECKING, Any

if TYPE_CHECKING:
from langchain_community.retrievers.arcee import (
ArceeRetriever,
)
from langchain_community.retrievers.arxiv import (
ArxivRetriever,
)
from langchain_community.retrievers.asknews import (
AskNewsRetriever,
)
from langchain_community.retrievers.azure_ai_search import (
AzureAISearchRetriever,
AzureCognitiveSearchRetriever,
)
from langchain_community.retrievers.bedrock import (
AmazonKnowledgeBasesRetriever,
)
from langchain_community.retrievers.bm25 import (
BM25Retriever,
)
from langchain_community.retrievers.breebs import (
BreebsRetriever,
)
from langchain_community.retrievers.chaindesk import (
ChaindeskRetriever,
)
from langchain_community.retrievers.chatgpt_plugin_retriever import (
ChatGPTPluginRetriever,
)
from langchain_community.retrievers.cohere_rag_retriever import (
CohereRagRetriever,
)
from langchain_community.retrievers.docarray import (
DocArrayRetriever,
)
from langchain_community.retrievers.dria_index import (
DriaRetriever,
)
from langchain_community.retrievers.elastic_search_bm25 import (
ElasticSearchBM25Retriever,
)
from langchain_community.retrievers.embedchain import (
EmbedchainRetriever,
)
from langchain_community.retrievers.google_cloud_documentai_warehouse import (
GoogleDocumentAIWarehouseRetriever,
)
from langchain_community.retrievers.google_vertex_ai_search import (
GoogleCloudEnterpriseSearchRetriever,
GoogleVertexAIMultiTurnSearchRetriever,
GoogleVertexAISearchRetriever,
)
from langchain_community.retrievers.kay import (
KayAiRetriever,
)
from langchain_community.retrievers.kendra import (
AmazonKendraRetriever,
)
from langchain_community.retrievers.knn import (
KNNRetriever,
)
from langchain_community.retrievers.llama_index import (
LlamaIndexGraphRetriever,
LlamaIndexRetriever,
)
from langchain_community.retrievers.metal import (
MetalRetriever,
)
from langchain_community.retrievers.milvus import (
MilvusRetriever,
)
from langchain_community.retrievers.nanopq import NanoPQRetriever
from langchain_community.retrievers.needle import NeedleRetriever
from langchain_community.retrievers.outline import (
OutlineRetriever,
)
from langchain_community.retrievers.pinecone_hybrid_search import (
PineconeHybridSearchRetriever,
)
from langchain_community.retrievers.pubmed import (
PubMedRetriever,
)
from langchain_community.retrievers.qdrant_sparse_vector_retriever import (
QdrantSparseVectorRetriever,
)
from langchain_community.retrievers.rememberizer import (
RememberizerRetriever,
)
from langchain_community.retrievers.remote_retriever import (
RemoteLangChainRetriever,
)
from langchain_community.retrievers.svm import (
SVMRetriever,
)
from langchain_community.retrievers.tavily_search_api import (
TavilySearchAPIRetriever,
)
from langchain_community.retrievers.tfidf import (
TFIDFRetriever,
)
from langchain_community.retrievers.thirdai_neuraldb import NeuralDBRetriever
from langchain_community.retrievers.vespa_retriever import (
VespaRetriever,
)
from langchain_community.retrievers.weaviate_hybrid_search import (
WeaviateHybridSearchRetriever,
)
from langchain_community.retrievers.web_research import WebResearchRetriever
from langchain_community.retrievers.wikipedia import (
WikipediaRetriever,
)
from langchain_community.retrievers.you import (
YouRetriever,
)
from langchain_community.retrievers.zep import (
ZepRetriever,
)
from langchain_community.retrievers.zep_cloud import (
ZepCloudRetriever,
)
from langchain_community.retrievers.zilliz import (
ZillizRetriever,
)

from langchain_community.retrievers.nimbleway import(
NimblewayRetriever,
)

Check failure on line 150 in libs/community/langchain_community/retrievers/__init__.py

View workflow job for this annotation

GitHub Actions / cd libs/community / make lint #3.13

Ruff (I001)

langchain_community/retrievers/__init__.py:25:1: I001 Import block is un-sorted or un-formatted

Check failure on line 150 in libs/community/langchain_community/retrievers/__init__.py

View workflow job for this annotation

GitHub Actions / cd libs/community / make lint #3.9

Ruff (I001)

langchain_community/retrievers/__init__.py:25:1: I001 Import block is un-sorted or un-formatted
_module_lookup = {
"AmazonKendraRetriever": "langchain_community.retrievers.kendra",
"AmazonKnowledgeBasesRetriever": "langchain_community.retrievers.bedrock",
Expand Down Expand Up @@ -193,6 +195,7 @@
"ZepCloudRetriever": "langchain_community.retrievers.zep_cloud",
"ZillizRetriever": "langchain_community.retrievers.zilliz",
"NeuralDBRetriever": "langchain_community.retrievers.thirdai_neuraldb",
"NimblewayRetriever": "langchain_community.retrievers.nimbleway",
}


Expand Down Expand Up @@ -250,4 +253,5 @@
"ZepRetriever",
"ZepCloudRetriever",
"ZillizRetriever",
"NimblewayRetriever",
]
81 changes: 81 additions & 0 deletions libs/community/langchain_community/retrievers/nimbleway.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
from enum import Enum
from typing import List, Any

Check failure on line 2 in libs/community/langchain_community/retrievers/nimbleway.py

View workflow job for this annotation

GitHub Actions / cd libs/community / make lint #3.13

Ruff (F401)

langchain_community/retrievers/nimbleway.py:2:26: F401 `typing.Any` imported but unused

Check failure on line 2 in libs/community/langchain_community/retrievers/nimbleway.py

View workflow job for this annotation

GitHub Actions / cd libs/community / make lint #3.9

Ruff (F401)

langchain_community/retrievers/nimbleway.py:2:26: F401 `typing.Any` imported but unused
import requests
import os

from langchain_core.callbacks.manager import CallbackManagerForRetrieverRun
from langchain_core.documents.base import Document
from langchain_core.retrievers import BaseRetriever


class SearchEngine(str, Enum):

Check failure on line 11 in libs/community/langchain_community/retrievers/nimbleway.py

View workflow job for this annotation

GitHub Actions / cd libs/community / make lint #3.13

Ruff (I001)

langchain_community/retrievers/nimbleway.py:1:1: I001 Import block is un-sorted or un-formatted

Check failure on line 11 in libs/community/langchain_community/retrievers/nimbleway.py

View workflow job for this annotation

GitHub Actions / cd libs/community / make lint #3.9

Ruff (I001)

langchain_community/retrievers/nimbleway.py:1:1: I001 Import block is un-sorted or un-formatted
"""
Enum representing the search engines supported by Nimble
"""
GOOGLE = "google_search"
GOOGLE_SGE = "google_sge"
BING = "bing_search"
YANDEX = "yandex_search"


class ParsingType(str, Enum):
"""
Enum representing the parsing types supported by Nimble
"""
PLAIN_TEXT = "plain_text"
MARKDOWN = "markdown"
SIMPLIFIED_HTML = "simplified_html"


class NimblewayRetriever(BaseRetriever):
"""Nimbleway Search API retriever.
Allows you to retrieve search results from Google, Bing, and Yandex.
Visit https://www.nimbleway.com/ and sign up to receive an API key and to see more info.

Check failure on line 33 in libs/community/langchain_community/retrievers/nimbleway.py

View workflow job for this annotation

GitHub Actions / cd libs/community / make lint #3.13

Ruff (E501)

langchain_community/retrievers/nimbleway.py:33:89: E501 Line too long (92 > 88)

Check failure on line 33 in libs/community/langchain_community/retrievers/nimbleway.py

View workflow job for this annotation

GitHub Actions / cd libs/community / make lint #3.9

Ruff (E501)

langchain_community/retrievers/nimbleway.py:33:89: E501 Line too long (92 > 88)

Args:
api_key: The API key for Nimbleway.
search_engine: The search engine to use. Default is Google.
"""

api_key: str
num_results: int = 3
search_engine: SearchEngine = SearchEngine.GOOGLE
parse: bool = False
render: bool = True
locale: str = "en"
country: str = "US"
parsing_type: ParsingType = ParsingType.PLAIN_TEXT

def _get_relevant_documents(
self, query: str, *, run_manager: CallbackManagerForRetrieverRun
) -> List[Document]:
request_body = {
'query': query,
'num_results': self.num_results,
'search_engine': self.search_engine.value,
'parse': self.parse,
'render': self.render,
'locale': self.locale,
'country': self.country,
'parsing_type': self.parsing_type
}

response = requests.post("https://searchit-server.crawlit.live/search",
json=request_body,
headers={
'Authorization': f'Basic {self.api_key or os.getenv("NIMBLE_API_KEY")}',

Check failure on line 66 in libs/community/langchain_community/retrievers/nimbleway.py

View workflow job for this annotation

GitHub Actions / cd libs/community / make lint #3.13

Ruff (E501)

langchain_community/retrievers/nimbleway.py:66:89: E501 Line too long (109 > 88)

Check failure on line 66 in libs/community/langchain_community/retrievers/nimbleway.py

View workflow job for this annotation

GitHub Actions / cd libs/community / make lint #3.9

Ruff (E501)

langchain_community/retrievers/nimbleway.py:66:89: E501 Line too long (109 > 88)
'Content-Type': 'application/json'
})
response.raise_for_status()
raw_json_content = response.json()
docs = [Document(page_content=doc.get("page_content", ""),
metadata={
"title": doc.get("metadata", {}).get("title", ""),
"snippet": doc.get("metadata", {}).get("snippet", ""),
"url": doc.get("metadata", {}).get("url", ""),
"position": doc.get("metadata", {}).get("position", -1),
"entity_type": doc.get("metadata", {}).get("entity_type", "")

Check failure on line 77 in libs/community/langchain_community/retrievers/nimbleway.py

View workflow job for this annotation

GitHub Actions / cd libs/community / make lint #3.13

Ruff (E501)

langchain_community/retrievers/nimbleway.py:77:89: E501 Line too long (90 > 88)

Check failure on line 77 in libs/community/langchain_community/retrievers/nimbleway.py

View workflow job for this annotation

GitHub Actions / cd libs/community / make lint #3.9

Ruff (E501)

langchain_community/retrievers/nimbleway.py:77:89: E501 Line too long (90 > 88)
}
)
for doc in raw_json_content.get('body', [])]
return docs
Loading