-
Notifications
You must be signed in to change notification settings - Fork 16.1k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
community[minor]: added Browserbase loader (#20478)
- Loading branch information
1 parent
9e69496
commit 6ccecf2
Showing
5 changed files
with
203 additions
and
0 deletions.
There are no files selected for viewing
122 changes: 122 additions & 0 deletions
122
docs/docs/integrations/document_loaders/browserbase.ipynb
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,122 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"# Browserbase\n", | ||
"\n", | ||
"[Browserbase](https://browserbase.com) is a serverless platform for running headless browsers, it offers advanced debugging, session recordings, stealth mode, integrated proxies and captcha solving.\n", | ||
"\n", | ||
"## Installation\n", | ||
"\n", | ||
"- Get an API key from [browserbase.com](https://browserbase.com) and set it in environment variables (`BROWSERBASE_API_KEY`).\n", | ||
"- Install the [Browserbase SDK](http://github.com/browserbase/python-sdk):" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"% pip install browserbase" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## Loading documents" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"You can load webpages into LangChain using `BrowserbaseLoader`. Optionally, you can set `text_content` parameter to convert the pages to text-only representation." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"from langchain_community.document_loaders import BrowserbaseLoader" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"loader = BrowserbaseLoader(\n", | ||
" urls=[\n", | ||
" \"https://example.com\",\n", | ||
" ],\n", | ||
" # Text mode\n", | ||
" text_content=False,\n", | ||
")\n", | ||
"\n", | ||
"docs = loader.load()\n", | ||
"print(docs[0].page_content[:61])" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## Loading images\n", | ||
"\n", | ||
"You can also load screenshots of webpages (as bytes) for multi-modal models.\n", | ||
"\n", | ||
"Full example using GPT-4V:" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"from browserbase import Browserbase\n", | ||
"from browserbase.helpers.gpt4 import GPT4VImage, GPT4VImageDetail\n", | ||
"from langchain_core.messages import HumanMessage\n", | ||
"from langchain_openai import ChatOpenAI\n", | ||
"\n", | ||
"chat = ChatOpenAI(model=\"gpt-4-vision-preview\", max_tokens=256)\n", | ||
"browser = Browserbase()\n", | ||
"\n", | ||
"screenshot = browser.screenshot(\"https://browserbase.com\")\n", | ||
"\n", | ||
"result = chat.invoke(\n", | ||
" [\n", | ||
" HumanMessage(\n", | ||
" content=[\n", | ||
" {\"type\": \"text\", \"text\": \"What color is the logo?\"},\n", | ||
" GPT4VImage(screenshot, GPT4VImageDetail.auto),\n", | ||
" ]\n", | ||
" )\n", | ||
" ]\n", | ||
")\n", | ||
"\n", | ||
"print(result.content)" | ||
] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "Python 3", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"name": "python", | ||
"version": "3.9.6" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 2 | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,28 @@ | ||
# Browserbase | ||
|
||
>[Browserbase](https://browserbase.com) is a serverless platform for running headless browsers, it offers advanced debugging, session recordings, stealth mode, integrated proxies and captcha solving. | ||
## Installation and Setup | ||
|
||
- Get an API key from [browserbase.com](https://browserbase.com) and set it in environment variables (`BROWSERBASE_API_KEY`). | ||
- Install the [Browserbase SDK](http://github.com/browserbase/python-sdk): | ||
|
||
```python | ||
pip install browserbase | ||
``` | ||
|
||
## Document loader | ||
|
||
See a [usage example](/docs/integrations/document_loaders/browserbase). | ||
|
||
```python | ||
from langchain_community.document_loaders import BrowserbaseLoader | ||
``` | ||
|
||
## Multi-Modal | ||
|
||
See a [usage example](/docs/integrations/document_loaders/browserbase). | ||
|
||
```python | ||
from browserbase.helpers.gpt4 import GPT4VImage, GPT4VImageDetail | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
47 changes: 47 additions & 0 deletions
47
libs/community/langchain_community/document_loaders/browserbase.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,47 @@ | ||
from typing import Iterator, List, Optional, Tuple, Union | ||
|
||
from langchain_core.documents import Document | ||
|
||
from langchain_community.document_loaders.base import BaseLoader | ||
|
||
|
||
class BrowserbaseLoader(BaseLoader): | ||
"""Load pre-rendered web pages using a headless browser hosted on Browserbase. | ||
Depends on `browserbase` package. | ||
Get your API key from https://browserbase.com | ||
""" | ||
|
||
def __init__( | ||
self, | ||
urls: Union[List[str], Tuple[str, ...]], | ||
*, | ||
api_key: Optional[str] = None, | ||
text_content: bool = False, | ||
): | ||
self.urls = urls | ||
self.text_content = text_content | ||
|
||
try: | ||
from browserbase import Browserbase | ||
except ImportError: | ||
raise ImportError( | ||
"You must run " | ||
"`pip install --upgrade " | ||
"browserbase` " | ||
"to use the Browserbase loader." | ||
) | ||
|
||
self.browserbase = Browserbase(api_key=api_key) | ||
|
||
def lazy_load(self) -> Iterator[Document]: | ||
"""Load pages from URLs""" | ||
pages = self.browserbase.load_urls(self.urls, self.text_content) | ||
|
||
for i, page in enumerate(pages): | ||
yield Document( | ||
page_content=page, | ||
metadata={ | ||
"url": self.urls[i], | ||
}, | ||
) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters