-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
langchain[minor]: Firecrawl Document Loader #5180
langchain[minor]: Firecrawl Document Loader #5180
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
@@ -0,0 +1,13 @@ | |||
import { FireCrawlLoader } from "langchain/document_loaders/web/firecrawl"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey there! 👋 I've reviewed the code and noticed that the added lines explicitly access an environment variable using process.env
. I've flagged this for your review to ensure it aligns with our best practices for handling environment variables. Let me know if you need further assistance with this.
@@ -1314,6 +1315,7 @@ | |||
"@gomomento/sdk-web": "^1.51.1", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey there! I noticed that this PR adds a new hard dependency with the addition of "@mendable/firecrawl-js". I've flagged this for your review as it's important to ensure the impact of this change on the project's dependencies. Keep up the great work!
@@ -0,0 +1,34 @@ | |||
/* eslint-disable no-process-env */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey there! I noticed that the recent changes in the FireCrawlLoader tests are accessing environment variables directly. I've flagged this for your review to ensure proper handling of sensitive information. Let me know if you need any further assistance with this.
@@ -0,0 +1,96 @@ | |||
import FirecrawlApp from "@mendable/firecrawl-js"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey there! I noticed that the recent code changes explicitly access an environment variable using getEnvironmentVariable
. I've flagged this for your review to ensure it aligns with the intended functionality. Keep up the great work!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple nits, besides that it looks pretty good!
|
||
import IntegrationInstallTooltip from "@mdx_components/integration_install_tooltip.mdx"; | ||
|
||
<IntegrationInstallTooltip></IntegrationInstallTooltip> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for an IntegrationInstallTooltip
for non LC deps. You only this for langchain package installs
|
||
## Setup | ||
|
||
You'll need to sign up and retrieve your [FireCrawl API key](https://firecrawl.dev). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You'll need to sign up and retrieve your [FireCrawl API key](https://firecrawl.dev). | |
You'll need to sign up and retrieve your [FireCrawl API key](https://firecrawl.dev). |
I would add some mention of the 100 free credits. I could easily see a user reading up until the API key requirement part and getting discouraged before they're even able to try it out
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
100%
apiKey?: string; | ||
mode?: "crawl" | "scrape"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add JSDoc here documenting what the default values are for both of these
* @returns An array of Documents representing the retrieved data. | ||
* @throws An error if the data could not be loaded. | ||
*/ | ||
public async load(): Promise<Document[]> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prefer DocumentInterface
public async load(): Promise<Document[]> { | |
public async load(): Promise<DocumentInterface[]> { |
Co-authored-by: Brace Sproul <[email protected]>
Co-authored-by: Brace Sproul <[email protected]>
@bracesproul Thank you. Just fixed all the suggestions. Let me know if you spot any other ones. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, thanks for adding this!!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, the amount of changed lines in the yarn.lock
is concerning. I'm going to push up a PR in a second resolving
Resolved
…nsc/firecrawl-loader
Thank you! Note that we'll move this to |
* Nick: init * Update firecrawl.ts * Nick: * Nick: * Update package.json * Nick: fixes docs * Update yarn.lock * Update examples/src/document_loaders/firecrawl.ts Co-authored-by: Brace Sproul <[email protected]> * Update langchain/src/document_loaders/web/firecrawl.ts Co-authored-by: Brace Sproul <[email protected]> * Nick: fixes * Update yarn.lock * Fix yarn.lock * lint & format * Update firecrawl.ts * Add entrypoint --------- Co-authored-by: Brace Sproul <[email protected]> Co-authored-by: Jacob Lee <[email protected]>
Added the FireCrawl document loader. Firecrawl crawls and convert any website into LLM-ready data. It crawls all accessible subpages and give you clean markdown for each.
Similar to langchain-ai/langchain#20364 but JS/TS :)
Thank you!