Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(community): Extend DocxLoader to load .doc files #7421

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

Fibii
Copy link

@Fibii Fibii commented Dec 23, 2024

Extends the existing DocxLoader to handle loading .doc files.
Uses word-extractor as a peer dependency for loading .doc

@dosubot dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Dec 23, 2024
Copy link

vercel bot commented Dec 23, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
langchainjs-docs ✅ Ready (Inspect) Visit Preview Dec 26, 2024 3:30am
1 Skipped Deployment
Name Status Preview Comments Updated (UTC)
langchainjs-api-refs ⬜️ Ignored (Inspect) Dec 26, 2024 3:30am

@dosubot dosubot bot added the auto:improvement Medium size change to existing code to handle new use-cases label Dec 23, 2024
Copy link
Collaborator

@jacoblee93 jacoblee93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! Some small nits and questions

Can you please also run yarn format from root? The .mdx files get formatted as well

} catch (e) {
console.error(e);
throw new Error(
"Failed to load word-extractor. Please install it with eg. `npm install word-extractor`."
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No way to just use mammoth is there?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, mammoth can only handle .docx

@jacoblee93 jacoblee93 added question Further information is requested close PRs that need one or two touch-ups to be ready labels Dec 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto:improvement Medium size change to existing code to handle new use-cases close PRs that need one or two touch-ups to be ready question Further information is requested size:L This PR changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants