Skip to content

Error Loading Word Documents - File is not a zip file #8212

Answered by dosubot bot
hodgesz asked this question in Q&A
Discussion options

You must be logged in to vote

Answer generated by a 🤖

Answer

From your description, it appears that you're encountering an error when trying to load Word documents using the UnstructuredWordDocumentLoader in LangChain. The error message suggests that the issue is not directly related to the loading of Word documents, but rather with the NLTK library when it tries to find a package and fails to open it as a zip file. This could be due to a corrupted NLTK data package.

To address this issue, I suggest manually downloading the NLTK packages. You can do this by running the following code:

import nltk
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')

These packages ('punkt' and 'averaged_perceptron_tagger'…

Replies: 1 comment 1 reply

Comment options

You must be logged in to vote
1 reply
@hodgesz
Comment options

Answer selected by hodgesz
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
1 participant