Skip to content

feat(document-search): add document processing with unstructured #41

feat(document-search): add document processing with unstructured

feat(document-search): add document processing with unstructured #41

GitHub Actions / JUnit Test Report failed Sep 19, 2024 in 0s

59 tests run, 54 passed, 4 skipped, 1 failed.

Annotations

Check failure on line 15 in packages/ragbits-document-search/tests/unit/test_document_search.py

See this annotation in the file changed.

@github-actions github-actions / JUnit Test Report

test_document_search

ValueError: UNSTRUCTURED_API_KEY environment variable is not set
Raw output
async def test_document_search():
        embeddings_mock = AsyncMock()
        embeddings_mock.embed_text.return_value = [[0.1, 0.1]]
    
        document_search = DocumentSearch(embedder=embeddings_mock, vector_store=InMemoryVectorStore())
    
>       await document_search.ingest_document(
            DocumentMeta.create_text_document_from_literal("Name of Peppa's brother is George")
        )

packages/ragbits-document-search/tests/unit/test_document_search.py:15: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
packages/ragbits-document-search/src/ragbits/document_search/_main.py:74: in ingest_document
    elements = await document_processor.process(document)
packages/ragbits-document-search/src/ragbits/document_search/ingestion/document_processor.py:83: in process
    return await provider.process(document_meta)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <ragbits.document_search.ingestion.providers.unstructured.UnstructuredProvider object at 0x7f2e5b2161d0>
document_meta = DocumentMeta(document_type=<DocumentType.TXT: 'txt'>, source=LocalFileSource(source_type='local_file', path=PosixPath('/tmp/tmpheq9t2rl')))

    async def process(self, document_meta: DocumentMeta) -> list[Element]:
        """Process the document using the Unstructured API.
    
        Args:
            document_meta: The document to process.
    
        Returns:
            The list of elements extracted from the document.
    
        Raises:
            ValueError: If the UNSTRUCTURED_API_KEY or UNSTRUCTURED_API_URL environment variables are not set.
            DocumentTypeNotSupportedError: If the document type is not supported.
    
        """
        self.validate_document_type(document_meta.document_type)
        if (api_key := os.getenv(UNSTRUCTURED_API_KEY_ENV)) is None:
>           raise ValueError(f"{UNSTRUCTURED_API_KEY_ENV} environment variable is not set")
E           ValueError: UNSTRUCTURED_API_KEY environment variable is not set

packages/ragbits-document-search/src/ragbits/document_search/ingestion/providers/unstructured.py:75: ValueError