Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(document-search): determine document type automatically #80

Closed
mhordynski opened this issue Oct 10, 2024 · 0 comments · Fixed by #99
Closed

feat(document-search): determine document type automatically #80

mhordynski opened this issue Oct 10, 2024 · 0 comments · Fixed by #99
Assignees
Labels
document search Changes to the document search package feature New feature or request
Milestone

Comments

@mhordynski
Copy link
Member

mhordynski commented Oct 10, 2024

Feature description

DocumentSearch should be capable to automatically determine document type based on the provided file.

This may be implemented with allowing additional type Source in async def ingest_document(self, document: Union[DocumentMeta, Document], document_processor: Optional[BaseProvider] = None) -> None: method inside DocumentSearch - then fetching source and determining document type

Motivation

Right now the main entrypoint to document ingestion pipeline requires DocumentMeta instance, which has required field document_type. It forces users to either know document type beforhand, or to detect it outsite of ragbits.

Additional context

No response

@mhordynski mhordynski added the feature New feature or request label Oct 10, 2024
@mhordynski mhordynski moved this to Backlog in ragbits Oct 10, 2024
@mhordynski mhordynski moved this from Backlog to Ready in ragbits Oct 10, 2024
@mhordynski mhordynski added this to the Ragbits 0.3 milestone Oct 10, 2024
@micpst micpst added the document search Changes to the document search package label Oct 11, 2024
@micpst micpst linked a pull request Oct 16, 2024 that will close this issue
@micpst micpst closed this as completed Oct 16, 2024
@github-project-automation github-project-automation bot moved this from In review to Done in ragbits Oct 16, 2024
@mhordynski mhordynski modified the milestones: Ragbits 0.3, Ragbits 0.2 Oct 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
document search Changes to the document search package feature New feature or request
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

3 participants