Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add SuperComponent #174

Closed
wants to merge 35 commits into from
Closed
Show file tree
Hide file tree
Changes from 31 commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
fbf2008
WIP add PipelineWrapper
mathislucka Jan 16, 2025
6463fc7
initial PipelineWrapper implementation
mathislucka Jan 16, 2025
3942c3b
add example supercomponent
mathislucka Jan 16, 2025
73ab49f
simplify
mathislucka Jan 16, 2025
de3d694
fix
mathislucka Jan 16, 2025
05b3635
WIP extend MultiFileConverter
mathislucka Jan 17, 2025
5f66820
extend converter to handle pre-processing
mathislucka Jan 17, 2025
ef6709c
fix splitter name
mathislucka Jan 17, 2025
b1456ff
add to_pipeline_wrapper_dict()
tstadel Jan 22, 2025
f24cf71
refactor: create SuperComponent abstractions
mathislucka Jan 24, 2025
7a726ec
refactor: make AutoFileConverter more explicit and adapt to SuperComp…
mathislucka Jan 24, 2025
28225d4
chore: remove pipeline wrapper
mathislucka Jan 24, 2025
f269119
fix: add missing utils and test_utils for SuperComponent
mathislucka Jan 24, 2025
17b7d57
format and lint
mathislucka Jan 24, 2025
cdc728b
bug: pipeline.run does not wait for lazy variadic inputs
mathislucka Jan 24, 2025
72528c5
chore: update example notebook
mathislucka Jan 24, 2025
30b9e20
Merge branch 'main' into feat/supercomponents
mathislucka Jan 24, 2025
bfe4393
fix: test
mathislucka Jan 24, 2025
b02d5f4
fix: license header
mathislucka Jan 24, 2025
74a6234
fix: license header
mathislucka Jan 24, 2025
a523354
fix: all missing license headers
mathislucka Jan 24, 2025
5026bf2
feat: implement `DocumentIndexer` super component
abrahamy Jan 27, 2025
36946a2
chore: add license headers
abrahamy Jan 27, 2025
9753107
test: add unit tests for `DocumentIndexer`
abrahamy Jan 27, 2025
965970d
Merge branch 'main' into feat/supercomponents
mathislucka Jan 28, 2025
c528b0d
chore: typing and lint
mathislucka Jan 28, 2025
0017a4d
chore: StrEnum introduce in Python 3.11
mathislucka Jan 30, 2025
fc6a3b1
fix: device can't be mps in CI runners without Apple Silicon
mathislucka Jan 30, 2025
1bb6fc7
fix: force cpu for consistencies between different CI runners
mathislucka Jan 30, 2025
1220304
fix: force cpu for consistency between different CI runners
mathislucka Jan 30, 2025
585a573
update indexer to take model names instead of component
abrahamy Jan 30, 2025
554a489
more tests
abrahamy Jan 30, 2025
b5dd209
fix
abrahamy Jan 30, 2025
ad2b049
remove unused imports
abrahamy Jan 30, 2025
ee16ac7
Merge branch 'main' into feat/supercomponents
abrahamy Jan 31, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added examples/example_files/react_paper.pdf
Binary file not shown.
65 changes: 65 additions & 0 deletions examples/example_files/sample.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
---
type: intro
date: 1.1.2023
---
```bash
pip install farm-haystack
```
## What to build with Haystack

- **Ask questions in natural language** and find granular answers in your own documents.
- Perform **semantic search** and retrieve documents according to meaning not keywords
- Use **off-the-shelf models** or **fine-tune** them to your own domain.
- Use **user feedback** to evaluate, benchmark and continuously improve your live models.
- Leverage existing **knowledge bases** and better handle the long tail of queries that **chatbots** receive.
- **Automate processes** by automatically applying a list of questions to new documents and using the extracted answers.

![Logo](https://raw.githubusercontent.com/deepset-ai/haystack/main/docs/img/logo.png)


## Core Features

- **Latest models**: Utilize all latest transformer based models (e.g. BERT, RoBERTa, MiniLM) for extractive QA, generative QA and document retrieval.
- **Modular**: Multiple choices to fit your tech stack and use case. Pick your favorite database, file converter or modeling framework.
- **Open**: 100% compatible with HuggingFace's model hub. Tight interfaces to other frameworks (e.g. Transformers, FARM, sentence-transformers)
- **Scalable**: Scale to millions of docs via retrievers, production-ready backends like Elasticsearch / FAISS and a fastAPI REST API
- **End-to-End**: All tooling in one place: file conversion, cleaning, splitting, training, eval, inference, labeling ...
- **Developer friendly**: Easy to debug, extend and modify.
- **Customizable**: Fine-tune models to your own domain or implement your custom DocumentStore.
- **Continuous Learning**: Collect new training data via user feedback in production & improve your models continuously

| | |
|-|-|
| :ledger: [Docs](https://haystack.deepset.ai/overview/intro) | Usage, Guides, API documentation ...|
| :beginner: [Quick Demo](https://github.com/deepset-ai/haystack/#quick-demo) | Quickly see what Haystack offers |
| :floppy_disk: [Installation](https://github.com/deepset-ai/haystack/#installation) | How to install Haystack |
| :art: [Key Components](https://github.com/deepset-ai/haystack/#key-components) | Overview of core concepts |
| :mortar_board: [Tutorials](https://github.com/deepset-ai/haystack/#tutorials) | Jupyter/Colab Notebooks & Scripts |
| :eyes: [How to use Haystack](https://github.com/deepset-ai/haystack/#how-to-use-haystack) | Basic explanation of concepts, options and usage |
| :heart: [Contributing](https://github.com/deepset-ai/haystack/#heart-contributing) | We welcome all contributions! |
| :bar_chart: [Benchmarks](https://haystack.deepset.ai/benchmarks/v0.9.0) | Speed & Accuracy of Retriever, Readers and DocumentStores |
| :telescope: [Roadmap](https://haystack.deepset.ai/overview/roadmap) | Public roadmap of Haystack |
| :pray: [Slack](https://haystack.deepset.ai/community/join) | Join our community on Slack |
| :bird: [Twitter](https://twitter.com/deepset_ai) | Follow us on Twitter for news and updates |
| :newspaper: [Blog](https://medium.com/deepset-ai) | Read our articles on Medium |


## Quick Demo

The quickest way to see what Haystack offers is to start a [Docker Compose](https://docs.docker.com/compose/) demo application:

**1. Update/install Docker and Docker Compose, then launch Docker**

```
# apt-get update && apt-get install docker && apt-get install docker-compose
# service docker start
```

**2. Clone Haystack repository**

```
# git clone https://github.com/deepset-ai/haystack.git
```

### 2nd level headline for testing purposes
#### 3rd level headline for testing purposes
4 changes: 4 additions & 0 deletions examples/example_files/sample_1.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
Name,Age
John Doe,27
Jane Smith,37
Mike Johnson,47
Binary file added examples/example_files/sample_docx.docx
Binary file not shown.
Binary file added examples/example_files/sample_pptx.pptx
Binary file not shown.
Loading
Loading