-
Notifications
You must be signed in to change notification settings - Fork 128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pinecone Document Store - minimal implementation #81
Merged
Merged
Changes from 31 commits
Commits
Show all changes
42 commits
Select commit
Hold shift + click to select a range
515e338
Add PineconeDocumentStore
vrunm 0c2204c
Merge branch 'main' into add_pinecone
anakin87 76ef43b
adapt to Document refactoring
anakin87 b449167
start improving existing tests
anakin87 7b26b2e
try to setup a testing workflow
anakin87 7d0cdd1
fix some format errors
anakin87 bbf5d6c
Merge remote-tracking branch 'origin/main' into add_pinecone
anakin87 31c1e43
adapt to new strucure
anakin87 292c426
Merge branch 'main' into add_pinecone
anakin87 9c39509
adapt pyproject; rm about
anakin87 abb080d
Merge branch 'add_pinecone' of https://github.com/deepset-ai/haystack…
anakin87 fe2168b
fix workflow
anakin87 2d9d215
add hatch-vcs
anakin87 7ee262c
simplification - first draft
anakin87 f5c5028
simplified tests
anakin87 89ca25c
make workflow read the api key
anakin87 542ec80
rm score when filtering docs
anakin87 abf985a
increase wait time
anakin87 f17b6ec
improve api key reading; more tests
anakin87 c63eac2
improvements from PR review
anakin87 f7d048d
test simplification
anakin87 c5e9174
test simplification 2
anakin87 42da9ab
fix
anakin87 a12a31c
std ds tests want valueerror
anakin87 72570ed
put tests together
anakin87 2e690e4
format
anakin87 9437c02
add fallback for namespace in _embedding_retrieval
anakin87 fdfd3e7
try to parallelize tests
anakin87 8e6f0e6
better try
anakin87 c759d10
labeler
anakin87 017cd75
format fix
anakin87 f42c540
Apply suggestions from code review
anakin87 d918414
Revert "Apply suggestions from code review"
anakin87 4d90b8c
improve document conversion
anakin87 3f07182
rm deepcopy
anakin87 7668a4a
missing return
anakin87 c4f5079
fix fmt
anakin87 54646e8
copy metadata
anakin87 9aa1ae8
fmt
anakin87 2ff5adf
mv comment
anakin87 ffe3c73
improve tests
anakin87 091b82a
readmes
anakin87 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,51 @@ | ||
# This workflow comes from https://github.com/ofek/hatch-mypyc | ||
# https://github.com/ofek/hatch-mypyc/blob/5a198c0ba8660494d02716cfc9d79ce4adfb1442/.github/workflows/test.yml | ||
name: Test / pinecone | ||
|
||
on: | ||
schedule: | ||
- cron: "0 0 * * *" | ||
pull_request: | ||
paths: | ||
- "integrations/pinecone/**" | ||
- ".github/workflows/pinecone.yml" | ||
|
||
concurrency: | ||
group: pinecone-${{ github.head_ref }} | ||
cancel-in-progress: true | ||
|
||
env: | ||
PYTHONUNBUFFERED: "1" | ||
FORCE_COLOR: "1" | ||
PINECONE_API_KEY: ${{ secrets.PINECONE_API_KEY }} | ||
|
||
jobs: | ||
run: | ||
name: Python ${{ matrix.python-version }} on ${{ startsWith(matrix.os, 'macos-') && 'macOS' || startsWith(matrix.os, 'windows-') && 'Windows' || 'Linux' }} | ||
runs-on: ${{ matrix.os }} | ||
strategy: | ||
fail-fast: false | ||
matrix: | ||
# Pinecone tests are time expensive, so the matrix is limited to Python 3.9 and 3.10 | ||
os: [ubuntu-latest] | ||
python-version: ["3.9", "3.10"] | ||
|
||
steps: | ||
- uses: actions/checkout@v4 | ||
|
||
- name: Set up Python ${{ matrix.python-version }} | ||
uses: actions/setup-python@v5 | ||
with: | ||
python-version: ${{ matrix.python-version }} | ||
|
||
- name: Install Hatch | ||
run: pip install --upgrade hatch | ||
|
||
- name: Lint | ||
working-directory: integrations/pinecone | ||
if: matrix.python-version == '3.9' | ||
run: hatch run lint:all | ||
|
||
- name: Run tests | ||
working-directory: integrations/pinecone | ||
run: hatch run cov |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
[![test](https://github.com/deepset-ai/document-store/actions/workflows/test.yml/badge.svg)](https://github.com/deepset-ai/document-store/actions/workflows/test.yml) | ||
|
||
# Pinecone Document Store | ||
|
||
This Github repository is a template that can be used to create custom document stores to extend | ||
the new [Haystack](https://github.com/deepset-ai/haystack/) API available under the `preview` | ||
package starting from version 1.15. | ||
|
||
While the new API is still under active development, the new "Store" architecture is quite stable | ||
and we are encouraging early adopters to contribute their custom document stores. | ||
|
||
## Installation | ||
|
||
## Examples | ||
|
||
## License | ||
|
||
`pinecone-haystack` is distributed under the terms of the [Apache-2.0](https://spdx.org/licenses/Apache-2.0.html) license. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,186 @@ | ||
[build-system] | ||
requires = ["hatchling", "hatch-vcs"] | ||
build-backend = "hatchling.build" | ||
|
||
[project] | ||
name = "pinecone_haystack" | ||
dynamic = ["version"] | ||
description = '' | ||
readme = "README.md" | ||
requires-python = ">=3.8" | ||
license = "Apache-2.0" | ||
keywords = [] | ||
authors = [ | ||
{ name = "deepset GmbH", email = "[email protected]" }, | ||
] | ||
classifiers = [ | ||
"Development Status :: 4 - Beta", | ||
"Programming Language :: Python", | ||
"Programming Language :: Python :: 3.8", | ||
"Programming Language :: Python :: 3.9", | ||
"Programming Language :: Python :: 3.10", | ||
"Programming Language :: Python :: 3.11", | ||
"Programming Language :: Python :: Implementation :: CPython", | ||
"Programming Language :: Python :: Implementation :: PyPy", | ||
] | ||
dependencies = [ | ||
"haystack-ai", | ||
"pinecone-client", | ||
] | ||
|
||
[project.urls] | ||
Documentation = "https://github.com/deepset-ai/haystack-core-integrations/tree/main/integrations/pinecone#readme" | ||
Issues = "https://github.com/deepset-ai/haystack-core-integrations/issues" | ||
Source = "https://github.com/deepset-ai/haystack-core-integrations/tree/main/integrations/pinecone" | ||
|
||
[tool.hatch.version] | ||
source = "vcs" | ||
tag-pattern = 'integrations\/pinecone-v(?P<version>.*)' | ||
|
||
[tool.hatch.version.raw-options] | ||
root = "../.." | ||
git_describe_command = 'git describe --tags --match="integrations/pinecone-v[0-9]*"' | ||
|
||
[tool.hatch.envs.default] | ||
dependencies = [ | ||
"coverage[toml]>=6.5", | ||
"pytest", | ||
"pytest-xdist", | ||
] | ||
[tool.hatch.envs.default.scripts] | ||
# Pinecone tests are slow (require HTTP requests), so we run them in parallel | ||
# with pytest-xdist (https://pytest-xdist.readthedocs.io/en/stable/distribution.html) | ||
test = "pytest -n auto --maxprocesses=3 {args:tests}" | ||
test-cov = "coverage run -m pytest -n auto --maxprocesses=3 {args:tests}" | ||
cov-report = [ | ||
"- coverage combine", | ||
"coverage report", | ||
] | ||
cov = [ | ||
"test-cov", | ||
"cov-report", | ||
] | ||
|
||
[[tool.hatch.envs.all.matrix]] | ||
python = ["3.8", "3.9", "3.10", "3.11"] | ||
|
||
[tool.hatch.envs.lint] | ||
detached = true | ||
dependencies = [ | ||
"black>=23.1.0", | ||
"mypy>=1.0.0", | ||
"ruff>=0.0.243", | ||
"numpy", | ||
] | ||
[tool.hatch.envs.lint.scripts] | ||
typing = "mypy --install-types --non-interactive {args:src/pinecone_haystack tests}" | ||
style = [ | ||
"ruff {args:.}", | ||
"black --check --diff {args:.}", | ||
] | ||
fmt = [ | ||
"black {args:.}", | ||
"ruff --fix {args:.}", | ||
"style", | ||
] | ||
all = [ | ||
"style", | ||
"typing", | ||
] | ||
|
||
[tool.hatch.metadata] | ||
allow-direct-references = true | ||
|
||
[tool.black] | ||
target-version = ["py37"] | ||
line-length = 120 | ||
skip-string-normalization = true | ||
|
||
[tool.ruff] | ||
target-version = "py37" | ||
line-length = 120 | ||
select = [ | ||
"A", | ||
"ARG", | ||
"B", | ||
"C", | ||
"DTZ", | ||
"E", | ||
"EM", | ||
"F", | ||
"FBT", | ||
"I", | ||
"ICN", | ||
"ISC", | ||
"N", | ||
"PLC", | ||
"PLE", | ||
"PLR", | ||
"PLW", | ||
"Q", | ||
"RUF", | ||
"S", | ||
"T", | ||
"TID", | ||
"UP", | ||
"W", | ||
"YTT", | ||
] | ||
ignore = [ | ||
# Allow non-abstract empty methods in abstract base classes | ||
"B027", | ||
# Allow boolean positional values in function calls, like `dict.get(... True)` | ||
"FBT003", | ||
# Ignore checks for possible passwords | ||
"S105", "S106", "S107", | ||
# Ignore complexity | ||
"C901", "PLR0911", "PLR0912", "PLR0913", "PLR0915", | ||
] | ||
unfixable = [ | ||
# Don't touch unused imports | ||
"F401", | ||
] | ||
|
||
[tool.ruff.isort] | ||
known-first-party = ["pinecone_haystack"] | ||
|
||
[tool.ruff.flake8-tidy-imports] | ||
ban-relative-imports = "all" | ||
|
||
[tool.ruff.per-file-ignores] | ||
# Tests can use magic values, assertions, and relative imports | ||
"tests/**/*" = ["PLR2004", "S101", "TID252"] | ||
|
||
[tool.coverage.run] | ||
source_pkgs = ["pinecone_haystack", "tests"] | ||
branch = true | ||
parallel = true | ||
omit = [ | ||
"example" | ||
] | ||
|
||
[tool.coverage.paths] | ||
pinecone_haystack = ["src/pinecone_haystack", "*/pinecone_haystack/src/pinecone_haystack"] | ||
tests = ["tests", "*/pinecone_haystack/tests"] | ||
|
||
[tool.coverage.report] | ||
exclude_lines = [ | ||
"no cov", | ||
"if __name__ == .__main__.:", | ||
"if TYPE_CHECKING:", | ||
] | ||
|
||
[tool.pytest.ini_options] | ||
minversion = "6.0" | ||
markers = [ | ||
"unit: unit tests", | ||
"integration: integration tests" | ||
] | ||
|
||
[[tool.mypy.overrides]] | ||
module = [ | ||
"pinecone.*", | ||
"haystack.*", | ||
"pytest.*" | ||
] | ||
ignore_missing_imports = true |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
# SPDX-FileCopyrightText: 2023-present deepset GmbH <[email protected]> | ||
# | ||
# SPDX-License-Identifier: Apache-2.0 | ||
from pinecone_haystack.document_store import PineconeDocumentStore | ||
|
||
__all__ = ["PineconeDocumentStore"] |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pytest-xdist works for parallelization.
I'm limiting the processes to 3, otherwise Pinecone fails with "too many requests"
Including the filters, tests last 5-6 minutes.