Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pinecone Document Store - minimal implementation #81

Merged
merged 42 commits into from
Dec 22, 2023
Merged
Show file tree
Hide file tree
Changes from 31 commits
Commits
Show all changes
42 commits
Select commit Hold shift + click to select a range
515e338
Add PineconeDocumentStore
vrunm Oct 29, 2023
0c2204c
Merge branch 'main' into add_pinecone
anakin87 Nov 15, 2023
76ef43b
adapt to Document refactoring
anakin87 Nov 15, 2023
b449167
start improving existing tests
anakin87 Nov 15, 2023
7b26b2e
try to setup a testing workflow
anakin87 Nov 15, 2023
7d0cdd1
fix some format errors
anakin87 Nov 16, 2023
bbf5d6c
Merge remote-tracking branch 'origin/main' into add_pinecone
anakin87 Dec 6, 2023
31c1e43
adapt to new strucure
anakin87 Dec 6, 2023
292c426
Merge branch 'main' into add_pinecone
anakin87 Dec 18, 2023
9c39509
adapt pyproject; rm about
anakin87 Dec 18, 2023
abb080d
Merge branch 'add_pinecone' of https://github.com/deepset-ai/haystack…
anakin87 Dec 18, 2023
fe2168b
fix workflow
anakin87 Dec 18, 2023
2d9d215
add hatch-vcs
anakin87 Dec 18, 2023
7ee262c
simplification - first draft
anakin87 Dec 19, 2023
f5c5028
simplified tests
anakin87 Dec 19, 2023
89ca25c
make workflow read the api key
anakin87 Dec 19, 2023
542ec80
rm score when filtering docs
anakin87 Dec 19, 2023
abf985a
increase wait time
anakin87 Dec 19, 2023
f17b6ec
improve api key reading; more tests
anakin87 Dec 20, 2023
c63eac2
improvements from PR review
anakin87 Dec 21, 2023
f7d048d
test simplification
anakin87 Dec 21, 2023
c5e9174
test simplification 2
anakin87 Dec 21, 2023
42da9ab
fix
anakin87 Dec 21, 2023
a12a31c
std ds tests want valueerror
anakin87 Dec 21, 2023
72570ed
put tests together
anakin87 Dec 22, 2023
2e690e4
format
anakin87 Dec 22, 2023
9437c02
add fallback for namespace in _embedding_retrieval
anakin87 Dec 22, 2023
fdfd3e7
try to parallelize tests
anakin87 Dec 22, 2023
8e6f0e6
better try
anakin87 Dec 22, 2023
c759d10
labeler
anakin87 Dec 22, 2023
017cd75
format fix
anakin87 Dec 22, 2023
f42c540
Apply suggestions from code review
anakin87 Dec 22, 2023
d918414
Revert "Apply suggestions from code review"
anakin87 Dec 22, 2023
4d90b8c
improve document conversion
anakin87 Dec 22, 2023
3f07182
rm deepcopy
anakin87 Dec 22, 2023
7668a4a
missing return
anakin87 Dec 22, 2023
c4f5079
fix fmt
anakin87 Dec 22, 2023
54646e8
copy metadata
anakin87 Dec 22, 2023
9aa1ae8
fmt
anakin87 Dec 22, 2023
2ff5adf
mv comment
anakin87 Dec 22, 2023
ffe3c73
improve tests
anakin87 Dec 22, 2023
091b82a
readmes
anakin87 Dec 22, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .github/labeler.yml
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,11 @@ integration:qdrant:
- any-glob-to-any-file: "integrations/qdrant/**/*"
- any-glob-to-any-file: ".github/workflows/qdrant.yml"

integration:pinecone:
- changed-files:
- any-glob-to-any-file: "integrations/pinecone/**/*"
- any-glob-to-any-file: ".github/workflows/pinecone.yml"

integration:unstructured-fileconverter:
- changed-files:
- any-glob-to-any-file: "integrations/unstructured/fileconverter/**/*"
Expand Down
51 changes: 51 additions & 0 deletions .github/workflows/pinecone.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
# This workflow comes from https://github.com/ofek/hatch-mypyc
# https://github.com/ofek/hatch-mypyc/blob/5a198c0ba8660494d02716cfc9d79ce4adfb1442/.github/workflows/test.yml
name: Test / pinecone

on:
schedule:
- cron: "0 0 * * *"
pull_request:
paths:
- "integrations/pinecone/**"
- ".github/workflows/pinecone.yml"

concurrency:
group: pinecone-${{ github.head_ref }}
cancel-in-progress: true

env:
PYTHONUNBUFFERED: "1"
FORCE_COLOR: "1"
PINECONE_API_KEY: ${{ secrets.PINECONE_API_KEY }}

jobs:
run:
name: Python ${{ matrix.python-version }} on ${{ startsWith(matrix.os, 'macos-') && 'macOS' || startsWith(matrix.os, 'windows-') && 'Windows' || 'Linux' }}
runs-on: ${{ matrix.os }}
strategy:
fail-fast: false
matrix:
# Pinecone tests are time expensive, so the matrix is limited to Python 3.9 and 3.10
os: [ubuntu-latest]
python-version: ["3.9", "3.10"]

steps:
- uses: actions/checkout@v4

- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}

- name: Install Hatch
run: pip install --upgrade hatch

- name: Lint
working-directory: integrations/pinecone
if: matrix.python-version == '3.9'
run: hatch run lint:all

- name: Run tests
working-directory: integrations/pinecone
run: hatch run cov
18 changes: 18 additions & 0 deletions integrations/pinecone/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
[![test](https://github.com/deepset-ai/document-store/actions/workflows/test.yml/badge.svg)](https://github.com/deepset-ai/document-store/actions/workflows/test.yml)

# Pinecone Document Store

This Github repository is a template that can be used to create custom document stores to extend
the new [Haystack](https://github.com/deepset-ai/haystack/) API available under the `preview`
package starting from version 1.15.

While the new API is still under active development, the new "Store" architecture is quite stable
and we are encouraging early adopters to contribute their custom document stores.

## Installation

## Examples

## License

`pinecone-haystack` is distributed under the terms of the [Apache-2.0](https://spdx.org/licenses/Apache-2.0.html) license.
186 changes: 186 additions & 0 deletions integrations/pinecone/pyproject.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,186 @@
[build-system]
requires = ["hatchling", "hatch-vcs"]
build-backend = "hatchling.build"

[project]
name = "pinecone_haystack"
dynamic = ["version"]
description = ''
readme = "README.md"
requires-python = ">=3.8"
license = "Apache-2.0"
keywords = []
authors = [
{ name = "deepset GmbH", email = "[email protected]" },
]
classifiers = [
"Development Status :: 4 - Beta",
"Programming Language :: Python",
"Programming Language :: Python :: 3.8",
"Programming Language :: Python :: 3.9",
"Programming Language :: Python :: 3.10",
"Programming Language :: Python :: 3.11",
"Programming Language :: Python :: Implementation :: CPython",
"Programming Language :: Python :: Implementation :: PyPy",
]
dependencies = [
"haystack-ai",
"pinecone-client",
]

[project.urls]
Documentation = "https://github.com/deepset-ai/haystack-core-integrations/tree/main/integrations/pinecone#readme"
Issues = "https://github.com/deepset-ai/haystack-core-integrations/issues"
Source = "https://github.com/deepset-ai/haystack-core-integrations/tree/main/integrations/pinecone"

[tool.hatch.version]
source = "vcs"
tag-pattern = 'integrations\/pinecone-v(?P<version>.*)'

[tool.hatch.version.raw-options]
root = "../.."
git_describe_command = 'git describe --tags --match="integrations/pinecone-v[0-9]*"'

[tool.hatch.envs.default]
dependencies = [
"coverage[toml]>=6.5",
"pytest",
"pytest-xdist",
]
[tool.hatch.envs.default.scripts]
# Pinecone tests are slow (require HTTP requests), so we run them in parallel
# with pytest-xdist (https://pytest-xdist.readthedocs.io/en/stable/distribution.html)
test = "pytest -n auto --maxprocesses=3 {args:tests}"
Comment on lines +51 to +53
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pytest-xdist works for parallelization.
I'm limiting the processes to 3, otherwise Pinecone fails with "too many requests"
Including the filters, tests last 5-6 minutes.

test-cov = "coverage run -m pytest -n auto --maxprocesses=3 {args:tests}"
cov-report = [
"- coverage combine",
"coverage report",
]
cov = [
"test-cov",
"cov-report",
]

[[tool.hatch.envs.all.matrix]]
python = ["3.8", "3.9", "3.10", "3.11"]

[tool.hatch.envs.lint]
detached = true
dependencies = [
"black>=23.1.0",
"mypy>=1.0.0",
"ruff>=0.0.243",
"numpy",
]
[tool.hatch.envs.lint.scripts]
typing = "mypy --install-types --non-interactive {args:src/pinecone_haystack tests}"
style = [
"ruff {args:.}",
"black --check --diff {args:.}",
]
fmt = [
"black {args:.}",
"ruff --fix {args:.}",
"style",
]
all = [
"style",
"typing",
]

[tool.hatch.metadata]
allow-direct-references = true

[tool.black]
target-version = ["py37"]
line-length = 120
skip-string-normalization = true

[tool.ruff]
target-version = "py37"
line-length = 120
select = [
"A",
"ARG",
"B",
"C",
"DTZ",
"E",
"EM",
"F",
"FBT",
"I",
"ICN",
"ISC",
"N",
"PLC",
"PLE",
"PLR",
"PLW",
"Q",
"RUF",
"S",
"T",
"TID",
"UP",
"W",
"YTT",
]
ignore = [
# Allow non-abstract empty methods in abstract base classes
"B027",
# Allow boolean positional values in function calls, like `dict.get(... True)`
"FBT003",
# Ignore checks for possible passwords
"S105", "S106", "S107",
# Ignore complexity
"C901", "PLR0911", "PLR0912", "PLR0913", "PLR0915",
]
unfixable = [
# Don't touch unused imports
"F401",
]

[tool.ruff.isort]
known-first-party = ["pinecone_haystack"]

[tool.ruff.flake8-tidy-imports]
ban-relative-imports = "all"

[tool.ruff.per-file-ignores]
# Tests can use magic values, assertions, and relative imports
"tests/**/*" = ["PLR2004", "S101", "TID252"]

[tool.coverage.run]
source_pkgs = ["pinecone_haystack", "tests"]
branch = true
parallel = true
omit = [
"example"
]

[tool.coverage.paths]
pinecone_haystack = ["src/pinecone_haystack", "*/pinecone_haystack/src/pinecone_haystack"]
tests = ["tests", "*/pinecone_haystack/tests"]

[tool.coverage.report]
exclude_lines = [
"no cov",
"if __name__ == .__main__.:",
"if TYPE_CHECKING:",
]

[tool.pytest.ini_options]
minversion = "6.0"
markers = [
"unit: unit tests",
"integration: integration tests"
]

[[tool.mypy.overrides]]
module = [
"pinecone.*",
"haystack.*",
"pytest.*"
]
ignore_missing_imports = true
6 changes: 6 additions & 0 deletions integrations/pinecone/src/pinecone_haystack/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
# SPDX-FileCopyrightText: 2023-present deepset GmbH <[email protected]>
#
# SPDX-License-Identifier: Apache-2.0
from pinecone_haystack.document_store import PineconeDocumentStore

__all__ = ["PineconeDocumentStore"]
Loading