Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

langchain: Add PebbloRetrievalQA chain with Identity & Semantic Enforcement support #20641

Merged
merged 5 commits into from
May 15, 2024

Conversation

Raj725
Copy link
Contributor

@Raj725 Raj725 commented Apr 19, 2024

@dosubot dosubot bot added the size:XXL This PR changes 1000+ lines, ignoring generated files. label Apr 19, 2024
Copy link

vercel bot commented Apr 19, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment
Name Status Preview Comments Updated (UTC)
langchain ⬜️ Ignored (Inspect) Visit Preview May 15, 2024 1:08pm

@dosubot dosubot bot added 🔌: weaviate Primarily related to Weaviate vector store integration 🤖:docs Changes to documentation and examples, like .md, .rst, .ipynb files. Changes to the docs/ folder 🤖:enhancement A large net-new component, integration, or chain. Use sparingly. The largest features labels Apr 19, 2024
@Raj725 Raj725 force-pushed the pebblo_retrieval_chain_v1 branch from 5c8f120 to 190d2db Compare April 19, 2024 05:00
@Raj725 Raj725 force-pushed the pebblo_retrieval_chain_v1 branch from 085e5b5 to 4118a2c Compare April 22, 2024 16:10
@dosubot dosubot bot added size:XL This PR changes 500-999 lines, ignoring generated files. and removed size:XXL This PR changes 1000+ lines, ignoring generated files. labels Apr 22, 2024
@Raj725
Copy link
Contributor Author

Raj725 commented Apr 22, 2024

cc: @eyurtsev

@eyurtsev eyurtsev self-assigned this Apr 23, 2024
@eyurtsev
Copy link
Collaborator

Hi @Raj725, could you move this into community?

The name of the retriever isn't tied conceptually to what it does, but to Pebblo. Is it provider specific or is it meant to be generic?

@Raj725
Copy link
Contributor Author

Raj725 commented Apr 24, 2024

Hi @Raj725, could you move this into community?

The name of the retriever isn't tied conceptually to what it does, but to Pebblo. Is it provider specific or is it meant to be generic?

@eyurtsev
I tried to move it to the community library, but now it's not allowing subclassing Chain. It's failing in linting.
Importing from the langchain library is not allowed

snippet from libs/community/scripts/lint_imports.sh :

# Initialize a variable to keep track of errors
errors=0

# make sure not importing from langchain or langchain_experimental
git --no-pager grep '^from langchain\.' . && errors=$((errors+1))
git --no-pager grep '^from langchain_experimental\.' . && errors=$((errors+1))

@Raj725
Copy link
Contributor Author

Raj725 commented Apr 24, 2024

Hi @Raj725, could you move this into community?
The name of the retriever isn't tied conceptually to what it does, but to Pebblo. Is it provider specific or is it meant to be generic?

@eyurtsev I tried to move it to the community library, but now it's not allowing subclassing Chain. It's failing in linting. Importing from the langchain library is not allowed

snippet from libs/community/scripts/lint_imports.sh :

# Initialize a variable to keep track of errors
errors=0

# make sure not importing from langchain or langchain_experimental
git --no-pager grep '^from langchain\.' . && errors=$((errors+1))
git --no-pager grep '^from langchain_experimental\.' . && errors=$((errors+1))

@eyurtsev We want to extend Chain, but "community" doesn't allow imports from "langchain". Can you please suggest an alternative approach?

@eyurtsev
Copy link
Collaborator

eyurtsev commented Apr 24, 2024

@Raj725 we're in the process of inverting the dependency between langchain and langchain-community (~2 weeks timeline)


Why do you need to use Chain rather than using LCEL for composition?

Chain is a legacy object

@eyurtsev
Copy link
Collaborator

@Raj725 any chance you could provide a bit of context for what this code is supposed to do ? Why should it be merged it into langchain/langchain-community?

@Raj725
Copy link
Contributor Author

Raj725 commented Apr 25, 2024

@Raj725 any chance you could provide a bit of context for what this code is supposed to do ? Why should it be merged it into langchain/langchain-community?

Context:
Sure. We are introducing PebbloRetrievalQA chain as a retrieval counterpart to Pebblo Safe DocumentLoader that @hwchase17 helped to merge into langchain-community sometime back.

For more context, Pebblo is an opensource project for Data Visibility & Governance for LangChain apps(LinkedIn Post by Langchain).

Both Pebblo Safe DocumentLoader and PebbloRetrievalQA work in tandem to add metadata tags on the loader with corresponding filters/enforcement on the retrieval side. Given that the Pebblo SafeLoader is already in langchain-community, our common customer base will be better served with the PebbloRetrievalQA chain, also in the same pkg.

References:

  1. https://medium.com/@sridhar_ramaswamy/identity-enabled-rag-using-pebblo-4d608ed28c31
  2. https://medium.com/@sridhar_ramaswamy/introducing-pebblo-data-visibility-governance-for-gen-ai-apps-086ca8a62d10

@Raj725
Copy link
Contributor Author

Raj725 commented Apr 25, 2024

@Raj725 we're in the process of inverting the dependency between langchain and langchain-community (~2 weeks timeline)

@eyurtsev Thanks for sharing the timeline. We can wait for dependency inversion.

Why do you need to use Chain rather than using LCEL for composition?

Chain is a legacy object

We would like to offer a governance and security solution for our existing customers who are using legacy chain-based solutions. We are also exploring LCEL.

@Raj725 Raj725 force-pushed the pebblo_retrieval_chain_v1 branch 4 times, most recently from b2ba9b3 to 54fff06 Compare May 6, 2024 06:16
@Raj725
Copy link
Contributor Author

Raj725 commented May 6, 2024

@Raj725 we're in the process of inverting the dependency between langchain and langchain-community (~2 weeks timeline)

@eyurtsev It looks like dependency inversion is complete; the lint error is gone now. However, it's still throwing the following error in the import check.

Can you please guide me on the right way to import "langchain" modules into "langchain-community"?

tests/unit_tests/chains/test_pebblo_retrieval.py:16: in <module>
    from langchain_community.chains import PebbloRetrievalQA
langchain_community/chains/__init__.py:7: in <module>
    from langchain_community.chains.pebblo_retrieval.base import PebbloRetrievalQA
langchain_community/chains/pebblo_retrieval/base.py:9: in <module>
    from langchain.chains.base import Chain
E   ModuleNotFoundError: No module named 'langchain'

@eyurtsev
Copy link
Collaborator

eyurtsev commented May 6, 2024

The dependency inversion will only happen for the 0.2 release of langchain -- still a few weeks away

@Raj725 Raj725 force-pushed the pebblo_retrieval_chain_v1 branch 2 times, most recently from 324c3ce to 5974b99 Compare May 9, 2024 06:37
@Raj725
Copy link
Contributor Author

Raj725 commented May 10, 2024

@eyurtsev Could you please review this PR? All checks are passing now.

@Raj725 Raj725 force-pushed the pebblo_retrieval_chain_v1 branch 3 times, most recently from 3809e12 to b48f1ee Compare May 10, 2024 18:10
@Raj725 Raj725 force-pushed the pebblo_retrieval_chain_v1 branch from b48f1ee to 2860b9b Compare May 14, 2024 16:48
@@ -69,6 +69,7 @@
"RetrievalQAWithSourcesChain": "langchain.chains.qa_with_sources.retrieval",
"VectorDBQAWithSourcesChain": "langchain.chains.qa_with_sources.vector_db",
"create_retrieval_chain": "langchain.chains.retrieval",
"PebbloRetrievalQA": "langchain.chains.pebblo_retrieval.base",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"PebbloRetrievalQA": "langchain.chains.pebblo_retrieval.base",

libs/community/langchain_community/chains/__init__.py Outdated Show resolved Hide resolved
if TYPE_CHECKING:
from langchain_community.chains.pebblo_retrieval.base import PebbloRetrievalQA

__all__ = ["PebbloRetrievalQA"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should use a hard-coded list of string literals otherwise some dev tools won't be able to pick it up

@dosubot dosubot bot added the lgtm PR looks good. Use to confirm that a PR is ready for merging. label May 15, 2024
@eyurtsev eyurtsev enabled auto-merge (squash) May 15, 2024 13:07
@eyurtsev eyurtsev merged commit 54e0032 into langchain-ai:master May 15, 2024
42 checks passed
hinthornw pushed a commit that referenced this pull request Jun 20, 2024
…c Enforcement support (#20641)

- **Description:** PebbloRetrievalQA chain introduces identity
enforcement using vector-db metadata filtering
- **Dependencies:** None
- **Issue:** None
- **Documentation:** Adding documentation for PebbloRetrievalQA chain in
a separate PR(#20746)
- **Unit tests:** New unit-tests added

---------

Co-authored-by: Eugene Yurtsev <[email protected]>
@Raj725 Raj725 deleted the pebblo_retrieval_chain_v1 branch June 26, 2024 07:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🤖:docs Changes to documentation and examples, like .md, .rst, .ipynb files. Changes to the docs/ folder 🤖:enhancement A large net-new component, integration, or chain. Use sparingly. The largest features lgtm PR looks good. Use to confirm that a PR is ready for merging. size:XL This PR changes 500-999 lines, ignoring generated files. 🔌: weaviate Primarily related to Weaviate vector store integration
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants