-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/content safety runnable #13
base: master
Are you sure you want to change the base?
Conversation
|
||
moderated_chain = moderate | prompt | model | ||
|
||
moderated_chain.invoke({"input": "Hey, How are you?"}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be helpful to show the full example here with the code you would have to write reacting to harmful content
error: bool = False | ||
"""Whether or not to error if bad content was found.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would default to True. If a user was going through the trouble of adding this to their chain, it feels to me it's because they want to react to harmful content, i.e. do something if an exception is raised
result = response.categories_analysis | ||
output = self._detect_harmful_content(text, result) | ||
|
||
return {self.input_key: output, self.output_key: output} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does returning the input_key: with the output do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes it so the input of the next step of the chain is the filtered content. It makes it so that if the user doesn't want an error to be thrown, the model will not receive harmful content but instead the message ''The input has breached Azure's content safety policy'. Without this the input at the next step would still be the original harmful content.
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"response.content" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be great to show how a user would write code to handle an exception from input violating the content policy
…ain-ai#28269) - **Description:** Corrected the parameter name in the HuggingFaceEmbeddings documentation under integrations/text_embedding/ from model to model_name to align with the actual code usage in the langchain_huggingface package. - **Issue:** Fixes langchain-ai#28231 - **Dependencies:** None
…ssages` (langchain-ai#28267) We have a test [test_structured_few_shot_examples](https://github.com/langchain-ai/langchain/blob/ad4333ca032033097c663dfe818c5c892c368bd6/libs/standard-tests/langchain_tests/integration_tests/chat_models.py#L546) in standard integration tests that implements a version of tool-calling few shot examples that works with ~all tested providers. The formulation supported by ~all providers is: `human message, tool call, tool message, AI reponse`. Here we update `langchain_core.utils.function_calling.tool_example_to_messages` to support this formulation. The `tool_example_to_messages` util is undocumented outside of our API reference. IMO, if we are testing that this function works across all providers, it can be helpful to feature it in our guides. The structured few-shot examples we document at the moment require users to implement this function and can be simplified.
langchain-ai#28296) **Description:** Currently, the docstring for `LanceDB.__init__()` provides the default value for `mode`, but not the list of valid values. This PR adds that list to the docstring. **Issue:** N/A **Dependencies:** N/A **Twitter handle:** `@metadaddy` [Leaving as a reminder: If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.]
…chain-ai#28304) Link migration guide first.
- fix import statement for qdrant - issue: langchain-ai#28012 langchain-ai#28012
pydantic 2.10 compat for langchain-core
pydantic compat 2.10 for langchain
fix small GOOGLE_API_KEY markdown formatting typo
Adds deprecation notices for Neo4j components moving to the `langchain_neo4j` partner package. - Adds deprecation warnings to all Neo4j-related classes and functions that have been migrated to the new `langchain_neo4j` partner package - Updates documentation to reference the new `langchain_neo4j` package instead of `langchain_community`
Library name was updated after langchain-ai#27879 branched off master.
conjuction -> conjunction
…n-ai#28321) **Description**: Adding Citation in response payload of ChatPerplexity **Issue**: langchain-ai#28108
) - **Description:** Lines of code must be indented beneath `.. code-block::` for proper formatting. https://devguide.python.org/documentation/markup/#showing-code-examples - **Issue:** The example code block on the `create_sql_agent` document page is not properly rendered. https://python.langchain.com/api_reference/community/agent_toolkits/langchain_community.agent_toolkits.sql.base.create_sql_agent.html#langchain_community.agent_toolkits.sql.base.create_sql_agent <img width="933" alt="image" src="https://github.com/user-attachments/assets/d764bcad-e412-408b-ab0b-9a78a11188ee">
…de assertion (langchain-ai#28712) Minor update for error message that should never be triggered
…gchain-ai#28505) ~Note that this PR is now Draft, so I didn't add change to `aindex` function and didn't add test codes for my change. After we have an agreement on the direction, I will add commits.~ `batch_size` is very difficult to decide because setting a large number like >10000 will impact VectorDB and RecordManager, while setting a small number will delete records unnecessarily, leading to redundant work, as the `IMPORTANT` section says. On the other hand, we can't use `full` because the loader returns just a subset of the dataset in our use case. I guess many people are in the same situation as us. So, as one of the possible solutions for it, I would like to introduce a new argument, `scoped_full_cleanup`. This argument will be valid only when `claneup` is Full. If True, Full cleanup deletes all documents that haven't been updated AND that are associated with source ids that were seen during indexing. Default is False. This change keeps backward compatibility. --------- Co-authored-by: Eugene Yurtsev <[email protected]> Co-authored-by: Eugene Yurtsev <[email protected]>
…rstore (langchain-ai#28103) The delete methods in the VectorStore and DocumentIndex interfaces return a status indicating the result. Therefore, we can assume that their implementations don't throw exceptions but instead return a result indicating whether the delete operations have failed. The current implementation doesn't check the returned value, so I modified it to throw an exception when the operation fails. --------- Co-authored-by: Eugene Yurtsev <[email protected]>
) JSONparse, in _validate_metadata_func(), checks the consistency of the _metadata_func() function. To do this, it invokes it and makes sure it receives a dictionary in response. However, during the call, it does not respect future calls, as shown on line 100. This generates errors if, for example, the function is like this: ```python def generate_metadata(json_node:Dict[str,Any],kwargs:Dict[str,Any]) -> Dict[str,Any]: return { "source": url, "row": kwargs['seq_num'], "question":json_node.get("question"), } loader = JSONLoader( file_path=file_path, content_key="answer", jq_schema='.[]', metadata_func=generate_metadata, text_content=False) ``` To avoid this, the verification must comply with the specifications. This patch does just that. --------- Co-authored-by: Eugene Yurtsev <[email protected]>
…i#25375) community: add hybrid search in opensearch # Langchain OpenSearch Hybrid Search Implementation ## Implementation of Hybrid Search: I have taken LangChain's OpenSearch integration to the next level by adding hybrid search capabilities. Building on the existing OpenSearchVectorSearch class, I have implemented Hybrid Search functionality (which combines the best of both keyword and semantic search). This new functionality allows users to harness the power of OpenSearch's advanced hybrid search features without leaving the familiar LangChain ecosystem. By blending traditional text matching with vector-based similarity, the enhanced class delivers more accurate and contextually relevant results. It's designed to seamlessly fit into existing LangChain workflows, making it easy for developers to upgrade their search capabilities. In implementing the hybrid search for OpenSearch within the LangChain framework, I also incorporated filtering capabilities. It's important to note that according to the OpenSearch hybrid search documentation, only post-filtering is supported for hybrid queries. This means that the filtering is applied after the hybrid search results are obtained, rather than during the initial search process. **Note:** For the implementation of hybrid search, I strictly followed the official OpenSearch Hybrid search documentation and I took inspiration from https://github.com/AndreasThinks/langchain/tree/feature/opensearch_hybrid_search Thanks Mate! ### Experiments I conducted few experiments to verify that the hybrid search implementation is accurate and capable of reproducing the results of both plain keyword search and vector search. Experiment - 1 Hybrid Search Keyword_weight: 1, vector_weight: 0 I conducted an experiment to verify the accuracy of my hybrid search implementation by comparing it to a plain keyword search. For this test, I set the keyword_weight to 1 and the vector_weight to 0 in the hybrid search, effectively giving full weightage to the keyword component. The results from this hybrid search configuration matched those of a plain keyword search, confirming that my implementation can accurately reproduce keyword-only search results when needed. It's important to note that while the results were the same, the scores differed between the two methods. This difference is expected because the plain keyword search in OpenSearch uses the BM25 algorithm for scoring, whereas the hybrid search still performs both keyword and vector searches before normalizing the scores, even when the vector component is given zero weight. This experiment validates that my hybrid search solution correctly handles the keyword search component and properly applies the weighting system, demonstrating its accuracy and flexibility in emulating different search scenarios. Experiment - 2 Hybrid Search keyword_weight = 0.0, vector_weight = 1.0 For experiment-2, I took the inverse approach to further validate my hybrid search implementation. I set the keyword_weight to 0 and the vector_weight to 1, effectively giving full weightage to the vector search component (KNN search). I then compared these results with a pure vector search. The outcome was consistent with my expectations: the results from the hybrid search with these settings exactly matched those from a standalone vector search. This confirms that my implementation accurately reproduces vector search results when configured to do so. As with the first experiment, I observed that while the results were identical, the scores differed between the two methods. This difference in scoring is expected and can be attributed to the normalization process in hybrid search, which still considers both components even when one is given zero weight. This experiment further validates the accuracy and flexibility of my hybrid search solution, demonstrating its ability to effectively emulate pure vector search when needed while maintaining the underlying hybrid search structure. Experiment - 3 Hybrid Search - balanced keyword_weight = 0.5, vector_weight = 0.5 For experiment-3, I adopted a balanced approach to further evaluate the effectiveness of my hybrid search implementation. In this test, I set both the keyword_weight and vector_weight to 0.5, giving equal importance to keyword-based and vector-based search components. This configuration aims to leverage the strengths of both search methods simultaneously. By setting both weights to 0.5, I intended to create a scenario where the hybrid search would consider lexical matches and semantic similarity equally. This balanced approach is often ideal for many real-world applications, as it can capture both exact keyword matches and contextually relevant results that might not contain the exact search terms. Kindly verify the notebook for the experiments conducted! **Notebook:** https://github.com/karthikbharadhwajKB/Langchain_OpenSearch_Hybrid_search/blob/main/Opensearch_Hybridsearch.ipynb ### Instructions to follow for Performing Hybrid Search: **Step-1: Instantiating OpenSearchVectorSearch Class:** ```python opensearch_vectorstore = OpenSearchVectorSearch( index_name=os.getenv("INDEX_NAME"), embedding_function=embedding_model, opensearch_url=os.getenv("OPENSEARCH_URL"), http_auth=(os.getenv("OPENSEARCH_USERNAME"),os.getenv("OPENSEARCH_PASSWORD")), use_ssl=False, verify_certs=False, ssl_assert_hostname=False, ssl_show_warn=False ) ``` **Parameters:** 1. **index_name:** The name of the OpenSearch index to use. 2. **embedding_function:** The function or model used to generate embeddings for the documents. It's assumed that embedding_model is defined elsewhere in the code. 3. **opensearch_url:** The URL of the OpenSearch instance. 4. **http_auth:** A tuple containing the username and password for authentication. 5. **use_ssl:** Set to False, indicating that the connection to OpenSearch is not using SSL/TLS encryption. 6. **verify_certs:** Set to False, which means the SSL certificates are not being verified. This is often used in development environments but is not recommended for production. 7. **ssl_assert_hostname:** Set to False, disabling hostname verification in SSL certificates. 8. **ssl_show_warn:** Set to False, suppressing SSL-related warnings. **Step-2: Configure Search Pipeline:** To initiate hybrid search functionality, you need to configures a search pipeline first. **Implementation Details:** This method configures a search pipeline in OpenSearch that: 1. Normalizes the scores from both keyword and vector searches using the min-max technique. 2. Applies the specified weights to the normalized scores. 3. Calculates the final score using an arithmetic mean of the weighted, normalized scores. **Parameters:** * **pipeline_name (str):** A unique identifier for the search pipeline. It's recommended to use a descriptive name that indicates the weights used for keyword and vector searches. * **keyword_weight (float):** The weight assigned to the keyword search component. This should be a float value between 0 and 1. In this example, 0.3 gives 30% importance to traditional text matching. * **vector_weight (float):** The weight assigned to the vector search component. This should be a float value between 0 and 1. In this example, 0.7 gives 70% importance to semantic similarity. ```python opensearch_vectorstore.configure_search_pipelines( pipeline_name="search_pipeline_keyword_0.3_vector_0.7", keyword_weight=0.3, vector_weight=0.7, ) ``` **Step-3: Performing Hybrid Search:** After creating the search pipeline, you can perform a hybrid search using the `similarity_search()` method (or) any methods that are supported by `langchain`. This method combines both `keyword-based and semantic similarity` searches on your OpenSearch index, leveraging the strengths of both traditional information retrieval and vector embedding techniques. **parameters:** * **query:** The search query string. * **k:** The number of top results to return (in this case, 3). * **search_type:** Set to `hybrid_search` to use both keyword and vector search capabilities. * **search_pipeline:** The name of the previously created search pipeline. ```python query = "what are the country named in our database?" top_k = 3 pipeline_name = "search_pipeline_keyword_0.3_vector_0.7" matched_docs = opensearch_vectorstore.similarity_search_with_score( query=query, k=top_k, search_type="hybrid_search", search_pipeline = pipeline_name ) matched_docs ``` twitter handle: @iamkarthik98 --------- Co-authored-by: Karthik Kolluri <[email protected]> Co-authored-by: Eugene Yurtsev <[email protected]>
…gchain-ai#28374) **Description:** This PR introduces a `model` alias for the embedding classes that contain the attribute `model_name`, to ensure consistency across the codebase, as suggested by a moderator in a previous PR. The change aligns the usage of attribute names across the project (see for example [here](https://github.com/langchain-ai/langchain/blob/65deeddd5dfec5d51f33ebc961f09c2e47a8f064/libs/partners/groq/langchain_groq/chat_models.py#L304)). **Issue:** This PR addresses the suggestion from the review of issue langchain-ai#28269. **Dependencies:** None --------- Co-authored-by: Eugene Yurtsev <[email protected]> Co-authored-by: Erick Friis <[email protected]>
…rkdownifyTransformer` (langchain-ai#27866) # Description Implements the `atransform_documents` method for `MarkdownifyTransformer` using the `asyncio` built-in library for concurrency. Note that this is mainly for API completeness when working with async frameworks rather than for performance, since the `markdownify` function is not I/O bound because it works with `Document` objects already in memory. # Issue Fixes langchain-ai#27865 # Dependencies No new dependencies added, but [`markdownify`](https://github.com/matthewwithanm/python-markdownify) is required since this PR updates the `markdownify` integration. # Tests and docs - Tests added - I did not modify the docstrings since they already described the basic functionality, and [the API docs also already included a description](https://python.langchain.com/api_reference/community/document_transformers/langchain_community.document_transformers.markdownify.MarkdownifyTransformer.html#langchain_community.document_transformers.markdownify.MarkdownifyTransformer.atransform_documents). If it would be helpful, I would be happy to update the docstrings and/or the API docs. # Lint and test - [x] format - [x] lint - [x] test I ran formatting with `make format`, linting with `make lint`, and confirmed that tests pass using `make test`. Note that some unit tests pass in CI but may fail when running `make_test`. Those unit tests are: - `test_extract_html` (and `test_extract_html_async`) - `test_strip_tags` (and `test_strip_tags_async`) - `test_convert_tags` (and `test_convert_tags_async`) The reason for the difference is that there are trailing spaces when the tests are run in the CI checks, and no trailing spaces when run with `make test`. I ensured that the tests pass in CI, but they may fail with `make test` due to the addition of trailing spaces. --------- Co-authored-by: Erick Friis <[email protected]>
Thank you for contributing to LangChain! **PR title**: "community: fix PDF Filter Type Error" - **Description:** fix PDF Filter Type Error" - **Issue:** the issue langchain-ai#27153 it fixes, - **Dependencies:** no - **Twitter handle:** if your PR gets announced, and you'd like a mention, we'll gladly shout you out! - [x] **Lint and test**: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17. --------- Co-authored-by: Erick Friis <[email protected]>
Added Langchain complete tutorial playlist from total technology zonne channel .In this playlist every video is focusing one specific use case and hands on demo.All tutorials are equally good for every levels . Thank you for contributing to LangChain! - [ ] **PR title**: "package: description" - Where "package" is whichever of langchain, community, core, etc. is being modified. Use "docs: ..." for purely docs changes, "infra: ..." for CI changes. - Example: "community: add foobar LLM" - [ ] **PR message**: ***Delete this entire checklist*** and replace with - **Description:** a description of the change - **Issue:** the issue # it fixes, if applicable - **Dependencies:** any dependencies required for this change - **Twitter handle:** if your PR gets announced, and you'd like a mention, we'll gladly shout you out! - [ ] **Add tests and docs**: If you're adding a new integration, please include 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory. - [ ] **Lint and test**: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17. --------- Co-authored-by: Erick Friis <[email protected]> Co-authored-by: Erick Friis <[email protected]>
Co-authored-by: Jesse Schumacher <[email protected]> Co-authored-by: Jesse S <[email protected]> Co-authored-by: dylan <[email protected]>
Issue: Here is an ambiguity about W&B integrations. There are two existing provider pages. Fix: Added the "root" W&B provider page. Added there the references to the documentation in the W&B site. Cleaned up formats in existing pages. Added one more integration reference. --------- Co-authored-by: Erick Friis <[email protected]> Co-authored-by: Eugene Yurtsev <[email protected]>
- **Description:** Adds a helper that renders documents with the GraphVectorStore metadata fields to Graphviz for visualization. This is helpful for understanding and debugging. --------- Co-authored-by: Erick Friis <[email protected]>
Thank you for contributing to LangChain! - [x] **PR title**: langchain: add URL parameter to ChatDeepInfra class - [x] **PR message**: add URL parameter to ChatDeepInfra class - **Description:** This PR introduces a url parameter to the ChatDeepInfra class in LangChain, allowing users to specify a custom URL. Previously, the URL for the DeepInfra API was hardcoded to "https://stage.api.deepinfra.com/v1/openai/chat/completions", which caused issues when the staging endpoint was not functional. The _url method was updated to return the value from the url parameter, enabling greater flexibility and addressing the problem. out! --------- Co-authored-by: Erick Friis <[email protected]>
Bump unstructured to pick up resolution of Unstructured-IO/unstructured#3795
…n-ai#28728) make sure id field of Documents in `FAISS` docstore have the same id as values in `index_to_docstore_id`, implement `get_by_ids` method
…ard compatability (langchain-ai#28439) - **Description:** `Model_Kwargs` was not being passed correctly to `sentence_transformers.SentenceTransformer` which has been corrected while maintaing backward compatability - **Issue:** langchain-ai#28436 --------- Co-authored-by: MoosaTae <[email protected]> Co-authored-by: Sadit Wongprayon <[email protected]> Co-authored-by: Erick Friis <[email protected]>
…earch Support (langchain-ai#28716) Thank you for contributing to LangChain! - Added [full text](https://learn.microsoft.com/en-us/azure/cosmos-db/gen-ai/full-text-search) and [hybrid search](https://learn.microsoft.com/en-us/azure/cosmos-db/gen-ai/hybrid-search) support for Azure CosmosDB NoSql Vector Store - Added a new enum called CosmosDBQueryType which supports the following values: - VECTOR = "vector" - FULL_TEXT_SEARCH = "full_text_search" - FULL_TEXT_RANK = "full_text_rank" - HYBRID = "hybrid" - User now needs to provide this query_type to the similarity_search method for the vectorStore to make the correct query api call. - Added a couple of work arounds as for the FULL_TEXT_RANK and HYBRID query functions we don't support parameterized queries right now. I have added TODO's in place, and will remove these work arounds by end of January. - Added necessary test cases and updated the - [x] **Add tests and docs**: If you're adding a new integration, please include 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory. - [x] **Lint and test**: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17. --------- Co-authored-by: Erick Friis <[email protected]>
…in-ai#26398) Fixes langchain-ai#26171: - added some clarification text for the keyword argument `breakpoint_threshold_amount` - added min_chunk_size: together with `breakpoint_threshold_amount`, too small/big chunk sizes can be avoided Note: the langchain-experimental was moved to a separate repo, so only the doc change stays here.
…-ai#25876) **Description**: Fixed formatting start and end time **Issue**: The old formatting resulted everytime in an timezone error **Dependencies**: / **Twitter handle**: / --------- Co-authored-by: Yannick Opitz <[email protected]> Co-authored-by: Erick Friis <[email protected]>
Here I've updated the class to use the new pydantic decorators in order to comply with the latest version of LangChain. This also contains 3 unit tests for the class and a short Jupyter Doc explaining use for the class