diff --git a/docs/docs/integrations/chat/ibm_watsonx.ipynb b/docs/docs/integrations/chat/ibm_watsonx.ipynb index 5b22b3e7260ab..d43d3834958fb 100644 --- a/docs/docs/integrations/chat/ibm_watsonx.ipynb +++ b/docs/docs/integrations/chat/ibm_watsonx.ipynb @@ -454,7 +454,7 @@ "\n", "Please note that `ChatWatsonx.bind_tools` is on beta state, so right now we only support `mistralai/mixtral-8x7b-instruct-v01` model.\n", "\n", - "You should also redefine `max_new_tokens` parameter to get the entire model response. By default `max_new_tokens` is set ot 20." + "You should also redefine `max_new_tokens` parameter to get the entire model response. By default `max_new_tokens` is set to 20." ] }, { @@ -577,7 +577,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.10.13" + "version": "3.1.undefined" } }, "nbformat": 4, diff --git a/docs/docs/integrations/document_loaders/arxiv.ipynb b/docs/docs/integrations/document_loaders/arxiv.ipynb index 09c4d989eeda0..25abec4c0d471 100644 --- a/docs/docs/integrations/document_loaders/arxiv.ipynb +++ b/docs/docs/integrations/document_loaders/arxiv.ipynb @@ -2,121 +2,173 @@ "cells": [ { "cell_type": "markdown", - "id": "bda1f3f5", + "id": "0dee6344", "metadata": {}, "source": [ - "# Arxiv\n", + "# ArxivLoader\n", "\n", - ">[arXiv](https://arxiv.org/) is an open-access archive for 2 million scholarly articles in the fields of physics, mathematics, computer science, quantitative biology, quantitative finance, statistics, electrical engineering and systems science, and economics.\n", - "\n", - "This notebook shows how to load scientific articles from `Arxiv.org` into a document format that we can use downstream." + "[arXiv](https://arxiv.org/) is an open-access archive for 2 million scholarly articles in the fields of physics, mathematics, computer science, quantitative biology, quantitative finance, statistics, electrical engineering and systems science, and economics." ] }, { "cell_type": "markdown", - "id": "1b7a1eef-7bf7-4e7d-8bfc-c4e27c9488cb", + "id": "834c9e84", + "metadata": {}, + "source": [ + "## Setup\n", + "\n", + "To access Arxiv document loader you'll need to install the `arxiv`, `PyMuPDF` and `langchain-community` integration packages. PyMuPDF transforms PDF files downloaded from the arxiv.org site into the text format." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "3b002f37", "metadata": {}, + "outputs": [], "source": [ - "## Installation" + "%pip install -qU langchain-community arxiv pymupdf" ] }, { "cell_type": "markdown", - "id": "2abd5578-aa3d-46b9-99af-8b262f0b3df8", + "id": "9fb0ae15", "metadata": {}, "source": [ - "First, you need to install `arxiv` python package." + "## Instantiation\n", + "\n", + "Now we can instantiate our model object and load documents:" ] }, { "cell_type": "code", - "execution_count": null, - "id": "b674aaea-ed3a-4541-8414-260a8f67f623", - "metadata": { - "tags": [] - }, + "execution_count": 1, + "id": "6acfc0f0", + "metadata": {}, "outputs": [], "source": [ - "%pip install --upgrade --quiet arxiv" + "from langchain_community.document_loaders import ArxivLoader\n", + "\n", + "# Supports all arguments of `ArxivAPIWrapper`\n", + "loader = ArxivLoader(\n", + " query=\"reasoning\",\n", + " load_max_docs=2,\n", + " # doc_content_chars_max=1000,\n", + " # load_all_available_meta=False,\n", + " # ...\n", + ")" ] }, { "cell_type": "markdown", - "id": "094b5f13-7e54-4354-9d83-26d6926ecaa0", - "metadata": { - "tags": [] - }, + "id": "d3ee2773", + "metadata": {}, "source": [ - "Second, you need to install `PyMuPDF` python package which transforms PDF files downloaded from the `arxiv.org` site into the text format." + "## Load\n", + "\n", + "Use ``.load()`` to synchronously load into memory all Documents, with one\n", + "Document per one arxiv paper.\n", + "\n", + "Let's run through a basic example of how to use the `ArxivLoader` searching for papers of reasoning:" ] }, { "cell_type": "code", - "execution_count": null, - "id": "7cd91121-2e96-43ba-af50-319853695f86", - "metadata": { - "tags": [] - }, - "outputs": [], + "execution_count": 2, + "id": "e95d66ba", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "Document(page_content='Hypothesis Testing Prompting Improves Deductive Reasoning in\\nLarge Language Models\\nYitian Li1,2, Jidong Tian1,2, Hao He1,2, Yaohui Jin1,2\\n1MoE Key Lab of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University\\n2State Key Lab of Advanced Optical Communication System and Network\\n{yitian_li, frank92, hehao, jinyh}@sjtu.edu.cn\\nAbstract\\nCombining different forms of prompts with pre-trained large language models has yielded remarkable results on\\nreasoning tasks (e.g. Chain-of-Thought prompting). However, along with testing on more complex reasoning, these\\nmethods also expose problems such as invalid reasoning and fictional reasoning paths. In this paper, we develop\\nHypothesis Testing Prompting, which adds conclusion assumptions, backward reasoning, and fact verification during\\nintermediate reasoning steps. Hypothesis Testing prompting involves multiple assumptions and reverses validation of\\nconclusions leading to its unique correct answer. Experiments on two challenging deductive reasoning datasets\\nProofWriter and RuleTaker show that hypothesis testing prompting not only significantly improves the effect, but also\\ngenerates a more reasonable and standardized reasoning process.\\nKeywords: Deductive Reasoning, Large Language Models, Prompt\\n1.\\nIntroduction\\nThe release of large language models (LLMs) has\\nrevolutionized the NLP landscape recently (Thop-\\npilan et al., 2022; Kaplan et al., 2020; Chowdh-\\nery et al., 2022). Scaling up the size of language\\nmodels and conducting diversified prompt meth-\\nods become mainstream (Liu et al., 2023c; Wei\\net al., 2022a; Yang et al., 2023). Given In-context\\nlearning or Chain-of-Thought prompts have already\\nachieved high performance on challenging tasks\\nsuch as commonsense, arithmetic, and symbolic\\nreasoning (Imani et al., 2023; Lee et al., 2021;\\nKojima et al., 2022). Logical reasoning is one of\\nthe most important and long-standing problems in\\nNLP (Hirschberg and Manning, 2015; Russell and\\nNorvig, 2010), and integrating this ability into nat-\\nural language understanding systems has always\\nbeen a goal pursued (Du et al., 2022).\\nNevertheless, scaling has been demonstrated\\nto offer limited advantages in resolving complex\\nlogical reasoning issues (Kazemi et al., 2022). For\\nexample, Saparov and He (2022) show that Chain-\\nof-Thought prompting struggles with proof planning\\nfor more complex logical reasoning problems. Addi-\\ntionally, the performance suffers greatly while han-\\ndling recently released and out-of-distribution logi-\\ncal reasoning datasets (Liu et al., 2023a). Despite\\nmany works have explored variants of Chain-of-\\nThought prompts to facilitate LLMs inference (Zelik-\\nman et al., 2022; Zheng et al., 2023), we discover\\nthat the present logical reasoning task prompts\\nplace an excessive amount of emphasis on the\\nreasoning process while ignoring the origin, pur-\\npose, and effectiveness of reasoning (Creswell\\net al., 2022; Xi et al., 2023). As examples shown in\\nQ1: Bob is green. True/false? \\nInput Facts: Alan is blue. Alan is \\nrough. Alan is young. Bob is big. \\nBob is round. Charlie is big. Charlie \\nis blue. Charlie is green. Dave is \\ngreen. Dave is rough.\\nInput Rules: Big people are rough. \\nIf someone is young and round then \\nthey are kind. If someone is round \\nand big then they are blue. All\\nrough people are green.\\nBob is big. \\nBig people are rough. \\nAll rough people are green.\\nAnswer: T\\nQ2: Dave is blue. True/false?\\nLess inspiring\\nFigure 1: Questions in RuleTaker involve logical\\nreasoning with facts and rules.\\nFigure 1, the difficulty in judging logical problems\\narises not only from the process of reasoning but\\nalso from the choice of facts and rules to use as a\\nstarting point. Even if we were provided the thought\\nprocess for some of the issues, it would not be very\\nbeneficial for others, based on how we previously\\ncreated the prompts.\\nIn this paper, we propose Hypothesis Testing\\nPrompting, a new and more considerate prompt\\ntemplate design idea. Hypothesis testing is a for-\\nmal procedure for investigating our ideas about\\nthe world using statistics and is often used by sci-\\nentists to test specific predictions (Bevans, 2022).\\nWe draw inspiration from its process to introduce a\\nprocess of conclusion assumptions, backward rea-\\nsoning, and fact verification. Experiments on Rule-\\nTaker (Clark et al., 2020) and ProofWriter (Tafjord\\net al., 2021) show the effectiveness of our novel\\nprompting paradigm as a strategy for promoting\\ndeductive reasoning in large language models. Fur-\\nther analyses show that Hypothesis Test prompting\\ngenerates more desirable intermediate processes\\narXiv:2405.06707v1 [cs.CL] 9 May 2024\\nand significantly improves the \"Unknown\" label.\\n2.\\nRelated Work\\n2.1.\\nFew-Shot Prompting\\nBrown et al. (2020) propose in-context learning as\\nan alternative few-shot prompting way to stimulate\\nability. Besides, chain-of-Thought (CoT) (Wei et al.,\\n2022b) is one of the most well-known works, which\\ndecomposes the problem into intermediate steps\\nand further improves the ability of large language\\nmodels. Subsequently, several follow-up works\\nwere carried out, including Zero-shot-CoT (simply\\nadding \"Let’s think step by step\" before each an-\\nswer) (Kojima et al., 2022), Self-consistency (Wang\\net al., 2022), complexity-based (Fu et al., 2022),\\nand other prompting work (Liu et al., 2023b; Jung\\net al., 2022; Zhou et al., 2022; Saparov and He,\\n2022). While these methods enhance the perfor-\\nmance of inference by paying attention to indica-\\ntions of the reasoning process, they often overlook\\nsome aspects such as identifying the root cause of\\nthe problem, establishing efficient reasoning strate-\\ngies, and determining the direction of logical rea-\\nsoning.\\n2.2.\\nDeductive Reasoning\\nDeductive reasoning is defined as the applica-\\ntion of general concepts to particular circum-\\nstances (Johnson-Laird, 2010). Making logical as-\\nsumptions is the foundation of deductive reasoning,\\nwhich then bases a conclusion on those assump-\\ntions. The deduction task is then applied to a sit-\\nuation from the actual world after starting with a\\nrule. In light of the principles \"All men are mortal.\"\\nand \"Socrates is a man.\" for example, we can draw\\nthe conclusion that \"Socrates is mortal.\" (Johnson-\\nLaird, 1999).\\n3.\\nHypothesis Testing Prompting\\nHypothesis testing is a formal procedure for investi-\\ngating our ideas about the world using statistics and\\nused by scientists to test specific predictions that\\narise from theories (Bevans, 2022; La et al., 2012).\\nThere are 5 main steps in hypothesis testing:\\n1. State your research hypothesis;\\n2. Collect data in a way designed to test the hy-\\npothesis;\\n3. Perform an appropriate statistical test;\\n4. Decide whether to reject or fail to reject your\\nnull hypothesis;\\n5. Present the findings in your results and discus-\\nsion section;\\nWhen completing a challenging reasoning activ-\\nity, such as a multi-step deductive reasoning prob-\\nlem, one is not conducting random reasoning to\\nobtain all possible intermediate results. We shall\\nchoose the relevant conditions for inference ver-\\nification after initially making assumptions about\\nthe judgment problem, such as \" First assume the\\nconclusion is True and start from ... Then assume\\nthe conclusion is False and start from ... because\\nthe rules state that ... So the conclusion ...\". The\\npurpose of this study is to give language models\\nthe capacity to build a process that is similar to\\nwhat we defined as Hypothesis Testing Prompt-\\ning. We will show that large language models can\\ngenerate more appropriate thought and more ac-\\ncurate results if demonstrations of hypothesis test\\nprompting are provided in the exemplars for few-\\nshot prompting. Figure 2 shows an example of a\\nmodel producing a hypothesis testing thought to\\nsolve a deductive reasoning problem.\\n4.\\nExperiment\\n4.1.\\nExperimental Setup\\nWe explore Hypothesis Test Prompting for Chat-\\nGPT (GPT-3.5-Turbo in the OpenAI API) on multiple\\nlogical reasoning benchmarks.\\nBenchmarks. Considering FOL reasoning in\\nquestion answering systems, there are two world\\nassumptions (Reiter, 1981) that result in different\\nobjectives. One is the closed world assumption\\n(CWA), which is the presumption that what is not\\ncurrently known to be entailment is contradiction.\\nThe other is the open world assumption (OWA),\\nwhose objective should distinguish false proposi-\\ntions from uncertain ones. Due to differences in\\nworld assumptions, our analysis and solutions are\\nalso different.\\nWe consider the following two deductive reason-\\ning problem benchmarks: (1) the RuleTaker (Clark\\net al., 2020) benchmark using CWA assumption;\\n(2) the ProofWriter (Tafjord et al., 2021) benchmark\\nusing OWA assumption. Both datasets are divided\\ninto five parts, each part requiring 0, ≤1, ≤2, ≤\\n3, and ≤5 hops of reasoning, respectively. We\\nconducted comparison tests on the test set of the\\ntwo datasets for 5 distinct hops.\\nStandard prompting. As one of the baselines,\\nwe take into account the common few-shot prompt-\\ning, made popular by Brown et al. (2020), in which\\na language model is provided with in-context ex-\\namples of input-output pairings before producing a\\nprediction for a test-time example. Examples are\\npresented in the form of questions and answers.\\nAs seen in Figure 2(above), the model directly an-\\nswers the question.\\nChain-of-Thought prompting. We also com-\\nChain-of-Thought Prompting\\nModel Input\\nStandard Prompting\\nModel Input\\nQ: Judge the following conclusion \\'Harry is cold.\\' is correct, wrong, or unknown, based on the following texts: Text 1\\nA: The conclusion \\'Harry is cold.\\' is True.\\nQ: Judge the following conclusion ‘Bob is not young.’ is correct, wrong, or unknown, based on the following texts: Text 2\\nModel Output\\nText 1: Charlie is rough. Charlie is white. Erin is blue. Erin is rough. \\nErin is young. Gary is blue. Gary is green. Gary is quiet. Gary is \\nwhite. Harry is blue. Harry is rough. All rough, white things are \\ngreen. All young, blue things are quiet. If something is young then it \\nis blue. White, blue things are quiet. If something is green then it is \\nyoung. Quiet, green things are cold. All green, cold things are rough.\\nQuestion: Harry is cold. \\nAnswer: True\\nText 2: Bob is rough. Fiona is not furry. Fiona is white. Young, nice \\nthings are big. Blue things are big. If something is young and white \\nthen it is nice. All nice things are young. All white things are young. \\nIf something is blue and not big then it is not rough.\\nQuestion: Bob is not young. \\nAnswer: Unknown\\nQ: Judge the following conclusion \\'Harry is cold.\\' is correct, wrong, or unknown, based on the following texts: Text 1\\nA: Start from \\'Harry is quiet\\' and \\'Quiet things are nice\\'. Then can know \\'Harry is nice\\'. And \\'All nice things are cold\\' is given. Finally \\nit can be deduced \\'Harry is cold\\'. So the conclusion \\'Harry is cold.\\' is True.\\nQ: Judge the following conclusion \\'Harry is not quiet.\\' is correct, wrong, or unknown, based on the following texts: Text 2\\nModel Output\\nHypothesis Testing Prompting\\nModel Input\\nQ: Judge the following conclusion \\'Harry is cold.\\' is correct, wrong, or unknown, based on the following texts: Text 1\\nA: First assume the conclusion is True and start from Harry is cold. Because the rules state that all nice things are cold, correct\\nconclusion needs to be satisfied that Harry is nice. And the rules states that quiet things are nice. It needs to be satisfied Harry is quiet, \\nwhich is also stated in the facts. So the conclusion \\'Harry is cold.\\' is True.\\nQ: Judge the following conclusion ‘Bob is not young.’ is correct, wrong, or unknown, based on the following texts: Text 2\\nQ: The conclusion \\'Bob is not young.\\' is True.\\nQ: Start from \\'Bob is rough\\' and \\'If something is blue and not big then it is not rough\\'. Then can know \\'Bob is not blue\\'. And \\'All white \\nthings are young\\' is given. Finally it can be deduced \\'Bob is not young\\'. So the conclusion \\'Bob is not young.\\' is True.\\nModel Output\\nQ: First assume the conclusion is True and start from Bob is not young. Because the rules state that all white things are young, correct \\nconclusion needs to be satisfied that Bob is not white. And there are no facts or rules about \\'not white\\'. Then assume the conclusion is \\nFalse and start from Bob is young. Because the rules state that all nice things are young, correct conclusion needs to be satisfied that \\nBob is nice. And there are no facts or rules about \\'nice\\'. So the conclusion \\'Bob is not young.\\' is Unknown.\\n× N\\n× N\\n× N\\nFigure 2: Comparison of three prompting methods: (a) Standard (b) Chain-of-Thought (c) Hypothesis\\nTesting. Particularly, we highlight the Hypothesis testing reasoning processes. The comparative experi-\\nmental results show that: Hypothesis testing prompting enables large language models to tackle complex\\nlogical reasoning.\\npare with Chain-of-thought prompting which has\\nachieved encouraging results on complex reason-\\ning tasks (Wei et al., 2022b).\\nAs seen in Fig-\\nure 2(middle), the model not only provides the final\\nanswer but also comes with the consideration of\\nintermediate steps.\\nHypothesis Testing Prompting. Our proposed\\napproach is to augment each exemplars in few-shot\\nprompting with the thought of hypothesis testing for\\nan associated answer, as illustrated in Figure 2(be-\\nlow). We show one chain of thought exemplars\\n(Example: Judge the following conclusion ’’ is true, false, or unknown, based on the\\nfollowing facts and rules: ... ...).\\n4.2.\\nExperimental Results\\nThe results for Hypothesis Testing Prompting and\\nthe baselines on the RuleTaker datasets are pro-\\nvided in Figure 3(a), and ProofWriter results are\\nshown in Figure 3(b). From the results, we ob-\\nserve that our method significantly outperforms the\\nother two baselines, especially on ProofWriter. Fig-\\nure 3(a) demonstrates that while CoT performs well\\nin the low hop, Hypothesis Testing prompting per-\\nforms better as the hops count increases on Rule-\\nTaker. While on ProofWriter, our approach has a\\nthorough lead (improved accuracy by over 4% on\\nall hops). Comparing two datasets, the latter dis-\\ntinguishes between \"False\" and \"Unknown\", which\\ndemand a greater level of logic. The results on two\\n0.97\\n0.93\\n0.83\\n0.84\\n0.81\\n0.99\\n0.92\\n0.77\\n0.8\\n0.78\\n0.78\\n0.72\\n0.63\\n0.63\\n0.65\\n0\\n0.1\\n0.2\\n0.3\\n0.4\\n0.5\\n0.6\\n0.7\\n0.8\\n0.9\\n1\\ndepth-0\\ndepth-1\\ndepth-2\\ndepth-3\\ndepth-5\\nAccuracy\\nStandard Prompting\\nChain-of-Thought Prompting\\nHypothesis Testing Prompting\\n(a)\\n0.6\\n0.41\\n0.39\\n0.34\\n0.33\\n0.77\\n0.54\\n0.58\\n0.56\\n0.5\\n0.82\\n0.63\\n0.62\\n0.61\\n0.57\\n0\\n0.1\\n0.2\\n0.3\\n0.4\\n0.5\\n0.6\\n0.7\\n0.8\\n0.9\\n1\\ndepth-0\\ndepth-1\\ndepth-2\\ndepth-3\\ndepth-5\\nAccuracy\\nStandard Prompting\\nChain-of-Thought Prompting\\nHypothesis Testing Prompting\\n(b)\\nFigure 3: Prediction accuracy results on the (a) RuleTaker and (b) ProofWriter datasets.\\n0.74\\n0.26\\n0\\n0.2\\n0.4\\n0.6\\n0.8\\n1\\nHypothesis Testing\\nChain-of-Thought\\n(a) The proof accuracy of Chain-of-Thought and Hy-\\npothesis Testing prompting.\\n0.65\\n0.3\\n0\\n0.2\\n0.4\\n0.6\\n0.8\\n1\\nHypothesis Testing\\nChain-of-Thought\\n(b) Comparison of accuracy between Chain-of-\\nThought and Hypothesis Testing prompting on \"Un-\\nknown\" label.\\nFigure 4: Further results on ProofWriter.\\ndatasets that were analyzed show a weakness in\\nall methods for handling \"Unknown\" labels. This\\nbeacuse the OWA hypothesis necessitates the ex-\\nclusion of both positive and negative findings to\\nvalidate the \"Unknown\" label. The advantages of\\nour strategy are illustrated by the comparison of the\\nmodel output outputs in Figure 2. The content \"First\\nassume the conclusion is True ... Then assume\\nthe conclusion is False ... So ... is Unknown.\" gen-\\nerated by the model through learning Hypothesis\\nTesting prompting is more in line with our thinking.\\nBesides, we’ll conduct further research and show\\nit later.\\n4.3.\\nFurther Analysis\\nWe carry out the following thorough analysis to\\nbetter comprehend the thought process:\\nProof Accuracy. Five students are required to\\nmanually evaluate the outcomes of the intermediate\\nreasoning after we randomly picked 100 examples\\nfrom depth-5 of the ProofWriter. Proof accuracy rep-\\nresents the proportion where the inference process\\nhas been proven to be reasonable in the correct\\npart of data label prediction. We compare the re-\\nsults of Chian-of-Thought and Hypothesis Testing\\nprompting and report in Figure 4(a). While Hy-\\npothesis Testing prompting mostly produced the\\ncorrect intermediate reasoning process when the\\npredicted label was correct, CoT only generated\\nthe correct chain for 26% of the examples. This\\nresult is in line with other research showing that\\nLMs rely on spurious correlations when solving log-\\nical problems from beginning to end. Additionally,\\nour approach can successfully increase reason-\\ning’s rationality. In processing the \"Unknown\" label,\\nHypothesis Testing prompting performs noticeably\\nbetter than Chain-of-Thought.\\n\"Unknown\" accuracy.\\nIn the ProofWriter\\ndataset, we separately counted the accuracy of\\nthe \"Unknown\" label shown in Figure 4(b). The\\nresults point to a flaw in the Chain-of-Thought strat-\\negy’s handling of \"Unknown\" labels(only 0.3 accu-\\nracyies). Contrarily, Hypothesis Testing prompting\\nsignificantly increases the reliability of judging this\\nlabel (up to 0.65). This further illustrates the value\\nof holding various assumptions, as well as the re-\\nverse confirmation of conclusions.\\n5.\\nConclusion\\nWe have investigated Hypothesis Testing prompt-\\ning as a straightforward and widely applicable tech-\\nnique for improving deductive reasoning in large\\nlanguage models. Multiple assumptions are made\\nduring hypothesis testing, and conclusions are\\nreverse-validated to arrive at the one and only ac-\\ncurate answer. Through experiments on two logical\\nreasoning datasets, we find that Hypothesis Test-\\ning prompting allows large language models to con-\\nstruct reasoning more reasonably and accurately.\\nWe anticipate that additional research on language-\\nbased reasoning approaches will be stimulated by\\nour novel prompting design strategy.\\n6.\\nReferences\\nAlfred V. Aho and Jeffrey D. Ullman. 1972. The\\nTheory of Parsing, Translation and Compiling,\\nvolume 1. Prentice-Hall, Englewood Cliffs, NJ.\\nAmerican Psychological Association. 1983. Publi-\\ncations Manual. American Psychological Associ-\\nation, Washington, DC.\\nRie Kubota Ando and Tong Zhang. 2005. A frame-\\nwork for learning predictive structures from multi-\\nple tasks and unlabeled data. Journal of Machine\\nLearning Research, 6:1817–1853.\\nGalen Andrew and Jianfeng Gao. 2007. Scalable\\ntraining of L1-regularized log-linear models. In\\nProceedings of the 24th International Conference\\non Machine Learning, pages 33–40.\\nRebecca Bevans. 2022.\\nHypothesis Testing |\\nA Step-by-Step Guide with Easy Examples.\\nScribbr.\\nTom B. Brown, Benjamin Mann, Nick Ryder,\\nMelanie Subbiah, Jared Kaplan, Prafulla Dhari-\\nwal, Arvind Neelakantan, Pranav Shyam, Girish\\nSastry, Amanda Askell, Sandhini Agarwal, Ariel\\nHerbert-Voss, Gretchen Krueger, Tom Henighan,\\nRewon Child, Aditya Ramesh, Daniel M. Ziegler,\\nJeffrey Wu, Clemens Winter, Christopher Hesse,\\nMark Chen, Eric Sigler, Mateusz Litwin, Scott\\nGray, Benjamin Chess, Jack Clark, Christopher\\nBerner, Sam McCandlish, Alec Radford, Ilya\\nSutskever, and Dario Amodei. 2020. Language\\nmodels are few-shot learners. In NeurIPS.\\nBSI. 1973a.\\nNatural Fibre Twines, 3rd edition.\\nBritish Standards Institution, London. BS 2570.\\nBSI. 1973b. Natural fibre twines. BS 2570, British\\nStandards Institution, London. 3rd. edn.\\nA. Castor and L. E. Pollux. 1992. The use of user\\nmodelling to guide inference and learning. Ap-\\nplied Intelligence, 2(1):37–53.\\nAshok K. Chandra, Dexter C. Kozen, and Larry J.\\nStockmeyer. 1981. Alternation. Journal of the As-\\nsociation for Computing Machinery, 28(1):114–\\n133.\\nJ.L. Chercheur. 1994. Case-Based Reasoning, 2nd\\nedition. Morgan Kaufman Publishers, San Mateo,\\nCA.\\nYejin Choi. 2022. The curious case of common-\\nsense intelligence. Daedalus.\\nN. Chomsky. 1973. Conditions on transformations.\\nIn A festschrift for Morris Halle, New York. Holt,\\nRinehart & Winston.\\nAakanksha Chowdhery, Sharan Narang, Jacob\\nDevlin, Maarten Bosma, Gaurav Mishra, Adam\\nRoberts, Paul Barham, Hyung Won Chung,\\nCharles Sutton, Sebastian Gehrmann, Parker\\nSchuh, Kensen Shi, Sasha Tsvyashchenko,\\nJoshua Maynez, Abhishek Rao, Parker Barnes,\\nYi Tay,\\nNoam Shazeer,\\nVinodkumar Prab-\\nhakaran, Emily Reif, Nan Du, Ben Hutchinson,\\nReiner Pope, James Bradbury, Jacob Austin,\\nMichael Isard, Guy Gur-Ari, Pengcheng Yin,\\nToju Duke, Anselm Levskaya, Sanjay Ghemawat,\\nSunipa Dev, Henryk Michalewski, Xavier Gar-\\ncia, Vedant Misra, Kevin Robinson, Liam Fe-\\ndus, Denny Zhou, Daphne Ippolito, David Luan,\\nHyeontaek Lim, Barret Zoph, Alexander Spiri-\\ndonov, Ryan Sepassi, David Dohan, Shivani\\nAgrawal, Mark Omernick, Andrew M. Dai, Thanu-\\nmalayan Sankaranarayana Pillai, Marie Pellat,\\nAitor Lewkowycz, Erica Moreira, Rewon Child,\\nOleksandr Polozov, Katherine Lee, Zongwei\\nZhou, Xuezhi Wang, Brennan Saeta, Mark Diaz,\\nOrhan Firat, Michele Catasta, Jason Wei, Kathy\\nMeier-Hellstern, Douglas Eck, Jeff Dean, Slav\\nPetrov, and Noah Fiedel. 2022. Palm: Scaling\\nlanguage modeling with pathways. CoRR.\\nPeter Clark, Oyvind Tafjord, and Kyle Richardson.\\n2020. Transformers as soft reasoners over lan-\\nguage. In IJCAI.\\nJames W. Cooley and John W. Tukey. 1965. An\\nalgorithm for the machine calculation of complex\\nFourier series.\\nMathematics of Computation,\\n19(90):297–301.\\nAntonia Creswell, Murray Shanahan, and Irina Hig-\\ngins. 2022. Selection-inference: Exploiting large\\nlanguage models for interpretable logical reason-\\ning. CoRR.\\nYilun Du, Shuang Li, Joshua B. Tenenbaum, and\\nIgor Mordatch. 2022. Learning iterative reason-\\ning through energy minimization. In ICML.\\nUmberto Eco. 1990. The Limits of Interpretation.\\nIndian University Press.\\nYao Fu, Hao Peng, Ashish Sabharwal, Peter Clark,\\nand Tushar Khot. 2022.\\nComplexity-based\\nprompting for multi-step reasoning. CoRR.\\nDan Gusfield. 1997. Algorithms on Strings, Trees\\nand Sequences. Cambridge University Press,\\nCambridge, UK.\\nJulia Hirschberg and Christopher D. Manning. 2015.\\nAdvances in natural language processing. Sci-\\nence.\\nPaul Gerhard Hoel. 1971a. Elementary Statistics,\\n3rd edition. Wiley series in probability and math-\\nematical statistics. Wiley, New York, Chichester.\\nISBN 0 471 40300.\\nPaul Gerhard Hoel. 1971b. Elementary Statistics,\\n3rd edition, Wiley series in probability and mathe-\\nmatical statistics, pages 19–33. Wiley, New York,\\nChichester. ISBN 0 471 40300.\\nShima Imani, Liang Du, and Harsh Shrivastava.\\n2023. Mathprompter: Mathematical reasoning\\nusing large language models. CoRR.\\nOtto Jespersen. 1922. Language: Its Nature, De-\\nvelopment, and Origin. Allen and Unwin.\\nPhil Johnson-Laird. 2010. Deductive reasoning.\\nWiley Interdisciplinary Reviews: Cognitive Sci-\\nence.\\nPhilip N Johnson-Laird. 1999. Deductive reasoning.\\nAnnual review of psychology.\\nJaehun Jung, Lianhui Qin, Sean Welleck, Faeze\\nBrahman, Chandra Bhagavatula, Ronan Le Bras,\\nand Yejin Choi. 2022. Maieutic prompting: Logi-\\ncally consistent reasoning with recursive expla-\\nnations. In EMNLP.\\nJared Kaplan, Sam McCandlish, Tom Henighan,\\nTom B. Brown, Benjamin Chess, Rewon Child,\\nScott Gray, Alec Radford, Jeffrey Wu, and Dario\\nAmodei. 2020. Scaling laws for neural language\\nmodels. CoRR.\\nSeyed Mehran Kazemi, Najoung Kim, Deepti Bha-\\ntia, Xin Xu, and Deepak Ramachandran. 2022.\\nLAMBADA: backward chaining for automated\\nreasoning in natural language. CoRR.\\nTakeshi Kojima, Shixiang Shane Gu, Machel Reid,\\nYutaka Matsuo, and Yusuke Iwasawa. 2022.\\nLarge language models are zero-shot reason-\\ners. In NeurIPS.\\nRosa Patricio S. La, Brooks J. Paul, Deych Elena,\\nEdward L. Boone, David J. Edwards, Wang Qin,\\nSodergren Erica, Weinstock George, William D.\\nShannon, and Ethan P. White. 2012. Hypothe-\\nsis testing and power calculations for taxonomic-\\nbased human microbiome data. Plos One.\\nChia-Hsuan Lee, Hao Cheng, and Mari Osten-\\ndorf. 2021. Dialogue state tracking with a lan-\\nguage model using schema-driven prompting. In\\nEMNLP. Association for Computational Linguis-\\ntics.\\nHanmeng Liu, Ruoxi Ning, Zhiyang Teng, Jian Liu,\\nQiji Zhou, and Yue Zhang. 2023a. Evaluating the\\nlogical reasoning ability of chatgpt and GPT-4.\\nCoRR.\\nHanmeng Liu, Zhiyang Teng, Leyang Cui, Chaoli\\nZhang, Qiji Zhou, and Yue Zhang. 2023b. Logi-\\ncot: Logical chain-of-thought instruction-tuning\\ndata collection with GPT-4. CoRR.\\nPengfei Liu, Weizhe Yuan, Jinlan Fu, Zhengbao\\nJiang, Hiroaki Hayashi, and Graham Neubig.\\n2023c. Pre-train, prompt, and predict: A sys-\\ntematic survey of prompting methods in natural\\nlanguage processing. ACM Comput. Surv.\\nMohammad Sadegh Rasooli and Joel R. Tetreault.\\n2015. Yara parser: A fast and accurate depen-\\ndency parser. Computing Research Repository,\\narXiv:1503.06733. Version 2.\\nRaymond Reiter. 1981. On closed world data bases.\\nIn Readings in Artificial Intelligence.\\nStuart J. Russell and Peter Norvig. 2010. Artificial\\nIntelligence - A Modern Approach, Third Interna-\\ntional Edition. Pearson Education.\\nAbulhair Saparov and He He. 2022. Language mod-\\nels are greedy reasoners: A systematic formal\\nanalysis of chain-of-thought. CoRR.\\nCharles Joseph Singer, E. J. Holmyard, and A. R.\\nHall, editors. 1954–58. A history of technology.\\nOxford University Press, London. 5 vol.\\nJannik Strötgen and Michael Gertz. 2012. Temporal\\ntagging on different domains: Challenges, strate-\\ngies, and gold standards. In Proceedings of the\\nEight International Conference on Language Re-\\nsources and Evaluation (LREC’12), pages 3746–\\n3753, Istanbul, Turkey. European Language Re-\\nsource Association (ELRA).\\nS. Superman, B. Batman, C. Catwoman, and S. Spi-\\nderman. 2000. Superheroes experiences with\\nbooks, 20th edition. The Phantom Editors Asso-\\nciates, Gotham City.\\nOyvind Tafjord, Bhavana Dalvi, and Peter Clark.\\n2021.\\nProofwriter:\\nGenerating implications,\\nproofs, and abductive statements over natural\\nlanguage. In Findings of ACL.\\nRomal Thoppilan, Daniel De Freitas, Jamie Hall,\\nNoam Shazeer, Apoorv Kulshreshtha, Heng-\\nTze Cheng, Alicia Jin, Taylor Bos, Leslie\\nBaker, Yu Du, YaGuang Li, Hongrae Lee,\\nHuaixiu Steven Zheng, Amin Ghafouri, Marcelo\\nMenegali, Yanping Huang, Maxim Krikun, Dmitry\\nLepikhin, James Qin, Dehao Chen, Yuanzhong\\nXu, Zhifeng Chen, Adam Roberts, Maarten\\nBosma, Yanqi Zhou, Chung-Ching Chang, Igor\\nKrivokon, Will Rusch, Marc Pickett, Kathleen S.\\nMeier-Hellstern, Meredith Ringel Morris, Tulsee\\nDoshi, Renelito Delos Santos, Toju Duke, Johnny\\nSoraker, Ben Zevenbergen, Vinodkumar Prab-\\nhakaran, Mark Diaz, Ben Hutchinson, Kristen Ol-\\nson, Alejandra Molina, Erin Hoffman-John, Josh\\nLee, Lora Aroyo, Ravi Rajakumar, Alena Butryna,\\nMatthew Lamm, Viktoriya Kuzmina, Joe Fenton,\\nAaron Cohen, Rachel Bernstein, Ray Kurzweil,\\nBlaise Aguera-Arcas, Claire Cui, Marian Croak,\\nEd H. Chi, and Quoc Le. 2022. Lamda: Lan-\\nguage models for dialog applications. CoRR.\\nXuezhi Wang, Jason Wei, Dale Schuurmans,\\nQuoc V. Le, Ed H. Chi, and Denny Zhou. 2022.\\nSelf-consistency improves chain of thought rea-\\nsoning in language models. CoRR.\\nJason Wei, Yi Tay, Rishi Bommasani, Colin Raf-\\nfel, Barret Zoph, Sebastian Borgeaud, Dani Yo-\\ngatama, Maarten Bosma, Denny Zhou, Donald\\nMetzler, Ed H. Chi, Tatsunori Hashimoto, Oriol\\nVinyals, Percy Liang, Jeff Dean, and William Fe-\\ndus. 2022a. Emergent abilities of large language\\nmodels. Trans. Mach. Learn. Res.\\nJason Wei, Xuezhi Wang, Dale Schuurmans,\\nMaarten Bosma, Brian Ichter, Fei Xia, Ed H. Chi,\\nQuoc V. Le, and Denny Zhou. 2022b. Chain-\\nof-thought prompting elicits reasoning in large\\nlanguage models. In NeurIPS.\\nZhiheng Xi, Senjie Jin, Yuhao Zhou, Rui Zheng,\\nSongyang Gao, Tao Gui, Qi Zhang, and Xuanjing\\nHuang. 2023. Self-polish: Enhance reasoning in\\nlarge language models via problem refinement.\\nJingfeng Yang, Hongye Jin, Ruixiang Tang, Xiao-\\ntian Han, Qizhang Feng, Haoming Jiang, Bing\\nYin, and Xia Hu. 2023. Harnessing the power of\\nllms in practice: A survey on chatgpt and beyond.\\nCoRR.\\nEric Zelikman, Yuhuai Wu, Jesse Mu, and Noah D.\\nGoodman. 2022. Star: Bootstrapping reasoning\\nwith reasoning. In NeurIPS.\\nChuanyang Zheng, Zhengying Liu, Enze Xie, Zhen-\\nguo Li, and Yu Li. 2023. Progressive-hint prompt-\\ning improves reasoning in large language mod-\\nels.\\nDenny Zhou, Nathanael Schärli, Le Hou, Jason\\nWei, Nathan Scales, Xuezhi Wang, Dale Schuur-\\nmans, Olivier Bousquet, Quoc Le, and Ed H. Chi.\\n2022. Least-to-most prompting enables complex\\nreasoning in large language models. CoRR.\\n', metadata={'Published': '2024-05-09', 'Title': 'Hypothesis Testing Prompting Improves Deductive Reasoning in Large Language Models', 'Authors': 'Yitian Li, Jidong Tian, Hao He, Yaohui Jin', 'Summary': 'Combining different forms of prompts with pre-trained large language models\\nhas yielded remarkable results on reasoning tasks (e.g. Chain-of-Thought\\nprompting). However, along with testing on more complex reasoning, these\\nmethods also expose problems such as invalid reasoning and fictional reasoning\\npaths. In this paper, we develop \\\\textit{Hypothesis Testing Prompting}, which\\nadds conclusion assumptions, backward reasoning, and fact verification during\\nintermediate reasoning steps. \\\\textit{Hypothesis Testing prompting} involves\\nmultiple assumptions and reverses validation of conclusions leading to its\\nunique correct answer. Experiments on two challenging deductive reasoning\\ndatasets ProofWriter and RuleTaker show that hypothesis testing prompting not\\nonly significantly improves the effect, but also generates a more reasonable\\nand standardized reasoning process.'})" + ] + }, + "execution_count": 2, + "metadata": {}, + "output_type": "execute_result" + } + ], "source": [ - "%pip install --upgrade --quiet pymupdf" + "docs = loader.load()\n", + "docs[0]" ] }, { - "cell_type": "markdown", - "id": "95f05e1c-195e-4e2b-ae8e-8d6637f15be6", + "cell_type": "code", + "execution_count": 3, + "id": "45f27c2e", "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "{'Published': '2024-05-09', 'Title': 'Hypothesis Testing Prompting Improves Deductive Reasoning in Large Language Models', 'Authors': 'Yitian Li, Jidong Tian, Hao He, Yaohui Jin', 'Summary': 'Combining different forms of prompts with pre-trained large language models\\nhas yielded remarkable results on reasoning tasks (e.g. Chain-of-Thought\\nprompting). However, along with testing on more complex reasoning, these\\nmethods also expose problems such as invalid reasoning and fictional reasoning\\npaths. In this paper, we develop \\\\textit{Hypothesis Testing Prompting}, which\\nadds conclusion assumptions, backward reasoning, and fact verification during\\nintermediate reasoning steps. \\\\textit{Hypothesis Testing prompting} involves\\nmultiple assumptions and reverses validation of conclusions leading to its\\nunique correct answer. Experiments on two challenging deductive reasoning\\ndatasets ProofWriter and RuleTaker show that hypothesis testing prompting not\\nonly significantly improves the effect, but also generates a more reasonable\\nand standardized reasoning process.'}\n" + ] + } + ], "source": [ - "## Examples" + "print(docs[0].metadata)" ] }, { "cell_type": "markdown", - "id": "e29b954c-1407-4797-ae21-6ba8937156be", + "id": "6d90e292", "metadata": {}, "source": [ - "`ArxivLoader` has these arguments:\n", - "- `query`: free text which used to find documents in the Arxiv\n", - "- optional `load_max_docs`: default=100. Use it to limit number of downloaded documents. It takes time to download all 100 documents, so use a small number for experiments.\n", - "- optional `load_all_available_meta`: default=False. By default only the most important fields downloaded: `Published` (date when document was published/last updated), `Title`, `Authors`, `Summary`. If True, other fields also downloaded." + "## Lazy Load\n", + "\n", + "If we're loading a large number of Documents and our downstream operations can be done over subsets of all loaded Documents, we can lazily load our Documents one at a time to minimize our memory footprint:" ] }, { "cell_type": "code", - "execution_count": 3, - "id": "9bfd5e46", + "execution_count": 4, + "id": "f00655a1", "metadata": {}, "outputs": [], "source": [ - "from langchain_community.document_loaders import ArxivLoader" + "docs = []\n", + "\n", + "for doc in loader.lazy_load():\n", + " docs.append(doc)\n", + "\n", + " if len(docs) >= 10:\n", + " # do some paged operation, e.g.\n", + " # index.upsert(doc)\n", + "\n", + " docs = []" ] }, { - "cell_type": "code", - "execution_count": null, - "id": "700e4ef2", + "cell_type": "markdown", + "id": "2240306f", + "metadata": {}, + "source": [ + "In this example we never have more than 10 Documents loaded into memory at a time." + ] + }, + { + "cell_type": "markdown", + "id": "99e3e155", "metadata": {}, - "outputs": [], "source": [ - "docs = ArxivLoader(query=\"1605.08386\", load_max_docs=2).load()\n", - "len(docs)" + "## Use papers summaries as documents\n", + "\n", + "You can use summaries of Arvix paper as documents rather than raw papers:" ] }, { "cell_type": "code", "execution_count": 5, - "id": "8977bac0-0042-4f23-9754-247dbd32439b", - "metadata": { - "tags": [] - }, + "id": "cef009e3", + "metadata": {}, "outputs": [ { "data": { "text/plain": [ - "{'Published': '2016-05-26',\n", - " 'Title': 'Heat-bath random walks with Markov bases',\n", - " 'Authors': 'Caprice Stanley, Tobias Windisch',\n", - " 'Summary': 'Graphs on lattice points are studied whose edges come from a finite set of\\nallowed moves of arbitrary length. We show that the diameter of these graphs on\\nfibers of a fixed integer matrix can be bounded from above by a constant. We\\nthen study the mixing behaviour of heat-bath random walks on these graphs. We\\nalso state explicit conditions on the set of moves so that the heat-bath random\\nwalk, a generalization of the Glauber dynamics, is an expander in fixed\\ndimension.'}" + "Document(page_content='Combining different forms of prompts with pre-trained large language models\\nhas yielded remarkable results on reasoning tasks (e.g. Chain-of-Thought\\nprompting). However, along with testing on more complex reasoning, these\\nmethods also expose problems such as invalid reasoning and fictional reasoning\\npaths. In this paper, we develop \\\\textit{Hypothesis Testing Prompting}, which\\nadds conclusion assumptions, backward reasoning, and fact verification during\\nintermediate reasoning steps. \\\\textit{Hypothesis Testing prompting} involves\\nmultiple assumptions and reverses validation of conclusions leading to its\\nunique correct answer. Experiments on two challenging deductive reasoning\\ndatasets ProofWriter and RuleTaker show that hypothesis testing prompting not\\nonly significantly improves the effect, but also generates a more reasonable\\nand standardized reasoning process.', metadata={'Entry ID': 'http://arxiv.org/abs/2405.06707v1', 'Published': datetime.date(2024, 5, 9), 'Title': 'Hypothesis Testing Prompting Improves Deductive Reasoning in Large Language Models', 'Authors': 'Yitian Li, Jidong Tian, Hao He, Yaohui Jin'})" ] }, "execution_count": 5, @@ -125,30 +177,18 @@ } ], "source": [ - "docs[0].metadata # meta-information of the Document" + "docs = loader.get_summaries_as_docs()\n", + "docs[0]" ] }, { - "cell_type": "code", - "execution_count": 6, - "id": "46969806-45a9-4c4d-a61b-cfb9658fc9de", - "metadata": { - "tags": [] - }, - "outputs": [ - { - "data": { - "text/plain": [ - "'arXiv:1605.08386v1 [math.CO] 26 May 2016\\nHEAT-BATH RANDOM WALKS WITH MARKOV BASES\\nCAPRICE STANLEY AND TOBIAS WINDISCH\\nAbstract. Graphs on lattice points are studied whose edges come from a finite set of\\nallowed moves of arbitrary length. We show that the diameter of these graphs on fibers of a\\nfixed integer matrix can be bounded from above by a constant. We then study the mixing\\nbehaviour of heat-b'" - ] - }, - "execution_count": 6, - "metadata": {}, - "output_type": "execute_result" - } - ], + "cell_type": "markdown", + "id": "505d29a5", + "metadata": {}, "source": [ - "docs[0].page_content[:400] # all pages of the Document content" + "## API reference\n", + "\n", + "For detailed documentation of all ArxivLoader features and configurations head to the API reference: https://api.python.langchain.com/en/latest/document_loaders/langchain_community.document_loaders.arxiv.ArxivLoader.html#langchain_community.document_loaders.arxiv.ArxivLoader" ] } ], @@ -168,7 +208,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.10.6" + "version": "3.11.4" } }, "nbformat": 4, diff --git a/docs/docs/integrations/toolkits/jira.ipynb b/docs/docs/integrations/toolkits/jira.ipynb index 507edf89b762e..88fb23d38f378 100644 --- a/docs/docs/integrations/toolkits/jira.ipynb +++ b/docs/docs/integrations/toolkits/jira.ipynb @@ -14,7 +14,8 @@ "To use this tool, you must first set as environment variables:\n", " JIRA_API_TOKEN\n", " JIRA_USERNAME\n", - " JIRA_INSTANCE_URL" + " JIRA_INSTANCE_URL\n", + " JIRA_CLOUD" ] }, { @@ -88,7 +89,8 @@ "os.environ[\"JIRA_API_TOKEN\"] = \"abc\"\n", "os.environ[\"JIRA_USERNAME\"] = \"123\"\n", "os.environ[\"JIRA_INSTANCE_URL\"] = \"https://jira.atlassian.com\"\n", - "os.environ[\"OPENAI_API_KEY\"] = \"xyz\"" + "os.environ[\"OPENAI_API_KEY\"] = \"xyz\"\n", + "os.environ[\"JIRA_CLOUD\"] = \"True\"" ] }, { diff --git a/docs/docs/integrations/tools/google_drive.ipynb b/docs/docs/integrations/tools/google_drive.ipynb index d3d216a733b95..bc28a0597ac1f 100644 --- a/docs/docs/integrations/tools/google_drive.ipynb +++ b/docs/docs/integrations/tools/google_drive.ipynb @@ -99,7 +99,7 @@ }, "outputs": [], "source": [ - "from langchain_googldrive.tools.google_drive.tool import GoogleDriveSearchTool\n", + "from langchain_googledrive.tools.google_drive.tool import GoogleDriveSearchTool\n", "from langchain_googledrive.utilities.google_drive import GoogleDriveAPIWrapper\n", "\n", "# By default, search only in the filename.\n", diff --git a/docs/docs/tutorials/llm_chain.ipynb b/docs/docs/tutorials/llm_chain.ipynb index 89b3988d4d5f6..1eab4444a6a91 100644 --- a/docs/docs/tutorials/llm_chain.ipynb +++ b/docs/docs/tutorials/llm_chain.ipynb @@ -325,7 +325,7 @@ "id": "fedf6f13", "metadata": {}, "source": [ - "Next, we can create the PromptTemplate. This will be a combination of the `system_template` as well as a simpler template for where the put the text" + "Next, we can create the PromptTemplate. This will be a combination of the `system_template` as well as a simpler template for where to put the text to be translated" ] }, { diff --git a/docs/docs/tutorials/summarization.ipynb b/docs/docs/tutorials/summarization.ipynb index 20f4087ddb403..410bd17f5ab2d 100644 --- a/docs/docs/tutorials/summarization.ipynb +++ b/docs/docs/tutorials/summarization.ipynb @@ -640,7 +640,7 @@ "metadata": {}, "source": [ "## Splitting and summarizing in a single chain\n", - "For convenience, we can wrap both the text splitting of our long document and summarizing in a single `AnalyzeDocumentsChain`." + "For convenience, we can wrap both the text splitting of our long document and summarizing in a single [chain](/docs/how_to/sequence):" ] }, { @@ -650,12 +650,11 @@ "metadata": {}, "outputs": [], "source": [ - "from langchain.chains import AnalyzeDocumentChain\n", + "def split_text(text: str):\n", + " return text_splitter.create_documents([text])\n", "\n", - "summarize_document_chain = AnalyzeDocumentChain(\n", - " combine_docs_chain=chain, text_splitter=text_splitter\n", - ")\n", - "summarize_document_chain.invoke(docs[0].page_content)" + "\n", + "summarize_document_chain = split_text | chain" ] }, { diff --git a/libs/community/langchain_community/document_loaders/arxiv.py b/libs/community/langchain_community/document_loaders/arxiv.py index 2f26d155201bb..a4171c4694d33 100644 --- a/libs/community/langchain_community/document_loaders/arxiv.py +++ b/libs/community/langchain_community/document_loaders/arxiv.py @@ -8,23 +8,146 @@ class ArxivLoader(BaseLoader): """Load a query result from `Arxiv`. - The loader converts the original PDF format into the text. - Args: - Supports all arguments of `ArxivAPIWrapper`. - """ + Setup: + Install ``arxiv`` and ``PyMuPDF`` packages. + ``PyMuPDF`` transforms PDF files downloaded from the arxiv.org site + into the text format. + + .. code-block:: bash + + pip install -U arxiv pymupdf + + + Instantiate: + .. code-block:: python + + from langchain_community.document_loaders import ArxivLoader + + loader = ArxivLoader( + query="reasoning", + # load_max_docs=2, + # load_all_available_meta=False + ) + + Load: + .. code-block:: python + + docs = loader.load() + print(docs[0].page_content[:100]) + print(docs[0].metadata) + + .. code-block:: python + Understanding the Reasoning Ability of Language Models + From the Perspective of Reasoning Paths Aggre + { + 'Published': '2024-02-29', + 'Title': 'Understanding the Reasoning Ability of Language Models From the + Perspective of Reasoning Paths Aggregation', + 'Authors': 'Xinyi Wang, Alfonso Amayuelas, Kexun Zhang, Liangming Pan, + Wenhu Chen, William Yang Wang', + 'Summary': 'Pre-trained language models (LMs) are able to perform complex reasoning + without explicit fine-tuning...' + } + + + Lazy load: + .. code-block:: python + + docs = [] + docs_lazy = loader.lazy_load() + + # async variant: + # docs_lazy = await loader.alazy_load() + + for doc in docs_lazy: + docs.append(doc) + print(docs[0].page_content[:100]) + print(docs[0].metadata) + + .. code-block:: python + + Understanding the Reasoning Ability of Language Models + From the Perspective of Reasoning Paths Aggre + { + 'Published': '2024-02-29', + 'Title': 'Understanding the Reasoning Ability of Language Models From the + Perspective of Reasoning Paths Aggregation', + 'Authors': 'Xinyi Wang, Alfonso Amayuelas, Kexun Zhang, Liangming Pan, + Wenhu Chen, William Yang Wang', + 'Summary': 'Pre-trained language models (LMs) are able to perform complex reasoning + without explicit fine-tuning...' + } + + Async load: + .. code-block:: python + + docs = await loader.aload() + print(docs[0].page_content[:100]) + print(docs[0].metadata) + + .. code-block:: python + + Understanding the Reasoning Ability of Language Models + From the Perspective of Reasoning Paths Aggre + { + 'Published': '2024-02-29', + 'Title': 'Understanding the Reasoning Ability of Language Models From the + Perspective of Reasoning Paths Aggregation', + 'Authors': 'Xinyi Wang, Alfonso Amayuelas, Kexun Zhang, Liangming Pan, + Wenhu Chen, William Yang Wang', + 'Summary': 'Pre-trained language models (LMs) are able to perform complex reasoning + without explicit fine-tuning...' + } + + Use summaries of articles as docs: + .. code-block:: python + + from langchain_community.document_loaders import ArxivLoader + + loader = ArxivLoader( + query="reasoning" + ) + + docs = loader.get_summaries_as_docs() + print(docs[0].page_content[:100]) + print(docs[0].metadata) + + .. code-block:: python + + Pre-trained language models (LMs) are able to perform complex reasoning + without explicit fine-tuning + { + 'Entry ID': 'http://arxiv.org/abs/2402.03268v2', + 'Published': datetime.date(2024, 2, 29), + 'Title': 'Understanding the Reasoning Ability of Language Models From the + Perspective of Reasoning Paths Aggregation', + 'Authors': 'Xinyi Wang, Alfonso Amayuelas, Kexun Zhang, Liangming Pan, + Wenhu Chen, William Yang Wang' + } + """ # noqa: E501 def __init__( self, query: str, doc_content_chars_max: Optional[int] = None, **kwargs: Any ): + """Initialize with search query to find documents in the Arxiv. + Supports all arguments of `ArxivAPIWrapper`. + + Args: + query: free text which used to find documents in the Arxiv + doc_content_chars_max: cut limit for the length of a document's content + """ # noqa: E501 + self.query = query self.client = ArxivAPIWrapper( doc_content_chars_max=doc_content_chars_max, **kwargs ) def lazy_load(self) -> Iterator[Document]: + """Lazy load Arvix documents""" yield from self.client.lazy_load(self.query) def get_summaries_as_docs(self) -> List[Document]: + """Uses papers summaries as documents rather than source Arvix papers""" return self.client.get_summaries_as_docs(self.query) diff --git a/libs/community/langchain_community/graph_vectorstores/__init__.py b/libs/community/langchain_community/graph_vectorstores/__init__.py new file mode 100644 index 0000000000000..f5281743f7110 --- /dev/null +++ b/libs/community/langchain_community/graph_vectorstores/__init__.py @@ -0,0 +1,3 @@ +from langchain_community.graph_vectorstores.cassandra import CassandraGraphVectorStore + +__all__ = ["CassandraGraphVectorStore"] diff --git a/libs/community/langchain_community/graph_vectorstores/cassandra.py b/libs/community/langchain_community/graph_vectorstores/cassandra.py new file mode 100644 index 0000000000000..19f9453c42b38 --- /dev/null +++ b/libs/community/langchain_community/graph_vectorstores/cassandra.py @@ -0,0 +1,172 @@ +from __future__ import annotations + +from typing import ( + TYPE_CHECKING, + Any, + Iterable, + List, + Optional, + Type, +) + +from langchain_core.documents import Document +from langchain_core.embeddings import Embeddings +from langchain_core.graph_vectorstores.base import ( + GraphVectorStore, + Node, + nodes_to_documents, +) + +from langchain_community.utilities.cassandra import SetupMode + +if TYPE_CHECKING: + from cassandra.cluster import Session + + +class CassandraGraphVectorStore(GraphVectorStore): + def __init__( + self, + embedding: Embeddings, + *, + node_table: str = "graph_nodes", + targets_table: str = "graph_targets", + session: Optional[Session] = None, + keyspace: Optional[str] = None, + setup_mode: SetupMode = SetupMode.SYNC, + ): + """ + Create the hybrid graph store. + Parameters configure the ways that edges should be added between + documents. Many take `Union[bool, Set[str]]`, with `False` disabling + inference, `True` enabling it globally between all documents, and a set + of metadata fields defining a scope in which to enable it. Specifically, + passing a set of metadata fields such as `source` only links documents + with the same `source` metadata value. + Args: + embedding: The embeddings to use for the document content. + setup_mode: Mode used to create the Cassandra table (SYNC, + ASYNC or OFF). + """ + try: + from ragstack_knowledge_store import EmbeddingModel, graph_store + except (ImportError, ModuleNotFoundError): + raise ImportError( + "Could not import ragstack-knowledge-store python package. " + "Please install it with `pip install ragstack-knowledge-store`." + ) + + self._embedding = embedding + _setup_mode = getattr(graph_store.SetupMode, setup_mode.name) + + class _EmbeddingModelAdapter(EmbeddingModel): + def __init__(self, embeddings: Embeddings): + self.embeddings = embeddings + + def embed_texts(self, texts: List[str]) -> List[List[float]]: + return self.embeddings.embed_documents(texts) + + def embed_query(self, text: str) -> List[float]: + return self.embeddings.embed_query(text) + + async def aembed_texts(self, texts: List[str]) -> List[List[float]]: + return await self.embeddings.aembed_documents(texts) + + async def aembed_query(self, text: str) -> List[float]: + return await self.embeddings.aembed_query(text) + + self.store = graph_store.GraphStore( + embedding=_EmbeddingModelAdapter(embedding), + node_table=node_table, + targets_table=targets_table, + session=session, + keyspace=keyspace, + setup_mode=_setup_mode, + ) + + @property + def embeddings(self) -> Optional[Embeddings]: + return self._embedding + + def add_nodes( + self, + nodes: Iterable[Node], + **kwargs: Any, + ) -> Iterable[str]: + return self.store.add_nodes(nodes) + + @classmethod + def from_texts( + cls: Type["CassandraGraphVectorStore"], + texts: Iterable[str], + embedding: Embeddings, + metadatas: Optional[List[dict]] = None, + ids: Optional[Iterable[str]] = None, + **kwargs: Any, + ) -> "CassandraGraphVectorStore": + """Return CassandraGraphVectorStore initialized from texts and embeddings.""" + store = cls(embedding, **kwargs) + store.add_texts(texts, metadatas, ids=ids) + return store + + @classmethod + def from_documents( + cls: Type["CassandraGraphVectorStore"], + documents: Iterable[Document], + embedding: Embeddings, + ids: Optional[Iterable[str]] = None, + **kwargs: Any, + ) -> "CassandraGraphVectorStore": + """Return CassandraGraphVectorStore initialized from documents and + embeddings.""" + store = cls(embedding, **kwargs) + store.add_documents(documents, ids=ids) + return store + + def similarity_search( + self, query: str, k: int = 4, **kwargs: Any + ) -> List[Document]: + embedding_vector = self._embedding.embed_query(query) + return self.similarity_search_by_vector( + embedding_vector, + k=k, + ) + + def similarity_search_by_vector( + self, embedding: List[float], k: int = 4, **kwargs: Any + ) -> List[Document]: + nodes = self.store.similarity_search(embedding, k=k) + return list(nodes_to_documents(nodes)) + + def traversal_search( + self, + query: str, + *, + k: int = 4, + depth: int = 1, + **kwargs: Any, + ) -> Iterable[Document]: + nodes = self.store.traversal_search(query, k=k, depth=depth) + return nodes_to_documents(nodes) + + def mmr_traversal_search( + self, + query: str, + *, + k: int = 4, + depth: int = 2, + fetch_k: int = 100, + adjacent_k: int = 10, + lambda_mult: float = 0.5, + score_threshold: float = float("-inf"), + **kwargs: Any, + ) -> Iterable[Document]: + nodes = self.store.mmr_traversal_search( + query, + k=k, + depth=depth, + fetch_k=fetch_k, + adjacent_k=adjacent_k, + lambda_mult=lambda_mult, + score_threshold=score_threshold, + ) + return nodes_to_documents(nodes) diff --git a/libs/community/langchain_community/tools/jira/tool.py b/libs/community/langchain_community/tools/jira/tool.py index 47ca6146db502..a80044b2a10cd 100644 --- a/libs/community/langchain_community/tools/jira/tool.py +++ b/libs/community/langchain_community/tools/jira/tool.py @@ -7,6 +7,7 @@ JIRA_API_TOKEN JIRA_USERNAME JIRA_INSTANCE_URL + JIRA_CLOUD Below is a sample script that uses the Jira tool: diff --git a/libs/community/langchain_community/utilities/jira.py b/libs/community/langchain_community/utilities/jira.py index c03d3badfea7f..dfc274dbd4612 100644 --- a/libs/community/langchain_community/utilities/jira.py +++ b/libs/community/langchain_community/utilities/jira.py @@ -15,6 +15,7 @@ class JiraAPIWrapper(BaseModel): jira_username: Optional[str] = None jira_api_token: Optional[str] = None jira_instance_url: Optional[str] = None + jira_cloud: Optional[bool] = None class Config: """Configuration for this pydantic object.""" @@ -39,6 +40,10 @@ def validate_environment(cls, values: Dict) -> Dict: ) values["jira_instance_url"] = jira_instance_url + jira_cloud_str = get_from_dict_or_env(values, "jira_cloud", "JIRA_CLOUD") + jira_cloud = jira_cloud_str.lower() == "true" + values["jira_cloud"] = jira_cloud + try: from atlassian import Confluence, Jira except ImportError: @@ -51,21 +56,21 @@ def validate_environment(cls, values: Dict) -> Dict: jira = Jira( url=jira_instance_url, token=jira_api_token, - cloud=True, + cloud=jira_cloud, ) else: jira = Jira( url=jira_instance_url, username=jira_username, password=jira_api_token, - cloud=True, + cloud=jira_cloud, ) confluence = Confluence( url=jira_instance_url, username=jira_username, password=jira_api_token, - cloud=True, + cloud=jira_cloud, ) values["jira"] = jira diff --git a/libs/community/langchain_community/vectorstores/inmemory.py b/libs/community/langchain_community/vectorstores/inmemory.py index ce3f2ddeb5126..61a8aa13d2455 100644 --- a/libs/community/langchain_community/vectorstores/inmemory.py +++ b/libs/community/langchain_community/vectorstores/inmemory.py @@ -6,6 +6,7 @@ import numpy as np from langchain_core.documents import Document from langchain_core.embeddings import Embeddings +from langchain_core.indexing import UpsertResponse from langchain_core.load import dumpd, load from langchain_core.vectorstores import VectorStore @@ -37,27 +38,41 @@ def delete(self, ids: Optional[Sequence[str]] = None, **kwargs: Any) -> None: async def adelete(self, ids: Optional[Sequence[str]] = None, **kwargs: Any) -> None: self.delete(ids) - def add_texts( - self, - texts: Iterable[str], - metadatas: Optional[List[dict]] = None, - ids: Optional[Sequence[str]] = None, - **kwargs: Any, - ) -> List[str]: - """Add texts to the store.""" - vectors = self.embedding.embed_documents(list(texts)) - ids_ = [] - - for i, text in enumerate(texts): - doc_id = ids[i] if ids else str(uuid.uuid4()) - ids_.append(doc_id) + def upsert(self, items: Sequence[Document], /, **kwargs: Any) -> UpsertResponse: + vectors = self.embedding.embed_documents([item.page_content for item in items]) + ids = [] + for item, vector in zip(items, vectors): + doc_id = item.id if item.id else str(uuid.uuid4()) + ids.append(doc_id) self.store[doc_id] = { "id": doc_id, - "vector": vectors[i], - "text": text, - "metadata": metadatas[i] if metadatas else {}, + "vector": vector, + "text": item.page_content, + "metadata": item.metadata, } - return ids_ + return { + "succeeded": ids, + "failed": [], + } + + def get_by_ids(self, ids: Sequence[str], /) -> List[Document]: + """Get documents by their ids.""" + documents = [] + + for doc_id in ids: + doc = self.store.get(doc_id) + if doc: + documents.append( + Document( + id=doc["id"], + page_content=doc["text"], + metadata=doc["metadata"], + ) + ) + return documents + + async def aget_by_ids(self, ids: Sequence[str], /) -> List[Document]: + return self.get_by_ids(ids) async def aadd_texts( self, @@ -80,7 +95,9 @@ def _similarity_search_with_score_by_vector( similarity = float(cosine_similarity([embedding], [vector]).item(0)) result.append( ( - Document(page_content=doc["text"], metadata=doc["metadata"]), + Document( + id=doc["id"], page_content=doc["text"], metadata=doc["metadata"] + ), similarity, vector, ) diff --git a/libs/community/langchain_community/vectorstores/milvus.py b/libs/community/langchain_community/vectorstores/milvus.py index 576a48b32136f..01d5df92c828c 100644 --- a/libs/community/langchain_community/vectorstores/milvus.py +++ b/libs/community/langchain_community/vectorstores/milvus.py @@ -1053,7 +1053,7 @@ def get_pks(self, expr: str, **kwargs: Any) -> List[int] | None: pks = [item.get(self._primary_field) for item in query_result] return pks - def upsert( + def upsert( # type: ignore[override] self, ids: Optional[List[str]] = None, documents: List[Document] | None = None, diff --git a/libs/community/langchain_community/vectorstores/redis/base.py b/libs/community/langchain_community/vectorstores/redis/base.py index 8a885aff39720..6490e54c22f35 100644 --- a/libs/community/langchain_community/vectorstores/redis/base.py +++ b/libs/community/langchain_community/vectorstores/redis/base.py @@ -582,8 +582,8 @@ def write_schema(self, path: Union[str, os.PathLike]) -> None: with open(path, "w+") as f: yaml.dump(self.schema, f) - @staticmethod def delete( + self, ids: Optional[List[str]] = None, **kwargs: Any, ) -> bool: @@ -602,30 +602,12 @@ def delete( ValueError: If the redis python package is not installed. ValueError: If the ids (keys in redis) are not provided """ - redis_url = get_from_dict_or_env(kwargs, "redis_url", "REDIS_URL") - - if ids is None: - raise ValueError("'ids' (keys)() were not provided.") - - try: - import redis # noqa: F401 - except ImportError: - raise ImportError( - "Could not import redis python package. " - "Please install it with `pip install redis`." - ) - try: - # We need to first remove redis_url from kwargs, - # otherwise passing it to Redis will result in an error. - if "redis_url" in kwargs: - kwargs.pop("redis_url") - client = get_client(redis_url=redis_url, **kwargs) - except ValueError as e: - raise ValueError(f"Your redis connected error: {e}") + client = self.client # Check if index exists try: - client.delete(*ids) - logger.info("Entries deleted") + if ids: + client.delete(*ids) + logger.info("Entries deleted") return True except: # noqa: E722 # ids does not exist diff --git a/libs/community/tests/integration_tests/.env.example b/libs/community/tests/integration_tests/.env.example index 44a4490e1e65f..cf7d891b143b4 100644 --- a/libs/community/tests/integration_tests/.env.example +++ b/libs/community/tests/integration_tests/.env.example @@ -34,6 +34,7 @@ PINECONE_ENVIRONMENT=us-west4-gcp # JIRA_API_TOKEN=your_jira_api_token_here # JIRA_USERNAME=your_jira_username_here # JIRA_INSTANCE_URL=your_jira_instance_url_here +# JIRA_CLOUD=True # clickup diff --git a/libs/community/tests/integration_tests/graph_vectorstores/__init__.py b/libs/community/tests/integration_tests/graph_vectorstores/__init__.py new file mode 100644 index 0000000000000..e69de29bb2d1d diff --git a/libs/community/tests/integration_tests/graph_vectorstores/test_cassandra.py b/libs/community/tests/integration_tests/graph_vectorstores/test_cassandra.py new file mode 100644 index 0000000000000..bfa3946fa4a36 --- /dev/null +++ b/libs/community/tests/integration_tests/graph_vectorstores/test_cassandra.py @@ -0,0 +1,272 @@ +import math +import os +from typing import Iterable, List, Optional, Type + +from langchain_core.documents import Document +from langchain_core.embeddings import Embeddings +from langchain_core.graph_vectorstores.links import METADATA_LINKS_KEY, Link + +from langchain_community.graph_vectorstores import CassandraGraphVectorStore + +CASSANDRA_DEFAULT_KEYSPACE = "graph_test_keyspace" + + +def _get_graph_store( + embedding_class: Type[Embeddings], documents: Iterable[Document] = () +) -> CassandraGraphVectorStore: + import cassio + from cassandra.cluster import Cluster + from cassio.config import check_resolve_session, resolve_keyspace + + node_table = "graph_test_node_table" + edge_table = "graph_test_edge_table" + + if any( + env_var in os.environ + for env_var in [ + "CASSANDRA_CONTACT_POINTS", + "ASTRA_DB_APPLICATION_TOKEN", + "ASTRA_DB_INIT_STRING", + ] + ): + cassio.init(auto=True) + session = check_resolve_session() + else: + cluster = Cluster() + session = cluster.connect() + keyspace = resolve_keyspace() or CASSANDRA_DEFAULT_KEYSPACE + cassio.init(session=session, keyspace=keyspace) + # ensure keyspace exists + session.execute( + ( + f"CREATE KEYSPACE IF NOT EXISTS {keyspace} " + f"WITH replication = {{'class': 'SimpleStrategy', 'replication_factor': 1}}" + ) + ) + session.execute(f"DROP TABLE IF EXISTS {keyspace}.{node_table}") + session.execute(f"DROP TABLE IF EXISTS {keyspace}.{edge_table}") + store = CassandraGraphVectorStore.from_documents( + documents, + embedding=embedding_class(), + session=session, + keyspace=keyspace, + node_table=node_table, + targets_table=edge_table, + ) + return store + + +class FakeEmbeddings(Embeddings): + """Fake embeddings functionality for testing.""" + + def embed_documents(self, texts: List[str]) -> List[List[float]]: + """Return simple embeddings. + Embeddings encode each text as its index.""" + return [[float(1.0)] * 9 + [float(i)] for i in range(len(texts))] + + async def aembed_documents(self, texts: List[str]) -> List[List[float]]: + return self.embed_documents(texts) + + def embed_query(self, text: str) -> List[float]: + """Return constant query embeddings. + Embeddings are identical to embed_documents(texts)[0]. + Distance to each text will be that text's index, + as it was passed to embed_documents.""" + return [float(1.0)] * 9 + [float(0.0)] + + async def aembed_query(self, text: str) -> List[float]: + return self.embed_query(text) + + +class AngularTwoDimensionalEmbeddings(Embeddings): + """ + From angles (as strings in units of pi) to unit embedding vectors on a circle. + """ + + def embed_documents(self, texts: List[str]) -> List[List[float]]: + """ + Make a list of texts into a list of embedding vectors. + """ + return [self.embed_query(text) for text in texts] + + def embed_query(self, text: str) -> List[float]: + """ + Convert input text to a 'vector' (list of floats). + If the text is a number, use it as the angle for the + unit vector in units of pi. + Any other input text becomes the singular result [0, 0] ! + """ + try: + angle = float(text) + return [math.cos(angle * math.pi), math.sin(angle * math.pi)] + except ValueError: + # Assume: just test string, no attention is paid to values. + return [0.0, 0.0] + + +def _result_ids(docs: Iterable[Document]) -> List[Optional[str]]: + return [doc.id for doc in docs] + + +def test_mmr_traversal() -> None: + """ + Test end to end construction and MMR search. + The embedding function used here ensures `texts` become + the following vectors on a circle (numbered v0 through v3): + + ______ v2 + / \ + / | v1 + v3 | . | query + | / v0 + |______/ (N.B. very crude drawing) + + With fetch_k==2 and k==2, when query is at (1, ), + one expects that v2 and v0 are returned (in some order) + because v1 is "too close" to v0 (and v0 is closer than v1)). + + Both v2 and v3 are reachable via edges from v0, so once it is + selected, those are both considered. + """ + store = _get_graph_store(AngularTwoDimensionalEmbeddings) + + v0 = Document( + id="v0", + page_content="-0.124", + metadata={ + METADATA_LINKS_KEY: [ + Link.outgoing(kind="explicit", tag="link"), + ], + }, + ) + v1 = Document( + id="v1", + page_content="+0.127", + ) + v2 = Document( + id="v2", + page_content="+0.25", + metadata={ + METADATA_LINKS_KEY: [ + Link.incoming(kind="explicit", tag="link"), + ], + }, + ) + v3 = Document( + id="v3", + page_content="+1.0", + metadata={ + METADATA_LINKS_KEY: [ + Link.incoming(kind="explicit", tag="link"), + ], + }, + ) + store.add_documents([v0, v1, v2, v3]) + + results = store.mmr_traversal_search("0.0", k=2, fetch_k=2) + assert _result_ids(results) == ["v0", "v2"] + + # With max depth 0, no edges are traversed, so this doesn't reach v2 or v3. + # So it ends up picking "v1" even though it's similar to "v0". + results = store.mmr_traversal_search("0.0", k=2, fetch_k=2, depth=0) + assert _result_ids(results) == ["v0", "v1"] + + # With max depth 0 but higher `fetch_k`, we encounter v2 + results = store.mmr_traversal_search("0.0", k=2, fetch_k=3, depth=0) + assert _result_ids(results) == ["v0", "v2"] + + # v0 score is .46, v2 score is 0.16 so it won't be chosen. + results = store.mmr_traversal_search("0.0", k=2, score_threshold=0.2) + assert _result_ids(results) == ["v0"] + + # with k=4 we should get all of the documents. + results = store.mmr_traversal_search("0.0", k=4) + assert _result_ids(results) == ["v0", "v2", "v1", "v3"] + + +def test_write_retrieve_keywords() -> None: + from langchain_openai import OpenAIEmbeddings + + greetings = Document( + id="greetings", + page_content="Typical Greetings", + metadata={ + METADATA_LINKS_KEY: [ + Link.incoming(kind="parent", tag="parent"), + ], + }, + ) + doc1 = Document( + id="doc1", + page_content="Hello World", + metadata={ + METADATA_LINKS_KEY: [ + Link.outgoing(kind="parent", tag="parent"), + Link.bidir(kind="kw", tag="greeting"), + Link.bidir(kind="kw", tag="world"), + ], + }, + ) + doc2 = Document( + id="doc2", + page_content="Hello Earth", + metadata={ + METADATA_LINKS_KEY: [ + Link.outgoing(kind="parent", tag="parent"), + Link.bidir(kind="kw", tag="greeting"), + Link.bidir(kind="kw", tag="earth"), + ], + }, + ) + store = _get_graph_store(OpenAIEmbeddings, [greetings, doc1, doc2]) + + # Doc2 is more similar, but World and Earth are similar enough that doc1 also + # shows up. + results: Iterable[Document] = store.similarity_search("Earth", k=2) + assert _result_ids(results) == ["doc2", "doc1"] + + results = store.similarity_search("Earth", k=1) + assert _result_ids(results) == ["doc2"] + + results = store.traversal_search("Earth", k=2, depth=0) + assert _result_ids(results) == ["doc2", "doc1"] + + results = store.traversal_search("Earth", k=2, depth=1) + assert _result_ids(results) == ["doc2", "doc1", "greetings"] + + # K=1 only pulls in doc2 (Hello Earth) + results = store.traversal_search("Earth", k=1, depth=0) + assert _result_ids(results) == ["doc2"] + + # K=1 only pulls in doc2 (Hello Earth). Depth=1 traverses to parent and via + # keyword edge. + results = store.traversal_search("Earth", k=1, depth=1) + assert set(_result_ids(results)) == {"doc2", "doc1", "greetings"} + + +def test_metadata() -> None: + store = _get_graph_store(FakeEmbeddings) + store.add_documents( + [ + Document( + id="a", + page_content="A", + metadata={ + METADATA_LINKS_KEY: [ + Link.incoming(kind="hyperlink", tag="http://a"), + Link.bidir(kind="other", tag="foo"), + ], + "other": "some other field", + }, + ) + ] + ) + results = store.similarity_search("A") + assert len(results) == 1 + assert results[0].id == "a" + metadata = results[0].metadata + assert metadata["other"] == "some other field" + assert set(metadata[METADATA_LINKS_KEY]) == { + Link.incoming(kind="hyperlink", tag="http://a"), + Link.bidir(kind="other", tag="foo"), + } diff --git a/libs/community/tests/unit_tests/vectorstores/test_inmemory.py b/libs/community/tests/unit_tests/vectorstores/test_inmemory.py index 10abe633f0080..7381335103a0a 100644 --- a/libs/community/tests/unit_tests/vectorstores/test_inmemory.py +++ b/libs/community/tests/unit_tests/vectorstores/test_inmemory.py @@ -1,4 +1,5 @@ from pathlib import Path +from typing import Any import pytest from langchain_core.documents import Document @@ -13,6 +14,11 @@ ) +class AnyStr(str): + def __eq__(self, other: Any) -> bool: + return isinstance(other, str) + + class TestInMemoryReadWriteTestSuite(ReadWriteTestSuite): @pytest.fixture def vectorstore(self) -> InMemoryVectorStore: @@ -31,10 +37,13 @@ async def test_inmemory() -> None: ["foo", "bar", "baz"], ConsistentFakeEmbeddings() ) output = await store.asimilarity_search("foo", k=1) - assert output == [Document(page_content="foo")] + assert output == [Document(page_content="foo", id=AnyStr())] output = await store.asimilarity_search("bar", k=2) - assert output == [Document(page_content="bar"), Document(page_content="baz")] + assert output == [ + Document(page_content="bar", id=AnyStr()), + Document(page_content="baz", id=AnyStr()), + ] output2 = await store.asimilarity_search_with_score("bar", k=2) assert output2[0][1] > output2[1][1] @@ -61,8 +70,8 @@ async def test_inmemory_mmr() -> None: "foo", k=10, lambda_mult=0.1 ) assert len(output) == len(texts) - assert output[0] == Document(page_content="foo") - assert output[1] == Document(page_content="foy") + assert output[0] == Document(page_content="foo", id=AnyStr()) + assert output[1] == Document(page_content="foy", id=AnyStr()) async def test_inmemory_dump_load(tmp_path: Path) -> None: @@ -90,4 +99,4 @@ async def test_inmemory_filter() -> None: output = await store.asimilarity_search( "baz", filter=lambda doc: doc.metadata["id"] == 1 ) - assert output == [Document(page_content="foo", metadata={"id": 1})] + assert output == [Document(page_content="foo", metadata={"id": 1}, id=AnyStr())] diff --git a/libs/core/langchain_core/graph_vectorstores/__init__.py b/libs/core/langchain_core/graph_vectorstores/__init__.py new file mode 100644 index 0000000000000..973f0ef954580 --- /dev/null +++ b/libs/core/langchain_core/graph_vectorstores/__init__.py @@ -0,0 +1,15 @@ +from langchain_core.graph_vectorstores.base import ( + GraphVectorStore, + GraphVectorStoreRetriever, + Node, +) +from langchain_core.graph_vectorstores.links import ( + Link, +) + +__all__ = [ + "GraphVectorStore", + "GraphVectorStoreRetriever", + "Node", + "Link", +] diff --git a/libs/core/langchain_core/graph_vectorstores/base.py b/libs/core/langchain_core/graph_vectorstores/base.py new file mode 100644 index 0000000000000..37d1b43857995 --- /dev/null +++ b/libs/core/langchain_core/graph_vectorstores/base.py @@ -0,0 +1,692 @@ +from __future__ import annotations + +from abc import abstractmethod +from typing import ( + Any, + AsyncIterable, + ClassVar, + Collection, + Iterable, + Iterator, + List, + Optional, +) + +from langchain_core.callbacks import ( + AsyncCallbackManagerForRetrieverRun, + CallbackManagerForRetrieverRun, +) +from langchain_core.documents import Document +from langchain_core.graph_vectorstores.links import METADATA_LINKS_KEY, Link +from langchain_core.load import Serializable +from langchain_core.pydantic_v1 import Field +from langchain_core.runnables import run_in_executor +from langchain_core.vectorstores import VectorStore, VectorStoreRetriever + + +def _has_next(iterator: Iterator) -> bool: + """Checks if the iterator has more elements. + Warning: consumes an element from the iterator""" + sentinel = object() + return next(iterator, sentinel) is not sentinel + + +class Node(Serializable): + """Node in the GraphVectorStore. + + Edges exist from nodes with an outgoing link to nodes with a matching incoming link. + + For instance two nodes `a` and `b` connected over a hyperlink `https://some-url` + would look like: + + .. code-block:: python + [ + Node( + id="a", + text="some text a", + links= [ + Link(kind="hyperlink", tag="https://some-url", direction="incoming") + ], + ), + Node( + id="b", + text="some text b", + links= [ + Link(kind="hyperlink", tag="https://some-url", direction="outgoing") + ], + ) + ] + """ + + id: Optional[str] = None + """Unique ID for the node. Will be generated by the GraphVectorStore if not set.""" + text: str + """Text contained by the node.""" + metadata: dict = Field(default_factory=dict) + """Metadata for the node.""" + links: List[Link] = Field(default_factory=list) + """Links associated with the node.""" + + +def _texts_to_nodes( + texts: Iterable[str], + metadatas: Optional[Iterable[dict]], + ids: Optional[Iterable[str]], +) -> Iterator[Node]: + metadatas_it = iter(metadatas) if metadatas else None + ids_it = iter(ids) if ids else None + for text in texts: + try: + _metadata = next(metadatas_it).copy() if metadatas_it else {} + except StopIteration: + raise ValueError("texts iterable longer than metadatas") + try: + _id = next(ids_it) if ids_it else None + except StopIteration: + raise ValueError("texts iterable longer than ids") + + links = _metadata.pop(METADATA_LINKS_KEY, []) + if not isinstance(links, list): + links = list(links) + yield Node( + id=_id, + metadata=_metadata, + text=text, + links=links, + ) + if ids_it and _has_next(ids_it): + raise ValueError("ids iterable longer than texts") + if metadatas_it and _has_next(metadatas_it): + raise ValueError("metadatas iterable longer than texts") + + +def _documents_to_nodes(documents: Iterable[Document]) -> Iterator[Node]: + for doc in documents: + metadata = doc.metadata.copy() + links = metadata.pop(METADATA_LINKS_KEY, []) + if not isinstance(links, list): + links = list(links) + yield Node( + id=doc.id, + metadata=metadata, + text=doc.page_content, + links=links, + ) + + +def nodes_to_documents(nodes: Iterable[Node]) -> Iterator[Document]: + for node in nodes: + metadata = node.metadata.copy() + metadata[METADATA_LINKS_KEY] = [ + # Convert the core `Link` (from the node) back to the local `Link`. + Link(kind=link.kind, direction=link.direction, tag=link.tag) + for link in node.links + ] + + yield Document( + id=node.id, + page_content=node.text, + metadata=metadata, + ) + + +class GraphVectorStore(VectorStore): + """A hybrid vector-and-graph graph store. + + Document chunks support vector-similarity search as well as edges linking + chunks based on structural and semantic properties. + """ + + @abstractmethod + def add_nodes( + self, + nodes: Iterable[Node], + **kwargs: Any, + ) -> Iterable[str]: + """Add nodes to the graph store. + + Args: + nodes: the nodes to add. + """ + + async def aadd_nodes( + self, + nodes: Iterable[Node], + **kwargs: Any, + ) -> AsyncIterable[str]: + """Add nodes to the graph store. + + Args: + nodes: the nodes to add. + """ + iterator = iter(await run_in_executor(None, self.add_nodes, nodes, **kwargs)) + done = object() + while True: + doc = await run_in_executor(None, next, iterator, done) + if doc is done: + break + yield doc # type: ignore[misc] + + def add_texts( + self, + texts: Iterable[str], + metadatas: Optional[Iterable[dict]] = None, + *, + ids: Optional[Iterable[str]] = None, + **kwargs: Any, + ) -> List[str]: + """Run more texts through the embeddings and add to the vectorstore. + + The Links present in the metadata field `links` will be extracted to create + the `Node` links. + + Eg if nodes `a` and `b` are connected over a hyperlink `https://some-url`, the + function call would look like: + + .. code-block:: python + + store.add_texts( + ids=["a", "b"], + texts=["some text a", "some text b"], + metadatas=[ + { + "links": [ + Link.incoming(kind="hyperlink", tag="https://some-url") + ] + }, + { + "links": [ + Link.outgoing(kind="hyperlink", tag="https://some-url") + ] + }, + ], + ) + + Args: + texts: Iterable of strings to add to the vectorstore. + metadatas: Optional list of metadatas associated with the texts. + The metadata key `links` shall be an iterable of + :py:class:`~langchain_core.graph_vectorstores.links.Link`. + **kwargs: vectorstore specific parameters. + + Returns: + List of ids from adding the texts into the vectorstore. + """ + nodes = _texts_to_nodes(texts, metadatas, ids) + return list(self.add_nodes(nodes, **kwargs)) + + async def aadd_texts( + self, + texts: Iterable[str], + metadatas: Optional[Iterable[dict]] = None, + *, + ids: Optional[Iterable[str]] = None, + **kwargs: Any, + ) -> List[str]: + """Run more texts through the embeddings and add to the vectorstore. + + The Links present in the metadata field `links` will be extracted to create + the `Node` links. + + Eg if nodes `a` and `b` are connected over a hyperlink `https://some-url`, the + function call would look like: + + .. code-block:: python + + await store.aadd_texts( + ids=["a", "b"], + texts=["some text a", "some text b"], + metadatas=[ + { + "links": [ + Link.incoming(kind="hyperlink", tag="https://some-url") + ] + }, + { + "links": [ + Link.outgoing(kind="hyperlink", tag="https://some-url") + ] + }, + ], + ) + + Args: + texts: Iterable of strings to add to the vectorstore. + metadatas: Optional list of metadatas associated with the texts. + The metadata key `links` shall be an iterable of + :py:class:`~langchain_core.graph_vectorstores.links.Link`. + **kwargs: vectorstore specific parameters. + + Returns: + List of ids from adding the texts into the vectorstore. + """ + nodes = _texts_to_nodes(texts, metadatas, ids) + return [_id async for _id in self.aadd_nodes(nodes, **kwargs)] + + def add_documents( + self, + documents: Iterable[Document], + **kwargs: Any, + ) -> List[str]: + """Run more documents through the embeddings and add to the vectorstore. + + The Links present in the document metadata field `links` will be extracted to + create the `Node` links. + + Eg if nodes `a` and `b` are connected over a hyperlink `https://some-url`, the + function call would look like: + + .. code-block:: python + + store.add_documents( + [ + Document( + id="a", + page_content="some text a", + metadata={ + "links": [ + Link.incoming(kind="hyperlink", tag="http://some-url") + ] + } + ), + Document( + id="b", + page_content="some text b", + metadata={ + "links": [ + Link.outgoing(kind="hyperlink", tag="http://some-url") + ] + } + ), + ] + + ) + + Args: + documents: Documents to add to the vectorstore. + The document's metadata key `links` shall be an iterable of + :py:class:`~langchain_core.graph_vectorstores.links.Link`. + + Returns: + List of IDs of the added texts. + """ + nodes = _documents_to_nodes(documents) + return list(self.add_nodes(nodes, **kwargs)) + + async def aadd_documents( + self, + documents: Iterable[Document], + **kwargs: Any, + ) -> List[str]: + """Run more documents through the embeddings and add to the vectorstore. + + The Links present in the document metadata field `links` will be extracted to + create the `Node` links. + + Eg if nodes `a` and `b` are connected over a hyperlink `https://some-url`, the + function call would look like: + + .. code-block:: python + + store.add_documents( + [ + Document( + id="a", + page_content="some text a", + metadata={ + "links": [ + Link.incoming(kind="hyperlink", tag="http://some-url") + ] + } + ), + Document( + id="b", + page_content="some text b", + metadata={ + "links": [ + Link.outgoing(kind="hyperlink", tag="http://some-url") + ] + } + ), + ] + + ) + + Args: + documents: Documents to add to the vectorstore. + The document's metadata key `links` shall be an iterable of + :py:class:`~langchain_core.graph_vectorstores.links.Link`. + + Returns: + List of IDs of the added texts. + """ + nodes = _documents_to_nodes(documents) + return [_id async for _id in self.aadd_nodes(nodes, **kwargs)] + + @abstractmethod + def traversal_search( + self, + query: str, + *, + k: int = 4, + depth: int = 1, + **kwargs: Any, + ) -> Iterable[Document]: + """Retrieve documents from traversing this graph store. + + First, `k` nodes are retrieved using a search for each `query` string. + Then, additional nodes are discovered up to the given `depth` from those + starting nodes. + + Args: + query: The query string. + k: The number of Documents to return from the initial search. + Defaults to 4. Applies to each of the query strings. + depth: The maximum depth of edges to traverse. Defaults to 1. + Returns: + Retrieved documents. + """ + + async def atraversal_search( + self, + query: str, + *, + k: int = 4, + depth: int = 1, + **kwargs: Any, + ) -> AsyncIterable[Document]: + """Retrieve documents from traversing this graph store. + + First, `k` nodes are retrieved using a search for each `query` string. + Then, additional nodes are discovered up to the given `depth` from those + starting nodes. + + Args: + query: The query string. + k: The number of Documents to return from the initial search. + Defaults to 4. Applies to each of the query strings. + depth: The maximum depth of edges to traverse. Defaults to 1. + Returns: + Retrieved documents. + """ + iterator = iter( + await run_in_executor( + None, self.traversal_search, query, k=k, depth=depth, **kwargs + ) + ) + done = object() + while True: + doc = await run_in_executor(None, next, iterator, done) + if doc is done: + break + yield doc # type: ignore[misc] + + @abstractmethod + def mmr_traversal_search( + self, + query: str, + *, + k: int = 4, + depth: int = 2, + fetch_k: int = 100, + adjacent_k: int = 10, + lambda_mult: float = 0.5, + score_threshold: float = float("-inf"), + **kwargs: Any, + ) -> Iterable[Document]: + """Retrieve documents from this graph store using MMR-traversal. + + This strategy first retrieves the top `fetch_k` results by similarity to + the question. It then selects the top `k` results based on + maximum-marginal relevance using the given `lambda_mult`. + + At each step, it considers the (remaining) documents from `fetch_k` as + well as any documents connected by edges to a selected document + retrieved based on similarity (a "root"). + + Args: + query: The query string to search for. + k: Number of Documents to return. Defaults to 4. + fetch_k: Number of Documents to fetch via similarity. + Defaults to 100. + adjacent_k: Number of adjacent Documents to fetch. + Defaults to 10. + depth: Maximum depth of a node (number of edges) from a node + retrieved via similarity. Defaults to 2. + lambda_mult: Number between 0 and 1 that determines the degree + of diversity among the results with 0 corresponding to maximum + diversity and 1 to minimum diversity. Defaults to 0.5. + score_threshold: Only documents with a score greater than or equal + this threshold will be chosen. Defaults to negative infinity. + """ + + async def ammr_traversal_search( + self, + query: str, + *, + k: int = 4, + depth: int = 2, + fetch_k: int = 100, + adjacent_k: int = 10, + lambda_mult: float = 0.5, + score_threshold: float = float("-inf"), + **kwargs: Any, + ) -> AsyncIterable[Document]: + """Retrieve documents from this graph store using MMR-traversal. + + This strategy first retrieves the top `fetch_k` results by similarity to + the question. It then selects the top `k` results based on + maximum-marginal relevance using the given `lambda_mult`. + + At each step, it considers the (remaining) documents from `fetch_k` as + well as any documents connected by edges to a selected document + retrieved based on similarity (a "root"). + + Args: + query: The query string to search for. + k: Number of Documents to return. Defaults to 4. + fetch_k: Number of Documents to fetch via similarity. + Defaults to 100. + adjacent_k: Number of adjacent Documents to fetch. + Defaults to 10. + depth: Maximum depth of a node (number of edges) from a node + retrieved via similarity. Defaults to 2. + lambda_mult: Number between 0 and 1 that determines the degree + of diversity among the results with 0 corresponding to maximum + diversity and 1 to minimum diversity. Defaults to 0.5. + score_threshold: Only documents with a score greater than or equal + this threshold will be chosen. Defaults to negative infinity. + """ + iterator = iter( + await run_in_executor( + None, + self.mmr_traversal_search, + query, + k=k, + fetch_k=fetch_k, + adjacent_k=adjacent_k, + depth=depth, + lambda_mult=lambda_mult, + score_threshold=score_threshold, + **kwargs, + ) + ) + done = object() + while True: + doc = await run_in_executor(None, next, iterator, done) + if doc is done: + break + yield doc # type: ignore[misc] + + def similarity_search( + self, query: str, k: int = 4, **kwargs: Any + ) -> List[Document]: + return list(self.traversal_search(query, k=k, depth=0)) + + def max_marginal_relevance_search( + self, + query: str, + k: int = 4, + fetch_k: int = 20, + lambda_mult: float = 0.5, + **kwargs: Any, + ) -> List[Document]: + return list( + self.mmr_traversal_search( + query, k=k, fetch_k=fetch_k, lambda_mult=lambda_mult, depth=0 + ) + ) + + async def asimilarity_search( + self, query: str, k: int = 4, **kwargs: Any + ) -> List[Document]: + return [doc async for doc in self.atraversal_search(query, k=k, depth=0)] + + def search(self, query: str, search_type: str, **kwargs: Any) -> List[Document]: + if search_type == "similarity": + return self.similarity_search(query, **kwargs) + elif search_type == "similarity_score_threshold": + docs_and_similarities = self.similarity_search_with_relevance_scores( + query, **kwargs + ) + return [doc for doc, _ in docs_and_similarities] + elif search_type == "mmr": + return self.max_marginal_relevance_search(query, **kwargs) + elif search_type == "traversal": + return list(self.traversal_search(query, **kwargs)) + elif search_type == "mmr_traversal": + return list(self.mmr_traversal_search(query, **kwargs)) + else: + raise ValueError( + f"search_type of {search_type} not allowed. Expected " + "search_type to be 'similarity', 'similarity_score_threshold', " + "'mmr' or 'traversal'." + ) + + async def asearch( + self, query: str, search_type: str, **kwargs: Any + ) -> List[Document]: + if search_type == "similarity": + return await self.asimilarity_search(query, **kwargs) + elif search_type == "similarity_score_threshold": + docs_and_similarities = await self.asimilarity_search_with_relevance_scores( + query, **kwargs + ) + return [doc for doc, _ in docs_and_similarities] + elif search_type == "mmr": + return await self.amax_marginal_relevance_search(query, **kwargs) + elif search_type == "traversal": + return [doc async for doc in self.atraversal_search(query, **kwargs)] + else: + raise ValueError( + f"search_type of {search_type} not allowed. Expected " + "search_type to be 'similarity', 'similarity_score_threshold', " + "'mmr' or 'traversal'." + ) + + def as_retriever(self, **kwargs: Any) -> "GraphVectorStoreRetriever": + """Return GraphVectorStoreRetriever initialized from this GraphVectorStore. + + Args: + search_type (Optional[str]): Defines the type of search that + the Retriever should perform. + Can be "traversal" (default), "similarity", "mmr", or + "similarity_score_threshold". + search_kwargs (Optional[Dict]): Keyword arguments to pass to the + search function. Can include things like: + k: Amount of documents to return (Default: 4) + depth: The maximum depth of edges to traverse (Default: 1) + score_threshold: Minimum relevance threshold + for similarity_score_threshold + fetch_k: Amount of documents to pass to MMR algorithm (Default: 20) + lambda_mult: Diversity of results returned by MMR; + 1 for minimum diversity and 0 for maximum. (Default: 0.5) + Returns: + Retriever for this GraphVectorStore. + + Examples: + + .. code-block:: python + + # Retrieve documents traversing edges + docsearch.as_retriever( + search_type="traversal", + search_kwargs={'k': 6, 'depth': 3} + ) + + # Retrieve more documents with higher diversity + # Useful if your dataset has many similar documents + docsearch.as_retriever( + search_type="mmr", + search_kwargs={'k': 6, 'lambda_mult': 0.25} + ) + + # Fetch more documents for the MMR algorithm to consider + # But only return the top 5 + docsearch.as_retriever( + search_type="mmr", + search_kwargs={'k': 5, 'fetch_k': 50} + ) + + # Only retrieve documents that have a relevance score + # Above a certain threshold + docsearch.as_retriever( + search_type="similarity_score_threshold", + search_kwargs={'score_threshold': 0.8} + ) + + # Only get the single most similar document from the dataset + docsearch.as_retriever(search_kwargs={'k': 1}) + + """ + return GraphVectorStoreRetriever(vectorstore=self, **kwargs) + + +class GraphVectorStoreRetriever(VectorStoreRetriever): + """Retriever class for GraphVectorStore.""" + + vectorstore: GraphVectorStore + """GraphVectorStore to use for retrieval.""" + search_type: str = "traversal" + """Type of search to perform. Defaults to "traversal".""" + allowed_search_types: ClassVar[Collection[str]] = ( + "similarity", + "similarity_score_threshold", + "mmr", + "traversal", + "mmr_traversal", + ) + + def _get_relevant_documents( + self, query: str, *, run_manager: CallbackManagerForRetrieverRun + ) -> List[Document]: + if self.search_type == "traversal": + return list(self.vectorstore.traversal_search(query, **self.search_kwargs)) + elif self.search_type == "mmr_traversal": + return list( + self.vectorstore.mmr_traversal_search(query, **self.search_kwargs) + ) + else: + return super()._get_relevant_documents(query, run_manager=run_manager) + + async def _aget_relevant_documents( + self, query: str, *, run_manager: AsyncCallbackManagerForRetrieverRun + ) -> List[Document]: + if self.search_type == "traversal": + return [ + doc + async for doc in self.vectorstore.atraversal_search( + query, **self.search_kwargs + ) + ] + elif self.search_type == "mmr_traversal": + return [ + doc + async for doc in self.vectorstore.ammr_traversal_search( + query, **self.search_kwargs + ) + ] + else: + return await super()._aget_relevant_documents( + query, run_manager=run_manager + ) diff --git a/libs/core/langchain_core/graph_vectorstores/links.py b/libs/core/langchain_core/graph_vectorstores/links.py new file mode 100644 index 0000000000000..9da58a39276ce --- /dev/null +++ b/libs/core/langchain_core/graph_vectorstores/links.py @@ -0,0 +1,68 @@ +from dataclasses import dataclass +from typing import Iterable, List, Literal, Union + +from langchain_core.documents import Document + + +@dataclass(frozen=True) +class Link: + """A link to/from a tag of a given tag. + + Edges exist from nodes with an outgoing link to nodes with a matching incoming link. + """ + + kind: str + """The kind of link. Allows different extractors to use the same tag name without + creating collisions between extractors. For example “keyword” vs “url”.""" + direction: Literal["in", "out", "bidir"] + """The direction of the link.""" + tag: str + """The tag of the link.""" + + @staticmethod + def incoming(kind: str, tag: str) -> "Link": + """Create an incoming link.""" + return Link(kind=kind, direction="in", tag=tag) + + @staticmethod + def outgoing(kind: str, tag: str) -> "Link": + """Create an outgoing link.""" + return Link(kind=kind, direction="out", tag=tag) + + @staticmethod + def bidir(kind: str, tag: str) -> "Link": + """Create a bidirectional link.""" + return Link(kind=kind, direction="bidir", tag=tag) + + +METADATA_LINKS_KEY = "links" + + +def get_links(doc: Document) -> List[Link]: + """Get the links from a document. + Args: + doc: The document to get the link tags from. + Returns: + The set of link tags from the document. + """ + + links = doc.metadata.setdefault(METADATA_LINKS_KEY, []) + if not isinstance(links, list): + # Convert to a list and remember that. + links = list(links) + doc.metadata[METADATA_LINKS_KEY] = links + return links + + +def add_links(doc: Document, *links: Union[Link, Iterable[Link]]) -> None: + """Add links to the given metadata. + Args: + doc: The document to add the links to. + *links: The links to add to the document. + """ + links_in_metadata = get_links(doc) + for link in links: + if isinstance(link, Iterable): + links_in_metadata.extend(link) + else: + links_in_metadata.append(link) diff --git a/libs/core/langchain_core/indexing/__init__.py b/libs/core/langchain_core/indexing/__init__.py index 3643b13041037..305ae7b459da5 100644 --- a/libs/core/langchain_core/indexing/__init__.py +++ b/libs/core/langchain_core/indexing/__init__.py @@ -6,7 +6,11 @@ """ from langchain_core.indexing.api import IndexingResult, aindex, index -from langchain_core.indexing.base import InMemoryRecordManager, RecordManager +from langchain_core.indexing.base import ( + InMemoryRecordManager, + RecordManager, + UpsertResponse, +) __all__ = [ "aindex", @@ -14,4 +18,5 @@ "IndexingResult", "InMemoryRecordManager", "RecordManager", + "UpsertResponse", ] diff --git a/libs/core/langchain_core/indexing/base.py b/libs/core/langchain_core/indexing/base.py index 15912358ba0a7..e5037b664a38c 100644 --- a/libs/core/langchain_core/indexing/base.py +++ b/libs/core/langchain_core/indexing/base.py @@ -421,3 +421,16 @@ async def adelete_keys(self, keys: Sequence[str]) -> None: keys: A list of keys to delete. """ self.delete_keys(keys) + + +class UpsertResponse(TypedDict): + """A generic response for upsert operations. + + The upsert response will be used by abstractions that implement an upsert + operation for content that can be upserted by ID. + """ + + succeeded: List[str] + """The IDs that were successfully indexed.""" + failed: List[str] + """The IDs that failed to index.""" diff --git a/libs/core/langchain_core/load/dump.py b/libs/core/langchain_core/load/dump.py index 941be0ae2530f..df21f13b9789c 100644 --- a/libs/core/langchain_core/load/dump.py +++ b/libs/core/langchain_core/load/dump.py @@ -6,7 +6,14 @@ def default(obj: Any) -> Any: """Return a default value for a Serializable object or - a SerializedNotImplemented object.""" + a SerializedNotImplemented object. + + Args: + obj: The object to serialize to json if it is a Serializable object. + + Returns: + A json serializable object or a SerializedNotImplemented object. + """ if isinstance(obj, Serializable): return obj.to_json() else: @@ -17,13 +24,17 @@ def dumps(obj: Any, *, pretty: bool = False, **kwargs: Any) -> str: """Return a json string representation of an object. Args: - obj: The object to dump + obj: The object to dump. pretty: Whether to pretty print the json. If true, the json will be - indented with 2 spaces (if no indent is provided as part of kwargs) + indented with 2 spaces (if no indent is provided as part of kwargs). + Default is False. **kwargs: Additional arguments to pass to json.dumps Returns: - A json string representation of the object + A json string representation of the object. + + Raises: + ValueError: If `default` is passed as a kwarg. """ if "default" in kwargs: raise ValueError("`default` should not be passed to dumps") diff --git a/libs/core/langchain_core/load/load.py b/libs/core/langchain_core/load/load.py index ec5c67f2a08b9..f7db2ff6f022f 100644 --- a/libs/core/langchain_core/load/load.py +++ b/libs/core/langchain_core/load/load.py @@ -36,6 +36,17 @@ def __init__( valid_namespaces: Optional[List[str]] = None, secrets_from_env: bool = True, ) -> None: + """Initialize the reviver. + + Args: + secrets_map: A map of secrets to load. If a secret is not found in + the map, it will be loaded from the environment if `secrets_from_env` + is True. Defaults to None. + valid_namespaces: A list of additional namespaces (modules) + to allow to be deserialized. Defaults to None. + secrets_from_env: Whether to load secrets from the environment. + Defaults to True. + """ self.secrets_from_env = secrets_from_env self.secrets_map = secrets_map or dict() # By default only support langchain, but user can pass in additional namespaces @@ -130,9 +141,13 @@ def loads( Args: text: The string to load. - secrets_map: A map of secrets to load. + secrets_map: A map of secrets to load. If a secret is not found in + the map, it will be loaded from the environment if `secrets_from_env` + is True. Defaults to None. valid_namespaces: A list of additional namespaces (modules) - to allow to be deserialized. + to allow to be deserialized. Defaults to None. + secrets_from_env: Whether to load secrets from the environment. + Defaults to True. Returns: Revived LangChain objects. @@ -155,9 +170,13 @@ def load( Args: obj: The object to load. - secrets_map: A map of secrets to load. + secrets_map: A map of secrets to load. If a secret is not found in + the map, it will be loaded from the environment if `secrets_from_env` + is True. Defaults to None. valid_namespaces: A list of additional namespaces (modules) - to allow to be deserialized. + to allow to be deserialized. Defaults to None. + secrets_from_env: Whether to load secrets from the environment. + Defaults to True. Returns: Revived LangChain objects. diff --git a/libs/core/langchain_core/load/serializable.py b/libs/core/langchain_core/load/serializable.py index badc7ed0f063e..0035a604ca499 100644 --- a/libs/core/langchain_core/load/serializable.py +++ b/libs/core/langchain_core/load/serializable.py @@ -16,7 +16,14 @@ class BaseSerialized(TypedDict): - """Base class for serialized objects.""" + """Base class for serialized objects. + + Parameters: + lc: The version of the serialization format. + id: The unique identifier of the object. + name: The name of the object. Optional. + graph: The graph of the object. Optional. + """ lc: int id: List[str] @@ -25,20 +32,34 @@ class BaseSerialized(TypedDict): class SerializedConstructor(BaseSerialized): - """Serialized constructor.""" + """Serialized constructor. + + Parameters: + type: The type of the object. Must be "constructor". + kwargs: The constructor arguments. + """ type: Literal["constructor"] kwargs: Dict[str, Any] class SerializedSecret(BaseSerialized): - """Serialized secret.""" + """Serialized secret. + + Parameters: + type: The type of the object. Must be "secret". + """ type: Literal["secret"] class SerializedNotImplemented(BaseSerialized): - """Serialized not implemented.""" + """Serialized not implemented. + + Parameters: + type: The type of the object. Must be "not_implemented". + repr: The representation of the object. Optional. + """ type: Literal["not_implemented"] repr: Optional[str] @@ -50,10 +71,13 @@ def try_neq_default(value: Any, key: str, model: BaseModel) -> bool: Args: value: The value. key: The key. - model: The model. + model: The pydantic model. Returns: Whether the value is different from the default. + + Raises: + Exception: If the key is not in the model. """ try: return model.__fields__[key].get_default() != value @@ -69,23 +93,31 @@ class Serializable(BaseModel, ABC): It relies on the following methods and properties: - `is_lc_serializable`: Is this class serializable? - By design even if a class inherits from Serializable, it is not serializable by + By design, even if a class inherits from Serializable, it is not serializable by default. This is to prevent accidental serialization of objects that should not be serialized. - `get_lc_namespace`: Get the namespace of the langchain object. - During de-serialization this namespace is used to identify + During deserialization, this namespace is used to identify the correct class to instantiate. Please see the `Reviver` class in `langchain_core.load.load` for more details. - During de-serialization an additional mapping is handle + During deserialization an additional mapping is handle classes that have moved or been renamed across package versions. - `lc_secrets`: A map of constructor argument names to secret ids. - `lc_attributes`: List of additional attribute names that should be included - as part of the serialized representation.. + as part of the serialized representation. """ @classmethod def is_lc_serializable(cls) -> bool: - """Is this class serializable?""" + """Is this class serializable? + + By design, even if a class inherits from Serializable, it is not serializable by + default. This is to prevent accidental serialization of objects that should not + be serialized. + + Returns: + Whether the class is serializable. Default is False. + """ return False @classmethod @@ -111,6 +143,7 @@ def lc_attributes(self) -> Dict: """List of attribute names that should be included in the serialized kwargs. These attributes must be accepted by the constructor. + Default is an empty dictionary. """ return {} @@ -120,6 +153,8 @@ def lc_id(cls) -> List[str]: The unique identifier is a list of strings that describes the path to the object. + For example, for the class `langchain.llms.openai.OpenAI`, the id is + ["langchain", "llms", "openai", "OpenAI"]. """ return [*cls.get_lc_namespace(), cls.__name__] @@ -134,6 +169,11 @@ def __repr_args__(self) -> Any: ] def to_json(self) -> Union[SerializedConstructor, SerializedNotImplemented]: + """Serialize the object to JSON. + + Returns: + A json serializable object or a SerializedNotImplemented object. + """ if not self.is_lc_serializable(): return self.to_json_not_implemented() @@ -209,7 +249,10 @@ def _is_field_useful(inst: Serializable, key: str, value: Any) -> bool: value: The value. Returns: - Whether the field is useful. + Whether the field is useful. If the field is required, it is useful. + If the field is not required, it is useful if the value is not None. + If the field is not required and the value is None, it is useful if the + default value is different from the value. """ field = inst.__fields__.get(key) if not field: @@ -242,7 +285,7 @@ def to_json_not_implemented(obj: object) -> SerializedNotImplemented: """Serialize a "not implemented" object. Args: - obj: object to serialize + obj: object to serialize. Returns: SerializedNotImplemented diff --git a/libs/core/langchain_core/outputs/chat_generation.py b/libs/core/langchain_core/outputs/chat_generation.py index f62d8cca4af82..a2d0052012f9a 100644 --- a/libs/core/langchain_core/outputs/chat_generation.py +++ b/libs/core/langchain_core/outputs/chat_generation.py @@ -32,7 +32,17 @@ class ChatGeneration(Generation): @root_validator(pre=False, skip_on_failure=True) def set_text(cls, values: Dict[str, Any]) -> Dict[str, Any]: - """Set the text attribute to be the contents of the message.""" + """Set the text attribute to be the contents of the message. + + Args: + values: The values of the object. + + Returns: + The values of the object with the text attribute set. + + Raises: + ValueError: If the message is not a string or a list. + """ try: text = "" if isinstance(values["message"].content, str): @@ -64,13 +74,11 @@ def get_lc_namespace(cls) -> List[str]: class ChatGenerationChunk(ChatGeneration): """ChatGeneration chunk, which can be concatenated with other - ChatGeneration chunks. - - Attributes: - message: The message chunk output by the chat model. + ChatGeneration chunks. """ message: BaseMessageChunk + """The message chunk output by the chat model.""" # Override type to be ChatGeneration, ignore mypy error as this is intentional type: Literal["ChatGenerationChunk"] = "ChatGenerationChunk" # type: ignore[assignment] """Type is used exclusively for serialization purposes.""" diff --git a/libs/core/langchain_core/outputs/generation.py b/libs/core/langchain_core/outputs/generation.py index bddf8929809ff..7dbb6896d875d 100644 --- a/libs/core/langchain_core/outputs/generation.py +++ b/libs/core/langchain_core/outputs/generation.py @@ -31,7 +31,8 @@ class Generation(Serializable): May include things like the reason for finishing or token log probabilities. """ type: Literal["Generation"] = "Generation" - """Type is used exclusively for serialization purposes.""" + """Type is used exclusively for serialization purposes. + Set to "Generation" for this class.""" @classmethod def is_lc_serializable(cls) -> bool: diff --git a/libs/core/langchain_core/outputs/llm_result.py b/libs/core/langchain_core/outputs/llm_result.py index aae6fa501d2e2..ce4b3cd1132b4 100644 --- a/libs/core/langchain_core/outputs/llm_result.py +++ b/libs/core/langchain_core/outputs/llm_result.py @@ -48,9 +48,9 @@ def flatten(self) -> List[LLMResult]: """Flatten generations into a single list. Unpack List[List[Generation]] -> List[LLMResult] where each returned LLMResult - contains only a single Generation. If token usage information is available, - it is kept only for the LLMResult corresponding to the top-choice - Generation, to avoid over-counting of token usage downstream. + contains only a single Generation. If token usage information is available, + it is kept only for the LLMResult corresponding to the top-choice + Generation, to avoid over-counting of token usage downstream. Returns: List of LLMResults where each returned LLMResult contains a single diff --git a/libs/core/langchain_core/outputs/run_info.py b/libs/core/langchain_core/outputs/run_info.py index f9f84032ef40f..a54e3b49cfc1c 100644 --- a/libs/core/langchain_core/outputs/run_info.py +++ b/libs/core/langchain_core/outputs/run_info.py @@ -8,7 +8,7 @@ class RunInfo(BaseModel): """Class that contains metadata for a single execution of a Chain or model. - Here for backwards compatibility with older versions of langchain_core. + Defined for backwards compatibility with older versions of langchain_core. This model will likely be deprecated in the future. diff --git a/libs/core/langchain_core/prompts/base.py b/libs/core/langchain_core/prompts/base.py index 147b61e6a191c..11ccffe17adca 100644 --- a/libs/core/langchain_core/prompts/base.py +++ b/libs/core/langchain_core/prompts/base.py @@ -43,7 +43,10 @@ class BasePromptTemplate( """Base class for all prompt templates, returning a prompt.""" input_variables: List[str] - """A list of the names of the variables the prompt template expects.""" + """A list of the names of the variables whose values are required as inputs to the + prompt.""" + optional_variables: List[str] = Field(default=[]) + """A list of the names of the variables that are optional in the prompt.""" input_types: Dict[str, Any] = Field(default_factory=dict) """A dictionary of the types of the variables the prompt template expects. If not provided, all variables are assumed to be strings.""" @@ -84,12 +87,14 @@ def validate_variable_names(cls, values: Dict) -> Dict: @classmethod def get_lc_namespace(cls) -> List[str]: - """Get the namespace of the langchain object.""" + """Get the namespace of the langchain object. + Returns ["langchain", "schema", "prompt_template"].""" return ["langchain", "schema", "prompt_template"] @classmethod def is_lc_serializable(cls) -> bool: - """Return whether this class is serializable.""" + """Return whether this class is serializable. + Returns True.""" return True class Config: @@ -99,15 +104,29 @@ class Config: @property def OutputType(self) -> Any: + """Return the output type of the prompt.""" return Union[StringPromptValue, ChatPromptValueConcrete] def get_input_schema( self, config: Optional[RunnableConfig] = None ) -> Type[BaseModel]: + """Get the input schema for the prompt. + + Args: + config: RunnableConfig, configuration for the prompt. + + Returns: + Type[BaseModel]: The input schema for the prompt. + """ # This is correct, but pydantic typings/mypy don't think so. - return create_model( # type: ignore[call-overload] - "PromptInput", - **{k: (self.input_types.get(k, str), None) for k in self.input_variables}, + required_input_variables = { + k: (self.input_types.get(k, str), ...) for k in self.input_variables + } + optional_input_variables = { + k: (self.input_types.get(k, str), None) for k in self.optional_variables + } + return create_model( + "PromptInput", **{**required_input_variables, **optional_input_variables} ) def _validate_input(self, inner_input: Dict) -> Dict: @@ -143,6 +162,15 @@ async def _aformat_prompt_with_error_handling( def invoke( self, input: Dict, config: Optional[RunnableConfig] = None ) -> PromptValue: + """Invoke the prompt. + + Args: + input: Dict, input to the prompt. + config: RunnableConfig, configuration for the prompt. + + Returns: + PromptValue: The output of the prompt. + """ config = ensure_config(config) if self.metadata: config["metadata"] = {**config["metadata"], **self.metadata} @@ -158,6 +186,15 @@ def invoke( async def ainvoke( self, input: Dict, config: Optional[RunnableConfig] = None, **kwargs: Any ) -> PromptValue: + """Async invoke the prompt. + + Args: + input: Dict, input to the prompt. + config: RunnableConfig, configuration for the prompt. + + Returns: + PromptValue: The output of the prompt. + """ config = ensure_config(config) if self.metadata: config["metadata"].update(self.metadata) @@ -172,14 +209,35 @@ async def ainvoke( @abstractmethod def format_prompt(self, **kwargs: Any) -> PromptValue: - """Create Prompt Value.""" + """Create Prompt Value. + + Args: + kwargs: Any arguments to be passed to the prompt template. + + Returns: + PromptValue: The output of the prompt. + """ async def aformat_prompt(self, **kwargs: Any) -> PromptValue: - """Create Prompt Value.""" + """Async create Prompt Value. + + Args: + kwargs: Any arguments to be passed to the prompt template. + + Returns: + PromptValue: The output of the prompt. + """ return self.format_prompt(**kwargs) def partial(self, **kwargs: Union[str, Callable[[], str]]) -> BasePromptTemplate: - """Return a partial of the prompt template.""" + """Return a partial of the prompt template. + + Args: + kwargs: Union[str, Callable[[], str], partial variables to set. + + Returns: + BasePromptTemplate: A partial of the prompt template. + """ prompt_dict = self.__dict__.copy() prompt_dict["input_variables"] = list( set(self.input_variables).difference(kwargs) @@ -212,7 +270,7 @@ def format(self, **kwargs: Any) -> FormatOutputType: """ async def aformat(self, **kwargs: Any) -> FormatOutputType: - """Format the prompt with the inputs. + """Async format the prompt with the inputs. Args: kwargs: Any arguments to be passed to the prompt template. @@ -234,7 +292,17 @@ def _prompt_type(self) -> str: raise NotImplementedError def dict(self, **kwargs: Any) -> Dict: - """Return dictionary representation of prompt.""" + """Return dictionary representation of prompt. + + Args: + kwargs: Any additional arguments to pass to the dictionary. + + Returns: + Dict: Dictionary representation of the prompt. + + Raises: + NotImplementedError: If the prompt type is not implemented. + """ prompt_dict = super().dict(**kwargs) try: prompt_dict["_type"] = self._prompt_type @@ -248,6 +316,11 @@ def save(self, file_path: Union[Path, str]) -> None: Args: file_path: Path to directory to save prompt to. + Raises: + ValueError: If the prompt has partial variables. + ValueError: If the file path is not json or yaml. + NotImplementedError: If the prompt type is not implemented. + Example: .. code-block:: python @@ -300,7 +373,7 @@ def format_document(doc: Document, prompt: BasePromptTemplate[str]) -> str: First, this pulls information from the document from two sources: - 1. `page_content`: + 1. page_content: This takes the information from the `document.page_content` and assigns it to a variable named `page_content`. 2. metadata: @@ -333,11 +406,11 @@ def format_document(doc: Document, prompt: BasePromptTemplate[str]) -> str: async def aformat_document(doc: Document, prompt: BasePromptTemplate[str]) -> str: - """Format a document into a string based on a prompt template. + """Async format a document into a string based on a prompt template. First, this pulls information from the document from two sources: - 1. `page_content`: + 1. page_content: This takes the information from the `document.page_content` and assigns it to a variable named `page_content`. 2. metadata: diff --git a/libs/core/langchain_core/prompts/chat.py b/libs/core/langchain_core/prompts/chat.py index 9d27aedce8d88..82b1478d68740 100644 --- a/libs/core/langchain_core/prompts/chat.py +++ b/libs/core/langchain_core/prompts/chat.py @@ -48,7 +48,8 @@ class BaseMessagePromptTemplate(Serializable, ABC): @classmethod def is_lc_serializable(cls) -> bool: - """Return whether or not the class is serializable.""" + """Return whether or not the class is serializable. + Returns: True""" return True @classmethod @@ -68,7 +69,8 @@ def format_messages(self, **kwargs: Any) -> List[BaseMessage]: """ async def aformat_messages(self, **kwargs: Any) -> List[BaseMessage]: - """Format messages from kwargs. Should return a list of BaseMessages. + """Async format messages from kwargs. + Should return a list of BaseMessages. Args: **kwargs: Keyword arguments to use for formatting. @@ -88,10 +90,18 @@ def input_variables(self) -> List[str]: """ def pretty_repr(self, html: bool = False) -> str: - """Human-readable representation.""" + """Human-readable representation. + + Args: + html: Whether to format as HTML. Defaults to False. + + Returns: + Human-readable representation. + """ raise NotImplementedError def pretty_print(self) -> None: + """Print a human-readable representation.""" print(self.pretty_repr(html=is_interactive_env())) # noqa: T201 def __add__(self, other: Any) -> ChatPromptTemplate: @@ -208,6 +218,9 @@ def format_messages(self, **kwargs: Any) -> List[BaseMessage]: Returns: List of BaseMessage. + + Raises: + ValueError: If variable is not a list of messages. """ value = ( kwargs.get(self.variable_name, []) @@ -234,6 +247,14 @@ def input_variables(self) -> List[str]: return [self.variable_name] if not self.optional else [] def pretty_repr(self, html: bool = False) -> str: + """Human-readable representation. + + Args: + html: Whether to format as HTML. Defaults to False. + + Returns: + Human-readable representation. + """ var = "{" + self.variable_name + "}" if html: title = get_msg_title_repr("Messages Placeholder", bold=True) @@ -274,12 +295,13 @@ def from_template( Args: template: a template. - template_format: format of the template. + template_format: format of the template. Defaults to "f-string". partial_variables: A dictionary of variables that can be used to partially fill in the template. For example, if the template is `"{variable1} {variable2}"`, and `partial_variables` is `{"variable1": "foo"}`, then the final prompt will be `"foo {variable2}"`. + Defaults to None. **kwargs: keyword arguments to pass to the constructor. Returns: @@ -324,7 +346,7 @@ def format(self, **kwargs: Any) -> BaseMessage: """ async def aformat(self, **kwargs: Any) -> BaseMessage: - """Format the prompt template. + """Async format the prompt template. Args: **kwargs: Keyword arguments to use for formatting. @@ -346,6 +368,14 @@ def format_messages(self, **kwargs: Any) -> List[BaseMessage]: return [self.format(**kwargs)] async def aformat_messages(self, **kwargs: Any) -> List[BaseMessage]: + """Async format messages from kwargs. + + Args: + **kwargs: Keyword arguments to use for formatting. + + Returns: + List of BaseMessages. + """ return [await self.aformat(**kwargs)] @property @@ -359,6 +389,14 @@ def input_variables(self) -> List[str]: return self.prompt.input_variables def pretty_repr(self, html: bool = False) -> str: + """Human-readable representation. + + Args: + html: Whether to format as HTML. Defaults to False. + + Returns: + Human-readable representation. + """ # TODO: Handle partials title = self.__class__.__name__.replace("MessagePromptTemplate", " Message") title = get_msg_title_repr(title, bold=html) @@ -391,6 +429,14 @@ def format(self, **kwargs: Any) -> BaseMessage: ) async def aformat(self, **kwargs: Any) -> BaseMessage: + """Async format the prompt template. + + Args: + **kwargs: Keyword arguments to use for formatting. + + Returns: + Formatted message. + """ text = await self.prompt.aformat(**kwargs) return ChatMessage( content=text, role=self.role, additional_kwargs=self.additional_kwargs @@ -440,12 +486,16 @@ def from_template( Args: template: a template. - template_format: format of the template. + template_format: format of the template. Defaults to "f-string". partial_variables: A dictionary of variables that can be used too partially. + Defaults to None. **kwargs: keyword arguments to pass to the constructor. Returns: A new instance of this class. + + Raises: + ValueError: If the template is not a string or list of strings. """ if isinstance(template, str): prompt: Union[StringPromptTemplate, List] = PromptTemplate.from_template( @@ -542,6 +592,14 @@ def format_messages(self, **kwargs: Any) -> List[BaseMessage]: return [self.format(**kwargs)] async def aformat_messages(self, **kwargs: Any) -> List[BaseMessage]: + """Async format messages from kwargs. + + Args: + **kwargs: Keyword arguments to use for formatting. + + Returns: + List of BaseMessages. + """ return [await self.aformat(**kwargs)] @property @@ -585,7 +643,7 @@ def format(self, **kwargs: Any) -> BaseMessage: ) async def aformat(self, **kwargs: Any) -> BaseMessage: - """Format the prompt template. + """Async format the prompt template. Args: **kwargs: Keyword arguments to use for formatting. @@ -613,6 +671,14 @@ async def aformat(self, **kwargs: Any) -> BaseMessage: ) def pretty_repr(self, html: bool = False) -> str: + """Human-readable representation. + + Args: + html: Whether to format as HTML. Defaults to False. + + Returns: + Human-readable representation. + """ # TODO: Handle partials title = self.__class__.__name__.replace("MessagePromptTemplate", " Message") title = get_msg_title_repr(title, bold=html) @@ -671,25 +737,25 @@ def format(self, **kwargs: Any) -> str: in all the template messages in this chat template. Returns: - formatted string + formatted string. """ return self.format_prompt(**kwargs).to_string() async def aformat(self, **kwargs: Any) -> str: - """Format the chat template into a string. + """Async format the chat template into a string. Args: **kwargs: keyword arguments to use for filling in template variables in all the template messages in this chat template. Returns: - formatted string + formatted string. """ return (await self.aformat_prompt(**kwargs)).to_string() def format_prompt(self, **kwargs: Any) -> PromptValue: - """ - Format prompt. Should return a PromptValue. + """Format prompt. Should return a PromptValue. + Args: **kwargs: Keyword arguments to use for formatting. @@ -700,6 +766,14 @@ def format_prompt(self, **kwargs: Any) -> PromptValue: return ChatPromptValue(messages=messages) async def aformat_prompt(self, **kwargs: Any) -> PromptValue: + """Async format prompt. Should return a PromptValue. + + Args: + **kwargs: Keyword arguments to use for formatting. + + Returns: + PromptValue. + """ messages = await self.aformat_messages(**kwargs) return ChatPromptValue(messages=messages) @@ -708,14 +782,22 @@ def format_messages(self, **kwargs: Any) -> List[BaseMessage]: """Format kwargs into a list of messages.""" async def aformat_messages(self, **kwargs: Any) -> List[BaseMessage]: - """Format kwargs into a list of messages.""" + """Async format kwargs into a list of messages.""" return self.format_messages(**kwargs) def pretty_repr(self, html: bool = False) -> str: - """Human-readable representation.""" + """Human-readable representation. + + Args: + html: Whether to format as HTML. Defaults to False. + + Returns: + Human-readable representation. + """ raise NotImplementedError def pretty_print(self) -> None: + """Print a human-readable representation.""" print(self.pretty_repr(html=is_interactive_env())) # noqa: T201 @@ -834,8 +916,6 @@ class ChatPromptTemplate(BaseChatPromptTemplate): """ # noqa: E501 - input_variables: List[str] - """List of input variables in template messages. Used for validation.""" messages: List[MessageLike] """List of messages consisting of either message prompt templates or messages.""" validate_template: bool = False @@ -883,18 +963,32 @@ def validate_input_variables(cls, values: dict) -> dict: Returns: Validated values. + + Raises: + ValueError: If input variables do not match. """ messages = values["messages"] input_vars = set() + optional_variables = set() input_types: Dict[str, Any] = values.get("input_types", {}) for message in messages: if isinstance(message, (BaseMessagePromptTemplate, BaseChatPromptTemplate)): input_vars.update(message.input_variables) if isinstance(message, MessagesPlaceholder): + if "partial_variables" not in values: + values["partial_variables"] = {} + if ( + message.optional + and message.variable_name not in values["partial_variables"] + ): + values["partial_variables"][message.variable_name] = [] + optional_variables.add(message.variable_name) if message.variable_name not in input_types: input_types[message.variable_name] = List[AnyMessage] if "partial_variables" in values: input_vars = input_vars - set(values["partial_variables"]) + if optional_variables: + input_vars = input_vars - optional_variables if "input_variables" in values and values.get("validate_template"): if input_vars != set(values["input_variables"]): raise ValueError( @@ -904,6 +998,8 @@ def validate_input_variables(cls, values: dict) -> dict: ) else: values["input_variables"] = sorted(input_vars) + if optional_variables: + values["optional_variables"] = sorted(optional_variables) values["input_types"] = input_types return values @@ -936,7 +1032,7 @@ def from_role_strings( string_messages: list of (role, template) tuples. Returns: - a chat prompt template + a chat prompt template. """ return cls( # type: ignore[call-arg] messages=[ @@ -956,7 +1052,7 @@ def from_strings( string_messages: list of (role class, template) tuples. Returns: - a chat prompt template + a chat prompt template. """ return cls.from_messages(string_messages) @@ -995,10 +1091,11 @@ def from_messages( (1) BaseMessagePromptTemplate, (2) BaseMessage, (3) 2-tuple of (message type, template); e.g., ("human", "{user_input}"), (4) 2-tuple of (message class, template), (4) a string which is - shorthand for ("human", template); e.g., "{user_input}" + shorthand for ("human", template); e.g., "{user_input}". + template_format: format of the template. Defaults to "f-string". Returns: - a chat prompt template + a chat prompt template. """ _messages = [ _convert_to_message(message, template_format) for message in messages @@ -1006,10 +1103,12 @@ def from_messages( # Automatically infer input variables from messages input_vars: Set[str] = set() + optional_variables: Set[str] = set() partial_vars: Dict[str, Any] = {} for _message in _messages: if isinstance(_message, MessagesPlaceholder) and _message.optional: partial_vars[_message.variable_name] = [] + optional_variables.add(_message.variable_name) elif isinstance( _message, (BaseChatPromptTemplate, BaseMessagePromptTemplate) ): @@ -1017,6 +1116,7 @@ def from_messages( return cls( input_variables=sorted(input_vars), + optional_variables=sorted(optional_variables), messages=_messages, partial_variables=partial_vars, ) @@ -1029,7 +1129,7 @@ def format_messages(self, **kwargs: Any) -> List[BaseMessage]: in all the template messages in this chat template. Returns: - list of formatted messages + list of formatted messages. """ kwargs = self._merge_partial_and_user_variables(**kwargs) result = [] @@ -1046,14 +1146,17 @@ def format_messages(self, **kwargs: Any) -> List[BaseMessage]: return result async def aformat_messages(self, **kwargs: Any) -> List[BaseMessage]: - """Format the chat template into a list of finalized messages. + """Async format the chat template into a list of finalized messages. Args: **kwargs: keyword arguments to use for filling in template variables in all the template messages in this chat template. Returns: - list of formatted messages + list of formatted messages. + + Raises: + ValueError: If unexpected input. """ kwargs = self._merge_partial_and_user_variables(**kwargs) result = [] @@ -1106,7 +1209,7 @@ def partial(self, **kwargs: Any) -> ChatPromptTemplate: return type(self)(**prompt_dict) def append(self, message: MessageLikeRepresentation) -> None: - """Append message to the end of the chat template. + """Append a message to the end of the chat template. Args: message: representation of a message to append. @@ -1114,7 +1217,11 @@ def append(self, message: MessageLikeRepresentation) -> None: self.messages.append(_convert_to_message(message)) def extend(self, messages: Sequence[MessageLikeRepresentation]) -> None: - """Extend the chat template with a sequence of messages.""" + """Extend the chat template with a sequence of messages. + + Args: + messages: sequence of message representations to append. + """ self.messages.extend([_convert_to_message(message) for message in messages]) @overload @@ -1140,7 +1247,7 @@ def __len__(self) -> int: @property def _prompt_type(self) -> str: - """Name of prompt type.""" + """Name of prompt type. Used for serialization.""" return "chat" def save(self, file_path: Union[Path, str]) -> None: @@ -1152,6 +1259,14 @@ def save(self, file_path: Union[Path, str]) -> None: raise NotImplementedError() def pretty_repr(self, html: bool = False) -> str: + """Human-readable representation. + + Args: + html: Whether to format as HTML. Defaults to False. + + Returns: + Human-readable representation. + """ # TODO: handle partials return "\n\n".join(msg.pretty_repr(html=html) for msg in self.messages) @@ -1166,9 +1281,13 @@ def _create_template_from_message_type( Args: message_type: str the type of the message template (e.g., "human", "ai", etc.) template: str the template string. + template_format: format of the template. Defaults to "f-string". Returns: a message prompt template of the appropriate type. + + Raises: + ValueError: If unexpected message type. """ if message_type in ("human", "user"): message: BaseMessagePromptTemplate = HumanMessagePromptTemplate.from_template( @@ -1235,10 +1354,15 @@ def _convert_to_message( - string: shorthand for ("human", template); e.g., "{user_input}" Args: - message: a representation of a message in one of the supported formats + message: a representation of a message in one of the supported formats. + template_format: format of the template. Defaults to "f-string". Returns: - an instance of a message or a message template + an instance of a message or a message template. + + Raises: + ValueError: If unexpected message type. + ValueError: If 2-tuple does not have 2 elements. """ if isinstance(message, (BaseMessagePromptTemplate, BaseChatPromptTemplate)): _message: Union[ diff --git a/libs/core/langchain_core/prompts/few_shot.py b/libs/core/langchain_core/prompts/few_shot.py index 800ee30654408..efd3fd1417f82 100644 --- a/libs/core/langchain_core/prompts/few_shot.py +++ b/libs/core/langchain_core/prompts/few_shot.py @@ -18,7 +18,7 @@ check_valid_template, get_template_variables, ) -from langchain_core.pydantic_v1 import BaseModel, Extra, Field, root_validator +from langchain_core.pydantic_v1 import BaseModel, Extra, root_validator class _FewShotPromptTemplateMixin(BaseModel): @@ -40,7 +40,18 @@ class Config: @root_validator(pre=True) def check_examples_and_selector(cls, values: Dict) -> Dict: - """Check that one and only one of examples/example_selector are provided.""" + """Check that one and only one of examples/example_selector are provided. + + Args: + values: The values to check. + + Returns: + The values if they are valid. + + Raises: + ValueError: If neither or both examples and example_selector are provided. + ValueError: If both examples and example_selector are provided. + """ examples = values.get("examples", None) example_selector = values.get("example_selector", None) if examples and example_selector: @@ -63,6 +74,9 @@ def _get_examples(self, **kwargs: Any) -> List[dict]: Returns: List of examples. + + Raises: + ValueError: If neither examples nor example_selector are provided. """ if self.examples is not None: return self.examples @@ -74,13 +88,16 @@ def _get_examples(self, **kwargs: Any) -> List[dict]: ) async def _aget_examples(self, **kwargs: Any) -> List[dict]: - """Get the examples to use for formatting the prompt. + """Async get the examples to use for formatting the prompt. Args: **kwargs: Keyword arguments to be passed to the example selector. Returns: List of examples. + + Raises: + ValueError: If neither examples nor example_selector are provided. """ if self.examples is not None: return self.examples @@ -103,9 +120,6 @@ def is_lc_serializable(cls) -> bool: validate_template: bool = False """Whether or not to try validating the template.""" - input_variables: List[str] - """A list of the names of the variables the prompt template expects.""" - example_prompt: PromptTemplate """PromptTemplate used to format an individual example.""" @@ -147,6 +161,16 @@ class Config: arbitrary_types_allowed = True def format(self, **kwargs: Any) -> str: + """Format the prompt with inputs generating a string. + + Use this method to generate a string representation of a prompt. + + Args: + **kwargs: keyword arguments to use for formatting. + + Returns: + A string representation of the prompt. + """ kwargs = self._merge_partial_and_user_variables(**kwargs) # Get the examples to use. examples = self._get_examples(**kwargs) @@ -165,6 +189,16 @@ def format(self, **kwargs: Any) -> str: return DEFAULT_FORMATTER_MAPPING[self.template_format](template, **kwargs) async def aformat(self, **kwargs: Any) -> str: + """Async format the prompt with inputs generating a string. + + Use this method to generate a string representation of a prompt. + + Args: + **kwargs: keyword arguments to use for formatting. + + Returns: + A string representation of the prompt. + """ kwargs = self._merge_partial_and_user_variables(**kwargs) # Get the examples to use. examples = await self._aget_examples(**kwargs) @@ -188,6 +222,14 @@ def _prompt_type(self) -> str: return "few_shot" def save(self, file_path: Union[Path, str]) -> None: + """Save the prompt template to a file. + + Args: + file_path: The path to save the prompt template to. + + Raises: + ValueError: If example_selector is provided. + """ if self.example_selector: raise ValueError("Saving an example selector is not currently supported") return super().save(file_path) @@ -314,9 +356,6 @@ def is_lc_serializable(cls) -> bool: """Return whether or not the class is serializable.""" return False - input_variables: List[str] = Field(default_factory=list) - """A list of the names of the variables the prompt template will use - to pass to the example_selector, if provided.""" example_prompt: Union[BaseMessagePromptTemplate, BaseChatPromptTemplate] """The class to format each example.""" @@ -349,7 +388,7 @@ def format_messages(self, **kwargs: Any) -> List[BaseMessage]: return messages async def aformat_messages(self, **kwargs: Any) -> List[BaseMessage]: - """Format kwargs into a list of messages. + """Async format kwargs into a list of messages. Args: **kwargs: keyword arguments to use for filling in templates in messages. @@ -376,7 +415,7 @@ def format(self, **kwargs: Any) -> str: Use this method to generate a string representation of a prompt consisting of chat messages. - Useful for feeding into a string based completion language model or debugging. + Useful for feeding into a string-based completion language model or debugging. Args: **kwargs: keyword arguments to use for formatting. @@ -388,8 +427,29 @@ def format(self, **kwargs: Any) -> str: return get_buffer_string(messages) async def aformat(self, **kwargs: Any) -> str: + """Async format the prompt with inputs generating a string. + + Use this method to generate a string representation of a prompt consisting + of chat messages. + + Useful for feeding into a string-based completion language model or debugging. + + Args: + **kwargs: keyword arguments to use for formatting. + + Returns: + A string representation of the prompt + """ messages = await self.aformat_messages(**kwargs) return get_buffer_string(messages) def pretty_repr(self, html: bool = False) -> str: + """Return a pretty representation of the prompt template. + + Args: + html: Whether or not to return an HTML formatted string. + + Returns: + A pretty representation of the prompt template. + """ raise NotImplementedError() diff --git a/libs/core/langchain_core/prompts/few_shot_with_templates.py b/libs/core/langchain_core/prompts/few_shot_with_templates.py index ef2feb0b9ad4d..8e5db1a81fa39 100644 --- a/libs/core/langchain_core/prompts/few_shot_with_templates.py +++ b/libs/core/langchain_core/prompts/few_shot_with_templates.py @@ -28,9 +28,6 @@ class FewShotPromptWithTemplates(StringPromptTemplate): suffix: StringPromptTemplate """A PromptTemplate to put after the examples.""" - input_variables: List[str] - """A list of the names of the variables the prompt template expects.""" - example_separator: str = "\n\n" """String separator used to join the prefix, the examples, and suffix.""" @@ -159,6 +156,14 @@ def format(self, **kwargs: Any) -> str: return DEFAULT_FORMATTER_MAPPING[self.template_format](template, **kwargs) async def aformat(self, **kwargs: Any) -> str: + """Async format the prompt with the inputs. + + Args: + kwargs: Any arguments to be passed to the prompt template. + + Returns: + A formatted string. + """ kwargs = self._merge_partial_and_user_variables(**kwargs) # Get the examples to use. examples = await self._aget_examples(**kwargs) @@ -200,6 +205,14 @@ def _prompt_type(self) -> str: return "few_shot_with_templates" def save(self, file_path: Union[Path, str]) -> None: + """Save the prompt to a file. + + Args: + file_path: The path to save the prompt to. + + Raises: + ValueError: If example_selector is provided. + """ if self.example_selector: raise ValueError("Saving an example selector is not currently supported") return super().save(file_path) diff --git a/libs/core/langchain_core/prompts/image.py b/libs/core/langchain_core/prompts/image.py index 09d63db65db8d..cc0bebfc5b4d1 100644 --- a/libs/core/langchain_core/prompts/image.py +++ b/libs/core/langchain_core/prompts/image.py @@ -37,9 +37,25 @@ def get_lc_namespace(cls) -> List[str]: return ["langchain", "prompts", "image"] def format_prompt(self, **kwargs: Any) -> PromptValue: + """Format the prompt with the inputs. + + Args: + kwargs: Any arguments to be passed to the prompt template. + + Returns: + A formatted string. + """ return ImagePromptValue(image_url=self.format(**kwargs)) async def aformat_prompt(self, **kwargs: Any) -> PromptValue: + """Async format the prompt with the inputs. + + Args: + kwargs: Any arguments to be passed to the prompt template. + + Returns: + A formatted string. + """ return ImagePromptValue(image_url=await self.aformat(**kwargs)) def format( @@ -54,6 +70,10 @@ def format( Returns: A formatted string. + Raises: + ValueError: If the url or path is not provided. + ValueError: If the path or url is not a string. + Example: .. code-block:: python @@ -84,7 +104,27 @@ def format( return output async def aformat(self, **kwargs: Any) -> ImageURL: + """Async format the prompt with the inputs. + + Args: + kwargs: Any arguments to be passed to the prompt template. + + Returns: + A formatted string. + + Raises: + ValueError: If the url or path is not provided. + ValueError: If the path or url is not a string. + """ return await run_in_executor(None, self.format, **kwargs) def pretty_repr(self, html: bool = False) -> str: + """Return a pretty representation of the prompt. + + Args: + html: Whether to return an html formatted string. + + Returns: + A pretty representation of the prompt. + """ raise NotImplementedError() diff --git a/libs/core/langchain_core/prompts/loading.py b/libs/core/langchain_core/prompts/loading.py index 0b554a90dca66..3e3f73374ae2a 100644 --- a/libs/core/langchain_core/prompts/loading.py +++ b/libs/core/langchain_core/prompts/loading.py @@ -18,7 +18,17 @@ def load_prompt_from_config(config: dict) -> BasePromptTemplate: - """Load prompt from Config Dict.""" + """Load prompt from Config Dict. + + Args: + config: Dict containing the prompt configuration. + + Returns: + A PromptTemplate object. + + Raises: + ValueError: If the prompt type is not supported. + """ if "_type" not in config: logger.warning("No `_type` key found, defaulting to `prompt`.") config_type = config.pop("_type", "prompt") @@ -128,7 +138,18 @@ def _load_prompt(config: dict) -> PromptTemplate: def load_prompt( path: Union[str, Path], encoding: Optional[str] = None ) -> BasePromptTemplate: - """Unified method for loading a prompt from LangChainHub or local fs.""" + """Unified method for loading a prompt from LangChainHub or local fs. + + Args: + path: Path to the prompt file. + encoding: Encoding of the file. Defaults to None. + + Returns: + A PromptTemplate object. + + Raises: + RuntimeError: If the path is a Lang Chain Hub path. + """ if isinstance(path, str) and path.startswith("lc://"): raise RuntimeError( "Loading from the deprecated github-based Hub is no longer supported. " diff --git a/libs/core/langchain_core/prompts/pipeline.py b/libs/core/langchain_core/prompts/pipeline.py index f89c341d2f319..49d3c9664343c 100644 --- a/libs/core/langchain_core/prompts/pipeline.py +++ b/libs/core/langchain_core/prompts/pipeline.py @@ -14,13 +14,14 @@ class PipelinePromptTemplate(BasePromptTemplate): """Prompt template for composing multiple prompt templates together. This can be useful when you want to reuse parts of prompts. + A PipelinePrompt consists of two main parts: - final_prompt: This is the final prompt that is returned - pipeline_prompts: This is a list of tuples, consisting - of a string (`name`) and a Prompt Template. - Each PromptTemplate will be formatted and then passed - to future prompt templates as a variable with - the same name as `name` + of a string (`name`) and a Prompt Template. + Each PromptTemplate will be formatted and then passed + to future prompt templates as a variable with + the same name as `name` """ final_prompt: BasePromptTemplate @@ -45,6 +46,14 @@ def get_input_variables(cls, values: Dict) -> Dict: return values def format_prompt(self, **kwargs: Any) -> PromptValue: + """Format the prompt with the inputs. + + Args: + kwargs: Any arguments to be passed to the prompt template. + + Returns: + A formatted string. + """ for k, prompt in self.pipeline_prompts: _inputs = _get_inputs(kwargs, prompt.input_variables) if isinstance(prompt, BaseChatPromptTemplate): @@ -55,6 +64,14 @@ def format_prompt(self, **kwargs: Any) -> PromptValue: return self.final_prompt.format_prompt(**_inputs) async def aformat_prompt(self, **kwargs: Any) -> PromptValue: + """Async format the prompt with the inputs. + + Args: + kwargs: Any arguments to be passed to the prompt template. + + Returns: + A formatted string. + """ for k, prompt in self.pipeline_prompts: _inputs = _get_inputs(kwargs, prompt.input_variables) if isinstance(prompt, BaseChatPromptTemplate): @@ -65,9 +82,25 @@ async def aformat_prompt(self, **kwargs: Any) -> PromptValue: return await self.final_prompt.aformat_prompt(**_inputs) def format(self, **kwargs: Any) -> str: + """Format the prompt with the inputs. + + Args: + kwargs: Any arguments to be passed to the prompt template. + + Returns: + A formatted string. + """ return self.format_prompt(**kwargs).to_string() async def aformat(self, **kwargs: Any) -> str: + """Async format the prompt with the inputs. + + Args: + kwargs: Any arguments to be passed to the prompt template. + + Returns: + A formatted string. + """ return (await self.aformat_prompt(**kwargs)).to_string() @property diff --git a/libs/core/langchain_core/prompts/prompt.py b/libs/core/langchain_core/prompts/prompt.py index f4b541585cc34..ef4084e83c1c5 100644 --- a/libs/core/langchain_core/prompts/prompt.py +++ b/libs/core/langchain_core/prompts/prompt.py @@ -25,7 +25,8 @@ class PromptTemplate(StringPromptTemplate): The template can be formatted using either f-strings (default) or jinja2 syntax. - *Security warning*: Prefer using `template_format="f-string"` instead of + *Security warning*: + Prefer using `template_format="f-string"` instead of `template_format="jinja2"`, or make sure to NEVER accept jinja2 templates from untrusted sources as they may lead to arbitrary Python code execution. @@ -62,9 +63,6 @@ def get_lc_namespace(cls) -> List[str]: """Get the namespace of the langchain object.""" return ["langchain", "prompts", "prompt"] - input_variables: List[str] - """A list of the names of the variables the prompt template expects.""" - template: str """The prompt template.""" @@ -113,6 +111,14 @@ def pre_init_validation(cls, values: Dict) -> Dict: return values def get_input_schema(self, config: RunnableConfig | None = None) -> type[BaseModel]: + """Get the input schema for the prompt. + + Args: + config: The runnable configuration. + + Returns: + The input schema for the prompt. + """ if self.template_format != "mustache": return super().get_input_schema(config) @@ -161,6 +167,14 @@ def _prompt_type(self) -> str: return "prompt" def format(self, **kwargs: Any) -> str: + """Format the prompt with the inputs. + + Args: + kwargs: Any arguments to be passed to the prompt template. + + Returns: + A formatted string. + """ kwargs = self._merge_partial_and_user_variables(**kwargs) return DEFAULT_FORMATTER_MAPPING[self.template_format](self.template, **kwargs) @@ -207,7 +221,7 @@ def from_file( Args: template_file: The path to the file containing the prompt template. input_variables: [DEPRECATED] A list of variable names the final prompt - template will expect. + template will expect. Defaults to None. input_variables is ignored as from_file now delegates to from_template(). @@ -233,7 +247,8 @@ def from_template( ) -> PromptTemplate: """Load a prompt template from a template. - *Security warning*: Prefer using `template_format="f-string"` instead of + *Security warning*: + Prefer using `template_format="f-string"` instead of `template_format="jinja2"`, or make sure to NEVER accept jinja2 templates from untrusted sources as they may lead to arbitrary Python code execution. @@ -242,18 +257,20 @@ def from_template( be treated as a best-effort approach rather than a guarantee of security, as it is an opt-out rather than opt-in approach. - Despite the sand-boxing, we recommend to never use jinja2 templates + Despite the sand-boxing, we recommend never using jinja2 templates from untrusted sources. Args: template: The template to load. template_format: The format of the template. Use `jinja2` for jinja2, and `f-string` or None for f-strings. + Defaults to `f-string`. partial_variables: A dictionary of variables that can be used to partially fill in the template. For example, if the template is `"{variable1} {variable2}"`, and `partial_variables` is `{"variable1": "foo"}`, then the final prompt will be - `"foo {variable2}"`. + `"foo {variable2}"`. Defaults to None. + kwargs: Any other arguments to pass to the prompt template. Returns: The prompt template loaded from the template. diff --git a/libs/core/langchain_core/prompts/string.py b/libs/core/langchain_core/prompts/string.py index 5f5104a20ba40..5afddd3acec49 100644 --- a/libs/core/langchain_core/prompts/string.py +++ b/libs/core/langchain_core/prompts/string.py @@ -19,13 +19,24 @@ def jinja2_formatter(template: str, **kwargs: Any) -> str: """Format a template using jinja2. - *Security warning*: As of LangChain 0.0.329, this method uses Jinja2's + *Security warning*: + As of LangChain 0.0.329, this method uses Jinja2's SandboxedEnvironment by default. However, this sand-boxing should be treated as a best-effort approach rather than a guarantee of security. Do not accept jinja2 templates from untrusted sources as they may lead to arbitrary Python code execution. https://jinja.palletsprojects.com/en/3.1.x/sandbox/ + + Args: + template: The template string. + **kwargs: The variables to format the template with. + + Returns: + The formatted string. + + Raises: + ImportError: If jinja2 is not installed. """ try: from jinja2.sandbox import SandboxedEnvironment @@ -88,14 +99,29 @@ def _get_jinja2_variables_from_template(template: str) -> Set[str]: def mustache_formatter(template: str, **kwargs: Any) -> str: - """Format a template using mustache.""" + """Format a template using mustache. + + Args: + template: The template string. + **kwargs: The variables to format the template with. + + Returns: + The formatted string. + """ return mustache.render(template, kwargs) def mustache_template_vars( template: str, ) -> Set[str]: - """Get the variables from a mustache template.""" + """Get the variables from a mustache template. + + Args: + template: The template string. + + Returns: + The variables from the template. + """ vars: Set[str] = set() section_depth = 0 for type, key in mustache.tokenize(template): @@ -118,7 +144,14 @@ def mustache_template_vars( def mustache_schema( template: str, ) -> Type[BaseModel]: - """Get the variables from a mustache template.""" + """Get the variables from a mustache template. + + Args: + template: The template string. + + Returns: + The variables from the template as a Pydantic model. + """ fields = {} prefix: Tuple[str, ...] = () section_stack: List[Tuple[str, ...]] = [] @@ -178,6 +211,7 @@ def check_valid_template( Raises: ValueError: If the template format is not supported. + ValueError: If the prompt schema is invalid. """ try: validator_func = DEFAULT_VALIDATOR_MAPPING[template_format] @@ -232,12 +266,36 @@ def get_lc_namespace(cls) -> List[str]: return ["langchain", "prompts", "base"] def format_prompt(self, **kwargs: Any) -> PromptValue: + """Format the prompt with the inputs. + + Args: + kwargs: Any arguments to be passed to the prompt template. + + Returns: + A formatted string. + """ return StringPromptValue(text=self.format(**kwargs)) async def aformat_prompt(self, **kwargs: Any) -> PromptValue: + """Async format the prompt with the inputs. + + Args: + kwargs: Any arguments to be passed to the prompt template. + + Returns: + A formatted string. + """ return StringPromptValue(text=await self.aformat(**kwargs)) def pretty_repr(self, html: bool = False) -> str: + """Get a pretty representation of the prompt. + + Args: + html: Whether to return an HTML-formatted string. + + Returns: + A pretty representation of the prompt. + """ # TODO: handle partials dummy_vars = { input_var: "{" + f"{input_var}" + "}" for input_var in self.input_variables @@ -249,4 +307,5 @@ def pretty_repr(self, html: bool = False) -> str: return self.format(**dummy_vars) def pretty_print(self) -> None: + """Print a pretty representation of the prompt.""" print(self.pretty_repr(html=is_interactive_env())) # noqa: T201 diff --git a/libs/core/langchain_core/prompts/structured.py b/libs/core/langchain_core/prompts/structured.py index 82388bccc9141..9dbecc02857cf 100644 --- a/libs/core/langchain_core/prompts/structured.py +++ b/libs/core/langchain_core/prompts/structured.py @@ -138,6 +138,19 @@ def pipe( *others: Union[Runnable[Any, Other], Callable[[Any], Other]], name: Optional[str] = None, ) -> RunnableSerializable[Dict, Other]: + """Pipe the structured prompt to a language model. + + Args: + others: The language model to pipe the structured prompt to. + name: The name of the pipeline. Defaults to None. + + Returns: + A RunnableSequence object. + + Raises: + NotImplementedError: If the first element of `others` + is not a language model. + """ if ( others and isinstance(others[0], BaseLanguageModel) diff --git a/libs/core/langchain_core/runnables/graph.py b/libs/core/langchain_core/runnables/graph.py index 6cbe0bdc261ea..31f3cfcb1b6c5 100644 --- a/libs/core/langchain_core/runnables/graph.py +++ b/libs/core/langchain_core/runnables/graph.py @@ -246,7 +246,7 @@ def add_node( return node def remove_node(self, node: Node) -> None: - """Remove a node from the graphm and all edges connected to it.""" + """Remove a node from the graph and all edges connected to it.""" self.nodes.pop(node.id) self.edges = [ edge diff --git a/libs/core/langchain_core/utils/__init__.py b/libs/core/langchain_core/utils/__init__.py index 92f919bac399b..80fbf680f3ed9 100644 --- a/libs/core/langchain_core/utils/__init__.py +++ b/libs/core/langchain_core/utils/__init__.py @@ -5,6 +5,7 @@ """ from langchain_core.utils import image +from langchain_core.utils.aiter import abatch_iterate from langchain_core.utils.env import get_from_dict_or_env, get_from_env from langchain_core.utils.formatting import StrictFormatter, formatter from langchain_core.utils.input import ( @@ -13,6 +14,7 @@ get_colored_text, print_text, ) +from langchain_core.utils.iter import batch_iterate from langchain_core.utils.loading import try_load_from_hub from langchain_core.utils.strings import comma_list, stringify_dict, stringify_value from langchain_core.utils.utils import ( @@ -48,4 +50,6 @@ "stringify_dict", "comma_list", "stringify_value", + "batch_iterate", + "abatch_iterate", ] diff --git a/libs/core/langchain_core/utils/aiter.py b/libs/core/langchain_core/utils/aiter.py index a75de1bae4cf8..eb55079e601ba 100644 --- a/libs/core/langchain_core/utils/aiter.py +++ b/libs/core/langchain_core/utils/aiter.py @@ -11,6 +11,7 @@ Any, AsyncContextManager, AsyncGenerator, + AsyncIterable, AsyncIterator, Awaitable, Callable, @@ -245,3 +246,28 @@ async def __aexit__( ) -> None: if hasattr(self.thing, "aclose"): await self.thing.aclose() + + +async def abatch_iterate( + size: int, iterable: AsyncIterable[T] +) -> AsyncIterator[List[T]]: + """Utility batching function for async iterables. + + Args: + size: The size of the batch. + iterable: The async iterable to batch. + + Returns: + An async iterator over the batches + """ + batch: List[T] = [] + async for element in iterable: + if len(batch) < size: + batch.append(element) + + if len(batch) >= size: + yield batch + batch = [] + + if batch: + yield batch diff --git a/libs/core/langchain_core/vectorstores.py b/libs/core/langchain_core/vectorstores.py index 428e2981ecd93..273b98de333eb 100644 --- a/libs/core/langchain_core/vectorstores.py +++ b/libs/core/langchain_core/vectorstores.py @@ -25,26 +25,34 @@ import math import warnings from abc import ABC, abstractmethod +from itertools import cycle from typing import ( TYPE_CHECKING, Any, + AsyncIterable, + AsyncIterator, Callable, ClassVar, Collection, Dict, Iterable, + Iterator, List, Optional, Sequence, Tuple, Type, TypeVar, + Union, ) +from langchain_core._api import beta from langchain_core.embeddings import Embeddings from langchain_core.pydantic_v1 import Field, root_validator from langchain_core.retrievers import BaseRetriever from langchain_core.runnables.config import run_in_executor +from langchain_core.utils.aiter import abatch_iterate +from langchain_core.utils.iter import batch_iterate if TYPE_CHECKING: from langchain_core.callbacks.manager import ( @@ -52,6 +60,7 @@ CallbackManagerForRetrieverRun, ) from langchain_core.documents import Document + from langchain_core.indexing.base import UpsertResponse logger = logging.getLogger(__name__) @@ -61,11 +70,14 @@ class VectorStore(ABC): """Interface for vector store.""" - @abstractmethod def add_texts( self, texts: Iterable[str], metadatas: Optional[List[dict]] = None, + # One of the kwargs should be `ids` which is a list of ids + # associated with the texts. + # This is not yet enforced in the type signature for backwards compatibility + # with existing implementations. **kwargs: Any, ) -> List[str]: """Run more texts through the embeddings and add to the vectorstore. @@ -74,16 +86,205 @@ def add_texts( texts: Iterable of strings to add to the vectorstore. metadatas: Optional list of metadatas associated with the texts. **kwargs: vectorstore specific parameters. + One of the kwargs should be `ids` which is a list of ids + associated with the texts. Returns: List of ids from adding the texts into the vectorstore. """ + if type(self).upsert != VectorStore.upsert: + # Import document in local scope to avoid circular imports + from langchain_core.documents import Document + + # This condition is triggered if the subclass has provided + # an implementation of the upsert method. + # The existing add_texts + texts_: Sequence[str] = ( + texts if isinstance(texts, (list, tuple)) else list(texts) + ) + if metadatas and len(metadatas) != len(texts_): + raise ValueError( + "The number of metadatas must match the number of texts." + "Got {len(metadatas)} metadatas and {len(texts_)} texts." + ) + + if "ids" in kwargs: + ids = kwargs.pop("ids") + if ids and len(ids) != len(texts_): + raise ValueError( + "The number of ids must match the number of texts." + "Got {len(ids)} ids and {len(texts_)} texts." + ) + else: + ids = None + + metadatas_ = iter(metadatas) if metadatas else cycle([{}]) + ids_: Iterable[Union[str, None]] = ids if ids is not None else cycle([None]) + docs = [ + Document(page_content=text, metadata=metadata_, id=id_) + for text, metadata_, id_ in zip(texts, metadatas_, ids_) + ] + upsert_response = self.upsert(docs, **kwargs) + return upsert_response["succeeded"] + raise NotImplementedError( + f"`add_texts` has not been implemented for {self.__class__.__name__} " + ) + + # Developer guidelines: + # Do not override streaming_upsert! + @beta(message="Added in 0.2.11. The API is subject to change.") + def streaming_upsert( + self, items: Iterable[Document], /, batch_size: int, **kwargs: Any + ) -> Iterator[UpsertResponse]: + """Upsert documents in a streaming fashion. + + Args: + items: Iterable of Documents to add to the vectorstore. + batch_size: The size of each batch to upsert. + **kwargs: Additional keyword arguments. + kwargs should only include parameters that are common to all + documents. (e.g., timeout for indexing, retry policy, etc.) + kwargs should not include ids to avoid ambiguous semantics. + Instead the ID should be provided as part of the Document object. + + .. versionadded:: 0.2.11 + """ + # The default implementation of this method breaks the input into + # batches of size `batch_size` and calls the `upsert` method on each batch. + # Subclasses can override this method to provide a more efficient + # implementation. + for item_batch in batch_iterate(batch_size, items): + yield self.upsert(item_batch, **kwargs) + + # Please note that we've added a new method `upsert` instead of re-using the + # existing `add_documents` method. + # This was done to resolve potential ambiguities around the behavior of **kwargs + # in existing add_documents / add_texts methods which could include per document + # information (e.g., the `ids` parameter). + # Over time the `add_documents` could be denoted as legacy and deprecated + # in favor of the `upsert` method. + @beta(message="Added in 0.2.11. The API is subject to change.") + def upsert(self, items: Sequence[Document], /, **kwargs: Any) -> UpsertResponse: + """Add or update documents in the vectorstore. + + The upsert functionality should utilize the ID field of the Document object + if it is provided. If the ID is not provided, the upsert method is free + to generate an ID for the document. + + When an ID is specified and the document already exists in the vectorstore, + the upsert method should update the document with the new data. If the document + does not exist, the upsert method should add the document to the vectorstore. + + Args: + items: Sequence of Documents to add to the vectorstore. + **kwargs: Additional keyword arguments. + + Returns: + UpsertResponse: A response object that contains the list of IDs that were + successfully added or updated in the vectorstore and the list of IDs that + failed to be added or updated. + + .. versionadded:: 0.2.11 + """ + # Developer guidelines: + # + # Vectorstores implementations are free to extend `upsert` implementation + # to take in additional data per document. + # + # This data **SHOULD NOT** be part of the **kwargs** parameter, instead + # sub-classes can use a Union type on `documents` to include additional + # supported formats for the input data stream. + # + # For example, + # + # .. code-block:: python + # from typing import TypedDict + # + # class DocumentWithVector(TypedDict): + # document: Document + # vector: List[float] + # + # def upsert( + # self, + # documents: Union[Iterable[Document], Iterable[DocumentWithVector]], + # /, + # **kwargs + # ) -> UpsertResponse: + # \"\"\"Add or update documents in the vectorstore.\"\"\" + # # Implementation should check if documents is an + # # iterable of DocumentWithVector or Document + # pass + # + # Implementations that override upsert should include a new doc-string + # that explains the semantics of upsert and includes in code + # examples of how to insert using the alternate data formats. + + # The implementation does not delegate to the `add_texts` method or + # the `add_documents` method by default since those implementations + raise NotImplementedError( + f"upsert has not been implemented for {self.__class__.__name__}" + ) + + @beta(message="Added in 0.2.11. The API is subject to change.") + async def astreaming_upsert( + self, + items: AsyncIterable[Document], + /, + batch_size: int, + **kwargs: Any, + ) -> AsyncIterator[UpsertResponse]: + """Upsert documents in a streaming fashion. Async version of streaming_upsert. + + Args: + items: Iterable of Documents to add to the vectorstore. + batch_size: The size of each batch to upsert. + **kwargs: Additional keyword arguments. + kwargs should only include parameters that are common to all + documents. (e.g., timeout for indexing, retry policy, etc.) + kwargs should not include ids to avoid ambiguous semantics. + Instead the ID should be provided as part of the Document object. + + .. versionadded:: 0.2.11 + """ + async for batch in abatch_iterate(batch_size, items): + yield await self.aupsert(batch, **kwargs) + + @beta(message="Added in 0.2.11. The API is subject to change.") + async def aupsert( + self, items: Sequence[Document], /, **kwargs: Any + ) -> UpsertResponse: + """Add or update documents in the vectorstore. Async version of upsert. + + The upsert functionality should utilize the ID field of the Document object + if it is provided. If the ID is not provided, the upsert method is free + to generate an ID for the document. + + When an ID is specified and the document already exists in the vectorstore, + the upsert method should update the document with the new data. If the document + does not exist, the upsert method should add the document to the vectorstore. + + Args: + items: Sequence of Documents to add to the vectorstore. + **kwargs: Additional keyword arguments. + + Returns: + UpsertResponse: A response object that contains the list of IDs that were + successfully added or updated in the vectorstore and the list of IDs that + failed to be added or updated. + + .. versionadded:: 0.2.11 + """ + # Developer guidelines: See guidelines for the `upsert` method. + # The implementation does not delegate to the `add_texts` method or + # the `add_documents` method by default since those implementations + return await run_in_executor(None, self.upsert, items, **kwargs) @property def embeddings(self) -> Optional[Embeddings]: """Access the query embedding object if available.""" logger.debug( - f"{Embeddings.__name__} is not implemented for {self.__class__.__name__}" + f"The embeddings property has not been " + f"implemented for {self.__class__.__name__}" ) return None @@ -187,17 +388,81 @@ async def aadd_texts( Returns: List of ids from adding the texts into the vectorstore. """ + if type(self).aupsert != VectorStore.aupsert: + # Import document in local scope to avoid circular imports + from langchain_core.documents import Document + + # This condition is triggered if the subclass has provided + # an implementation of the upsert method. + # The existing add_texts + texts_: Sequence[str] = ( + texts if isinstance(texts, (list, tuple)) else list(texts) + ) + if metadatas and len(metadatas) != len(texts_): + raise ValueError( + "The number of metadatas must match the number of texts." + "Got {len(metadatas)} metadatas and {len(texts_)} texts." + ) + + if "ids" in kwargs: + ids = kwargs.pop("ids") + if ids and len(ids) != len(texts_): + raise ValueError( + "The number of ids must match the number of texts." + "Got {len(ids)} ids and {len(texts_)} texts." + ) + else: + ids = None + + metadatas_ = iter(metadatas) if metadatas else cycle([{}]) + ids_: Iterable[Union[str, None]] = ids if ids is not None else cycle([None]) + docs = [ + Document(page_content=text, metadata=metadata_, id=id_) + for text, metadata_, id_ in zip(texts, metadatas_, ids_) + ] + upsert_response = await self.aupsert(docs, **kwargs) + return upsert_response["succeeded"] return await run_in_executor(None, self.add_texts, texts, metadatas, **kwargs) def add_documents(self, documents: List[Document], **kwargs: Any) -> List[str]: - """Run more documents through the embeddings and add to the vectorstore. + """Add or update documents in the vectorstore. Args: documents: Documents to add to the vectorstore. + kwargs: Additional keyword arguments. + if kwargs contains ids and documents contain ids, + the ids in the kwargs will receive precedence. Returns: List of IDs of the added texts. """ + if type(self).upsert != VectorStore.upsert: + from langchain_core.documents import Document + + if "ids" in kwargs: + ids = kwargs.pop("ids") + if ids and len(ids) != len(documents): + raise ValueError( + "The number of ids must match the number of documents. " + "Got {len(ids)} ids and {len(documents)} documents." + ) + + documents_ = [] + + for id_, document in zip(ids, documents): + doc_with_id = Document( + page_content=document.page_content, + metadata=document.metadata, + id=id_, + ) + documents_.append(doc_with_id) + else: + documents_ = documents + + # If upsert has been implemented, we can use it to add documents + return self.upsert(documents_, **kwargs)["succeeded"] + + # Code path that delegates to add_text for backwards compatibility # TODO: Handle the case where the user doesn't provide ids on the Collection texts = [doc.page_content for doc in documents] metadatas = [doc.metadata for doc in documents] @@ -214,6 +479,38 @@ async def aadd_documents( Returns: List of IDs of the added texts. """ + # If either upsert or aupsert has been implemented, we delegate to them! + if ( + type(self).aupsert != VectorStore.aupsert + or type(self).upsert != VectorStore.upsert + ): + # If aupsert has been implemented, we can use it to add documents + from langchain_core.documents import Document + + if "ids" in kwargs: + ids = kwargs.pop("ids") + if ids and len(ids) != len(documents): + raise ValueError( + "The number of ids must match the number of documents." + "Got {len(ids)} ids and {len(documents)} documents." + ) + + documents_ = [] + + for id_, document in zip(ids, documents): + doc_with_id = Document( + page_content=document.page_content, + metadata=document.metadata, + id=id_, + ) + documents_.append(doc_with_id) + else: + documents_ = documents + + # If upsert has been implemented, we can use it to add documents + upsert_response = await self.aupsert(documents_, **kwargs) + return upsert_response["succeeded"] + texts = [doc.page_content for doc in documents] metadatas = [doc.metadata for doc in documents] return await self.aadd_texts(texts, metadatas, **kwargs) diff --git a/libs/core/tests/unit_tests/indexing/test_public_api.py b/libs/core/tests/unit_tests/indexing/test_public_api.py index 89c52cf681013..0259017a95492 100644 --- a/libs/core/tests/unit_tests/indexing/test_public_api.py +++ b/libs/core/tests/unit_tests/indexing/test_public_api.py @@ -10,4 +10,5 @@ def test_all() -> None: "IndexingResult", "InMemoryRecordManager", "RecordManager", + "UpsertResponse", ] diff --git a/libs/core/tests/unit_tests/prompts/test_chat.py b/libs/core/tests/unit_tests/prompts/test_chat.py index 86f1cc1954b0f..617e3deb06ccb 100644 --- a/libs/core/tests/unit_tests/prompts/test_chat.py +++ b/libs/core/tests/unit_tests/prompts/test_chat.py @@ -28,6 +28,7 @@ SystemMessagePromptTemplate, _convert_to_message, ) +from langchain_core.pydantic_v1 import ValidationError @pytest.fixture @@ -786,3 +787,611 @@ async def test_messages_prompt_accepts_list() -> None: with pytest.raises(TypeError): await prompt.ainvoke([("user", "Hi there")]) # type: ignore + + +def test_chat_input_schema() -> None: + prompt_all_required = ChatPromptTemplate.from_messages( + messages=[MessagesPlaceholder("history", optional=False), ("user", "${input}")] + ) + prompt_all_required.input_variables == {"input"} + prompt_all_required.optional_variables == {"history"} + with pytest.raises(ValidationError): + prompt_all_required.input_schema(input="") + assert prompt_all_required.input_schema.schema() == { + "title": "PromptInput", + "type": "object", + "properties": { + "history": { + "title": "History", + "type": "array", + "items": { + "anyOf": [ + {"$ref": "#/definitions/AIMessage"}, + {"$ref": "#/definitions/HumanMessage"}, + {"$ref": "#/definitions/ChatMessage"}, + {"$ref": "#/definitions/SystemMessage"}, + {"$ref": "#/definitions/FunctionMessage"}, + {"$ref": "#/definitions/ToolMessage"}, + ] + }, + }, + "input": {"title": "Input", "type": "string"}, + }, + "required": ["history", "input"], + "definitions": { + "ToolCall": { + "title": "ToolCall", + "type": "object", + "properties": { + "name": {"title": "Name", "type": "string"}, + "args": {"title": "Args", "type": "object"}, + "id": {"title": "Id", "type": "string"}, + }, + "required": ["name", "args", "id"], + }, + "InvalidToolCall": { + "title": "InvalidToolCall", + "type": "object", + "properties": { + "name": {"title": "Name", "type": "string"}, + "args": {"title": "Args", "type": "string"}, + "id": {"title": "Id", "type": "string"}, + "error": {"title": "Error", "type": "string"}, + }, + "required": ["name", "args", "id", "error"], + }, + "UsageMetadata": { + "title": "UsageMetadata", + "type": "object", + "properties": { + "input_tokens": {"title": "Input Tokens", "type": "integer"}, + "output_tokens": {"title": "Output Tokens", "type": "integer"}, + "total_tokens": {"title": "Total Tokens", "type": "integer"}, + }, + "required": ["input_tokens", "output_tokens", "total_tokens"], + }, + "AIMessage": { + "title": "AIMessage", + "description": "Message from an AI.\n\nAIMessage is returned from a chat model as a response to a prompt.\n\nThis message represents the output of the model and consists of both\nthe raw output as returned by the model together standardized fields\n(e.g., tool calls, usage metadata) added by the LangChain framework.", # noqa: E501 + "type": "object", + "properties": { + "content": { + "title": "Content", + "anyOf": [ + {"type": "string"}, + { + "type": "array", + "items": { + "anyOf": [{"type": "string"}, {"type": "object"}] + }, + }, + ], + }, + "additional_kwargs": { + "title": "Additional Kwargs", + "type": "object", + }, + "response_metadata": { + "title": "Response Metadata", + "type": "object", + }, + "type": { + "title": "Type", + "default": "ai", + "enum": ["ai"], + "type": "string", + }, + "name": {"title": "Name", "type": "string"}, + "id": {"title": "Id", "type": "string"}, + "example": { + "title": "Example", + "default": False, + "type": "boolean", + }, + "tool_calls": { + "title": "Tool Calls", + "default": [], + "type": "array", + "items": {"$ref": "#/definitions/ToolCall"}, + }, + "invalid_tool_calls": { + "title": "Invalid Tool Calls", + "default": [], + "type": "array", + "items": {"$ref": "#/definitions/InvalidToolCall"}, + }, + "usage_metadata": {"$ref": "#/definitions/UsageMetadata"}, + }, + "required": ["content"], + }, + "HumanMessage": { + "title": "HumanMessage", + "description": 'Message from a human.\n\nHumanMessages are messages that are passed in from a human to the model.\n\nExample:\n\n .. code-block:: python\n\n from langchain_core.messages import HumanMessage, SystemMessage\n\n messages = [\n SystemMessage(\n content="You are a helpful assistant! Your name is Bob."\n ),\n HumanMessage(\n content="What is your name?"\n )\n ]\n\n # Instantiate a chat model and invoke it with the messages\n model = ...\n print(model.invoke(messages))', # noqa: E501 + "type": "object", + "properties": { + "content": { + "title": "Content", + "anyOf": [ + {"type": "string"}, + { + "type": "array", + "items": { + "anyOf": [{"type": "string"}, {"type": "object"}] + }, + }, + ], + }, + "additional_kwargs": { + "title": "Additional Kwargs", + "type": "object", + }, + "response_metadata": { + "title": "Response Metadata", + "type": "object", + }, + "type": { + "title": "Type", + "default": "human", + "enum": ["human"], + "type": "string", + }, + "name": {"title": "Name", "type": "string"}, + "id": {"title": "Id", "type": "string"}, + "example": { + "title": "Example", + "default": False, + "type": "boolean", + }, + }, + "required": ["content"], + }, + "ChatMessage": { + "title": "ChatMessage", + "description": "Message that can be assigned an arbitrary speaker (i.e. role).", # noqa: E501 + "type": "object", + "properties": { + "content": { + "title": "Content", + "anyOf": [ + {"type": "string"}, + { + "type": "array", + "items": { + "anyOf": [{"type": "string"}, {"type": "object"}] + }, + }, + ], + }, + "additional_kwargs": { + "title": "Additional Kwargs", + "type": "object", + }, + "response_metadata": { + "title": "Response Metadata", + "type": "object", + }, + "type": { + "title": "Type", + "default": "chat", + "enum": ["chat"], + "type": "string", + }, + "name": {"title": "Name", "type": "string"}, + "id": {"title": "Id", "type": "string"}, + "role": {"title": "Role", "type": "string"}, + }, + "required": ["content", "role"], + }, + "SystemMessage": { + "title": "SystemMessage", + "description": 'Message for priming AI behavior.\n\nThe system message is usually passed in as the first of a sequence\nof input messages.\n\nExample:\n\n .. code-block:: python\n\n from langchain_core.messages import HumanMessage, SystemMessage\n\n messages = [\n SystemMessage(\n content="You are a helpful assistant! Your name is Bob."\n ),\n HumanMessage(\n content="What is your name?"\n )\n ]\n\n # Define a chat model and invoke it with the messages\n print(model.invoke(messages))', # noqa: E501 + "type": "object", + "properties": { + "content": { + "title": "Content", + "anyOf": [ + {"type": "string"}, + { + "type": "array", + "items": { + "anyOf": [{"type": "string"}, {"type": "object"}] + }, + }, + ], + }, + "additional_kwargs": { + "title": "Additional Kwargs", + "type": "object", + }, + "response_metadata": { + "title": "Response Metadata", + "type": "object", + }, + "type": { + "title": "Type", + "default": "system", + "enum": ["system"], + "type": "string", + }, + "name": {"title": "Name", "type": "string"}, + "id": {"title": "Id", "type": "string"}, + }, + "required": ["content"], + }, + "FunctionMessage": { + "title": "FunctionMessage", + "description": "Message for passing the result of executing a tool back to a model.\n\nFunctionMessage are an older version of the ToolMessage schema, and\ndo not contain the tool_call_id field.\n\nThe tool_call_id field is used to associate the tool call request with the\ntool call response. This is useful in situations where a chat model is able\nto request multiple tool calls in parallel.", # noqa: E501 + "type": "object", + "properties": { + "content": { + "title": "Content", + "anyOf": [ + {"type": "string"}, + { + "type": "array", + "items": { + "anyOf": [{"type": "string"}, {"type": "object"}] + }, + }, + ], + }, + "additional_kwargs": { + "title": "Additional Kwargs", + "type": "object", + }, + "response_metadata": { + "title": "Response Metadata", + "type": "object", + }, + "type": { + "title": "Type", + "default": "function", + "enum": ["function"], + "type": "string", + }, + "name": {"title": "Name", "type": "string"}, + "id": {"title": "Id", "type": "string"}, + }, + "required": ["content", "name"], + }, + "ToolMessage": { + "title": "ToolMessage", + "description": "Message for passing the result of executing a tool back to a model.\n\nToolMessages contain the result of a tool invocation. Typically, the result\nis encoded inside the `content` field.\n\nExample: A TooMessage representing a result of 42 from a tool call with id\n\n .. code-block:: python\n\n from langchain_core.messages import ToolMessage\n\n ToolMessage(content='42', tool_call_id='call_Jja7J89XsjrOLA5r!MEOW!SL')\n\nThe tool_call_id field is used to associate the tool call request with the\ntool call response. This is useful in situations where a chat model is able\nto request multiple tool calls in parallel.", # noqa: E501 + "type": "object", + "properties": { + "content": { + "title": "Content", + "anyOf": [ + {"type": "string"}, + { + "type": "array", + "items": { + "anyOf": [{"type": "string"}, {"type": "object"}] + }, + }, + ], + }, + "additional_kwargs": { + "title": "Additional Kwargs", + "type": "object", + }, + "response_metadata": { + "title": "Response Metadata", + "type": "object", + }, + "type": { + "title": "Type", + "default": "tool", + "enum": ["tool"], + "type": "string", + }, + "name": {"title": "Name", "type": "string"}, + "id": {"title": "Id", "type": "string"}, + "tool_call_id": {"title": "Tool Call Id", "type": "string"}, + }, + "required": ["content", "tool_call_id"], + }, + }, + } + + prompt_optional = ChatPromptTemplate.from_messages( + messages=[MessagesPlaceholder("history", optional=True), ("user", "${input}")] + ) + prompt_optional.input_variables == {"history", "input"} + prompt_optional.input_schema(input="") # won't raise error + prompt_optional.input_schema.schema() == { + "title": "PromptInput", + "type": "object", + "properties": { + "input": {"title": "Input", "type": "string"}, + "history": { + "title": "History", + "type": "array", + "items": { + "anyOf": [ + {"$ref": "#/definitions/AIMessage"}, + {"$ref": "#/definitions/HumanMessage"}, + {"$ref": "#/definitions/ChatMessage"}, + {"$ref": "#/definitions/SystemMessage"}, + {"$ref": "#/definitions/FunctionMessage"}, + {"$ref": "#/definitions/ToolMessage"}, + ] + }, + }, + }, + "required": ["input"], + "definitions": { + "ToolCall": { + "title": "ToolCall", + "type": "object", + "properties": { + "name": {"title": "Name", "type": "string"}, + "args": {"title": "Args", "type": "object"}, + "id": {"title": "Id", "type": "string"}, + }, + "required": ["name", "args", "id"], + }, + "InvalidToolCall": { + "title": "InvalidToolCall", + "type": "object", + "properties": { + "name": {"title": "Name", "type": "string"}, + "args": {"title": "Args", "type": "string"}, + "id": {"title": "Id", "type": "string"}, + "error": {"title": "Error", "type": "string"}, + }, + "required": ["name", "args", "id", "error"], + }, + "UsageMetadata": { + "title": "UsageMetadata", + "type": "object", + "properties": { + "input_tokens": {"title": "Input Tokens", "type": "integer"}, + "output_tokens": {"title": "Output Tokens", "type": "integer"}, + "total_tokens": {"title": "Total Tokens", "type": "integer"}, + }, + "required": ["input_tokens", "output_tokens", "total_tokens"], + }, + "AIMessage": { + "title": "AIMessage", + "description": "Message from an AI.\n\nAIMessage is returned from a chat model as a response to a prompt.\n\nThis message represents the output of the model and consists of both\nthe raw output as returned by the model together standardized fields\n(e.g., tool calls, usage metadata) added by the LangChain framework.", # noqa: E501 + "type": "object", + "properties": { + "content": { + "title": "Content", + "anyOf": [ + {"type": "string"}, + { + "type": "array", + "items": { + "anyOf": [{"type": "string"}, {"type": "object"}] + }, + }, + ], + }, + "additional_kwargs": { + "title": "Additional Kwargs", + "type": "object", + }, + "response_metadata": { + "title": "Response Metadata", + "type": "object", + }, + "type": { + "title": "Type", + "default": "ai", + "enum": ["ai"], + "type": "string", + }, + "name": {"title": "Name", "type": "string"}, + "id": {"title": "Id", "type": "string"}, + "example": { + "title": "Example", + "default": False, + "type": "boolean", + }, + "tool_calls": { + "title": "Tool Calls", + "default": [], + "type": "array", + "items": {"$ref": "#/definitions/ToolCall"}, + }, + "invalid_tool_calls": { + "title": "Invalid Tool Calls", + "default": [], + "type": "array", + "items": {"$ref": "#/definitions/InvalidToolCall"}, + }, + "usage_metadata": {"$ref": "#/definitions/UsageMetadata"}, + }, + "required": ["content"], + }, + "HumanMessage": { + "title": "HumanMessage", + "description": 'Message from a human.\n\nHumanMessages are messages that are passed in from a human to the model.\n\nExample:\n\n .. code-block:: python\n\n from langchain_core.messages import HumanMessage, SystemMessage\n\n messages = [\n SystemMessage(\n content="You are a helpful assistant! Your name is Bob."\n ),\n HumanMessage(\n content="What is your name?"\n )\n ]\n\n # Instantiate a chat model and invoke it with the messages\n model = ...\n print(model.invoke(messages))', # noqa: E501 + "type": "object", + "properties": { + "content": { + "title": "Content", + "anyOf": [ + {"type": "string"}, + { + "type": "array", + "items": { + "anyOf": [{"type": "string"}, {"type": "object"}] + }, + }, + ], + }, + "additional_kwargs": { + "title": "Additional Kwargs", + "type": "object", + }, + "response_metadata": { + "title": "Response Metadata", + "type": "object", + }, + "type": { + "title": "Type", + "default": "human", + "enum": ["human"], + "type": "string", + }, + "name": {"title": "Name", "type": "string"}, + "id": {"title": "Id", "type": "string"}, + "example": { + "title": "Example", + "default": False, + "type": "boolean", + }, + }, + "required": ["content"], + }, + "ChatMessage": { + "title": "ChatMessage", + "description": "Message that can be assigned an arbitrary speaker (i.e. role).", # noqa: E501 + "type": "object", + "properties": { + "content": { + "title": "Content", + "anyOf": [ + {"type": "string"}, + { + "type": "array", + "items": { + "anyOf": [{"type": "string"}, {"type": "object"}] + }, + }, + ], + }, + "additional_kwargs": { + "title": "Additional Kwargs", + "type": "object", + }, + "response_metadata": { + "title": "Response Metadata", + "type": "object", + }, + "type": { + "title": "Type", + "default": "chat", + "enum": ["chat"], + "type": "string", + }, + "name": {"title": "Name", "type": "string"}, + "id": {"title": "Id", "type": "string"}, + "role": {"title": "Role", "type": "string"}, + }, + "required": ["content", "role"], + }, + "SystemMessage": { + "title": "SystemMessage", + "description": 'Message for priming AI behavior.\n\nThe system message is usually passed in as the first of a sequence\nof input messages.\n\nExample:\n\n .. code-block:: python\n\n from langchain_core.messages import HumanMessage, SystemMessage\n\n messages = [\n SystemMessage(\n content="You are a helpful assistant! Your name is Bob."\n ),\n HumanMessage(\n content="What is your name?"\n )\n ]\n\n # Define a chat model and invoke it with the messages\n print(model.invoke(messages))', # noqa: E501 + "type": "object", + "properties": { + "content": { + "title": "Content", + "anyOf": [ + {"type": "string"}, + { + "type": "array", + "items": { + "anyOf": [{"type": "string"}, {"type": "object"}] + }, + }, + ], + }, + "additional_kwargs": { + "title": "Additional Kwargs", + "type": "object", + }, + "response_metadata": { + "title": "Response Metadata", + "type": "object", + }, + "type": { + "title": "Type", + "default": "system", + "enum": ["system"], + "type": "string", + }, + "name": {"title": "Name", "type": "string"}, + "id": {"title": "Id", "type": "string"}, + }, + "required": ["content"], + }, + "FunctionMessage": { + "title": "FunctionMessage", + "description": "Message for passing the result of executing a tool back to a model.\n\nFunctionMessage are an older version of the ToolMessage schema, and\ndo not contain the tool_call_id field.\n\nThe tool_call_id field is used to associate the tool call request with the\ntool call response. This is useful in situations where a chat model is able\nto request multiple tool calls in parallel.", # noqa: E501 + "type": "object", + "properties": { + "content": { + "title": "Content", + "anyOf": [ + {"type": "string"}, + { + "type": "array", + "items": { + "anyOf": [{"type": "string"}, {"type": "object"}] # noqa: E501 + }, + }, + ], + }, + "additional_kwargs": { + "title": "Additional Kwargs", + "type": "object", + }, + "response_metadata": { + "title": "Response Metadata", + "type": "object", + }, + "type": { + "title": "Type", + "default": "function", + "enum": ["function"], + "type": "string", + }, + "name": {"title": "Name", "type": "string"}, + "id": {"title": "Id", "type": "string"}, + }, + "required": ["content", "name"], + }, + "ToolMessage": { + "title": "ToolMessage", + "description": "Message for passing the result of executing a tool back to a model.\n\nToolMessages contain the result of a tool invocation. Typically, the result\nis encoded inside the `content` field.\n\nExample: A TooMessage representing a result of 42 from a tool call with id\n\n .. code-block:: python\n\n from langchain_core.messages import ToolMessage\n\n ToolMessage(content='42', tool_call_id='call_Jja7J89XsjrOLA5r!MEOW!SL')\n\nThe tool_call_id field is used to associate the tool call request with the\ntool call response. This is useful in situations where a chat model is able\nto request multiple tool calls in parallel.", # noqa: E501 + "type": "object", + "properties": { + "content": { + "title": "Content", + "anyOf": [ + {"type": "string"}, + { + "type": "array", + "items": { + "anyOf": [{"type": "string"}, {"type": "object"}] + }, + }, + ], + }, + "additional_kwargs": { + "title": "Additional Kwargs", + "type": "object", + }, + "response_metadata": { + "title": "Response Metadata", + "type": "object", + }, + "type": { + "title": "Type", + "default": "tool", + "enum": ["tool"], + "type": "string", + }, + "name": {"title": "Name", "type": "string"}, + "id": {"title": "Id", "type": "string"}, + "tool_call_id": {"title": "Tool Call Id", "type": "string"}, + }, + "required": ["content", "tool_call_id"], + }, + }, + } diff --git a/libs/core/tests/unit_tests/runnables/test_graph.py b/libs/core/tests/unit_tests/runnables/test_graph.py index bd1d033f7febe..2cecae55040b7 100644 --- a/libs/core/tests/unit_tests/runnables/test_graph.py +++ b/libs/core/tests/unit_tests/runnables/test_graph.py @@ -94,6 +94,7 @@ def test_graph_sequence(snapshot: SnapshotAssertion) -> None: "title": "PromptInput", "type": "object", "properties": {"name": {"title": "Name", "type": "string"}}, + "required": ["name"], }, }, { @@ -177,6 +178,7 @@ def conditional_str_parser(input: str) -> Runnable: "title": "PromptInput", "type": "object", "properties": {"name": {"title": "Name", "type": "string"}}, + "required": ["name"], }, }, { diff --git a/libs/core/tests/unit_tests/runnables/test_runnable.py b/libs/core/tests/unit_tests/runnables/test_runnable.py index 5861b0fc1ab0c..ab556ee1252c9 100644 --- a/libs/core/tests/unit_tests/runnables/test_runnable.py +++ b/libs/core/tests/unit_tests/runnables/test_runnable.py @@ -366,6 +366,7 @@ async def typed_async_lambda_impl(x: str) -> int: }, } }, + "required": ["history"], "definitions": { "ToolCall": { "title": "ToolCall", @@ -400,7 +401,7 @@ async def typed_async_lambda_impl(x: str) -> int: }, "AIMessage": { "title": "AIMessage", - "description": AnyStr(), + "description": "Message from an AI.\n\nAIMessage is returned from a chat model as a response to a prompt.\n\nThis message represents the output of the model and consists of both\nthe raw output as returned by the model together standardized fields\n(e.g., tool calls, usage metadata) added by the LangChain framework.", # noqa: E501 "type": "object", "properties": { "content": { @@ -454,7 +455,7 @@ async def typed_async_lambda_impl(x: str) -> int: }, "HumanMessage": { "title": "HumanMessage", - "description": AnyStr(), + "description": 'Message from a human.\n\nHumanMessages are messages that are passed in from a human to the model.\n\nExample:\n\n .. code-block:: python\n\n from langchain_core.messages import HumanMessage, SystemMessage\n\n messages = [\n SystemMessage(\n content="You are a helpful assistant! Your name is Bob."\n ),\n HumanMessage(\n content="What is your name?"\n )\n ]\n\n # Instantiate a chat model and invoke it with the messages\n model = ...\n print(model.invoke(messages))', # noqa: E501 "type": "object", "properties": { "content": { @@ -495,7 +496,7 @@ async def typed_async_lambda_impl(x: str) -> int: }, "ChatMessage": { "title": "ChatMessage", - "description": AnyStr(), + "description": "Message that can be assigned an arbitrary speaker (i.e. role).", # noqa: E501 "type": "object", "properties": { "content": { @@ -532,7 +533,7 @@ async def typed_async_lambda_impl(x: str) -> int: }, "SystemMessage": { "title": "SystemMessage", - "description": AnyStr(), + "description": 'Message for priming AI behavior.\n\nThe system message is usually passed in as the first of a sequence\nof input messages.\n\nExample:\n\n .. code-block:: python\n\n from langchain_core.messages import HumanMessage, SystemMessage\n\n messages = [\n SystemMessage(\n content="You are a helpful assistant! Your name is Bob."\n ),\n HumanMessage(\n content="What is your name?"\n )\n ]\n\n # Define a chat model and invoke it with the messages\n print(model.invoke(messages))', # noqa: E501 "type": "object", "properties": { "content": { @@ -568,7 +569,7 @@ async def typed_async_lambda_impl(x: str) -> int: }, "FunctionMessage": { "title": "FunctionMessage", - "description": AnyStr(), + "description": "Message for passing the result of executing a tool back to a model.\n\nFunctionMessage are an older version of the ToolMessage schema, and\ndo not contain the tool_call_id field.\n\nThe tool_call_id field is used to associate the tool call request with the\ntool call response. This is useful in situations where a chat model is able\nto request multiple tool calls in parallel.", # noqa: E501 "type": "object", "properties": { "content": { @@ -604,7 +605,7 @@ async def typed_async_lambda_impl(x: str) -> int: }, "ToolMessage": { "title": "ToolMessage", - "description": AnyStr(), + "description": "Message for passing the result of executing a tool back to a model.\n\nToolMessages contain the result of a tool invocation. Typically, the result\nis encoded inside the `content` field.\n\nExample: A TooMessage representing a result of 42 from a tool call with id\n\n .. code-block:: python\n\n from langchain_core.messages import ToolMessage\n\n ToolMessage(content='42', tool_call_id='call_Jja7J89XsjrOLA5r!MEOW!SL')\n\nThe tool_call_id field is used to associate the tool call request with the\ntool call response. This is useful in situations where a chat model is able\nto request multiple tool calls in parallel.", # noqa: E501 "type": "object", "properties": { "content": { @@ -649,6 +650,7 @@ async def typed_async_lambda_impl(x: str) -> int: "title": "PromptInput", "type": "object", "properties": {"name": {"title": "Name", "type": "string"}}, + "required": ["name"], } assert prompt.output_schema.schema() == snapshot @@ -658,6 +660,7 @@ async def typed_async_lambda_impl(x: str) -> int: "definitions": { "PromptInput": { "properties": {"name": {"title": "Name", "type": "string"}}, + "required": ["name"], "title": "PromptInput", "type": "object", } @@ -683,6 +686,7 @@ async def typed_async_lambda_impl(x: str) -> int: "title": "PromptInput", "type": "object", "properties": {"name": {"title": "Name", "type": "string"}}, + "required": ["name"], } assert seq.output_schema.schema() == { "type": "array", @@ -723,6 +727,7 @@ async def typed_async_lambda_impl(x: str) -> int: "title": "PromptInput", "type": "object", "properties": {"name": {"title": "Name", "type": "string"}}, + "required": ["name"], } assert seq_w_map.output_schema.schema() == { "title": "RunnableParallelOutput", @@ -1056,6 +1061,7 @@ def test_configurable_fields() -> None: "lang": {"title": "Lang", "type": "string"}, "name": {"title": "Name", "type": "string"}, }, + "required": ["lang", "name"], } chain_configurable = prompt_configurable | fake_llm_configurable | StrOutputParser() @@ -1111,6 +1117,7 @@ def test_configurable_fields() -> None: "lang": {"title": "Lang", "type": "string"}, "name": {"title": "Name", "type": "string"}, }, + "required": ["lang", "name"], } chain_with_map_configurable: Runnable = prompt_configurable | { @@ -3794,6 +3801,7 @@ def test_deep_stream_assign() -> None: "title": "PromptInput", "type": "object", "properties": {"question": {"title": "Question", "type": "string"}}, + "required": ["question"], } assert chain_with_assign.output_schema.schema() == { "title": "RunnableSequenceOutput", @@ -3844,6 +3852,7 @@ def test_deep_stream_assign() -> None: "title": "PromptInput", "type": "object", "properties": {"question": {"title": "Question", "type": "string"}}, + "required": ["question"], } assert chain_with_assign_shadow.output_schema.schema() == { "title": "RunnableSequenceOutput", @@ -3918,6 +3927,7 @@ async def test_deep_astream_assign() -> None: "title": "PromptInput", "type": "object", "properties": {"question": {"title": "Question", "type": "string"}}, + "required": ["question"], } assert chain_with_assign.output_schema.schema() == { "title": "RunnableSequenceOutput", @@ -3968,6 +3978,7 @@ async def test_deep_astream_assign() -> None: "title": "PromptInput", "type": "object", "properties": {"question": {"title": "Question", "type": "string"}}, + "required": ["question"], } assert chain_with_assign_shadow.output_schema.schema() == { "title": "RunnableSequenceOutput", @@ -4969,6 +4980,7 @@ async def test_tool_from_runnable() -> None: "properties": {"question": {"title": "Question", "type": "string"}}, "title": "PromptInput", "type": "object", + "required": ["question"], } diff --git a/libs/core/tests/unit_tests/test_graph_vectorstores.py b/libs/core/tests/unit_tests/test_graph_vectorstores.py new file mode 100644 index 0000000000000..2e3c8c5bdaf5b --- /dev/null +++ b/libs/core/tests/unit_tests/test_graph_vectorstores.py @@ -0,0 +1,59 @@ +import pytest + +from langchain_core.documents import Document +from langchain_core.graph_vectorstores.base import ( + Node, + _documents_to_nodes, + _texts_to_nodes, +) +from langchain_core.graph_vectorstores.links import Link + + +def test_texts_to_nodes() -> None: + assert list(_texts_to_nodes(["a", "b"], [{"a": "b"}, {"c": "d"}], ["a", "b"])) == [ + Node(id="a", metadata={"a": "b"}, text="a"), + Node(id="b", metadata={"c": "d"}, text="b"), + ] + assert list(_texts_to_nodes(["a", "b"], None, ["a", "b"])) == [ + Node(id="a", metadata={}, text="a"), + Node(id="b", metadata={}, text="b"), + ] + assert list(_texts_to_nodes(["a", "b"], [{"a": "b"}, {"c": "d"}], None)) == [ + Node(metadata={"a": "b"}, text="a"), + Node(metadata={"c": "d"}, text="b"), + ] + assert list( + _texts_to_nodes( + ["a"], + [{"links": {Link.incoming(kind="hyperlink", tag="http://b")}}], + None, + ) + ) == [Node(links=[Link.incoming(kind="hyperlink", tag="http://b")], text="a")] + with pytest.raises(ValueError): + list(_texts_to_nodes(["a", "b"], None, ["a"])) + with pytest.raises(ValueError): + list(_texts_to_nodes(["a", "b"], [{"a": "b"}], None)) + with pytest.raises(ValueError): + list(_texts_to_nodes(["a"], [{"a": "b"}, {"c": "d"}], None)) + with pytest.raises(ValueError): + list(_texts_to_nodes(["a"], None, ["a", "b"])) + + +def test_documents_to_nodes() -> None: + documents = [ + Document( + id="a", + page_content="some text a", + metadata={"links": [Link.incoming(kind="hyperlink", tag="http://b")]}, + ), + Document(id="b", page_content="some text b", metadata={"c": "d"}), + ] + assert list(_documents_to_nodes(documents)) == [ + Node( + id="a", + metadata={}, + links=[Link.incoming(kind="hyperlink", tag="http://b")], + text="some text a", + ), + Node(id="b", metadata={"c": "d"}, text="some text b"), + ] diff --git a/libs/core/tests/unit_tests/utils/test_aiter.py b/libs/core/tests/unit_tests/utils/test_aiter.py new file mode 100644 index 0000000000000..3b035a89277ab --- /dev/null +++ b/libs/core/tests/unit_tests/utils/test_aiter.py @@ -0,0 +1,31 @@ +from typing import AsyncIterator, List + +import pytest + +from langchain_core.utils.aiter import abatch_iterate + + +@pytest.mark.parametrize( + "input_size, input_iterable, expected_output", + [ + (2, [1, 2, 3, 4, 5], [[1, 2], [3, 4], [5]]), + (3, [10, 20, 30, 40, 50], [[10, 20, 30], [40, 50]]), + (1, [100, 200, 300], [[100], [200], [300]]), + (4, [], []), + ], +) +async def test_abatch_iterate( + input_size: int, input_iterable: List[str], expected_output: List[str] +) -> None: + """Test batching function.""" + + async def _to_async_iterable(iterable: List[str]) -> AsyncIterator[str]: + for item in iterable: + yield item + + iterator_ = abatch_iterate(input_size, _to_async_iterable(input_iterable)) + + assert isinstance(iterator_, AsyncIterator) + + output = [el async for el in iterator_] + assert output == expected_output diff --git a/libs/core/tests/unit_tests/utils/test_imports.py b/libs/core/tests/unit_tests/utils/test_imports.py index 64528cfd521b2..8a1d4236688ae 100644 --- a/libs/core/tests/unit_tests/utils/test_imports.py +++ b/libs/core/tests/unit_tests/utils/test_imports.py @@ -6,6 +6,8 @@ "convert_to_secret_str", "formatter", "get_bolded_text", + "abatch_iterate", + "batch_iterate", "get_color_mapping", "get_colored_text", "get_pydantic_field_names", diff --git a/libs/core/tests/unit_tests/vectorstores/__init__.py b/libs/core/tests/unit_tests/vectorstores/__init__.py new file mode 100644 index 0000000000000..e69de29bb2d1d diff --git a/libs/core/tests/unit_tests/vectorstores/test_vectorstore.py b/libs/core/tests/unit_tests/vectorstores/test_vectorstore.py new file mode 100644 index 0000000000000..dc4955e70a2ae --- /dev/null +++ b/libs/core/tests/unit_tests/vectorstores/test_vectorstore.py @@ -0,0 +1,194 @@ +from __future__ import annotations + +import uuid +from typing import Any, Dict, List, Optional, Sequence, Union + +from typing_extensions import TypedDict + +from langchain_core.documents import Document +from langchain_core.embeddings import Embeddings +from langchain_core.indexing.base import UpsertResponse +from langchain_core.vectorstores import VectorStore + + +def test_custom_upsert_type() -> None: + """Test that we can override the signature of the upsert method + of the VectorStore class without creating typing issues by violating + the Liskov Substitution Principle. + """ + + class ByVector(TypedDict): + document: Document + vector: List[float] + + class CustomVectorStore(VectorStore): + def upsert( + # This unit test verifies that the signature of the upsert method + # specifically the items parameter can be overridden without + # violating the Liskov Substitution Principle (and getting + # typing errors). + self, + items: Union[Sequence[Document], Sequence[ByVector]], + /, + **kwargs: Any, + ) -> UpsertResponse: + raise NotImplementedError() + + +class CustomSyncVectorStore(VectorStore): + """A vectorstore that only implements the synchronous methods.""" + + def __init__(self) -> None: + self.store: Dict[str, Document] = {} + + def upsert( + self, + items: Sequence[Document], + /, + **kwargs: Any, + ) -> UpsertResponse: + ids = [] + for item in items: + if item.id is None: + new_item = item.copy() + id_: str = str(uuid.uuid4()) + new_item.id = id_ + else: + id_ = item.id + new_item = item + + self.store[id_] = new_item + ids.append(id_) + + return { + "succeeded": ids, + "failed": [], + } + + def get_by_ids(self, ids: Sequence[str], /) -> List[Document]: + return [self.store[id] for id in ids if id in self.store] + + def from_texts( # type: ignore + cls, + texts: List[str], + embedding: Embeddings, + metadatas: Optional[List[dict]] = None, + **kwargs: Any, + ) -> CustomSyncVectorStore: + vectorstore = CustomSyncVectorStore() + vectorstore.add_texts(texts, metadatas=metadatas, **kwargs) + return vectorstore + + def similarity_search( + self, query: str, k: int = 4, **kwargs: Any + ) -> List[Document]: + raise NotImplementedError() + + +def test_implement_upsert() -> None: + """Test that we can implement the upsert method of the CustomVectorStore + class without violating the Liskov Substitution Principle. + """ + + store = CustomSyncVectorStore() + + # Check upsert with id + assert store.upsert([Document(id="1", page_content="hello")]) == { + "succeeded": ["1"], + "failed": [], + } + + assert store.get_by_ids(["1"]) == [Document(id="1", page_content="hello")] + + # Check upsert without id + response = store.upsert([Document(page_content="world")]) + assert len(response["succeeded"]) == 1 + id_ = response["succeeded"][0] + assert id_ is not None + assert store.get_by_ids([id_]) == [Document(id=id_, page_content="world")] + + # Check that default implementation of add_texts works + assert store.add_texts(["hello", "world"], ids=["3", "4"]) == ["3", "4"] + assert store.get_by_ids(["3", "4"]) == [ + Document(id="3", page_content="hello"), + Document(id="4", page_content="world"), + ] + + # Add texts without ids + ids_ = store.add_texts(["foo", "bar"]) + assert len(ids_) == 2 + assert store.get_by_ids(ids_) == [ + Document(id=ids_[0], page_content="foo"), + Document(id=ids_[1], page_content="bar"), + ] + + # Add texts with metadatas + ids_2 = store.add_texts(["foo", "bar"], metadatas=[{"foo": "bar"}] * 2) + assert len(ids_2) == 2 + assert store.get_by_ids(ids_2) == [ + Document(id=ids_2[0], page_content="foo", metadata={"foo": "bar"}), + Document(id=ids_2[1], page_content="bar", metadata={"foo": "bar"}), + ] + + # Check that add_documents works + assert store.add_documents([Document(id="5", page_content="baz")]) == ["5"] + + # Test add documents with id specified in both document and ids + original_document = Document(id="7", page_content="baz") + assert store.add_documents([original_document], ids=["6"]) == ["6"] + assert original_document.id == "7" # original document should not be modified + assert store.get_by_ids(["6"]) == [Document(id="6", page_content="baz")] + + +async def test_aupsert_delegation_to_upsert() -> None: + """Test delegation to the synchronous upsert method in async execution + if async methods are not implemented. + """ + store = CustomSyncVectorStore() + + # Check upsert with id + assert await store.aupsert([Document(id="1", page_content="hello")]) == { + "succeeded": ["1"], + "failed": [], + } + + assert await store.aget_by_ids(["1"]) == [Document(id="1", page_content="hello")] + + # Check upsert without id + response = await store.aupsert([Document(page_content="world")]) + assert len(response["succeeded"]) == 1 + id_ = response["succeeded"][0] + assert id_ is not None + assert await store.aget_by_ids([id_]) == [Document(id=id_, page_content="world")] + + # Check that default implementation of add_texts works + assert await store.aadd_texts(["hello", "world"], ids=["3", "4"]) == ["3", "4"] + assert await store.aget_by_ids(["3", "4"]) == [ + Document(id="3", page_content="hello"), + Document(id="4", page_content="world"), + ] + + # Add texts without ids + ids_ = await store.aadd_texts(["foo", "bar"]) + assert len(ids_) == 2 + assert await store.aget_by_ids(ids_) == [ + Document(id=ids_[0], page_content="foo"), + Document(id=ids_[1], page_content="bar"), + ] + + # Add texts with metadatas + ids_2 = await store.aadd_texts(["foo", "bar"], metadatas=[{"foo": "bar"}] * 2) + assert len(ids_2) == 2 + assert await store.aget_by_ids(ids_2) == [ + Document(id=ids_2[0], page_content="foo", metadata={"foo": "bar"}), + Document(id=ids_2[1], page_content="bar", metadata={"foo": "bar"}), + ] + + # Check that add_documents works + assert await store.aadd_documents([Document(id="5", page_content="baz")]) == ["5"] + + # Test add documents with id specified in both document and ids + original_document = Document(id="7", page_content="baz") + assert await store.aadd_documents([original_document], ids=["6"]) == ["6"] + assert original_document.id == "7" # original document should not be modified + assert await store.aget_by_ids(["6"]) == [Document(id="6", page_content="baz")] diff --git a/libs/langchain/langchain/chains/combine_documents/base.py b/libs/langchain/langchain/chains/combine_documents/base.py index 6746c9df8f5df..90e965996de45 100644 --- a/libs/langchain/langchain/chains/combine_documents/base.py +++ b/libs/langchain/langchain/chains/combine_documents/base.py @@ -3,6 +3,7 @@ from abc import ABC, abstractmethod from typing import Any, Dict, List, Optional, Tuple, Type +from langchain_core._api import deprecated from langchain_core.callbacks import ( AsyncCallbackManagerForChainRun, CallbackManagerForChainRun, @@ -157,12 +158,70 @@ async def _acall( return extra_return_dict +@deprecated( + since="0.2.7", + alternative=( + "example in API reference with more detail: " + "https://api.python.langchain.com/en/latest/chains/langchain.chains.combine_documents.base.AnalyzeDocumentChain.html" # noqa: E501 + ), + removal="1.0", +) class AnalyzeDocumentChain(Chain): """Chain that splits documents, then analyzes it in pieces. This chain is parameterized by a TextSplitter and a CombineDocumentsChain. This chain takes a single document as input, and then splits it up into chunks and then passes those chucks to the CombineDocumentsChain. + + This class is deprecated. See below for alternative implementations which + supports async and streaming modes of operation. + + If the underlying combine documents chain takes one ``input_documents`` argument + (e.g., chains generated by ``load_summarize_chain``): + + .. code-block:: python + + split_text = lambda x: text_splitter.create_documents([x]) + + summarize_document_chain = split_text | chain + + If the underlying chain takes additional arguments (e.g., ``load_qa_chain``, which + takes an additional ``question`` argument), we can use the following: + + .. code-block:: python + + from operator import itemgetter + from langchain_core.runnables import RunnableLambda, RunnableParallel + + split_text = RunnableLambda( + lambda x: text_splitter.create_documents([x]) + ) + summarize_document_chain = RunnableParallel( + question=itemgetter("question"), + input_documents=itemgetter("input_document") | split_text, + ) | chain.pick("output_text") + + To additionally return the input parameters, as ``AnalyzeDocumentChain`` does, + we can wrap this construction with ``RunnablePassthrough``: + + .. code-block:: python + + from operator import itemgetter + from langchain_core.runnables import ( + RunnableLambda, + RunnableParallel, + RunnablePassthrough, + ) + + split_text = RunnableLambda( + lambda x: text_splitter.create_documents([x]) + ) + summarize_document_chain = RunnablePassthrough.assign( + output_text=RunnableParallel( + question=itemgetter("question"), + input_documents=itemgetter("input_document") | split_text, + ) | chain.pick("output_text") + ) """ input_key: str = "input_document" #: :meta private: diff --git a/libs/langchain/langchain/chains/qa_generation/base.py b/libs/langchain/langchain/chains/qa_generation/base.py index bdc3444cf700e..b66b8a5442599 100644 --- a/libs/langchain/langchain/chains/qa_generation/base.py +++ b/libs/langchain/langchain/chains/qa_generation/base.py @@ -3,6 +3,7 @@ import json from typing import Any, Dict, List, Optional +from langchain_core._api import deprecated from langchain_core.callbacks import CallbackManagerForChainRun from langchain_core.language_models import BaseLanguageModel from langchain_core.prompts import BasePromptTemplate @@ -14,8 +15,53 @@ from langchain.chains.qa_generation.prompt import PROMPT_SELECTOR +@deprecated( + since="0.2.7", + alternative=( + "example in API reference with more detail: " + "https://api.python.langchain.com/en/latest/chains/langchain.chains.qa_generation.base.QAGenerationChain.html" # noqa: E501 + ), + removal="1.0", +) class QAGenerationChain(Chain): - """Base class for question-answer generation chains.""" + """Base class for question-answer generation chains. + + This class is deprecated. See below for an alternative implementation. + + Advantages of this implementation include: + + - Supports async and streaming; + - Surfaces prompt and text splitter for easier customization; + - Use of JsonOutputParser supports JSONPatch operations in streaming mode, + as well as robustness to markdown. + + .. code-block:: python + + from langchain.chains.qa_generation.prompt import CHAT_PROMPT as prompt + # Note: import PROMPT if using a legacy non-chat model. + from langchain_core.output_parsers import JsonOutputParser + from langchain_core.runnables import ( + RunnableLambda, + RunnableParallel, + RunnablePassthrough, + ) + from langchain_core.runnables.base import RunnableEach + from langchain_openai import ChatOpenAI + from langchain_text_splitters import RecursiveCharacterTextSplitter + + llm = ChatOpenAI() + text_splitter = RecursiveCharacterTextSplitter(chunk_overlap=500) + split_text = RunnableLambda( + lambda x: text_splitter.create_documents([x]) + ) + + chain = RunnableParallel( + text=RunnablePassthrough(), + questions=( + split_text | RunnableEach(bound=prompt | llm | JsonOutputParser()) + ) + ) + """ llm_chain: LLMChain """LLM Chain that generates responses from user input and context.""" diff --git a/libs/partners/ibm/langchain_ibm/chat_models.py b/libs/partners/ibm/langchain_ibm/chat_models.py index 7e7c4374eedd3..e7fa66ea4ed0b 100644 --- a/libs/partners/ibm/langchain_ibm/chat_models.py +++ b/libs/partners/ibm/langchain_ibm/chat_models.py @@ -26,6 +26,7 @@ from langchain_core.language_models import LanguageModelInput from langchain_core.language_models.chat_models import ( BaseChatModel, + LangSmithParams, generate_from_stream, ) from langchain_core.messages import ( @@ -310,8 +311,18 @@ def is_lc_serializable(cls) -> bool: @property def _llm_type(self) -> str: + """Return type of chat model.""" return "watsonx-chat" + def _get_ls_params( + self, stop: Optional[List[str]] = None, **kwargs: Any + ) -> LangSmithParams: + """Get standard params for tracing.""" + params = super()._get_ls_params(stop=stop, **kwargs) + params["ls_provider"] = "together" + params["ls_model_name"] = self.model_id + return params + @property def lc_secrets(self) -> Dict[str, str]: """A map of constructor argument names to secret ids. @@ -457,11 +468,8 @@ def _generate( if "tool_choice" in kwargs: del kwargs["tool_choice"] - if "params" in kwargs: - del kwargs["params"] - response = self.watsonx_model.generate( - prompt=chat_prompt, params=params, **kwargs + prompt=chat_prompt, **(kwargs | {"params": params}) ) return self._create_chat_result(response) @@ -472,11 +480,37 @@ def _stream( run_manager: Optional[CallbackManagerForLLMRun] = None, **kwargs: Any, ) -> Iterator[ChatGenerationChunk]: - message_dicts, params = self._create_message_dicts(messages, stop) + message_dicts, params = self._create_message_dicts(messages, stop, **kwargs) chat_prompt = self._create_chat_prompt(message_dicts) + tools = kwargs.get("tools") + + if tools: + chat_prompt = f"""[AVAILABLE_TOOLS] +{json.dumps(tools[0], indent=2)} +[/AVAILABLE_TOOLS] +[INST]<>You are Mixtral Chat function calling, an AI language model developed by +Mistral AI. You are a cautious assistant. You carefully follow instructions. You are +helpful and harmless and you follow ethical guidelines and promote positive behavior. +<> + +To use these tools you must always respond in JSON format containing `"type"` and +`"function"` key-value pairs. Also `"function"` key-value pair always containing +`"name"` and `"arguments"` key-value pairs. + +Between subsequent JSONs should be one blank line. + +Remember, even when answering to the user, you must still use this only JSON format! + +{chat_prompt}[/INST]""" + + if "tools" in kwargs: + del kwargs["tools"] + if "tool_choice" in kwargs: + del kwargs["tool_choice"] + for chunk in self.watsonx_model.generate_text_stream( - prompt=chat_prompt, raw_response=True, params=params, **kwargs + prompt=chat_prompt, raw_response=True, **(kwargs | {"params": params}) ): if not isinstance(chunk, dict): chunk = chunk.dict() @@ -497,7 +531,9 @@ def _stream( message=chunk, generation_info=generation_info or None ) if run_manager: - run_manager.on_llm_new_token(chunk.text, chunk=chunk, logprobs=logprobs) + run_manager.on_llm_new_token( + chunk.content, chunk=chunk, logprobs=logprobs + ) yield chunk @@ -532,7 +568,9 @@ def _create_chat_prompt(self, messages: List[Dict[str, Any]]) -> str: prompt += message["content"] + "\n[/INST]\n" else: - prompt = ChatPromptValue(messages=convert_to_messages(messages)).to_string() + prompt = ChatPromptValue( + messages=convert_to_messages(messages) + [AIMessage(content="")] + ).to_string() return prompt @@ -567,6 +605,13 @@ def _create_chat_result(self, response: Union[dict]) -> ChatResult: sum_of_total_generated_tokens += res["generated_token_count"] if "input_token_count" in res: sum_of_total_input_tokens += res["input_token_count"] + total_token = sum_of_total_generated_tokens + sum_of_total_input_tokens + if total_token and isinstance(message, AIMessage): + message.usage_metadata = { + "input_tokens": sum_of_total_input_tokens, + "output_tokens": sum_of_total_generated_tokens, + "total_tokens": total_token, + } gen = ChatGeneration( message=message, generation_info=generation_info, diff --git a/libs/partners/ibm/langchain_ibm/llms.py b/libs/partners/ibm/langchain_ibm/llms.py index 039aa02680522..5ed987465c9e6 100644 --- a/libs/partners/ibm/langchain_ibm/llms.py +++ b/libs/partners/ibm/langchain_ibm/llms.py @@ -266,10 +266,15 @@ def get_count_value(key: str, result: Dict[str, Any]) -> int: } def _get_chat_params( - self, stop: Optional[List[str]] = None + self, stop: Optional[List[str]] = None, **kwargs: Any ) -> Optional[Dict[str, Any]]: - params: Optional[Dict[str, Any]] = {**self.params} if self.params else None + params = {**self.params} if self.params else {} + params = params | {**kwargs.get("params", {})} if stop is not None: + if params and "stop_sequences" in params: + raise ValueError( + "`stop_sequences` found in both the input and default params." + ) params = (params or {}) | {"stop_sequences": stop} return params @@ -355,7 +360,7 @@ def _generate( response = watsonx_llm.generate(["What is a molecule"]) """ - params = self._get_chat_params(stop=stop) + params = self._get_chat_params(stop=stop, **kwargs) should_stream = stream if stream is not None else self.streaming if should_stream: if len(prompts) > 1: @@ -378,7 +383,7 @@ def _generate( return LLMResult(generations=[[generation]]) else: response = self.watsonx_model.generate( - prompt=prompts, params=params, **kwargs + prompt=prompts, **(kwargs | {"params": params}) ) return self._create_llm_result(response) @@ -403,9 +408,9 @@ def _stream( for chunk in response: print(chunk, end='') """ - params = self._get_chat_params(stop=stop) + params = self._get_chat_params(stop=stop, **kwargs) for stream_resp in self.watsonx_model.generate_text_stream( - prompt=prompt, raw_response=True, params=params, **kwargs + prompt=prompt, raw_response=True, **(kwargs | {"params": params}) ): if not isinstance(stream_resp, dict): stream_resp = stream_resp.dict() diff --git a/libs/partners/ibm/poetry.lock b/libs/partners/ibm/poetry.lock index af684d8ac364d..f4c3dc04c1cfb 100644 --- a/libs/partners/ibm/poetry.lock +++ b/libs/partners/ibm/poetry.lock @@ -1,4 +1,4 @@ -# This file is automatically @generated by Poetry 1.8.2 and should not be changed by hand. +# This file is automatically @generated by Poetry 1.6.1 and should not be changed by hand. [[package]] name = "annotated-types" @@ -13,13 +13,13 @@ files = [ [[package]] name = "certifi" -version = "2024.6.2" +version = "2024.7.4" description = "Python package for providing Mozilla's CA Bundle." optional = false python-versions = ">=3.6" files = [ - {file = "certifi-2024.6.2-py3-none-any.whl", hash = "sha256:ddc6c8ce995e6987e7faf5e3f1b02b302836a0e5d98ece18392cb1a36c72ad56"}, - {file = "certifi-2024.6.2.tar.gz", hash = "sha256:3cd43f1c6fa7dedc5899d69d3ad0398fd018ad1a17fba83ddaf78aa46c747516"}, + {file = "certifi-2024.7.4-py3-none-any.whl", hash = "sha256:c198e21b1289c2ab85ee4e67bb4b4ef3ead0892059901a8d5b622f24a1101e90"}, + {file = "certifi-2024.7.4.tar.gz", hash = "sha256:5a1e7645bc0ec61a09e26c36f6106dd4cf40c6db3a1fb6352b0244e7fb057c7b"}, ] [[package]] @@ -223,13 +223,13 @@ ibm-cos-sdk-core = "2.13.5" [[package]] name = "ibm-watsonx-ai" -version = "1.0.9" +version = "1.0.10" description = "IBM watsonx.ai API Client" optional = false python-versions = ">=3.10" files = [ - {file = "ibm_watsonx_ai-1.0.9-py3-none-any.whl", hash = "sha256:3b9d42a60418430ddb37b1f9d7e98ade241089518903ebd937d77ff16b07e20b"}, - {file = "ibm_watsonx_ai-1.0.9.tar.gz", hash = "sha256:c0733a028d34ac75904812a6e3d213feffa401498a9e02a3e3d8d4d2d6d83d32"}, + {file = "ibm_watsonx_ai-1.0.10-py3-none-any.whl", hash = "sha256:6358838fc8e5a88f336a55063f3a8efee01311a1f6cab65acb5a7fbb7129670f"}, + {file = "ibm_watsonx_ai-1.0.10.tar.gz", hash = "sha256:e432396399efa342e5e41c6383cf2425890acd49e2b972eb6ea63c9f8409105f"}, ] [package.dependencies] @@ -246,8 +246,8 @@ urllib3 = "*" [package.extras] fl-crypto = ["pyhelayers (==1.5.0.3)"] fl-crypto-rt24-1 = ["pyhelayers (==1.5.3.1)"] -fl-rt23-1-py3-10 = ["GPUtil", "cryptography (==42.0.5)", "ddsketch (==2.0.4)", "diffprivlib (==0.5.1)", "environs (==9.5.0)", "gym", "image (==1.5.33)", "joblib (==1.1.1)", "lz4", "msgpack (==1.0.7)", "msgpack-numpy (==0.4.8)", "numcompress (==0.1.2)", "numpy (==1.23.5)", "pandas (==1.5.3)", "parse (==1.19.0)", "pathlib2 (==2.3.6)", "protobuf (==4.22.1)", "psutil", "pyYAML (==6.0.1)", "pytest (==6.2.5)", "requests (==2.31.0)", "scikit-learn (==1.1.1)", "scipy (==1.10.1)", "setproctitle", "skops (==0.9.0)", "skorch (==0.12.0)", "tabulate (==0.8.9)", "tensorflow (==2.12.0)", "torch (==2.0.1)", "websockets (==10.1)"] -fl-rt24-1-py3-11 = ["GPUtil", "cryptography (==42.0.5)", "ddsketch (==2.0.4)", "diffprivlib (==0.5.1)", "environs (==9.5.0)", "gym", "image (==1.5.33)", "joblib (==1.3.2)", "lz4", "msgpack (==1.0.7)", "msgpack-numpy (==0.4.8)", "numcompress (==0.1.2)", "numpy (==1.26.4)", "pandas (==2.1.4)", "parse (==1.19.0)", "pathlib2 (==2.3.6)", "protobuf (==4.22.1)", "psutil", "pyYAML (==6.0.1)", "pytest (==6.2.5)", "requests (==2.31.0)", "scikit-learn (==1.3.2)", "scipy (==1.11.4)", "setproctitle", "skops (==0.9.0)", "skorch (==0.12.0)", "tabulate (==0.8.9)", "tensorflow (==2.14.1)", "torch (==2.1.2)", "websockets (==10.1)"] +fl-rt23-1-py3-10 = ["GPUtil", "cryptography (==42.0.5)", "ddsketch (==2.0.4)", "diffprivlib (==0.5.1)", "environs (==9.5.0)", "gym", "image (==1.5.33)", "joblib (==1.1.1)", "lz4", "msgpack (==1.0.7)", "msgpack-numpy (==0.4.8)", "numcompress (==0.1.2)", "numpy (==1.23.5)", "pandas (==1.5.3)", "parse (==1.19.0)", "pathlib2 (==2.3.6)", "protobuf (==4.22.1)", "psutil", "pyYAML (==6.0.1)", "pytest (==6.2.5)", "requests (==2.32.3)", "scikit-learn (==1.1.1)", "scipy (==1.10.1)", "setproctitle", "skops (==0.9.0)", "skorch (==0.12.0)", "tabulate (==0.8.9)", "tensorflow (==2.12.0)", "torch (==2.0.1)", "websockets (==10.1)"] +fl-rt24-1-py3-11 = ["GPUtil", "cryptography (==42.0.5)", "ddsketch (==2.0.4)", "diffprivlib (==0.5.1)", "environs (==9.5.0)", "gym", "image (==1.5.33)", "joblib (==1.3.2)", "lz4", "msgpack (==1.0.7)", "msgpack-numpy (==0.4.8)", "numcompress (==0.1.2)", "numpy (==1.26.4)", "pandas (==2.1.4)", "parse (==1.19.0)", "pathlib2 (==2.3.6)", "protobuf (==4.22.1)", "psutil", "pyYAML (==6.0.1)", "pytest (==6.2.5)", "requests (==2.32.3)", "scikit-learn (==1.3.2)", "scipy (==1.11.4)", "setproctitle", "skops (==0.9.0)", "skorch (==0.12.0)", "tabulate (==0.8.9)", "tensorflow (==2.14.1)", "torch (==2.1.2)", "websockets (==10.1)"] [[package]] name = "idna" @@ -352,13 +352,13 @@ url = "../../core" [[package]] name = "langsmith" -version = "0.1.82" +version = "0.1.83" description = "Client library to connect to the LangSmith LLM Tracing and Evaluation Platform." optional = false python-versions = "<4.0,>=3.8.1" files = [ - {file = "langsmith-0.1.82-py3-none-any.whl", hash = "sha256:9b3653e7d316036b0c60bf0bc3e280662d660f485a4ebd8e5c9d84f9831ae79c"}, - {file = "langsmith-0.1.82.tar.gz", hash = "sha256:c02e2bbc488c10c13b52c69d271eb40bd38da078d37b6ae7ae04a18bd48140be"}, + {file = "langsmith-0.1.83-py3-none-any.whl", hash = "sha256:f54d8cd8479b648b6339f3f735d19292c3516d080f680933ecdca3eab4b67ed3"}, + {file = "langsmith-0.1.83.tar.gz", hash = "sha256:5cdd947212c8ad19adb992c06471c860185a777daa6859bb47150f90daf64bf3"}, ] [package.dependencies] @@ -488,57 +488,62 @@ files = [ [[package]] name = "orjson" -version = "3.10.5" +version = "3.10.6" description = "Fast, correct Python JSON library supporting dataclasses, datetimes, and numpy" optional = false python-versions = ">=3.8" files = [ - {file = "orjson-3.10.5-cp310-cp310-macosx_10_15_x86_64.macosx_11_0_arm64.macosx_10_15_universal2.whl", hash = "sha256:545d493c1f560d5ccfc134803ceb8955a14c3fcb47bbb4b2fee0232646d0b932"}, - {file = "orjson-3.10.5-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:f4324929c2dd917598212bfd554757feca3e5e0fa60da08be11b4aa8b90013c1"}, - {file = "orjson-3.10.5-cp310-cp310-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:8c13ca5e2ddded0ce6a927ea5a9f27cae77eee4c75547b4297252cb20c4d30e6"}, - {file = "orjson-3.10.5-cp310-cp310-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:b6c8e30adfa52c025f042a87f450a6b9ea29649d828e0fec4858ed5e6caecf63"}, - {file = "orjson-3.10.5-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:338fd4f071b242f26e9ca802f443edc588fa4ab60bfa81f38beaedf42eda226c"}, - {file = "orjson-3.10.5-cp310-cp310-musllinux_1_2_aarch64.whl", hash = "sha256:6970ed7a3126cfed873c5d21ece1cd5d6f83ca6c9afb71bbae21a0b034588d96"}, - {file = "orjson-3.10.5-cp310-cp310-musllinux_1_2_x86_64.whl", hash = "sha256:235dadefb793ad12f7fa11e98a480db1f7c6469ff9e3da5e73c7809c700d746b"}, - {file = "orjson-3.10.5-cp310-none-win32.whl", hash = "sha256:be79e2393679eda6a590638abda16d167754393f5d0850dcbca2d0c3735cebe2"}, - {file = "orjson-3.10.5-cp310-none-win_amd64.whl", hash = "sha256:c4a65310ccb5c9910c47b078ba78e2787cb3878cdded1702ac3d0da71ddc5228"}, - {file = "orjson-3.10.5-cp311-cp311-macosx_10_15_x86_64.macosx_11_0_arm64.macosx_10_15_universal2.whl", hash = "sha256:cdf7365063e80899ae3a697def1277c17a7df7ccfc979990a403dfe77bb54d40"}, - {file = "orjson-3.10.5-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:6b68742c469745d0e6ca5724506858f75e2f1e5b59a4315861f9e2b1df77775a"}, - {file = "orjson-3.10.5-cp311-cp311-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:7d10cc1b594951522e35a3463da19e899abe6ca95f3c84c69e9e901e0bd93d38"}, - {file = "orjson-3.10.5-cp311-cp311-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:dcbe82b35d1ac43b0d84072408330fd3295c2896973112d495e7234f7e3da2e1"}, - {file = "orjson-3.10.5-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:10c0eb7e0c75e1e486c7563fe231b40fdd658a035ae125c6ba651ca3b07936f5"}, - {file = "orjson-3.10.5-cp311-cp311-musllinux_1_2_aarch64.whl", hash = "sha256:53ed1c879b10de56f35daf06dbc4a0d9a5db98f6ee853c2dbd3ee9d13e6f302f"}, - {file = "orjson-3.10.5-cp311-cp311-musllinux_1_2_x86_64.whl", hash = "sha256:099e81a5975237fda3100f918839af95f42f981447ba8f47adb7b6a3cdb078fa"}, - {file = "orjson-3.10.5-cp311-none-win32.whl", hash = "sha256:1146bf85ea37ac421594107195db8bc77104f74bc83e8ee21a2e58596bfb2f04"}, - {file = "orjson-3.10.5-cp311-none-win_amd64.whl", hash = "sha256:36a10f43c5f3a55c2f680efe07aa93ef4a342d2960dd2b1b7ea2dd764fe4a37c"}, - {file = "orjson-3.10.5-cp312-cp312-macosx_10_15_x86_64.macosx_11_0_arm64.macosx_10_15_universal2.whl", hash = "sha256:68f85ecae7af14a585a563ac741b0547a3f291de81cd1e20903e79f25170458f"}, - {file = "orjson-3.10.5-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:28afa96f496474ce60d3340fe8d9a263aa93ea01201cd2bad844c45cd21f5268"}, - {file = "orjson-3.10.5-cp312-cp312-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:9cd684927af3e11b6e754df80b9ffafd9fb6adcaa9d3e8fdd5891be5a5cad51e"}, - {file = "orjson-3.10.5-cp312-cp312-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:3d21b9983da032505f7050795e98b5d9eee0df903258951566ecc358f6696969"}, - {file = "orjson-3.10.5-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:1ad1de7fef79736dde8c3554e75361ec351158a906d747bd901a52a5c9c8d24b"}, - {file = "orjson-3.10.5-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:2d97531cdfe9bdd76d492e69800afd97e5930cb0da6a825646667b2c6c6c0211"}, - {file = "orjson-3.10.5-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:d69858c32f09c3e1ce44b617b3ebba1aba030e777000ebdf72b0d8e365d0b2b3"}, - {file = "orjson-3.10.5-cp312-none-win32.whl", hash = "sha256:64c9cc089f127e5875901ac05e5c25aa13cfa5dbbbd9602bda51e5c611d6e3e2"}, - {file = "orjson-3.10.5-cp312-none-win_amd64.whl", hash = "sha256:b2efbd67feff8c1f7728937c0d7f6ca8c25ec81373dc8db4ef394c1d93d13dc5"}, - {file = "orjson-3.10.5-cp38-cp38-macosx_10_15_x86_64.macosx_11_0_arm64.macosx_10_15_universal2.whl", hash = "sha256:03b565c3b93f5d6e001db48b747d31ea3819b89abf041ee10ac6988886d18e01"}, - {file = "orjson-3.10.5-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:584c902ec19ab7928fd5add1783c909094cc53f31ac7acfada817b0847975f26"}, - {file = "orjson-3.10.5-cp38-cp38-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:5a35455cc0b0b3a1eaf67224035f5388591ec72b9b6136d66b49a553ce9eb1e6"}, - {file = "orjson-3.10.5-cp38-cp38-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:1670fe88b116c2745a3a30b0f099b699a02bb3482c2591514baf5433819e4f4d"}, - {file = "orjson-3.10.5-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:185c394ef45b18b9a7d8e8f333606e2e8194a50c6e3c664215aae8cf42c5385e"}, - {file = "orjson-3.10.5-cp38-cp38-musllinux_1_2_aarch64.whl", hash = "sha256:ca0b3a94ac8d3886c9581b9f9de3ce858263865fdaa383fbc31c310b9eac07c9"}, - {file = "orjson-3.10.5-cp38-cp38-musllinux_1_2_x86_64.whl", hash = "sha256:dfc91d4720d48e2a709e9c368d5125b4b5899dced34b5400c3837dadc7d6271b"}, - {file = "orjson-3.10.5-cp38-none-win32.whl", hash = "sha256:c05f16701ab2a4ca146d0bca950af254cb7c02f3c01fca8efbbad82d23b3d9d4"}, - {file = "orjson-3.10.5-cp38-none-win_amd64.whl", hash = "sha256:8a11d459338f96a9aa7f232ba95679fc0c7cedbd1b990d736467894210205c09"}, - {file = "orjson-3.10.5-cp39-cp39-macosx_10_15_x86_64.macosx_11_0_arm64.macosx_10_15_universal2.whl", hash = "sha256:85c89131d7b3218db1b24c4abecea92fd6c7f9fab87441cfc342d3acc725d807"}, - {file = "orjson-3.10.5-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:fb66215277a230c456f9038d5e2d84778141643207f85336ef8d2a9da26bd7ca"}, - {file = "orjson-3.10.5-cp39-cp39-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:51bbcdea96cdefa4a9b4461e690c75ad4e33796530d182bdd5c38980202c134a"}, - {file = "orjson-3.10.5-cp39-cp39-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:dbead71dbe65f959b7bd8cf91e0e11d5338033eba34c114f69078d59827ee139"}, - {file = "orjson-3.10.5-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:5df58d206e78c40da118a8c14fc189207fffdcb1f21b3b4c9c0c18e839b5a214"}, - {file = "orjson-3.10.5-cp39-cp39-musllinux_1_2_aarch64.whl", hash = "sha256:c4057c3b511bb8aef605616bd3f1f002a697c7e4da6adf095ca5b84c0fd43595"}, - {file = "orjson-3.10.5-cp39-cp39-musllinux_1_2_x86_64.whl", hash = "sha256:b39e006b00c57125ab974362e740c14a0c6a66ff695bff44615dcf4a70ce2b86"}, - {file = "orjson-3.10.5-cp39-none-win32.whl", hash = "sha256:eded5138cc565a9d618e111c6d5c2547bbdd951114eb822f7f6309e04db0fb47"}, - {file = "orjson-3.10.5-cp39-none-win_amd64.whl", hash = "sha256:cc28e90a7cae7fcba2493953cff61da5a52950e78dc2dacfe931a317ee3d8de7"}, - {file = "orjson-3.10.5.tar.gz", hash = "sha256:7a5baef8a4284405d96c90c7c62b755e9ef1ada84c2406c24a9ebec86b89f46d"}, + {file = "orjson-3.10.6-cp310-cp310-macosx_10_15_x86_64.macosx_11_0_arm64.macosx_10_15_universal2.whl", hash = "sha256:fb0ee33124db6eaa517d00890fc1a55c3bfe1cf78ba4a8899d71a06f2d6ff5c7"}, + {file = "orjson-3.10.6-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:9c1c4b53b24a4c06547ce43e5fee6ec4e0d8fe2d597f4647fc033fd205707365"}, + {file = "orjson-3.10.6-cp310-cp310-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:eadc8fd310edb4bdbd333374f2c8fec6794bbbae99b592f448d8214a5e4050c0"}, + {file = "orjson-3.10.6-cp310-cp310-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:61272a5aec2b2661f4fa2b37c907ce9701e821b2c1285d5c3ab0207ebd358d38"}, + {file = "orjson-3.10.6-cp310-cp310-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:57985ee7e91d6214c837936dc1608f40f330a6b88bb13f5a57ce5257807da143"}, + {file = "orjson-3.10.6-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:633a3b31d9d7c9f02d49c4ab4d0a86065c4a6f6adc297d63d272e043472acab5"}, + {file = "orjson-3.10.6-cp310-cp310-musllinux_1_2_aarch64.whl", hash = "sha256:1c680b269d33ec444afe2bdc647c9eb73166fa47a16d9a75ee56a374f4a45f43"}, + {file = "orjson-3.10.6-cp310-cp310-musllinux_1_2_x86_64.whl", hash = "sha256:f759503a97a6ace19e55461395ab0d618b5a117e8d0fbb20e70cfd68a47327f2"}, + {file = "orjson-3.10.6-cp310-none-win32.whl", hash = "sha256:95a0cce17f969fb5391762e5719575217bd10ac5a189d1979442ee54456393f3"}, + {file = "orjson-3.10.6-cp310-none-win_amd64.whl", hash = "sha256:df25d9271270ba2133cc88ee83c318372bdc0f2cd6f32e7a450809a111efc45c"}, + {file = "orjson-3.10.6-cp311-cp311-macosx_10_15_x86_64.macosx_11_0_arm64.macosx_10_15_universal2.whl", hash = "sha256:b1ec490e10d2a77c345def52599311849fc063ae0e67cf4f84528073152bb2ba"}, + {file = "orjson-3.10.6-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:55d43d3feb8f19d07e9f01e5b9be4f28801cf7c60d0fa0d279951b18fae1932b"}, + {file = "orjson-3.10.6-cp311-cp311-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:ac3045267e98fe749408eee1593a142e02357c5c99be0802185ef2170086a863"}, + {file = "orjson-3.10.6-cp311-cp311-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:c27bc6a28ae95923350ab382c57113abd38f3928af3c80be6f2ba7eb8d8db0b0"}, + {file = "orjson-3.10.6-cp311-cp311-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:d27456491ca79532d11e507cadca37fb8c9324a3976294f68fb1eff2dc6ced5a"}, + {file = "orjson-3.10.6-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:05ac3d3916023745aa3b3b388e91b9166be1ca02b7c7e41045da6d12985685f0"}, + {file = "orjson-3.10.6-cp311-cp311-musllinux_1_2_aarch64.whl", hash = "sha256:1335d4ef59ab85cab66fe73fd7a4e881c298ee7f63ede918b7faa1b27cbe5212"}, + {file = "orjson-3.10.6-cp311-cp311-musllinux_1_2_x86_64.whl", hash = "sha256:4bbc6d0af24c1575edc79994c20e1b29e6fb3c6a570371306db0993ecf144dc5"}, + {file = "orjson-3.10.6-cp311-none-win32.whl", hash = "sha256:450e39ab1f7694465060a0550b3f6d328d20297bf2e06aa947b97c21e5241fbd"}, + {file = "orjson-3.10.6-cp311-none-win_amd64.whl", hash = "sha256:227df19441372610b20e05bdb906e1742ec2ad7a66ac8350dcfd29a63014a83b"}, + {file = "orjson-3.10.6-cp312-cp312-macosx_10_15_x86_64.macosx_11_0_arm64.macosx_10_15_universal2.whl", hash = "sha256:ea2977b21f8d5d9b758bb3f344a75e55ca78e3ff85595d248eee813ae23ecdfb"}, + {file = "orjson-3.10.6-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:b6f3d167d13a16ed263b52dbfedff52c962bfd3d270b46b7518365bcc2121eed"}, + {file = "orjson-3.10.6-cp312-cp312-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:f710f346e4c44a4e8bdf23daa974faede58f83334289df80bc9cd12fe82573c7"}, + {file = "orjson-3.10.6-cp312-cp312-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:7275664f84e027dcb1ad5200b8b18373e9c669b2a9ec33d410c40f5ccf4b257e"}, + {file = "orjson-3.10.6-cp312-cp312-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:0943e4c701196b23c240b3d10ed8ecd674f03089198cf503105b474a4f77f21f"}, + {file = "orjson-3.10.6-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:446dee5a491b5bc7d8f825d80d9637e7af43f86a331207b9c9610e2f93fee22a"}, + {file = "orjson-3.10.6-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:64c81456d2a050d380786413786b057983892db105516639cb5d3ee3c7fd5148"}, + {file = "orjson-3.10.6-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:960db0e31c4e52fa0fc3ecbaea5b2d3b58f379e32a95ae6b0ebeaa25b93dfd34"}, + {file = "orjson-3.10.6-cp312-none-win32.whl", hash = "sha256:a6ea7afb5b30b2317e0bee03c8d34c8181bc5a36f2afd4d0952f378972c4efd5"}, + {file = "orjson-3.10.6-cp312-none-win_amd64.whl", hash = "sha256:874ce88264b7e655dde4aeaacdc8fd772a7962faadfb41abe63e2a4861abc3dc"}, + {file = "orjson-3.10.6-cp38-cp38-macosx_10_15_x86_64.macosx_11_0_arm64.macosx_10_15_universal2.whl", hash = "sha256:66680eae4c4e7fc193d91cfc1353ad6d01b4801ae9b5314f17e11ba55e934183"}, + {file = "orjson-3.10.6-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:caff75b425db5ef8e8f23af93c80f072f97b4fb3afd4af44482905c9f588da28"}, + {file = "orjson-3.10.6-cp38-cp38-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:3722fddb821b6036fd2a3c814f6bd9b57a89dc6337b9924ecd614ebce3271394"}, + {file = "orjson-3.10.6-cp38-cp38-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:c2c116072a8533f2fec435fde4d134610f806bdac20188c7bd2081f3e9e0133f"}, + {file = "orjson-3.10.6-cp38-cp38-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:6eeb13218c8cf34c61912e9df2de2853f1d009de0e46ea09ccdf3d757896af0a"}, + {file = "orjson-3.10.6-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:965a916373382674e323c957d560b953d81d7a8603fbeee26f7b8248638bd48b"}, + {file = "orjson-3.10.6-cp38-cp38-musllinux_1_2_aarch64.whl", hash = "sha256:03c95484d53ed8e479cade8628c9cea00fd9d67f5554764a1110e0d5aa2de96e"}, + {file = "orjson-3.10.6-cp38-cp38-musllinux_1_2_x86_64.whl", hash = "sha256:e060748a04cccf1e0a6f2358dffea9c080b849a4a68c28b1b907f272b5127e9b"}, + {file = "orjson-3.10.6-cp38-none-win32.whl", hash = "sha256:738dbe3ef909c4b019d69afc19caf6b5ed0e2f1c786b5d6215fbb7539246e4c6"}, + {file = "orjson-3.10.6-cp38-none-win_amd64.whl", hash = "sha256:d40f839dddf6a7d77114fe6b8a70218556408c71d4d6e29413bb5f150a692ff7"}, + {file = "orjson-3.10.6-cp39-cp39-macosx_10_15_x86_64.macosx_11_0_arm64.macosx_10_15_universal2.whl", hash = "sha256:697a35a083c4f834807a6232b3e62c8b280f7a44ad0b759fd4dce748951e70db"}, + {file = "orjson-3.10.6-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:fd502f96bf5ea9a61cbc0b2b5900d0dd68aa0da197179042bdd2be67e51a1e4b"}, + {file = "orjson-3.10.6-cp39-cp39-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:f215789fb1667cdc874c1b8af6a84dc939fd802bf293a8334fce185c79cd359b"}, + {file = "orjson-3.10.6-cp39-cp39-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:a2debd8ddce948a8c0938c8c93ade191d2f4ba4649a54302a7da905a81f00b56"}, + {file = "orjson-3.10.6-cp39-cp39-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:5410111d7b6681d4b0d65e0f58a13be588d01b473822483f77f513c7f93bd3b2"}, + {file = "orjson-3.10.6-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:bb1f28a137337fdc18384079fa5726810681055b32b92253fa15ae5656e1dddb"}, + {file = "orjson-3.10.6-cp39-cp39-musllinux_1_2_aarch64.whl", hash = "sha256:bf2fbbce5fe7cd1aa177ea3eab2b8e6a6bc6e8592e4279ed3db2d62e57c0e1b2"}, + {file = "orjson-3.10.6-cp39-cp39-musllinux_1_2_x86_64.whl", hash = "sha256:79b9b9e33bd4c517445a62b90ca0cc279b0f1f3970655c3df9e608bc3f91741a"}, + {file = "orjson-3.10.6-cp39-none-win32.whl", hash = "sha256:30b0a09a2014e621b1adf66a4f705f0809358350a757508ee80209b2d8dae219"}, + {file = "orjson-3.10.6-cp39-none-win_amd64.whl", hash = "sha256:49e3bc615652617d463069f91b867a4458114c5b104e13b7ae6872e5f79d0844"}, + {file = "orjson-3.10.6.tar.gz", hash = "sha256:e54b63d0a7c6c54a5f5f726bc93a2078111ef060fec4ecbf34c5db800ca3b3a7"}, ] [[package]] @@ -637,109 +642,122 @@ testing = ["pytest", "pytest-benchmark"] [[package]] name = "pydantic" -version = "2.7.4" +version = "2.8.2" description = "Data validation using Python type hints" optional = false python-versions = ">=3.8" files = [ - {file = "pydantic-2.7.4-py3-none-any.whl", hash = "sha256:ee8538d41ccb9c0a9ad3e0e5f07bf15ed8015b481ced539a1759d8cc89ae90d0"}, - {file = "pydantic-2.7.4.tar.gz", hash = "sha256:0c84efd9548d545f63ac0060c1e4d39bb9b14db8b3c0652338aecc07b5adec52"}, + {file = "pydantic-2.8.2-py3-none-any.whl", hash = "sha256:73ee9fddd406dc318b885c7a2eab8a6472b68b8fb5ba8150949fc3db939f23c8"}, + {file = "pydantic-2.8.2.tar.gz", hash = "sha256:6f62c13d067b0755ad1c21a34bdd06c0c12625a22b0fc09c6b149816604f7c2a"}, ] [package.dependencies] annotated-types = ">=0.4.0" -pydantic-core = "2.18.4" -typing-extensions = ">=4.6.1" +pydantic-core = "2.20.1" +typing-extensions = [ + {version = ">=4.6.1", markers = "python_version < \"3.13\""}, + {version = ">=4.12.2", markers = "python_version >= \"3.13\""}, +] [package.extras] email = ["email-validator (>=2.0.0)"] [[package]] name = "pydantic-core" -version = "2.18.4" +version = "2.20.1" description = "Core functionality for Pydantic validation and serialization" optional = false python-versions = ">=3.8" files = [ - {file = "pydantic_core-2.18.4-cp310-cp310-macosx_10_12_x86_64.whl", hash = "sha256:f76d0ad001edd426b92233d45c746fd08f467d56100fd8f30e9ace4b005266e4"}, - {file = "pydantic_core-2.18.4-cp310-cp310-macosx_11_0_arm64.whl", hash = "sha256:59ff3e89f4eaf14050c8022011862df275b552caef8082e37b542b066ce1ff26"}, - {file = "pydantic_core-2.18.4-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:a55b5b16c839df1070bc113c1f7f94a0af4433fcfa1b41799ce7606e5c79ce0a"}, - {file = "pydantic_core-2.18.4-cp310-cp310-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:4d0dcc59664fcb8974b356fe0a18a672d6d7cf9f54746c05f43275fc48636851"}, - {file = "pydantic_core-2.18.4-cp310-cp310-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:8951eee36c57cd128f779e641e21eb40bc5073eb28b2d23f33eb0ef14ffb3f5d"}, - {file = "pydantic_core-2.18.4-cp310-cp310-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:4701b19f7e3a06ea655513f7938de6f108123bf7c86bbebb1196eb9bd35cf724"}, - {file = "pydantic_core-2.18.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:e00a3f196329e08e43d99b79b286d60ce46bed10f2280d25a1718399457e06be"}, - {file = "pydantic_core-2.18.4-cp310-cp310-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:97736815b9cc893b2b7f663628e63f436018b75f44854c8027040e05230eeddb"}, - {file = "pydantic_core-2.18.4-cp310-cp310-musllinux_1_1_aarch64.whl", hash = "sha256:6891a2ae0e8692679c07728819b6e2b822fb30ca7445f67bbf6509b25a96332c"}, - {file = "pydantic_core-2.18.4-cp310-cp310-musllinux_1_1_x86_64.whl", hash = "sha256:bc4ff9805858bd54d1a20efff925ccd89c9d2e7cf4986144b30802bf78091c3e"}, - {file = "pydantic_core-2.18.4-cp310-none-win32.whl", hash = "sha256:1b4de2e51bbcb61fdebd0ab86ef28062704f62c82bbf4addc4e37fa4b00b7cbc"}, - {file = "pydantic_core-2.18.4-cp310-none-win_amd64.whl", hash = "sha256:6a750aec7bf431517a9fd78cb93c97b9b0c496090fee84a47a0d23668976b4b0"}, - {file = "pydantic_core-2.18.4-cp311-cp311-macosx_10_12_x86_64.whl", hash = "sha256:942ba11e7dfb66dc70f9ae66b33452f51ac7bb90676da39a7345e99ffb55402d"}, - {file = "pydantic_core-2.18.4-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:b2ebef0e0b4454320274f5e83a41844c63438fdc874ea40a8b5b4ecb7693f1c4"}, - {file = "pydantic_core-2.18.4-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:a642295cd0c8df1b86fc3dced1d067874c353a188dc8e0f744626d49e9aa51c4"}, - {file = "pydantic_core-2.18.4-cp311-cp311-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:5f09baa656c904807e832cf9cce799c6460c450c4ad80803517032da0cd062e2"}, - {file = "pydantic_core-2.18.4-cp311-cp311-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:98906207f29bc2c459ff64fa007afd10a8c8ac080f7e4d5beff4c97086a3dabd"}, - {file = "pydantic_core-2.18.4-cp311-cp311-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:19894b95aacfa98e7cb093cd7881a0c76f55731efad31073db4521e2b6ff5b7d"}, - {file = "pydantic_core-2.18.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:0fbbdc827fe5e42e4d196c746b890b3d72876bdbf160b0eafe9f0334525119c8"}, - {file = "pydantic_core-2.18.4-cp311-cp311-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:f85d05aa0918283cf29a30b547b4df2fbb56b45b135f9e35b6807cb28bc47951"}, - {file = "pydantic_core-2.18.4-cp311-cp311-musllinux_1_1_aarch64.whl", hash = "sha256:e85637bc8fe81ddb73fda9e56bab24560bdddfa98aa64f87aaa4e4b6730c23d2"}, - {file = "pydantic_core-2.18.4-cp311-cp311-musllinux_1_1_x86_64.whl", hash = "sha256:2f5966897e5461f818e136b8451d0551a2e77259eb0f73a837027b47dc95dab9"}, - {file = "pydantic_core-2.18.4-cp311-none-win32.whl", hash = "sha256:44c7486a4228413c317952e9d89598bcdfb06399735e49e0f8df643e1ccd0558"}, - {file = "pydantic_core-2.18.4-cp311-none-win_amd64.whl", hash = "sha256:8a7164fe2005d03c64fd3b85649891cd4953a8de53107940bf272500ba8a788b"}, - {file = "pydantic_core-2.18.4-cp311-none-win_arm64.whl", hash = "sha256:4e99bc050fe65c450344421017f98298a97cefc18c53bb2f7b3531eb39bc7805"}, - {file = "pydantic_core-2.18.4-cp312-cp312-macosx_10_12_x86_64.whl", hash = "sha256:6f5c4d41b2771c730ea1c34e458e781b18cc668d194958e0112455fff4e402b2"}, - {file = "pydantic_core-2.18.4-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:2fdf2156aa3d017fddf8aea5adfba9f777db1d6022d392b682d2a8329e087cef"}, - {file = "pydantic_core-2.18.4-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:4748321b5078216070b151d5271ef3e7cc905ab170bbfd27d5c83ee3ec436695"}, - {file = "pydantic_core-2.18.4-cp312-cp312-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:847a35c4d58721c5dc3dba599878ebbdfd96784f3fb8bb2c356e123bdcd73f34"}, - {file = "pydantic_core-2.18.4-cp312-cp312-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:3c40d4eaad41f78e3bbda31b89edc46a3f3dc6e171bf0ecf097ff7a0ffff7cb1"}, - {file = "pydantic_core-2.18.4-cp312-cp312-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:21a5e440dbe315ab9825fcd459b8814bb92b27c974cbc23c3e8baa2b76890077"}, - {file = "pydantic_core-2.18.4-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:01dd777215e2aa86dfd664daed5957704b769e726626393438f9c87690ce78c3"}, - {file = "pydantic_core-2.18.4-cp312-cp312-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:4b06beb3b3f1479d32befd1f3079cc47b34fa2da62457cdf6c963393340b56e9"}, - {file = "pydantic_core-2.18.4-cp312-cp312-musllinux_1_1_aarch64.whl", hash = "sha256:564d7922e4b13a16b98772441879fcdcbe82ff50daa622d681dd682175ea918c"}, - {file = "pydantic_core-2.18.4-cp312-cp312-musllinux_1_1_x86_64.whl", hash = "sha256:0eb2a4f660fcd8e2b1c90ad566db2b98d7f3f4717c64fe0a83e0adb39766d5b8"}, - {file = "pydantic_core-2.18.4-cp312-none-win32.whl", hash = "sha256:8b8bab4c97248095ae0c4455b5a1cd1cdd96e4e4769306ab19dda135ea4cdb07"}, - {file = "pydantic_core-2.18.4-cp312-none-win_amd64.whl", hash = "sha256:14601cdb733d741b8958224030e2bfe21a4a881fb3dd6fbb21f071cabd48fa0a"}, - {file = "pydantic_core-2.18.4-cp312-none-win_arm64.whl", hash = "sha256:c1322d7dd74713dcc157a2b7898a564ab091ca6c58302d5c7b4c07296e3fd00f"}, - {file = "pydantic_core-2.18.4-cp38-cp38-macosx_10_12_x86_64.whl", hash = "sha256:823be1deb01793da05ecb0484d6c9e20baebb39bd42b5d72636ae9cf8350dbd2"}, - {file = "pydantic_core-2.18.4-cp38-cp38-macosx_11_0_arm64.whl", hash = "sha256:ebef0dd9bf9b812bf75bda96743f2a6c5734a02092ae7f721c048d156d5fabae"}, - {file = "pydantic_core-2.18.4-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:ae1d6df168efb88d7d522664693607b80b4080be6750c913eefb77e34c12c71a"}, - {file = "pydantic_core-2.18.4-cp38-cp38-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:f9899c94762343f2cc2fc64c13e7cae4c3cc65cdfc87dd810a31654c9b7358cc"}, - {file = "pydantic_core-2.18.4-cp38-cp38-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:99457f184ad90235cfe8461c4d70ab7dd2680e28821c29eca00252ba90308c78"}, - {file = "pydantic_core-2.18.4-cp38-cp38-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:18f469a3d2a2fdafe99296a87e8a4c37748b5080a26b806a707f25a902c040a8"}, - {file = "pydantic_core-2.18.4-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:b7cdf28938ac6b8b49ae5e92f2735056a7ba99c9b110a474473fd71185c1af5d"}, - {file = "pydantic_core-2.18.4-cp38-cp38-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:938cb21650855054dc54dfd9120a851c974f95450f00683399006aa6e8abb057"}, - {file = "pydantic_core-2.18.4-cp38-cp38-musllinux_1_1_aarch64.whl", hash = "sha256:44cd83ab6a51da80fb5adbd9560e26018e2ac7826f9626bc06ca3dc074cd198b"}, - {file = "pydantic_core-2.18.4-cp38-cp38-musllinux_1_1_x86_64.whl", hash = "sha256:972658f4a72d02b8abfa2581d92d59f59897d2e9f7e708fdabe922f9087773af"}, - {file = "pydantic_core-2.18.4-cp38-none-win32.whl", hash = "sha256:1d886dc848e60cb7666f771e406acae54ab279b9f1e4143babc9c2258213daa2"}, - {file = "pydantic_core-2.18.4-cp38-none-win_amd64.whl", hash = "sha256:bb4462bd43c2460774914b8525f79b00f8f407c945d50881568f294c1d9b4443"}, - {file = "pydantic_core-2.18.4-cp39-cp39-macosx_10_12_x86_64.whl", hash = "sha256:44a688331d4a4e2129140a8118479443bd6f1905231138971372fcde37e43528"}, - {file = "pydantic_core-2.18.4-cp39-cp39-macosx_11_0_arm64.whl", hash = "sha256:a2fdd81edd64342c85ac7cf2753ccae0b79bf2dfa063785503cb85a7d3593223"}, - {file = "pydantic_core-2.18.4-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:86110d7e1907ab36691f80b33eb2da87d780f4739ae773e5fc83fb272f88825f"}, - {file = "pydantic_core-2.18.4-cp39-cp39-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:46387e38bd641b3ee5ce247563b60c5ca098da9c56c75c157a05eaa0933ed154"}, - {file = "pydantic_core-2.18.4-cp39-cp39-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:123c3cec203e3f5ac7b000bd82235f1a3eced8665b63d18be751f115588fea30"}, - {file = "pydantic_core-2.18.4-cp39-cp39-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:dc1803ac5c32ec324c5261c7209e8f8ce88e83254c4e1aebdc8b0a39f9ddb443"}, - {file = "pydantic_core-2.18.4-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:53db086f9f6ab2b4061958d9c276d1dbe3690e8dd727d6abf2321d6cce37fa94"}, - {file = "pydantic_core-2.18.4-cp39-cp39-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:abc267fa9837245cc28ea6929f19fa335f3dc330a35d2e45509b6566dc18be23"}, - {file = "pydantic_core-2.18.4-cp39-cp39-musllinux_1_1_aarch64.whl", hash = "sha256:a0d829524aaefdebccb869eed855e2d04c21d2d7479b6cada7ace5448416597b"}, - {file = "pydantic_core-2.18.4-cp39-cp39-musllinux_1_1_x86_64.whl", hash = "sha256:509daade3b8649f80d4e5ff21aa5673e4ebe58590b25fe42fac5f0f52c6f034a"}, - {file = "pydantic_core-2.18.4-cp39-none-win32.whl", hash = "sha256:ca26a1e73c48cfc54c4a76ff78df3727b9d9f4ccc8dbee4ae3f73306a591676d"}, - {file = "pydantic_core-2.18.4-cp39-none-win_amd64.whl", hash = "sha256:c67598100338d5d985db1b3d21f3619ef392e185e71b8d52bceacc4a7771ea7e"}, - {file = "pydantic_core-2.18.4-pp310-pypy310_pp73-macosx_10_12_x86_64.whl", hash = "sha256:574d92eac874f7f4db0ca653514d823a0d22e2354359d0759e3f6a406db5d55d"}, - {file = "pydantic_core-2.18.4-pp310-pypy310_pp73-macosx_11_0_arm64.whl", hash = "sha256:1f4d26ceb5eb9eed4af91bebeae4b06c3fb28966ca3a8fb765208cf6b51102ab"}, - {file = "pydantic_core-2.18.4-pp310-pypy310_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:77450e6d20016ec41f43ca4a6c63e9fdde03f0ae3fe90e7c27bdbeaece8b1ed4"}, - {file = "pydantic_core-2.18.4-pp310-pypy310_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:d323a01da91851a4f17bf592faf46149c9169d68430b3146dcba2bb5e5719abc"}, - {file = "pydantic_core-2.18.4-pp310-pypy310_pp73-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:43d447dd2ae072a0065389092a231283f62d960030ecd27565672bd40746c507"}, - {file = "pydantic_core-2.18.4-pp310-pypy310_pp73-musllinux_1_1_aarch64.whl", hash = "sha256:578e24f761f3b425834f297b9935e1ce2e30f51400964ce4801002435a1b41ef"}, - {file = "pydantic_core-2.18.4-pp310-pypy310_pp73-musllinux_1_1_x86_64.whl", hash = "sha256:81b5efb2f126454586d0f40c4d834010979cb80785173d1586df845a632e4e6d"}, - {file = "pydantic_core-2.18.4-pp310-pypy310_pp73-win_amd64.whl", hash = "sha256:ab86ce7c8f9bea87b9d12c7f0af71102acbf5ecbc66c17796cff45dae54ef9a5"}, - {file = "pydantic_core-2.18.4-pp39-pypy39_pp73-macosx_10_12_x86_64.whl", hash = "sha256:90afc12421df2b1b4dcc975f814e21bc1754640d502a2fbcc6d41e77af5ec312"}, - {file = "pydantic_core-2.18.4-pp39-pypy39_pp73-macosx_11_0_arm64.whl", hash = "sha256:51991a89639a912c17bef4b45c87bd83593aee0437d8102556af4885811d59f5"}, - {file = "pydantic_core-2.18.4-pp39-pypy39_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:293afe532740370aba8c060882f7d26cfd00c94cae32fd2e212a3a6e3b7bc15e"}, - {file = "pydantic_core-2.18.4-pp39-pypy39_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:b48ece5bde2e768197a2d0f6e925f9d7e3e826f0ad2271120f8144a9db18d5c8"}, - {file = "pydantic_core-2.18.4-pp39-pypy39_pp73-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:eae237477a873ab46e8dd748e515c72c0c804fb380fbe6c85533c7de51f23a8f"}, - {file = "pydantic_core-2.18.4-pp39-pypy39_pp73-musllinux_1_1_aarch64.whl", hash = "sha256:834b5230b5dfc0c1ec37b2fda433b271cbbc0e507560b5d1588e2cc1148cf1ce"}, - {file = "pydantic_core-2.18.4-pp39-pypy39_pp73-musllinux_1_1_x86_64.whl", hash = "sha256:e858ac0a25074ba4bce653f9b5d0a85b7456eaddadc0ce82d3878c22489fa4ee"}, - {file = "pydantic_core-2.18.4-pp39-pypy39_pp73-win_amd64.whl", hash = "sha256:2fd41f6eff4c20778d717af1cc50eca52f5afe7805ee530a4fbd0bae284f16e9"}, - {file = "pydantic_core-2.18.4.tar.gz", hash = "sha256:ec3beeada09ff865c344ff3bc2f427f5e6c26401cc6113d77e372c3fdac73864"}, + {file = "pydantic_core-2.20.1-cp310-cp310-macosx_10_12_x86_64.whl", hash = "sha256:3acae97ffd19bf091c72df4d726d552c473f3576409b2a7ca36b2f535ffff4a3"}, + {file = "pydantic_core-2.20.1-cp310-cp310-macosx_11_0_arm64.whl", hash = "sha256:41f4c96227a67a013e7de5ff8f20fb496ce573893b7f4f2707d065907bffdbd6"}, + {file = "pydantic_core-2.20.1-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:5f239eb799a2081495ea659d8d4a43a8f42cd1fe9ff2e7e436295c38a10c286a"}, + {file = "pydantic_core-2.20.1-cp310-cp310-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:53e431da3fc53360db73eedf6f7124d1076e1b4ee4276b36fb25514544ceb4a3"}, + {file = "pydantic_core-2.20.1-cp310-cp310-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:f1f62b2413c3a0e846c3b838b2ecd6c7a19ec6793b2a522745b0869e37ab5bc1"}, + {file = "pydantic_core-2.20.1-cp310-cp310-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:5d41e6daee2813ecceea8eda38062d69e280b39df793f5a942fa515b8ed67953"}, + {file = "pydantic_core-2.20.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:3d482efec8b7dc6bfaedc0f166b2ce349df0011f5d2f1f25537ced4cfc34fd98"}, + {file = "pydantic_core-2.20.1-cp310-cp310-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:e93e1a4b4b33daed65d781a57a522ff153dcf748dee70b40c7258c5861e1768a"}, + {file = "pydantic_core-2.20.1-cp310-cp310-musllinux_1_1_aarch64.whl", hash = "sha256:e7c4ea22b6739b162c9ecaaa41d718dfad48a244909fe7ef4b54c0b530effc5a"}, + {file = "pydantic_core-2.20.1-cp310-cp310-musllinux_1_1_x86_64.whl", hash = "sha256:4f2790949cf385d985a31984907fecb3896999329103df4e4983a4a41e13e840"}, + {file = "pydantic_core-2.20.1-cp310-none-win32.whl", hash = "sha256:5e999ba8dd90e93d57410c5e67ebb67ffcaadcea0ad973240fdfd3a135506250"}, + {file = "pydantic_core-2.20.1-cp310-none-win_amd64.whl", hash = "sha256:512ecfbefef6dac7bc5eaaf46177b2de58cdf7acac8793fe033b24ece0b9566c"}, + {file = "pydantic_core-2.20.1-cp311-cp311-macosx_10_12_x86_64.whl", hash = "sha256:d2a8fa9d6d6f891f3deec72f5cc668e6f66b188ab14bb1ab52422fe8e644f312"}, + {file = "pydantic_core-2.20.1-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:175873691124f3d0da55aeea1d90660a6ea7a3cfea137c38afa0a5ffabe37b88"}, + {file = "pydantic_core-2.20.1-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:37eee5b638f0e0dcd18d21f59b679686bbd18917b87db0193ae36f9c23c355fc"}, + {file = "pydantic_core-2.20.1-cp311-cp311-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:25e9185e2d06c16ee438ed39bf62935ec436474a6ac4f9358524220f1b236e43"}, + {file = "pydantic_core-2.20.1-cp311-cp311-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:150906b40ff188a3260cbee25380e7494ee85048584998c1e66df0c7a11c17a6"}, + {file = "pydantic_core-2.20.1-cp311-cp311-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:8ad4aeb3e9a97286573c03df758fc7627aecdd02f1da04516a86dc159bf70121"}, + {file = "pydantic_core-2.20.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:d3f3ed29cd9f978c604708511a1f9c2fdcb6c38b9aae36a51905b8811ee5cbf1"}, + {file = "pydantic_core-2.20.1-cp311-cp311-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:b0dae11d8f5ded51699c74d9548dcc5938e0804cc8298ec0aa0da95c21fff57b"}, + {file = "pydantic_core-2.20.1-cp311-cp311-musllinux_1_1_aarch64.whl", hash = "sha256:faa6b09ee09433b87992fb5a2859efd1c264ddc37280d2dd5db502126d0e7f27"}, + {file = "pydantic_core-2.20.1-cp311-cp311-musllinux_1_1_x86_64.whl", hash = "sha256:9dc1b507c12eb0481d071f3c1808f0529ad41dc415d0ca11f7ebfc666e66a18b"}, + {file = "pydantic_core-2.20.1-cp311-none-win32.whl", hash = "sha256:fa2fddcb7107e0d1808086ca306dcade7df60a13a6c347a7acf1ec139aa6789a"}, + {file = "pydantic_core-2.20.1-cp311-none-win_amd64.whl", hash = "sha256:40a783fb7ee353c50bd3853e626f15677ea527ae556429453685ae32280c19c2"}, + {file = "pydantic_core-2.20.1-cp312-cp312-macosx_10_12_x86_64.whl", hash = "sha256:595ba5be69b35777474fa07f80fc260ea71255656191adb22a8c53aba4479231"}, + {file = "pydantic_core-2.20.1-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:a4f55095ad087474999ee28d3398bae183a66be4823f753cd7d67dd0153427c9"}, + {file = "pydantic_core-2.20.1-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:f9aa05d09ecf4c75157197f27cdc9cfaeb7c5f15021c6373932bf3e124af029f"}, + {file = "pydantic_core-2.20.1-cp312-cp312-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:e97fdf088d4b31ff4ba35db26d9cc472ac7ef4a2ff2badeabf8d727b3377fc52"}, + {file = "pydantic_core-2.20.1-cp312-cp312-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:bc633a9fe1eb87e250b5c57d389cf28998e4292336926b0b6cdaee353f89a237"}, + {file = "pydantic_core-2.20.1-cp312-cp312-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:d573faf8eb7e6b1cbbcb4f5b247c60ca8be39fe2c674495df0eb4318303137fe"}, + {file = "pydantic_core-2.20.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:26dc97754b57d2fd00ac2b24dfa341abffc380b823211994c4efac7f13b9e90e"}, + {file = "pydantic_core-2.20.1-cp312-cp312-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:33499e85e739a4b60c9dac710c20a08dc73cb3240c9a0e22325e671b27b70d24"}, + {file = "pydantic_core-2.20.1-cp312-cp312-musllinux_1_1_aarch64.whl", hash = "sha256:bebb4d6715c814597f85297c332297c6ce81e29436125ca59d1159b07f423eb1"}, + {file = "pydantic_core-2.20.1-cp312-cp312-musllinux_1_1_x86_64.whl", hash = "sha256:516d9227919612425c8ef1c9b869bbbee249bc91912c8aaffb66116c0b447ebd"}, + {file = "pydantic_core-2.20.1-cp312-none-win32.whl", hash = "sha256:469f29f9093c9d834432034d33f5fe45699e664f12a13bf38c04967ce233d688"}, + {file = "pydantic_core-2.20.1-cp312-none-win_amd64.whl", hash = "sha256:035ede2e16da7281041f0e626459bcae33ed998cca6a0a007a5ebb73414ac72d"}, + {file = "pydantic_core-2.20.1-cp313-cp313-macosx_10_12_x86_64.whl", hash = "sha256:0827505a5c87e8aa285dc31e9ec7f4a17c81a813d45f70b1d9164e03a813a686"}, + {file = "pydantic_core-2.20.1-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:19c0fa39fa154e7e0b7f82f88ef85faa2a4c23cc65aae2f5aea625e3c13c735a"}, + {file = "pydantic_core-2.20.1-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:4aa223cd1e36b642092c326d694d8bf59b71ddddc94cdb752bbbb1c5c91d833b"}, + {file = "pydantic_core-2.20.1-cp313-cp313-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:c336a6d235522a62fef872c6295a42ecb0c4e1d0f1a3e500fe949415761b8a19"}, + {file = "pydantic_core-2.20.1-cp313-cp313-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:7eb6a0587eded33aeefea9f916899d42b1799b7b14b8f8ff2753c0ac1741edac"}, + {file = "pydantic_core-2.20.1-cp313-cp313-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:70c8daf4faca8da5a6d655f9af86faf6ec2e1768f4b8b9d0226c02f3d6209703"}, + {file = "pydantic_core-2.20.1-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:e9fa4c9bf273ca41f940bceb86922a7667cd5bf90e95dbb157cbb8441008482c"}, + {file = "pydantic_core-2.20.1-cp313-cp313-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:11b71d67b4725e7e2a9f6e9c0ac1239bbc0c48cce3dc59f98635efc57d6dac83"}, + {file = "pydantic_core-2.20.1-cp313-cp313-musllinux_1_1_aarch64.whl", hash = "sha256:270755f15174fb983890c49881e93f8f1b80f0b5e3a3cc1394a255706cabd203"}, + {file = "pydantic_core-2.20.1-cp313-cp313-musllinux_1_1_x86_64.whl", hash = "sha256:c81131869240e3e568916ef4c307f8b99583efaa60a8112ef27a366eefba8ef0"}, + {file = "pydantic_core-2.20.1-cp313-none-win32.whl", hash = "sha256:b91ced227c41aa29c672814f50dbb05ec93536abf8f43cd14ec9521ea09afe4e"}, + {file = "pydantic_core-2.20.1-cp313-none-win_amd64.whl", hash = "sha256:65db0f2eefcaad1a3950f498aabb4875c8890438bc80b19362cf633b87a8ab20"}, + {file = "pydantic_core-2.20.1-cp38-cp38-macosx_10_12_x86_64.whl", hash = "sha256:4745f4ac52cc6686390c40eaa01d48b18997cb130833154801a442323cc78f91"}, + {file = "pydantic_core-2.20.1-cp38-cp38-macosx_11_0_arm64.whl", hash = "sha256:a8ad4c766d3f33ba8fd692f9aa297c9058970530a32c728a2c4bfd2616d3358b"}, + {file = "pydantic_core-2.20.1-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:41e81317dd6a0127cabce83c0c9c3fbecceae981c8391e6f1dec88a77c8a569a"}, + {file = "pydantic_core-2.20.1-cp38-cp38-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:04024d270cf63f586ad41fff13fde4311c4fc13ea74676962c876d9577bcc78f"}, + {file = "pydantic_core-2.20.1-cp38-cp38-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:eaad4ff2de1c3823fddf82f41121bdf453d922e9a238642b1dedb33c4e4f98ad"}, + {file = "pydantic_core-2.20.1-cp38-cp38-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:26ab812fa0c845df815e506be30337e2df27e88399b985d0bb4e3ecfe72df31c"}, + {file = "pydantic_core-2.20.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:3c5ebac750d9d5f2706654c638c041635c385596caf68f81342011ddfa1e5598"}, + {file = "pydantic_core-2.20.1-cp38-cp38-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:2aafc5a503855ea5885559eae883978c9b6d8c8993d67766ee73d82e841300dd"}, + {file = "pydantic_core-2.20.1-cp38-cp38-musllinux_1_1_aarch64.whl", hash = "sha256:4868f6bd7c9d98904b748a2653031fc9c2f85b6237009d475b1008bfaeb0a5aa"}, + {file = "pydantic_core-2.20.1-cp38-cp38-musllinux_1_1_x86_64.whl", hash = "sha256:aa2f457b4af386254372dfa78a2eda2563680d982422641a85f271c859df1987"}, + {file = "pydantic_core-2.20.1-cp38-none-win32.whl", hash = "sha256:225b67a1f6d602de0ce7f6c1c3ae89a4aa25d3de9be857999e9124f15dab486a"}, + {file = "pydantic_core-2.20.1-cp38-none-win_amd64.whl", hash = "sha256:6b507132dcfc0dea440cce23ee2182c0ce7aba7054576efc65634f080dbe9434"}, + {file = "pydantic_core-2.20.1-cp39-cp39-macosx_10_12_x86_64.whl", hash = "sha256:b03f7941783b4c4a26051846dea594628b38f6940a2fdc0df00b221aed39314c"}, + {file = "pydantic_core-2.20.1-cp39-cp39-macosx_11_0_arm64.whl", hash = "sha256:1eedfeb6089ed3fad42e81a67755846ad4dcc14d73698c120a82e4ccf0f1f9f6"}, + {file = "pydantic_core-2.20.1-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:635fee4e041ab9c479e31edda27fcf966ea9614fff1317e280d99eb3e5ab6fe2"}, + {file = "pydantic_core-2.20.1-cp39-cp39-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:77bf3ac639c1ff567ae3b47f8d4cc3dc20f9966a2a6dd2311dcc055d3d04fb8a"}, + {file = "pydantic_core-2.20.1-cp39-cp39-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:7ed1b0132f24beeec5a78b67d9388656d03e6a7c837394f99257e2d55b461611"}, + {file = "pydantic_core-2.20.1-cp39-cp39-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:c6514f963b023aeee506678a1cf821fe31159b925c4b76fe2afa94cc70b3222b"}, + {file = "pydantic_core-2.20.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:10d4204d8ca33146e761c79f83cc861df20e7ae9f6487ca290a97702daf56006"}, + {file = "pydantic_core-2.20.1-cp39-cp39-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:2d036c7187b9422ae5b262badb87a20a49eb6c5238b2004e96d4da1231badef1"}, + {file = "pydantic_core-2.20.1-cp39-cp39-musllinux_1_1_aarch64.whl", hash = "sha256:9ebfef07dbe1d93efb94b4700f2d278494e9162565a54f124c404a5656d7ff09"}, + {file = "pydantic_core-2.20.1-cp39-cp39-musllinux_1_1_x86_64.whl", hash = "sha256:6b9d9bb600328a1ce523ab4f454859e9d439150abb0906c5a1983c146580ebab"}, + {file = "pydantic_core-2.20.1-cp39-none-win32.whl", hash = "sha256:784c1214cb6dd1e3b15dd8b91b9a53852aed16671cc3fbe4786f4f1db07089e2"}, + {file = "pydantic_core-2.20.1-cp39-none-win_amd64.whl", hash = "sha256:d2fe69c5434391727efa54b47a1e7986bb0186e72a41b203df8f5b0a19a4f669"}, + {file = "pydantic_core-2.20.1-pp310-pypy310_pp73-macosx_10_12_x86_64.whl", hash = "sha256:a45f84b09ac9c3d35dfcf6a27fd0634d30d183205230a0ebe8373a0e8cfa0906"}, + {file = "pydantic_core-2.20.1-pp310-pypy310_pp73-macosx_11_0_arm64.whl", hash = "sha256:d02a72df14dfdbaf228424573a07af10637bd490f0901cee872c4f434a735b94"}, + {file = "pydantic_core-2.20.1-pp310-pypy310_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:d2b27e6af28f07e2f195552b37d7d66b150adbaa39a6d327766ffd695799780f"}, + {file = "pydantic_core-2.20.1-pp310-pypy310_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:084659fac3c83fd674596612aeff6041a18402f1e1bc19ca39e417d554468482"}, + {file = "pydantic_core-2.20.1-pp310-pypy310_pp73-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:242b8feb3c493ab78be289c034a1f659e8826e2233786e36f2893a950a719bb6"}, + {file = "pydantic_core-2.20.1-pp310-pypy310_pp73-musllinux_1_1_aarch64.whl", hash = "sha256:38cf1c40a921d05c5edc61a785c0ddb4bed67827069f535d794ce6bcded919fc"}, + {file = "pydantic_core-2.20.1-pp310-pypy310_pp73-musllinux_1_1_x86_64.whl", hash = "sha256:e0bbdd76ce9aa5d4209d65f2b27fc6e5ef1312ae6c5333c26db3f5ade53a1e99"}, + {file = "pydantic_core-2.20.1-pp310-pypy310_pp73-win_amd64.whl", hash = "sha256:254ec27fdb5b1ee60684f91683be95e5133c994cc54e86a0b0963afa25c8f8a6"}, + {file = "pydantic_core-2.20.1-pp39-pypy39_pp73-macosx_10_12_x86_64.whl", hash = "sha256:407653af5617f0757261ae249d3fba09504d7a71ab36ac057c938572d1bc9331"}, + {file = "pydantic_core-2.20.1-pp39-pypy39_pp73-macosx_11_0_arm64.whl", hash = "sha256:c693e916709c2465b02ca0ad7b387c4f8423d1db7b4649c551f27a529181c5ad"}, + {file = "pydantic_core-2.20.1-pp39-pypy39_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:5b5ff4911aea936a47d9376fd3ab17e970cc543d1b68921886e7f64bd28308d1"}, + {file = "pydantic_core-2.20.1-pp39-pypy39_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:177f55a886d74f1808763976ac4efd29b7ed15c69f4d838bbd74d9d09cf6fa86"}, + {file = "pydantic_core-2.20.1-pp39-pypy39_pp73-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:964faa8a861d2664f0c7ab0c181af0bea66098b1919439815ca8803ef136fc4e"}, + {file = "pydantic_core-2.20.1-pp39-pypy39_pp73-musllinux_1_1_aarch64.whl", hash = "sha256:4dd484681c15e6b9a977c785a345d3e378d72678fd5f1f3c0509608da24f2ac0"}, + {file = "pydantic_core-2.20.1-pp39-pypy39_pp73-musllinux_1_1_x86_64.whl", hash = "sha256:f6d6cff3538391e8486a431569b77921adfcdef14eb18fbf19b7c0a5294d4e6a"}, + {file = "pydantic_core-2.20.1-pp39-pypy39_pp73-win_amd64.whl", hash = "sha256:a6d511cc297ff0883bc3708b465ff82d7560193169a8b93260f74ecb0a5e08a7"}, + {file = "pydantic_core-2.20.1.tar.gz", hash = "sha256:26ca695eeee5f9f1aeeb211ffc12f10bcb6f71e2989988fda61dabd65db878d4"}, ] [package.dependencies] @@ -867,7 +885,6 @@ files = [ {file = "PyYAML-6.0.1-cp311-cp311-win_amd64.whl", hash = "sha256:bf07ee2fef7014951eeb99f56f39c9bb4af143d8aa3c21b1677805985307da34"}, {file = "PyYAML-6.0.1-cp312-cp312-macosx_10_9_x86_64.whl", hash = "sha256:855fb52b0dc35af121542a76b9a84f8d1cd886ea97c84703eaa6d88e37a2ad28"}, {file = "PyYAML-6.0.1-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:40df9b996c2b73138957fe23a16a4f0ba614f4c0efce1e9406a184b6d07fa3a9"}, - {file = "PyYAML-6.0.1-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:a08c6f0fe150303c1c6b71ebcd7213c2858041a7e01975da3a99aed1e7a378ef"}, {file = "PyYAML-6.0.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:6c22bec3fbe2524cde73d7ada88f6566758a8f7227bfbf93a408a9d86bcc12a0"}, {file = "PyYAML-6.0.1-cp312-cp312-musllinux_1_1_x86_64.whl", hash = "sha256:8d4e9c88387b0f5c7d5f281e55304de64cf7f9c0021a3525bd3b1c542da3b0e4"}, {file = "PyYAML-6.0.1-cp312-cp312-win32.whl", hash = "sha256:d483d2cdf104e7c9fa60c544d92981f12ad66a457afae824d146093b8c294c54"}, diff --git a/libs/partners/ibm/pyproject.toml b/libs/partners/ibm/pyproject.toml index c55358efd385e..7542bd1eb6b9b 100644 --- a/libs/partners/ibm/pyproject.toml +++ b/libs/partners/ibm/pyproject.toml @@ -4,7 +4,7 @@ build-backend = "poetry.core.masonry.api" [tool.poetry] name = "langchain-ibm" -version = "0.1.8" +version = "0.1.9" description = "An integration package connecting IBM watsonx.ai and LangChain" authors = [ "IBM",] readme = "README.md" diff --git a/libs/partners/ibm/tests/integration_tests/test_chat_models.py b/libs/partners/ibm/tests/integration_tests/test_chat_models.py index 391608dd6c7cc..440472fb2018a 100644 --- a/libs/partners/ibm/tests/integration_tests/test_chat_models.py +++ b/libs/partners/ibm/tests/integration_tests/test_chat_models.py @@ -52,6 +52,34 @@ def test_01a_generate_chat_with_invoke_params() -> None: assert response +def test_01b_generate_chat_with_invoke_params() -> None: + from ibm_watsonx_ai.metanames import GenTextParamsMetaNames + + params_1 = { + GenTextParamsMetaNames.MIN_NEW_TOKENS: 1, + GenTextParamsMetaNames.MAX_NEW_TOKENS: 10, + } + params_2 = { + GenTextParamsMetaNames.MIN_NEW_TOKENS: 1, + GenTextParamsMetaNames.MAX_NEW_TOKENS: 10, + } + chat = ChatWatsonx( + model_id=MODEL_ID, + url=URL, # type: ignore[arg-type] + project_id=WX_PROJECT_ID, + params=params_1, # type: ignore[arg-type] + ) + messages = [ + ("system", "You are a helpful assistant that translates English to French."), + ( + "human", + "Translate this sentence from English to French. I love programming.", + ), + ] + response = chat.invoke(messages, params=params_2) + assert response + + def test_02_generate_chat_with_few_inputs() -> None: chat = ChatWatsonx(model_id=MODEL_ID, url=URL, project_id=WX_PROJECT_ID) # type: ignore[arg-type] message = HumanMessage(content="Hello") @@ -75,6 +103,19 @@ def test_05_generate_chat_with_stream() -> None: assert isinstance(chunk.content, str) +def test_05a_generate_chat_with_stream_with_param() -> None: + from ibm_watsonx_ai.metanames import GenTextParamsMetaNames + + params = { + GenTextParamsMetaNames.MIN_NEW_TOKENS: 1, + GenTextParamsMetaNames.MAX_NEW_TOKENS: 10, + } + chat = ChatWatsonx(model_id=MODEL_ID, url=URL, project_id=WX_PROJECT_ID) # type: ignore[arg-type] + response = chat.stream("What's the weather in san francisco", params=params) + for chunk in response: + assert isinstance(chunk.content, str) + + def test_10_chaining() -> None: chat = ChatWatsonx(model_id=MODEL_ID, url=URL, project_id=WX_PROJECT_ID) # type: ignore[arg-type] prompt = ChatPromptTemplate.from_messages( diff --git a/libs/partners/ibm/tests/integration_tests/test_llms.py b/libs/partners/ibm/tests/integration_tests/test_llms.py index dc4c071ce7b68..cf0a39e9971f3 100644 --- a/libs/partners/ibm/tests/integration_tests/test_llms.py +++ b/libs/partners/ibm/tests/integration_tests/test_llms.py @@ -52,6 +52,45 @@ def test_watsonxllm_invoke_with_params() -> None: assert len(response) > 0 +def test_watsonxllm_invoke_with_params_2() -> None: + parameters = { + GenTextParamsMetaNames.DECODING_METHOD: "sample", + GenTextParamsMetaNames.MAX_NEW_TOKENS: 10, + GenTextParamsMetaNames.MIN_NEW_TOKENS: 5, + } + + watsonxllm = WatsonxLLM( + model_id=MODEL_ID, + url="https://us-south.ml.cloud.ibm.com", # type: ignore[arg-type] + project_id=WX_PROJECT_ID, + ) + response = watsonxllm.invoke("What color sunflower is?", params=parameters) + print(f"\nResponse: {response}") + assert isinstance(response, str) + assert len(response) > 0 + + +def test_watsonxllm_invoke_with_params_3() -> None: + parameters_1 = { + GenTextParamsMetaNames.DECODING_METHOD: "sample", + GenTextParamsMetaNames.MAX_NEW_TOKENS: 10, + } + parameters_2 = { + GenTextParamsMetaNames.MIN_NEW_TOKENS: 5, + } + + watsonxllm = WatsonxLLM( + model_id=MODEL_ID, + url="https://us-south.ml.cloud.ibm.com", # type: ignore[arg-type] + project_id=WX_PROJECT_ID, + params=parameters_1, + ) + response = watsonxllm.invoke("What color sunflower is?", params=parameters_2) + print(f"\nResponse: {response}") + assert isinstance(response, str) + assert len(response) > 0 + + def test_watsonxllm_generate() -> None: watsonxllm = WatsonxLLM( model_id=MODEL_ID, @@ -66,6 +105,25 @@ def test_watsonxllm_generate() -> None: assert len(response_text) > 0 +def test_watsonxllm_generate_with_param() -> None: + parameters = { + GenTextParamsMetaNames.DECODING_METHOD: "sample", + GenTextParamsMetaNames.MAX_NEW_TOKENS: 10, + GenTextParamsMetaNames.MIN_NEW_TOKENS: 5, + } + watsonxllm = WatsonxLLM( + model_id=MODEL_ID, + url="https://us-south.ml.cloud.ibm.com", # type: ignore[arg-type] + project_id=WX_PROJECT_ID, + ) + response = watsonxllm.generate(["What color sunflower is?"], params=parameters) + print(f"\nResponse: {response}") + response_text = response.generations[0][0].text + print(f"Response text: {response_text}") + assert isinstance(response, LLMResult) + assert len(response_text) > 0 + + def test_watsonxllm_generate_with_multiple_prompts() -> None: watsonxllm = WatsonxLLM( model_id=MODEL_ID, diff --git a/libs/standard-tests/langchain_standard_tests/integration_tests/vectorstores.py b/libs/standard-tests/langchain_standard_tests/integration_tests/vectorstores.py index d65eb12934947..81c3e82885789 100644 --- a/libs/standard-tests/langchain_standard_tests/integration_tests/vectorstores.py +++ b/libs/standard-tests/langchain_standard_tests/integration_tests/vectorstores.py @@ -46,15 +46,21 @@ def test_vectorstore_is_empty(self, vectorstore: VectorStore) -> None: def test_add_documents(self, vectorstore: VectorStore) -> None: """Test adding documents into the vectorstore.""" - documents = [ + original_documents = [ Document(page_content="foo", metadata={"id": 1}), Document(page_content="bar", metadata={"id": 2}), ] - vectorstore.add_documents(documents) + ids = vectorstore.add_documents(original_documents) documents = vectorstore.similarity_search("bar", k=2) assert documents == [ - Document(page_content="bar", metadata={"id": 2}), + Document(page_content="bar", metadata={"id": 2}, id=ids[1]), + Document(page_content="foo", metadata={"id": 1}, id=ids[0]), + ] + # Verify that the original document object does not get mutated! + # (e.g., an ID is added to the original document object) + assert original_documents == [ Document(page_content="foo", metadata={"id": 1}), + Document(page_content="bar", metadata={"id": 2}), ] def test_vectorstore_still_empty(self, vectorstore: VectorStore) -> None: @@ -71,10 +77,11 @@ def test_deleting_documents(self, vectorstore: VectorStore) -> None: Document(page_content="foo", metadata={"id": 1}), Document(page_content="bar", metadata={"id": 2}), ] - vectorstore.add_documents(documents, ids=["1", "2"]) + ids = vectorstore.add_documents(documents, ids=["1", "2"]) + assert ids == ["1", "2"] vectorstore.delete(["1"]) documents = vectorstore.similarity_search("foo", k=1) - assert documents == [Document(page_content="bar", metadata={"id": 2})] + assert documents == [Document(page_content="bar", metadata={"id": 2}, id="2")] def test_deleting_bulk_documents(self, vectorstore: VectorStore) -> None: """Test that we can delete several documents at once.""" @@ -87,7 +94,7 @@ def test_deleting_bulk_documents(self, vectorstore: VectorStore) -> None: vectorstore.add_documents(documents, ids=["1", "2", "3"]) vectorstore.delete(["1", "2"]) documents = vectorstore.similarity_search("foo", k=1) - assert documents == [Document(page_content="baz", metadata={"id": 3})] + assert documents == [Document(page_content="baz", metadata={"id": 3}, id="3")] def test_delete_missing_content(self, vectorstore: VectorStore) -> None: """Deleting missing content should not raise an exception.""" @@ -106,25 +113,8 @@ def test_add_documents_with_ids_is_idempotent( vectorstore.add_documents(documents, ids=["1", "2"]) documents = vectorstore.similarity_search("bar", k=2) assert documents == [ - Document(page_content="bar", metadata={"id": 2}), - Document(page_content="foo", metadata={"id": 1}), - ] - - def test_add_documents_without_ids_gets_duplicated( - self, vectorstore: VectorStore - ) -> None: - """Adding documents without specifying IDs should duplicate content.""" - documents = [ - Document(page_content="foo", metadata={"id": 1}), - Document(page_content="bar", metadata={"id": 2}), - ] - - vectorstore.add_documents(documents) - vectorstore.add_documents(documents) - documents = vectorstore.similarity_search("bar", k=2) - assert documents == [ - Document(page_content="bar", metadata={"id": 2}), - Document(page_content="bar", metadata={"id": 2}), + Document(page_content="bar", metadata={"id": 2}, id="2"), + Document(page_content="foo", metadata={"id": 1}, id="1"), ] def test_add_documents_by_id_with_mutation(self, vectorstore: VectorStore) -> None: @@ -149,9 +139,11 @@ def test_add_documents_by_id_with_mutation(self, vectorstore: VectorStore) -> No documents = vectorstore.similarity_search("new foo", k=2) assert documents == [ Document( - page_content="new foo", metadata={"id": 1, "some_other_field": "foo"} + id="1", + page_content="new foo", + metadata={"id": 1, "some_other_field": "foo"}, ), - Document(page_content="bar", metadata={"id": 2}), + Document(id="2", page_content="bar", metadata={"id": 2}), ] @@ -190,15 +182,22 @@ async def test_vectorstore_is_empty(self, vectorstore: VectorStore) -> None: async def test_add_documents(self, vectorstore: VectorStore) -> None: """Test adding documents into the vectorstore.""" - documents = [ + original_documents = [ Document(page_content="foo", metadata={"id": 1}), Document(page_content="bar", metadata={"id": 2}), ] - await vectorstore.aadd_documents(documents) + ids = await vectorstore.aadd_documents(original_documents) documents = await vectorstore.asimilarity_search("bar", k=2) assert documents == [ - Document(page_content="bar", metadata={"id": 2}), + Document(page_content="bar", metadata={"id": 2}, id=ids[1]), + Document(page_content="foo", metadata={"id": 1}, id=ids[0]), + ] + + # Verify that the original document object does not get mutated! + # (e.g., an ID is added to the original document object) + assert original_documents == [ Document(page_content="foo", metadata={"id": 1}), + Document(page_content="bar", metadata={"id": 2}), ] async def test_vectorstore_still_empty(self, vectorstore: VectorStore) -> None: @@ -215,10 +214,11 @@ async def test_deleting_documents(self, vectorstore: VectorStore) -> None: Document(page_content="foo", metadata={"id": 1}), Document(page_content="bar", metadata={"id": 2}), ] - await vectorstore.aadd_documents(documents, ids=["1", "2"]) + ids = await vectorstore.aadd_documents(documents, ids=["1", "2"]) + assert ids == ["1", "2"] await vectorstore.adelete(["1"]) documents = await vectorstore.asimilarity_search("foo", k=1) - assert documents == [Document(page_content="bar", metadata={"id": 2})] + assert documents == [Document(page_content="bar", metadata={"id": 2}, id="2")] async def test_deleting_bulk_documents(self, vectorstore: VectorStore) -> None: """Test that we can delete several documents at once.""" @@ -231,7 +231,7 @@ async def test_deleting_bulk_documents(self, vectorstore: VectorStore) -> None: await vectorstore.aadd_documents(documents, ids=["1", "2", "3"]) await vectorstore.adelete(["1", "2"]) documents = await vectorstore.asimilarity_search("foo", k=1) - assert documents == [Document(page_content="baz", metadata={"id": 3})] + assert documents == [Document(page_content="baz", metadata={"id": 3}, id="3")] async def test_delete_missing_content(self, vectorstore: VectorStore) -> None: """Deleting missing content should not raise an exception.""" @@ -250,25 +250,8 @@ async def test_add_documents_with_ids_is_idempotent( await vectorstore.aadd_documents(documents, ids=["1", "2"]) documents = await vectorstore.asimilarity_search("bar", k=2) assert documents == [ - Document(page_content="bar", metadata={"id": 2}), - Document(page_content="foo", metadata={"id": 1}), - ] - - async def test_add_documents_without_ids_gets_duplicated( - self, vectorstore: VectorStore - ) -> None: - """Adding documents without specifying IDs should duplicate content.""" - documents = [ - Document(page_content="foo", metadata={"id": 1}), - Document(page_content="bar", metadata={"id": 2}), - ] - - await vectorstore.aadd_documents(documents) - await vectorstore.aadd_documents(documents) - documents = await vectorstore.asimilarity_search("bar", k=2) - assert documents == [ - Document(page_content="bar", metadata={"id": 2}), - Document(page_content="bar", metadata={"id": 2}), + Document(page_content="bar", metadata={"id": 2}, id="2"), + Document(page_content="foo", metadata={"id": 1}, id="1"), ] async def test_add_documents_by_id_with_mutation( @@ -295,7 +278,9 @@ async def test_add_documents_by_id_with_mutation( documents = await vectorstore.asimilarity_search("new foo", k=2) assert documents == [ Document( - page_content="new foo", metadata={"id": 1, "some_other_field": "foo"} + id="1", + page_content="new foo", + metadata={"id": 1, "some_other_field": "foo"}, ), - Document(page_content="bar", metadata={"id": 2}), + Document(id="2", page_content="bar", metadata={"id": 2}), ]