diff --git a/docs/docs/guides/safety/amazon_comprehend_chain.ipynb b/docs/docs/guides/safety/amazon_comprehend_chain.ipynb index ad6833a7b8def..b4bf7de26c483 100644 --- a/docs/docs/guides/safety/amazon_comprehend_chain.ipynb +++ b/docs/docs/guides/safety/amazon_comprehend_chain.ipynb @@ -7,7 +7,9 @@ "source": [ "# Amazon Comprehend Moderation Chain\n", "\n", - "This notebook shows how to use [Amazon Comprehend](https://aws.amazon.com/comprehend/) to detect and handle `Personally Identifiable Information` (`PII`) and toxicity.\n", + ">[Amazon Comprehend](https://aws.amazon.com/comprehend/) is a natural-language processing (NLP) service that uses machine learning to uncover valuable insights and connections in text.\n", + "\n", + "This notebook shows how to use `Amazon Comprehend` to detect and handle `Personally Identifiable Information` (`PII`) and toxicity.\n", "\n", "## Setting up" ] @@ -1417,7 +1419,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.9.1" + "version": "3.10.12" } }, "nbformat": 4, diff --git a/docs/docs/integrations/chat/bedrock.ipynb b/docs/docs/integrations/chat/bedrock.ipynb index 990cc767689d0..02dfb5b9fbde7 100644 --- a/docs/docs/integrations/chat/bedrock.ipynb +++ b/docs/docs/integrations/chat/bedrock.ipynb @@ -7,7 +7,15 @@ "source": [ "# Bedrock Chat\n", "\n", - "[Amazon Bedrock](https://aws.amazon.com/bedrock/) is a fully managed service that makes FMs from leading AI startups and Amazon available via an API, so you can choose from a wide range of FMs to find the model that is best suited for your use case" + ">[Amazon Bedrock](https://aws.amazon.com/bedrock/) is a fully managed service that offers a choice of \n", + "> high-performing foundation models (FMs) from leading AI companies like `AI21 Labs`, `Anthropic`, `Cohere`, \n", + "> `Meta`, `Stability AI`, and `Amazon` via a single API, along with a broad set of capabilities you need to \n", + "> build generative AI applications with security, privacy, and responsible AI. Using `Amazon Bedrock`, \n", + "> you can easily experiment with and evaluate top FMs for your use case, privately customize them with \n", + "> your data using techniques such as fine-tuning and `Retrieval Augmented Generation` (`RAG`), and build \n", + "> agents that execute tasks using your enterprise systems and data sources. Since `Amazon Bedrock` is \n", + "> serverless, you don't have to manage any infrastructure, and you can securely integrate and deploy \n", + "> generative AI capabilities into your applications using the AWS services you are already familiar with.\n" ] }, { @@ -131,7 +139,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.10.9" + "version": "3.10.12" } }, "nbformat": 4, diff --git a/docs/docs/integrations/document_loaders/amazon_textract.ipynb b/docs/docs/integrations/document_loaders/amazon_textract.ipynb new file mode 100644 index 0000000000000..392d08b3bdc7b --- /dev/null +++ b/docs/docs/integrations/document_loaders/amazon_textract.ipynb @@ -0,0 +1,884 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "1f3cebbe-079a-4bfe-b1a1-07bdac882ce2", + "metadata": {}, + "source": [ + "# Amazon Textract \n", + "\n", + ">[Amazon Textract](https://docs.aws.amazon.com/managedservices/latest/userguide/textract.html) is a machine learning (ML) service that automatically extracts text, handwriting, and data from scanned documents.\n", + ">\n", + ">It goes beyond simple optical character recognition (OCR) to identify, understand, and extract data from forms and tables. Today, many companies manually extract data from scanned documents such as PDFs, images, tables, and forms, or through simple OCR software that requires manual configuration (which often must be updated when the form changes). To overcome these manual and expensive processes, `Textract` uses ML to read and process any type of document, accurately extracting text, handwriting, tables, and other data with no manual effort. \n", + "\n", + "This sample demonstrates the use of `Amazon Textract` in combination with LangChain as a DocumentLoader.\n", + "\n", + "`Textract` supports`PDF`, `TIF`F, `PNG` and `JPEG` format.\n", + "\n", + "`Textract` supports these [document sizes, languages and characters](https://docs.aws.amazon.com/textract/latest/dg/limits-document.html)." + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "a1aa66d4-85f2-42ad-a8d3-de7cea8d6c35", + "metadata": {}, + "outputs": [], + "source": [ + "#!pip install boto3 openai tiktoken python-dotenv" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "e4305a0d-37da-41f9-a52c-7d166d7dbabf", + "metadata": {}, + "outputs": [], + "source": [ + "#!pip install \"amazon-textract-caller>=0.2.0\"" + ] + }, + { + "cell_type": "markdown", + "id": "400b25c6-befa-4730-a201-39ff112c8858", + "metadata": {}, + "source": [ + "## Sample 1\n", + "\n", + "The first example uses a local file, which internally will be send to Amazon Textract sync API [DetectDocumentText](https://docs.aws.amazon.com/textract/latest/dg/API_DetectDocumentText.html). \n", + "\n", + "Local files or URL endpoints like HTTP:// are limited to one page documents for Textract.\n", + "Multi-page documents have to reside on S3. This sample file is a jpeg." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "1becee92-e82f-42d4-9b4e-b23d77cbe88d", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "from langchain.document_loaders import AmazonTextractPDFLoader\n", + "\n", + "loader = AmazonTextractPDFLoader(\"example_data/alejandro_rosalez_sample-small.jpeg\")\n", + "documents = loader.load()" + ] + }, + { + "cell_type": "markdown", + "id": "d566dc56-c9a9-44ec-84fb-a81928f90d40", + "metadata": {}, + "source": [ + "Output from the file" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "id": "1272ce8c-d298-4059-ac0a-780bf5f82302", + "metadata": { + "tags": [] + }, + "outputs": [ + { + "data": { + "text/plain": [ + "[Document(page_content='Patient Information First Name: ALEJANDRO Last Name: ROSALEZ Date of Birth: 10/10/1982 Sex: M Marital Status: MARRIED Email Address: Address: 123 ANY STREET City: ANYTOWN State: CA Zip Code: 12345 Phone: 646-555-0111 Emergency Contact 1: First Name: CARLOS Last Name: SALAZAR Phone: 212-555-0150 Relationship to Patient: BROTHER Emergency Contact 2: First Name: JANE Last Name: DOE Phone: 650-555-0123 Relationship FRIEND to Patient: Did you feel fever or feverish lately? Yes No Are you having shortness of breath? Yes No Do you have a cough? Yes No Did you experience loss of taste or smell? Yes No Where you in contact with any confirmed COVID-19 positive patients? Yes No Did you travel in the past 14 days to any regions affected by COVID-19? Yes No Patient Information First Name: ALEJANDRO Last Name: ROSALEZ Date of Birth: 10/10/1982 Sex: M Marital Status: MARRIED Email Address: Address: 123 ANY STREET City: ANYTOWN State: CA Zip Code: 12345 Phone: 646-555-0111 Emergency Contact 1: First Name: CARLOS Last Name: SALAZAR Phone: 212-555-0150 Relationship to Patient: BROTHER Emergency Contact 2: First Name: JANE Last Name: DOE Phone: 650-555-0123 Relationship FRIEND to Patient: Did you feel fever or feverish lately? Yes No Are you having shortness of breath? Yes No Do you have a cough? Yes No Did you experience loss of taste or smell? Yes No Where you in contact with any confirmed COVID-19 positive patients? Yes No Did you travel in the past 14 days to any regions affected by COVID-19? Yes No ', metadata={'source': 'example_data/alejandro_rosalez_sample-small.jpeg', 'page': 1})]" + ] + }, + "execution_count": 10, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "documents" + ] + }, + { + "cell_type": "markdown", + "id": "4cf7f19c-3635-453a-9c76-4baf98b8d7f4", + "metadata": {}, + "source": [ + "## Sample 2\n", + "The next sample loads a file from an HTTPS endpoint. \n", + "It has to be single page, as Amazon Textract requires all multi-page documents to be stored on S3." + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "10374bfb-b325-451f-8bd0-c686710ab68c", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "from langchain.document_loaders import AmazonTextractPDFLoader\n", + "\n", + "loader = AmazonTextractPDFLoader(\n", + " \"https://amazon-textract-public-content.s3.us-east-2.amazonaws.com/langchain/alejandro_rosalez_sample_1.jpg\"\n", + ")\n", + "documents = loader.load()" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "id": "16a2b6a3-7514-4c2c-a427-6847169af473", + "metadata": { + "tags": [] + }, + "outputs": [ + { + "data": { + "text/plain": [ + "[Document(page_content='Patient Information First Name: ALEJANDRO Last Name: ROSALEZ Date of Birth: 10/10/1982 Sex: M Marital Status: MARRIED Email Address: Address: 123 ANY STREET City: ANYTOWN State: CA Zip Code: 12345 Phone: 646-555-0111 Emergency Contact 1: First Name: CARLOS Last Name: SALAZAR Phone: 212-555-0150 Relationship to Patient: BROTHER Emergency Contact 2: First Name: JANE Last Name: DOE Phone: 650-555-0123 Relationship FRIEND to Patient: Did you feel fever or feverish lately? Yes No Are you having shortness of breath? Yes No Do you have a cough? Yes No Did you experience loss of taste or smell? Yes No Where you in contact with any confirmed COVID-19 positive patients? Yes No Did you travel in the past 14 days to any regions affected by COVID-19? Yes No Patient Information First Name: ALEJANDRO Last Name: ROSALEZ Date of Birth: 10/10/1982 Sex: M Marital Status: MARRIED Email Address: Address: 123 ANY STREET City: ANYTOWN State: CA Zip Code: 12345 Phone: 646-555-0111 Emergency Contact 1: First Name: CARLOS Last Name: SALAZAR Phone: 212-555-0150 Relationship to Patient: BROTHER Emergency Contact 2: First Name: JANE Last Name: DOE Phone: 650-555-0123 Relationship FRIEND to Patient: Did you feel fever or feverish lately? Yes No Are you having shortness of breath? Yes No Do you have a cough? Yes No Did you experience loss of taste or smell? Yes No Where you in contact with any confirmed COVID-19 positive patients? Yes No Did you travel in the past 14 days to any regions affected by COVID-19? Yes No ', metadata={'source': 'example_data/alejandro_rosalez_sample-small.jpeg', 'page': 1})]" + ] + }, + "execution_count": 11, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "documents" + ] + }, + { + "cell_type": "markdown", + "id": "3a9cd8ec-e663-4dc7-9db1-d2f575253141", + "metadata": {}, + "source": [ + "## Sample 3\n", + "\n", + "Processing a multi-page document requires the document to be on S3. The sample document resides in a bucket in us-east-2 and Textract needs to be called in that same region to be successful, so we set the region_name on the client and pass that in to the loader to ensure Textract is called from us-east-2. You could also to have your notebook running in us-east-2, setting the AWS_DEFAULT_REGION set to us-east-2 or when running in a different environment, pass in a boto3 Textract client with that region name like in the cell below." + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "id": "8185e3e6-9599-4a47-8969-d6dcef3e6404", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "import boto3\n", + "\n", + "textract_client = boto3.client(\"textract\", region_name=\"us-east-2\")\n", + "\n", + "file_path = \"s3://amazon-textract-public-content/langchain/layout-parser-paper.pdf\"\n", + "loader = AmazonTextractPDFLoader(file_path, client=textract_client)\n", + "documents = loader.load()" + ] + }, + { + "cell_type": "markdown", + "id": "b8901eec-070d-4fd6-9d65-52211d332441", + "metadata": {}, + "source": [ + "Now getting the number of pages to validate the response (printing out the full response would be quite long...). We expect 16 pages." + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "id": "b23c01c8-cf69-4fe2-8141-4621edb7d79c", + "metadata": { + "tags": [] + }, + "outputs": [ + { + "data": { + "text/plain": [ + "16" + ] + }, + "execution_count": 13, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "len(documents)" + ] + }, + { + "cell_type": "markdown", + "id": "b3e41b4d-b159-4274-89be-80d8159134ef", + "metadata": {}, + "source": [ + "## Using the AmazonTextractPDFLoader in an LangChain chain (e. g. OpenAI)\n", + "\n", + "The AmazonTextractPDFLoader can be used in a chain the same way the other loaders are used.\n", + "Textract itself does have a [Query feature](https://docs.aws.amazon.com/textract/latest/dg/API_Query.html), which offers similar functionality to the QA chain in this sample, which is worth checking out as well." + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "id": "53c47b24-cc06-4256-9e5b-a82fc80bc55d", + "metadata": {}, + "outputs": [], + "source": [ + "# You can store your OPENAI_API_KEY in a .env file as well\n", + "# import os\n", + "# from dotenv import load_dotenv\n", + "\n", + "# load_dotenv()" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "id": "a9ae004c-246c-4c7f-8458-191cd7424a9b", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# Or set the OpenAI key in the environment directly\n", + "import os\n", + "\n", + "os.environ[\"OPENAI_API_KEY\"] = \"your-OpenAI-API-key\"" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "id": "d52b089c-10ca-45fb-8669-8a1c5fee10d5", + "metadata": { + "tags": [] + }, + "outputs": [ + { + "data": { + "text/plain": [ + "' The authors are Zejiang Shen, Ruochen Zhang, Melissa Dell, Benjamin Charles Germain Lee, Jacob Carlson, Weining Li, Gardner, M., Grus, J., Neumann, M., Tafjord, O., Dasigi, P., Liu, N., Peters, M., Schmitz, M., Zettlemoyer, L., Lukasz Garncarek, Powalski, R., Stanislawek, T., Topolski, B., Halama, P., Gralinski, F., Graves, A., Fernández, S., Gomez, F., Schmidhuber, J., Harley, A.W., Ufkes, A., Derpanis, K.G., He, K., Gkioxari, G., Dollár, P., Girshick, R., He, K., Zhang, X., Ren, S., Sun, J., Kay, A., Lamiroy, B., Lopresti, D., Mears, J., Jakeway, E., Ferriter, M., Adams, C., Yarasavage, N., Thomas, D., Zwaard, K., Li, M., Cui, L., Huang,'" + ] + }, + "execution_count": 16, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "from langchain.chains.question_answering import load_qa_chain\n", + "from langchain.llms import OpenAI\n", + "\n", + "chain = load_qa_chain(llm=OpenAI(), chain_type=\"map_reduce\")\n", + "query = [\"Who are the autors?\"]\n", + "\n", + "chain.run(input_documents=documents, question=query)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "1a09d18b-ab7b-468e-ae66-f92abf666b9b", + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "availableInstances": [ + { + "_defaultOrder": 0, + "_isFastLaunch": true, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 4, + "name": "ml.t3.medium", + "vcpuNum": 2 + }, + { + "_defaultOrder": 1, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 8, + "name": "ml.t3.large", + "vcpuNum": 2 + }, + { + "_defaultOrder": 2, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 16, + "name": "ml.t3.xlarge", + "vcpuNum": 4 + }, + { + "_defaultOrder": 3, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 32, + "name": "ml.t3.2xlarge", + "vcpuNum": 8 + }, + { + "_defaultOrder": 4, + "_isFastLaunch": true, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 8, + "name": "ml.m5.large", + "vcpuNum": 2 + }, + { + "_defaultOrder": 5, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 16, + "name": "ml.m5.xlarge", + "vcpuNum": 4 + }, + { + "_defaultOrder": 6, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 32, + "name": "ml.m5.2xlarge", + "vcpuNum": 8 + }, + { + "_defaultOrder": 7, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 64, + "name": "ml.m5.4xlarge", + "vcpuNum": 16 + }, + { + "_defaultOrder": 8, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 128, + "name": "ml.m5.8xlarge", + "vcpuNum": 32 + }, + { + "_defaultOrder": 9, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 192, + "name": "ml.m5.12xlarge", + "vcpuNum": 48 + }, + { + "_defaultOrder": 10, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 256, + "name": "ml.m5.16xlarge", + "vcpuNum": 64 + }, + { + "_defaultOrder": 11, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 384, + "name": "ml.m5.24xlarge", + "vcpuNum": 96 + }, + { + "_defaultOrder": 12, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 8, + "name": "ml.m5d.large", + "vcpuNum": 2 + }, + { + "_defaultOrder": 13, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 16, + "name": "ml.m5d.xlarge", + "vcpuNum": 4 + }, + { + "_defaultOrder": 14, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 32, + "name": "ml.m5d.2xlarge", + "vcpuNum": 8 + }, + { + "_defaultOrder": 15, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 64, + "name": "ml.m5d.4xlarge", + "vcpuNum": 16 + }, + { + "_defaultOrder": 16, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 128, + "name": "ml.m5d.8xlarge", + "vcpuNum": 32 + }, + { + "_defaultOrder": 17, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 192, + "name": "ml.m5d.12xlarge", + "vcpuNum": 48 + }, + { + "_defaultOrder": 18, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 256, + "name": "ml.m5d.16xlarge", + "vcpuNum": 64 + }, + { + "_defaultOrder": 19, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 384, + "name": "ml.m5d.24xlarge", + "vcpuNum": 96 + }, + { + "_defaultOrder": 20, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": true, + "memoryGiB": 0, + "name": "ml.geospatial.interactive", + "supportedImageNames": [ + "sagemaker-geospatial-v1-0" + ], + "vcpuNum": 0 + }, + { + "_defaultOrder": 21, + "_isFastLaunch": true, + "category": "Compute optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 4, + "name": "ml.c5.large", + "vcpuNum": 2 + }, + { + "_defaultOrder": 22, + "_isFastLaunch": false, + "category": "Compute optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 8, + "name": "ml.c5.xlarge", + "vcpuNum": 4 + }, + { + "_defaultOrder": 23, + "_isFastLaunch": false, + "category": "Compute optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 16, + "name": "ml.c5.2xlarge", + "vcpuNum": 8 + }, + { + "_defaultOrder": 24, + "_isFastLaunch": false, + "category": "Compute optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 32, + "name": "ml.c5.4xlarge", + "vcpuNum": 16 + }, + { + "_defaultOrder": 25, + "_isFastLaunch": false, + "category": "Compute optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 72, + "name": "ml.c5.9xlarge", + "vcpuNum": 36 + }, + { + "_defaultOrder": 26, + "_isFastLaunch": false, + "category": "Compute optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 96, + "name": "ml.c5.12xlarge", + "vcpuNum": 48 + }, + { + "_defaultOrder": 27, + "_isFastLaunch": false, + "category": "Compute optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 144, + "name": "ml.c5.18xlarge", + "vcpuNum": 72 + }, + { + "_defaultOrder": 28, + "_isFastLaunch": false, + "category": "Compute optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 192, + "name": "ml.c5.24xlarge", + "vcpuNum": 96 + }, + { + "_defaultOrder": 29, + "_isFastLaunch": true, + "category": "Accelerated computing", + "gpuNum": 1, + "hideHardwareSpecs": false, + "memoryGiB": 16, + "name": "ml.g4dn.xlarge", + "vcpuNum": 4 + }, + { + "_defaultOrder": 30, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 1, + "hideHardwareSpecs": false, + "memoryGiB": 32, + "name": "ml.g4dn.2xlarge", + "vcpuNum": 8 + }, + { + "_defaultOrder": 31, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 1, + "hideHardwareSpecs": false, + "memoryGiB": 64, + "name": "ml.g4dn.4xlarge", + "vcpuNum": 16 + }, + { + "_defaultOrder": 32, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 1, + "hideHardwareSpecs": false, + "memoryGiB": 128, + "name": "ml.g4dn.8xlarge", + "vcpuNum": 32 + }, + { + "_defaultOrder": 33, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 4, + "hideHardwareSpecs": false, + "memoryGiB": 192, + "name": "ml.g4dn.12xlarge", + "vcpuNum": 48 + }, + { + "_defaultOrder": 34, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 1, + "hideHardwareSpecs": false, + "memoryGiB": 256, + "name": "ml.g4dn.16xlarge", + "vcpuNum": 64 + }, + { + "_defaultOrder": 35, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 1, + "hideHardwareSpecs": false, + "memoryGiB": 61, + "name": "ml.p3.2xlarge", + "vcpuNum": 8 + }, + { + "_defaultOrder": 36, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 4, + "hideHardwareSpecs": false, + "memoryGiB": 244, + "name": "ml.p3.8xlarge", + "vcpuNum": 32 + }, + { + "_defaultOrder": 37, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 8, + "hideHardwareSpecs": false, + "memoryGiB": 488, + "name": "ml.p3.16xlarge", + "vcpuNum": 64 + }, + { + "_defaultOrder": 38, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 8, + "hideHardwareSpecs": false, + "memoryGiB": 768, + "name": "ml.p3dn.24xlarge", + "vcpuNum": 96 + }, + { + "_defaultOrder": 39, + "_isFastLaunch": false, + "category": "Memory Optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 16, + "name": "ml.r5.large", + "vcpuNum": 2 + }, + { + "_defaultOrder": 40, + "_isFastLaunch": false, + "category": "Memory Optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 32, + "name": "ml.r5.xlarge", + "vcpuNum": 4 + }, + { + "_defaultOrder": 41, + "_isFastLaunch": false, + "category": "Memory Optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 64, + "name": "ml.r5.2xlarge", + "vcpuNum": 8 + }, + { + "_defaultOrder": 42, + "_isFastLaunch": false, + "category": "Memory Optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 128, + "name": "ml.r5.4xlarge", + "vcpuNum": 16 + }, + { + "_defaultOrder": 43, + "_isFastLaunch": false, + "category": "Memory Optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 256, + "name": "ml.r5.8xlarge", + "vcpuNum": 32 + }, + { + "_defaultOrder": 44, + "_isFastLaunch": false, + "category": "Memory Optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 384, + "name": "ml.r5.12xlarge", + "vcpuNum": 48 + }, + { + "_defaultOrder": 45, + "_isFastLaunch": false, + "category": "Memory Optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 512, + "name": "ml.r5.16xlarge", + "vcpuNum": 64 + }, + { + "_defaultOrder": 46, + "_isFastLaunch": false, + "category": "Memory Optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 768, + "name": "ml.r5.24xlarge", + "vcpuNum": 96 + }, + { + "_defaultOrder": 47, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 1, + "hideHardwareSpecs": false, + "memoryGiB": 16, + "name": "ml.g5.xlarge", + "vcpuNum": 4 + }, + { + "_defaultOrder": 48, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 1, + "hideHardwareSpecs": false, + "memoryGiB": 32, + "name": "ml.g5.2xlarge", + "vcpuNum": 8 + }, + { + "_defaultOrder": 49, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 1, + "hideHardwareSpecs": false, + "memoryGiB": 64, + "name": "ml.g5.4xlarge", + "vcpuNum": 16 + }, + { + "_defaultOrder": 50, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 1, + "hideHardwareSpecs": false, + "memoryGiB": 128, + "name": "ml.g5.8xlarge", + "vcpuNum": 32 + }, + { + "_defaultOrder": 51, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 1, + "hideHardwareSpecs": false, + "memoryGiB": 256, + "name": "ml.g5.16xlarge", + "vcpuNum": 64 + }, + { + "_defaultOrder": 52, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 4, + "hideHardwareSpecs": false, + "memoryGiB": 192, + "name": "ml.g5.12xlarge", + "vcpuNum": 48 + }, + { + "_defaultOrder": 53, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 4, + "hideHardwareSpecs": false, + "memoryGiB": 384, + "name": "ml.g5.24xlarge", + "vcpuNum": 96 + }, + { + "_defaultOrder": 54, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 8, + "hideHardwareSpecs": false, + "memoryGiB": 768, + "name": "ml.g5.48xlarge", + "vcpuNum": 192 + }, + { + "_defaultOrder": 55, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 8, + "hideHardwareSpecs": false, + "memoryGiB": 1152, + "name": "ml.p4d.24xlarge", + "vcpuNum": 96 + }, + { + "_defaultOrder": 56, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 8, + "hideHardwareSpecs": false, + "memoryGiB": 1152, + "name": "ml.p4de.24xlarge", + "vcpuNum": 96 + } + ], + "instance_type": "ml.t3.medium", + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.12" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/docs/docs/integrations/document_loaders/pdf-amazonTextractPDFLoader.ipynb b/docs/docs/integrations/document_loaders/pdf-amazonTextractPDFLoader.ipynb deleted file mode 100644 index 87c6cf2c3946f..0000000000000 --- a/docs/docs/integrations/document_loaders/pdf-amazonTextractPDFLoader.ipynb +++ /dev/null @@ -1,1020 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "id": "1f3cebbe-079a-4bfe-b1a1-07bdac882ce2", - "metadata": {}, - "source": [ - "# Amazon Textract \n", - "\n", - "Amazon Textract is a machine learning (ML) service that automatically extracts text, handwriting, and data from scanned documents. It goes beyond simple optical character recognition (OCR) to identify, understand, and extract data from forms and tables. Today, many companies manually extract data from scanned documents such as PDFs, images, tables, and forms, or through simple OCR software that requires manual configuration (which often must be updated when the form changes). To overcome these manual and expensive processes, Textract uses ML to read and process any type of document, accurately extracting text, handwriting, tables, and other data with no manual effort. You can quickly automate document processing and act on the information extracted, whether you’re automating loans processing or extracting information from invoices and receipts. Textract can extract the data in minutes instead of hours or days.\n", - "\n", - "This sample demonstrates the use of Amazon Textract in combination with LangChain as a DocumentLoader.\n", - "\n", - "Textract supports PDF, TIFF, PNG and JPEG format.\n", - "\n", - "Check https://docs.aws.amazon.com/textract/latest/dg/limits-document.html for supported document sizes, languages and characters." - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "id": "c049beaf-f904-4ce6-91ca-805da62084c2", - "metadata": { - "tags": [] - }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\u001b[33mDEPRECATION: amazon-textract-pipeline-pagedimensions 0.0.8 has a non-standard dependency specifier Pillow>=9.4.*. pip 23.3 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of amazon-textract-pipeline-pagedimensions or contact the author to suggest that they release a version with a conforming dependency specifiers. Discussion can be found at https://github.com/pypa/pip/issues/12063\u001b[0m\u001b[33m\n", - "\u001b[0m\u001b[33mDEPRECATION: amazon-textract-pipeline-pagedimensions 0.0.8 has a non-standard dependency specifier pypdf>=2.5.*. pip 23.3 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of amazon-textract-pipeline-pagedimensions or contact the author to suggest that they release a version with a conforming dependency specifiers. Discussion can be found at https://github.com/pypa/pip/issues/12063\u001b[0m\u001b[33m\n", - "\u001b[0m\n", - "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m23.2\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m23.3\u001b[0m\n", - "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpython -m pip install --upgrade pip\u001b[0m\n", - "Obtaining file:///Users/schadem/code/github/schadem/langchain/libs/langchain\n", - " Installing build dependencies ... \u001b[?25ldone\n", - "\u001b[?25h Checking if build backend supports build_editable ... \u001b[?25ldone\n", - "\u001b[?25h Getting requirements to build editable ... \u001b[?25ldone\n", - "\u001b[?25h Preparing editable metadata (pyproject.toml) ... \u001b[?25ldone\n", - "\u001b[?25hRequirement already satisfied: PyYAML>=5.3 in /Users/schadem/.pyenv/versions/3.11.1/envs/langchain/lib/python3.11/site-packages (from langchain==0.0.267) (6.0.1)\n", - "Requirement already satisfied: SQLAlchemy<3,>=1.4 in /Users/schadem/.pyenv/versions/3.11.1/envs/langchain/lib/python3.11/site-packages (from langchain==0.0.267) (2.0.22)\n", - "Requirement already satisfied: aiohttp<4.0.0,>=3.8.3 in /Users/schadem/.pyenv/versions/3.11.1/envs/langchain/lib/python3.11/site-packages (from langchain==0.0.267) (3.8.6)\n", - "Requirement already satisfied: amazon-textract-textractor<2 in /Users/schadem/.pyenv/versions/3.11.1/envs/langchain/lib/python3.11/site-packages (from langchain==0.0.267) (1.4.1)\n", - "Collecting dataclasses-json<0.6.0,>=0.5.7 (from langchain==0.0.267)\n", - " Obtaining dependency information for dataclasses-json<0.6.0,>=0.5.7 from https://files.pythonhosted.org/packages/97/5f/e7cc90f36152810cab08b6c9c1125e8bcb9d76f8b3018d101b5f877b386c/dataclasses_json-0.5.14-py3-none-any.whl.metadata\n", - " Downloading dataclasses_json-0.5.14-py3-none-any.whl.metadata (22 kB)\n", - "Requirement already satisfied: langsmith<0.1.0,>=0.0.21 in /Users/schadem/.pyenv/versions/3.11.1/envs/langchain/lib/python3.11/site-packages (from langchain==0.0.267) (0.0.44)\n", - "Requirement already satisfied: numexpr<3.0.0,>=2.8.4 in /Users/schadem/.pyenv/versions/3.11.1/envs/langchain/lib/python3.11/site-packages (from langchain==0.0.267) (2.8.7)\n", - "Requirement already satisfied: numpy<2,>=1 in /Users/schadem/.pyenv/versions/3.11.1/envs/langchain/lib/python3.11/site-packages (from langchain==0.0.267) (1.24.4)\n", - "Requirement already satisfied: pydantic<3,>=1 in /Users/schadem/.pyenv/versions/3.11.1/envs/langchain/lib/python3.11/site-packages (from langchain==0.0.267) (1.10.13)\n", - "Requirement already satisfied: requests<3,>=2 in /Users/schadem/.pyenv/versions/3.11.1/envs/langchain/lib/python3.11/site-packages (from langchain==0.0.267) (2.31.0)\n", - "Requirement already satisfied: tenacity<9.0.0,>=8.1.0 in /Users/schadem/.pyenv/versions/3.11.1/envs/langchain/lib/python3.11/site-packages (from langchain==0.0.267) (8.2.3)\n", - "Requirement already satisfied: attrs>=17.3.0 in /Users/schadem/.pyenv/versions/3.11.1/envs/langchain/lib/python3.11/site-packages (from aiohttp<4.0.0,>=3.8.3->langchain==0.0.267) (23.1.0)\n", - "Requirement already satisfied: charset-normalizer<4.0,>=2.0 in /Users/schadem/.pyenv/versions/3.11.1/envs/langchain/lib/python3.11/site-packages (from aiohttp<4.0.0,>=3.8.3->langchain==0.0.267) (3.3.0)\n", - "Requirement already satisfied: multidict<7.0,>=4.5 in /Users/schadem/.pyenv/versions/3.11.1/envs/langchain/lib/python3.11/site-packages (from aiohttp<4.0.0,>=3.8.3->langchain==0.0.267) (6.0.4)\n", - "Requirement already satisfied: async-timeout<5.0,>=4.0.0a3 in /Users/schadem/.pyenv/versions/3.11.1/envs/langchain/lib/python3.11/site-packages (from aiohttp<4.0.0,>=3.8.3->langchain==0.0.267) (4.0.3)\n", - "Requirement already satisfied: yarl<2.0,>=1.0 in /Users/schadem/.pyenv/versions/3.11.1/envs/langchain/lib/python3.11/site-packages (from aiohttp<4.0.0,>=3.8.3->langchain==0.0.267) (1.9.2)\n", - "Requirement already satisfied: frozenlist>=1.1.1 in /Users/schadem/.pyenv/versions/3.11.1/envs/langchain/lib/python3.11/site-packages (from aiohttp<4.0.0,>=3.8.3->langchain==0.0.267) (1.4.0)\n", - "Requirement already satisfied: aiosignal>=1.1.2 in /Users/schadem/.pyenv/versions/3.11.1/envs/langchain/lib/python3.11/site-packages (from aiohttp<4.0.0,>=3.8.3->langchain==0.0.267) (1.3.1)\n", - "Requirement already satisfied: Pillow in /Users/schadem/.pyenv/versions/3.11.1/envs/langchain/lib/python3.11/site-packages (from amazon-textract-textractor<2->langchain==0.0.267) (10.1.0)\n", - "Requirement already satisfied: XlsxWriter<3.1,>=3.0 in /Users/schadem/.pyenv/versions/3.11.1/envs/langchain/lib/python3.11/site-packages (from amazon-textract-textractor<2->langchain==0.0.267) (3.0.9)\n", - "Collecting amazon-textract-caller<0.1.0,>=0.0.27 (from amazon-textract-textractor<2->langchain==0.0.267)\n", - " Using cached amazon_textract_caller-0.0.29-py2.py3-none-any.whl (13 kB)\n", - "Requirement already satisfied: amazon-textract-pipeline-pagedimensions in /Users/schadem/.pyenv/versions/3.11.1/envs/langchain/lib/python3.11/site-packages (from amazon-textract-textractor<2->langchain==0.0.267) (0.0.8)\n", - "Requirement already satisfied: amazon-textract-response-parser<0.2.0,>=0.1.45 in /Users/schadem/.pyenv/versions/3.11.1/envs/langchain/lib/python3.11/site-packages (from amazon-textract-textractor<2->langchain==0.0.267) (0.1.48)\n", - "Requirement already satisfied: editdistance==0.6.2 in /Users/schadem/.pyenv/versions/3.11.1/envs/langchain/lib/python3.11/site-packages (from amazon-textract-textractor<2->langchain==0.0.267) (0.6.2)\n", - "Requirement already satisfied: tabulate<0.10,>=0.9 in /Users/schadem/.pyenv/versions/3.11.1/envs/langchain/lib/python3.11/site-packages (from amazon-textract-textractor<2->langchain==0.0.267) (0.9.0)\n", - "Requirement already satisfied: marshmallow<4.0.0,>=3.18.0 in /Users/schadem/.pyenv/versions/3.11.1/envs/langchain/lib/python3.11/site-packages (from dataclasses-json<0.6.0,>=0.5.7->langchain==0.0.267) (3.20.1)\n", - "Requirement already satisfied: typing-inspect<1,>=0.4.0 in /Users/schadem/.pyenv/versions/3.11.1/envs/langchain/lib/python3.11/site-packages (from dataclasses-json<0.6.0,>=0.5.7->langchain==0.0.267) (0.9.0)\n", - "Requirement already satisfied: typing-extensions>=4.2.0 in /Users/schadem/.pyenv/versions/3.11.1/envs/langchain/lib/python3.11/site-packages (from pydantic<3,>=1->langchain==0.0.267) (4.8.0)\n", - "Requirement already satisfied: idna<4,>=2.5 in /Users/schadem/.pyenv/versions/3.11.1/envs/langchain/lib/python3.11/site-packages (from requests<3,>=2->langchain==0.0.267) (3.4)\n", - "Requirement already satisfied: urllib3<3,>=1.21.1 in /Users/schadem/.pyenv/versions/3.11.1/envs/langchain/lib/python3.11/site-packages (from requests<3,>=2->langchain==0.0.267) (1.26.18)\n", - "Requirement already satisfied: certifi>=2017.4.17 in /Users/schadem/.pyenv/versions/3.11.1/envs/langchain/lib/python3.11/site-packages (from requests<3,>=2->langchain==0.0.267) (2023.7.22)\n", - "Requirement already satisfied: boto3>=1.26.35 in /Users/schadem/.pyenv/versions/3.11.1/envs/langchain/lib/python3.11/site-packages (from amazon-textract-caller<0.1.0,>=0.0.27->amazon-textract-textractor<2->langchain==0.0.267) (1.28.67)\n", - "Requirement already satisfied: botocore in /Users/schadem/.pyenv/versions/3.11.1/envs/langchain/lib/python3.11/site-packages (from amazon-textract-caller<0.1.0,>=0.0.27->amazon-textract-textractor<2->langchain==0.0.267) (1.31.67)\n", - "Requirement already satisfied: packaging>=17.0 in /Users/schadem/.pyenv/versions/3.11.1/envs/langchain/lib/python3.11/site-packages (from marshmallow<4.0.0,>=3.18.0->dataclasses-json<0.6.0,>=0.5.7->langchain==0.0.267) (23.2)\n", - "Requirement already satisfied: mypy-extensions>=0.3.0 in /Users/schadem/.pyenv/versions/3.11.1/envs/langchain/lib/python3.11/site-packages (from typing-inspect<1,>=0.4.0->dataclasses-json<0.6.0,>=0.5.7->langchain==0.0.267) (1.0.0)\n", - "Requirement already satisfied: pypdf>=2.5.* in /Users/schadem/.pyenv/versions/3.11.1/envs/langchain/lib/python3.11/site-packages (from amazon-textract-pipeline-pagedimensions->amazon-textract-textractor<2->langchain==0.0.267) (3.16.4)\n", - "Requirement already satisfied: jmespath<2.0.0,>=0.7.1 in /Users/schadem/.pyenv/versions/3.11.1/envs/langchain/lib/python3.11/site-packages (from boto3>=1.26.35->amazon-textract-caller<0.1.0,>=0.0.27->amazon-textract-textractor<2->langchain==0.0.267) (1.0.1)\n", - "Requirement already satisfied: s3transfer<0.8.0,>=0.7.0 in /Users/schadem/.pyenv/versions/3.11.1/envs/langchain/lib/python3.11/site-packages (from boto3>=1.26.35->amazon-textract-caller<0.1.0,>=0.0.27->amazon-textract-textractor<2->langchain==0.0.267) (0.7.0)\n", - "Requirement already satisfied: python-dateutil<3.0.0,>=2.1 in /Users/schadem/.pyenv/versions/3.11.1/envs/langchain/lib/python3.11/site-packages (from botocore->amazon-textract-caller<0.1.0,>=0.0.27->amazon-textract-textractor<2->langchain==0.0.267) (2.8.2)\n", - "Requirement already satisfied: six>=1.5 in /Users/schadem/.pyenv/versions/3.11.1/envs/langchain/lib/python3.11/site-packages (from python-dateutil<3.0.0,>=2.1->botocore->amazon-textract-caller<0.1.0,>=0.0.27->amazon-textract-textractor<2->langchain==0.0.267) (1.16.0)\n", - "Downloading dataclasses_json-0.5.14-py3-none-any.whl (26 kB)\n", - "Building wheels for collected packages: langchain\n", - " Building editable for langchain (pyproject.toml) ... \u001b[?25ldone\n", - "\u001b[?25h Created wheel for langchain: filename=langchain-0.0.267-py3-none-any.whl size=5553 sha256=daaf68d6658b27d69a4a092aa0a39e31f32b96868ef195102d2a17cf119f9d86\n", - " Stored in directory: /private/var/folders/s4/y_t_mj094c95t80n023c9wym0000gr/T/pip-ephem-wheel-cache-v1ynlirx/wheels/9f/73/28/b1d250633de6bd5759f959e16889c6c841dd0e0ffb6474185a\n", - "Successfully built langchain\n", - "\u001b[33mDEPRECATION: amazon-textract-pipeline-pagedimensions 0.0.8 has a non-standard dependency specifier Pillow>=9.4.*. pip 23.3 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of amazon-textract-pipeline-pagedimensions or contact the author to suggest that they release a version with a conforming dependency specifiers. Discussion can be found at https://github.com/pypa/pip/issues/12063\u001b[0m\u001b[33m\n", - "\u001b[0m\u001b[33mDEPRECATION: amazon-textract-pipeline-pagedimensions 0.0.8 has a non-standard dependency specifier pypdf>=2.5.*. pip 23.3 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of amazon-textract-pipeline-pagedimensions or contact the author to suggest that they release a version with a conforming dependency specifiers. Discussion can be found at https://github.com/pypa/pip/issues/12063\u001b[0m\u001b[33m\n", - "\u001b[0mInstalling collected packages: dataclasses-json, amazon-textract-caller, langchain\n", - " Attempting uninstall: dataclasses-json\n", - " Found existing installation: dataclasses-json 0.6.1\n", - " Uninstalling dataclasses-json-0.6.1:\n", - " Successfully uninstalled dataclasses-json-0.6.1\n", - " Attempting uninstall: amazon-textract-caller\n", - " Found existing installation: amazon-textract-caller 0.2.0\n", - " Uninstalling amazon-textract-caller-0.2.0:\n", - " Successfully uninstalled amazon-textract-caller-0.2.0\n", - " Attempting uninstall: langchain\n", - " Found existing installation: langchain 0.0.319\n", - " Uninstalling langchain-0.0.319:\n", - " Successfully uninstalled langchain-0.0.319\n", - "Successfully installed amazon-textract-caller-0.0.29 dataclasses-json-0.5.14 langchain-0.0.267\n", - "\n", - "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m23.2\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m23.3\u001b[0m\n", - "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpython -m pip install --upgrade pip\u001b[0m\n" - ] - } - ], - "source": [ - "# !pip install langchain boto3 openai tiktoken python-dotenv -q\n", - "!pip install boto3 openai tiktoken python-dotenv -q\n", - "!pip install -e /Users/schadem/code/github/schadem/langchain/libs/langchain" - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "id": "e4305a0d-37da-41f9-a52c-7d166d7dbabf", - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Collecting amazon-textract-caller>=0.2.0\n", - " Obtaining dependency information for amazon-textract-caller>=0.2.0 from https://files.pythonhosted.org/packages/35/42/17daacf400060ee1f768553980b7bd6bb77d5b80bcb8a82d8a9665e5bb9b/amazon_textract_caller-0.2.0-py2.py3-none-any.whl.metadata\n", - " Using cached amazon_textract_caller-0.2.0-py2.py3-none-any.whl.metadata (7.1 kB)\n", - "Requirement already satisfied: boto3>=1.26.35 in /Users/schadem/.pyenv/versions/3.11.1/envs/langchain/lib/python3.11/site-packages (from amazon-textract-caller>=0.2.0) (1.28.67)\n", - "Requirement already satisfied: botocore in /Users/schadem/.pyenv/versions/3.11.1/envs/langchain/lib/python3.11/site-packages (from amazon-textract-caller>=0.2.0) (1.31.67)\n", - "Requirement already satisfied: amazon-textract-response-parser>=0.1.39 in /Users/schadem/.pyenv/versions/3.11.1/envs/langchain/lib/python3.11/site-packages (from amazon-textract-caller>=0.2.0) (0.1.48)\n", - "Requirement already satisfied: marshmallow<4,>=3.14 in /Users/schadem/.pyenv/versions/3.11.1/envs/langchain/lib/python3.11/site-packages (from amazon-textract-response-parser>=0.1.39->amazon-textract-caller>=0.2.0) (3.20.1)\n", - "Requirement already satisfied: jmespath<2.0.0,>=0.7.1 in /Users/schadem/.pyenv/versions/3.11.1/envs/langchain/lib/python3.11/site-packages (from boto3>=1.26.35->amazon-textract-caller>=0.2.0) (1.0.1)\n", - "Requirement already satisfied: s3transfer<0.8.0,>=0.7.0 in /Users/schadem/.pyenv/versions/3.11.1/envs/langchain/lib/python3.11/site-packages (from boto3>=1.26.35->amazon-textract-caller>=0.2.0) (0.7.0)\n", - "Requirement already satisfied: python-dateutil<3.0.0,>=2.1 in /Users/schadem/.pyenv/versions/3.11.1/envs/langchain/lib/python3.11/site-packages (from botocore->amazon-textract-caller>=0.2.0) (2.8.2)\n", - "Requirement already satisfied: urllib3<2.1,>=1.25.4 in /Users/schadem/.pyenv/versions/3.11.1/envs/langchain/lib/python3.11/site-packages (from botocore->amazon-textract-caller>=0.2.0) (1.26.18)\n", - "Requirement already satisfied: packaging>=17.0 in /Users/schadem/.pyenv/versions/3.11.1/envs/langchain/lib/python3.11/site-packages (from marshmallow<4,>=3.14->amazon-textract-response-parser>=0.1.39->amazon-textract-caller>=0.2.0) (23.2)\n", - "Requirement already satisfied: six>=1.5 in /Users/schadem/.pyenv/versions/3.11.1/envs/langchain/lib/python3.11/site-packages (from python-dateutil<3.0.0,>=2.1->botocore->amazon-textract-caller>=0.2.0) (1.16.0)\n", - "Using cached amazon_textract_caller-0.2.0-py2.py3-none-any.whl (13 kB)\n", - "\u001b[33mDEPRECATION: amazon-textract-pipeline-pagedimensions 0.0.8 has a non-standard dependency specifier Pillow>=9.4.*. pip 23.3 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of amazon-textract-pipeline-pagedimensions or contact the author to suggest that they release a version with a conforming dependency specifiers. Discussion can be found at https://github.com/pypa/pip/issues/12063\u001b[0m\u001b[33m\n", - "\u001b[0m\u001b[33mDEPRECATION: amazon-textract-pipeline-pagedimensions 0.0.8 has a non-standard dependency specifier pypdf>=2.5.*. pip 23.3 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of amazon-textract-pipeline-pagedimensions or contact the author to suggest that they release a version with a conforming dependency specifiers. Discussion can be found at https://github.com/pypa/pip/issues/12063\u001b[0m\u001b[33m\n", - "\u001b[0mInstalling collected packages: amazon-textract-caller\n", - " Attempting uninstall: amazon-textract-caller\n", - " Found existing installation: amazon-textract-caller 0.0.29\n", - " Uninstalling amazon-textract-caller-0.0.29:\n", - " Successfully uninstalled amazon-textract-caller-0.0.29\n", - "\u001b[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.\n", - "amazon-textract-textractor 1.4.1 requires amazon-textract-caller<0.1.0,>=0.0.27, but you have amazon-textract-caller 0.2.0 which is incompatible.\u001b[0m\u001b[31m\n", - "\u001b[0mSuccessfully installed amazon-textract-caller-0.2.0\n", - "\n", - "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m23.2\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m23.3\u001b[0m\n", - "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpython -m pip install --upgrade pip\u001b[0m\n" - ] - } - ], - "source": [ - "!pip install \"amazon-textract-caller>=0.2.0\"" - ] - }, - { - "cell_type": "markdown", - "id": "400b25c6-befa-4730-a201-39ff112c8858", - "metadata": {}, - "source": [ - "## Sample 1\n", - "\n", - "The first example uses a local file, which internally will be send to Amazon Textract sync API [DetectDocumentText](https://docs.aws.amazon.com/textract/latest/dg/API_DetectDocumentText.html). \n", - "\n", - "Local files or URL endpoints like HTTP:// are limited to one page documents for Textract.\n", - "Multi-page documents have to reside on S3. This sample file is a jpeg." - ] - }, - { - "cell_type": "code", - "execution_count": 6, - "id": "1becee92-e82f-42d4-9b4e-b23d77cbe88d", - "metadata": { - "tags": [] - }, - "outputs": [ - { - "ename": "ImportError", - "evalue": "cannot import name 'DocumentIntelligenceParser' from 'langchain.document_loaders.parsers.pdf' (/Users/schadem/code/github/schadem/langchain/libs/langchain/langchain/document_loaders/parsers/pdf.py)", - "output_type": "error", - "traceback": [ - "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", - "\u001b[0;31mImportError\u001b[0m Traceback (most recent call last)", - "Cell \u001b[0;32mIn[6], line 1\u001b[0m\n\u001b[0;32m----> 1\u001b[0m \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01mlangchain\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mdocument_loaders\u001b[39;00m \u001b[38;5;28;01mimport\u001b[39;00m AmazonTextractPDFLoader\n\u001b[1;32m 2\u001b[0m loader \u001b[38;5;241m=\u001b[39m AmazonTextractPDFLoader(\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mexample_data/alejandro_rosalez_sample-small.jpeg\u001b[39m\u001b[38;5;124m\"\u001b[39m)\n\u001b[1;32m 3\u001b[0m documents \u001b[38;5;241m=\u001b[39m loader\u001b[38;5;241m.\u001b[39mload()\n", - "File \u001b[0;32m~/code/github/schadem/langchain/libs/langchain/langchain/document_loaders/__init__.py:46\u001b[0m\n\u001b[1;32m 44\u001b[0m \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01mlangchain\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mdocument_loaders\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mbigquery\u001b[39;00m \u001b[38;5;28;01mimport\u001b[39;00m BigQueryLoader\n\u001b[1;32m 45\u001b[0m \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01mlangchain\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mdocument_loaders\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mbilibili\u001b[39;00m \u001b[38;5;28;01mimport\u001b[39;00m BiliBiliLoader\n\u001b[0;32m---> 46\u001b[0m \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01mlangchain\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mdocument_loaders\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mblackboard\u001b[39;00m \u001b[38;5;28;01mimport\u001b[39;00m BlackboardLoader\n\u001b[1;32m 47\u001b[0m \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01mlangchain\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mdocument_loaders\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mblob_loaders\u001b[39;00m \u001b[38;5;28;01mimport\u001b[39;00m (\n\u001b[1;32m 48\u001b[0m Blob,\n\u001b[1;32m 49\u001b[0m BlobLoader,\n\u001b[1;32m 50\u001b[0m FileSystemBlobLoader,\n\u001b[1;32m 51\u001b[0m YoutubeAudioLoader,\n\u001b[1;32m 52\u001b[0m )\n\u001b[1;32m 53\u001b[0m \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01mlangchain\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mdocument_loaders\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mblockchain\u001b[39;00m \u001b[38;5;28;01mimport\u001b[39;00m BlockchainDocumentLoader\n", - "File \u001b[0;32m~/code/github/schadem/langchain/libs/langchain/langchain/document_loaders/blackboard.py:9\u001b[0m\n\u001b[1;32m 7\u001b[0m \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01mlangchain\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mdocstore\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mdocument\u001b[39;00m \u001b[38;5;28;01mimport\u001b[39;00m Document\n\u001b[1;32m 8\u001b[0m \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01mlangchain\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mdocument_loaders\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mdirectory\u001b[39;00m \u001b[38;5;28;01mimport\u001b[39;00m DirectoryLoader\n\u001b[0;32m----> 9\u001b[0m \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01mlangchain\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mdocument_loaders\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mpdf\u001b[39;00m \u001b[38;5;28;01mimport\u001b[39;00m PyPDFLoader\n\u001b[1;32m 10\u001b[0m \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01mlangchain\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mdocument_loaders\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mweb_base\u001b[39;00m \u001b[38;5;28;01mimport\u001b[39;00m WebBaseLoader\n\u001b[1;32m 13\u001b[0m \u001b[38;5;28;01mclass\u001b[39;00m \u001b[38;5;21;01mBlackboardLoader\u001b[39;00m(WebBaseLoader):\n", - "File \u001b[0;32m~/code/github/schadem/langchain/libs/langchain/langchain/document_loaders/pdf.py:17\u001b[0m\n\u001b[1;32m 15\u001b[0m \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01mlangchain\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mdocument_loaders\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mbase\u001b[39;00m \u001b[38;5;28;01mimport\u001b[39;00m BaseLoader\n\u001b[1;32m 16\u001b[0m \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01mlangchain\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mdocument_loaders\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mblob_loaders\u001b[39;00m \u001b[38;5;28;01mimport\u001b[39;00m Blob\n\u001b[0;32m---> 17\u001b[0m \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01mlangchain\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mdocument_loaders\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mparsers\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mpdf\u001b[39;00m \u001b[38;5;28;01mimport\u001b[39;00m (\n\u001b[1;32m 18\u001b[0m AmazonTextractPDFParser,\n\u001b[1;32m 19\u001b[0m DocumentIntelligenceParser,\n\u001b[1;32m 20\u001b[0m PDFMinerParser,\n\u001b[1;32m 21\u001b[0m PDFPlumberParser,\n\u001b[1;32m 22\u001b[0m PyMuPDFParser,\n\u001b[1;32m 23\u001b[0m PyPDFium2Parser,\n\u001b[1;32m 24\u001b[0m PyPDFParser,\n\u001b[1;32m 25\u001b[0m )\n\u001b[1;32m 26\u001b[0m \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01mlangchain\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mdocument_loaders\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01munstructured\u001b[39;00m \u001b[38;5;28;01mimport\u001b[39;00m UnstructuredFileLoader\n\u001b[1;32m 27\u001b[0m \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01mlangchain\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mutils\u001b[39;00m \u001b[38;5;28;01mimport\u001b[39;00m get_from_dict_or_env\n", - "\u001b[0;31mImportError\u001b[0m: cannot import name 'DocumentIntelligenceParser' from 'langchain.document_loaders.parsers.pdf' (/Users/schadem/code/github/schadem/langchain/libs/langchain/langchain/document_loaders/parsers/pdf.py)" - ] - } - ], - "source": [ - "from langchain.document_loaders import AmazonTextractPDFLoader\n", - "\n", - "loader = AmazonTextractPDFLoader(\"example_data/alejandro_rosalez_sample-small.jpeg\")\n", - "documents = loader.load()" - ] - }, - { - "cell_type": "markdown", - "id": "d566dc56-c9a9-44ec-84fb-a81928f90d40", - "metadata": {}, - "source": [ - "Output from the file" - ] - }, - { - "cell_type": "code", - "execution_count": 10, - "id": "1272ce8c-d298-4059-ac0a-780bf5f82302", - "metadata": { - "tags": [] - }, - "outputs": [ - { - "data": { - "text/plain": [ - "[Document(page_content='Patient Information First Name: ALEJANDRO Last Name: ROSALEZ Date of Birth: 10/10/1982 Sex: M Marital Status: MARRIED Email Address: Address: 123 ANY STREET City: ANYTOWN State: CA Zip Code: 12345 Phone: 646-555-0111 Emergency Contact 1: First Name: CARLOS Last Name: SALAZAR Phone: 212-555-0150 Relationship to Patient: BROTHER Emergency Contact 2: First Name: JANE Last Name: DOE Phone: 650-555-0123 Relationship FRIEND to Patient: Did you feel fever or feverish lately? Yes No Are you having shortness of breath? Yes No Do you have a cough? Yes No Did you experience loss of taste or smell? Yes No Where you in contact with any confirmed COVID-19 positive patients? Yes No Did you travel in the past 14 days to any regions affected by COVID-19? Yes No Patient Information First Name: ALEJANDRO Last Name: ROSALEZ Date of Birth: 10/10/1982 Sex: M Marital Status: MARRIED Email Address: Address: 123 ANY STREET City: ANYTOWN State: CA Zip Code: 12345 Phone: 646-555-0111 Emergency Contact 1: First Name: CARLOS Last Name: SALAZAR Phone: 212-555-0150 Relationship to Patient: BROTHER Emergency Contact 2: First Name: JANE Last Name: DOE Phone: 650-555-0123 Relationship FRIEND to Patient: Did you feel fever or feverish lately? Yes No Are you having shortness of breath? Yes No Do you have a cough? Yes No Did you experience loss of taste or smell? Yes No Where you in contact with any confirmed COVID-19 positive patients? Yes No Did you travel in the past 14 days to any regions affected by COVID-19? Yes No ', metadata={'source': 'example_data/alejandro_rosalez_sample-small.jpeg', 'page': 1})]" - ] - }, - "execution_count": 10, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "documents" - ] - }, - { - "cell_type": "markdown", - "id": "4cf7f19c-3635-453a-9c76-4baf98b8d7f4", - "metadata": {}, - "source": [ - "## Sample 2\n", - "The next sample loads a file from an HTTPS endpoint. \n", - "It has to be single page, as Amazon Textract requires all multi-page documents to be stored on S3." - ] - }, - { - "cell_type": "code", - "execution_count": 7, - "id": "10374bfb-b325-451f-8bd0-c686710ab68c", - "metadata": { - "tags": [] - }, - "outputs": [], - "source": [ - "from langchain.document_loaders import AmazonTextractPDFLoader\n", - "\n", - "loader = AmazonTextractPDFLoader(\n", - " \"https://amazon-textract-public-content.s3.us-east-2.amazonaws.com/langchain/alejandro_rosalez_sample_1.jpg\"\n", - ")\n", - "documents = loader.load()" - ] - }, - { - "cell_type": "code", - "execution_count": 11, - "id": "16a2b6a3-7514-4c2c-a427-6847169af473", - "metadata": { - "tags": [] - }, - "outputs": [ - { - "data": { - "text/plain": [ - "[Document(page_content='Patient Information First Name: ALEJANDRO Last Name: ROSALEZ Date of Birth: 10/10/1982 Sex: M Marital Status: MARRIED Email Address: Address: 123 ANY STREET City: ANYTOWN State: CA Zip Code: 12345 Phone: 646-555-0111 Emergency Contact 1: First Name: CARLOS Last Name: SALAZAR Phone: 212-555-0150 Relationship to Patient: BROTHER Emergency Contact 2: First Name: JANE Last Name: DOE Phone: 650-555-0123 Relationship FRIEND to Patient: Did you feel fever or feverish lately? Yes No Are you having shortness of breath? Yes No Do you have a cough? Yes No Did you experience loss of taste or smell? Yes No Where you in contact with any confirmed COVID-19 positive patients? Yes No Did you travel in the past 14 days to any regions affected by COVID-19? Yes No Patient Information First Name: ALEJANDRO Last Name: ROSALEZ Date of Birth: 10/10/1982 Sex: M Marital Status: MARRIED Email Address: Address: 123 ANY STREET City: ANYTOWN State: CA Zip Code: 12345 Phone: 646-555-0111 Emergency Contact 1: First Name: CARLOS Last Name: SALAZAR Phone: 212-555-0150 Relationship to Patient: BROTHER Emergency Contact 2: First Name: JANE Last Name: DOE Phone: 650-555-0123 Relationship FRIEND to Patient: Did you feel fever or feverish lately? Yes No Are you having shortness of breath? Yes No Do you have a cough? Yes No Did you experience loss of taste or smell? Yes No Where you in contact with any confirmed COVID-19 positive patients? Yes No Did you travel in the past 14 days to any regions affected by COVID-19? Yes No ', metadata={'source': 'example_data/alejandro_rosalez_sample-small.jpeg', 'page': 1})]" - ] - }, - "execution_count": 11, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "documents" - ] - }, - { - "cell_type": "markdown", - "id": "3a9cd8ec-e663-4dc7-9db1-d2f575253141", - "metadata": {}, - "source": [ - "## Sample 3\n", - "\n", - "Processing a multi-page document requires the document to be on S3. The sample document resides in a bucket in us-east-2 and Textract needs to be called in that same region to be successful, so we set the region_name on the client and pass that in to the loader to ensure Textract is called from us-east-2. You could also to have your notebook running in us-east-2, setting the AWS_DEFAULT_REGION set to us-east-2 or when running in a different environment, pass in a boto3 Textract client with that region name like in the cell below." - ] - }, - { - "cell_type": "code", - "execution_count": 12, - "id": "8185e3e6-9599-4a47-8969-d6dcef3e6404", - "metadata": { - "tags": [] - }, - "outputs": [], - "source": [ - "import boto3\n", - "\n", - "textract_client = boto3.client(\"textract\", region_name=\"us-east-2\")\n", - "\n", - "file_path = \"s3://amazon-textract-public-content/langchain/layout-parser-paper.pdf\"\n", - "loader = AmazonTextractPDFLoader(file_path, client=textract_client)\n", - "documents = loader.load()" - ] - }, - { - "cell_type": "markdown", - "id": "b8901eec-070d-4fd6-9d65-52211d332441", - "metadata": {}, - "source": [ - "Now getting the number of pages to validate the response (printing out the full response would be quite long...). We expect 16 pages." - ] - }, - { - "cell_type": "code", - "execution_count": 13, - "id": "b23c01c8-cf69-4fe2-8141-4621edb7d79c", - "metadata": { - "tags": [] - }, - "outputs": [ - { - "data": { - "text/plain": [ - "16" - ] - }, - "execution_count": 13, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "len(documents)" - ] - }, - { - "cell_type": "markdown", - "id": "b3e41b4d-b159-4274-89be-80d8159134ef", - "metadata": {}, - "source": [ - "## Using the AmazonTextractPDFLoader in an LangChain chain (e. g. OpenAI)\n", - "\n", - "The AmazonTextractPDFLoader can be used in a chain the same way the other loaders are used.\n", - "Textract itself does have a [Query feature](https://docs.aws.amazon.com/textract/latest/dg/API_Query.html), which offers similar functionality to the QA chain in this sample, which is worth checking out as well." - ] - }, - { - "cell_type": "code", - "execution_count": 14, - "id": "53c47b24-cc06-4256-9e5b-a82fc80bc55d", - "metadata": {}, - "outputs": [], - "source": [ - "# You can store your OPENAI_API_KEY in a .env file as well\n", - "# import os\n", - "# from dotenv import load_dotenv\n", - "\n", - "# load_dotenv()" - ] - }, - { - "cell_type": "code", - "execution_count": 15, - "id": "a9ae004c-246c-4c7f-8458-191cd7424a9b", - "metadata": { - "tags": [] - }, - "outputs": [], - "source": [ - "# Or set the OpenAI key in the environment directly\n", - "import os\n", - "\n", - "os.environ[\"OPENAI_API_KEY\"] = \"your-OpenAI-API-key\"" - ] - }, - { - "cell_type": "code", - "execution_count": 16, - "id": "d52b089c-10ca-45fb-8669-8a1c5fee10d5", - "metadata": { - "tags": [] - }, - "outputs": [ - { - "data": { - "text/plain": [ - "' The authors are Zejiang Shen, Ruochen Zhang, Melissa Dell, Benjamin Charles Germain Lee, Jacob Carlson, Weining Li, Gardner, M., Grus, J., Neumann, M., Tafjord, O., Dasigi, P., Liu, N., Peters, M., Schmitz, M., Zettlemoyer, L., Lukasz Garncarek, Powalski, R., Stanislawek, T., Topolski, B., Halama, P., Gralinski, F., Graves, A., Fernández, S., Gomez, F., Schmidhuber, J., Harley, A.W., Ufkes, A., Derpanis, K.G., He, K., Gkioxari, G., Dollár, P., Girshick, R., He, K., Zhang, X., Ren, S., Sun, J., Kay, A., Lamiroy, B., Lopresti, D., Mears, J., Jakeway, E., Ferriter, M., Adams, C., Yarasavage, N., Thomas, D., Zwaard, K., Li, M., Cui, L., Huang,'" - ] - }, - "execution_count": 16, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "from langchain.chains.question_answering import load_qa_chain\n", - "from langchain.llms import OpenAI\n", - "\n", - "chain = load_qa_chain(llm=OpenAI(), chain_type=\"map_reduce\")\n", - "query = [\"Who are the autors?\"]\n", - "\n", - "chain.run(input_documents=documents, question=query)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "1a09d18b-ab7b-468e-ae66-f92abf666b9b", - "metadata": {}, - "outputs": [], - "source": [] - } - ], - "metadata": { - "availableInstances": [ - { - "_defaultOrder": 0, - "_isFastLaunch": true, - "category": "General purpose", - "gpuNum": 0, - "hideHardwareSpecs": false, - "memoryGiB": 4, - "name": "ml.t3.medium", - "vcpuNum": 2 - }, - { - "_defaultOrder": 1, - "_isFastLaunch": false, - "category": "General purpose", - "gpuNum": 0, - "hideHardwareSpecs": false, - "memoryGiB": 8, - "name": "ml.t3.large", - "vcpuNum": 2 - }, - { - "_defaultOrder": 2, - "_isFastLaunch": false, - "category": "General purpose", - "gpuNum": 0, - "hideHardwareSpecs": false, - "memoryGiB": 16, - "name": "ml.t3.xlarge", - "vcpuNum": 4 - }, - { - "_defaultOrder": 3, - "_isFastLaunch": false, - "category": "General purpose", - "gpuNum": 0, - "hideHardwareSpecs": false, - "memoryGiB": 32, - "name": "ml.t3.2xlarge", - "vcpuNum": 8 - }, - { - "_defaultOrder": 4, - "_isFastLaunch": true, - "category": "General purpose", - "gpuNum": 0, - "hideHardwareSpecs": false, - "memoryGiB": 8, - "name": "ml.m5.large", - "vcpuNum": 2 - }, - { - "_defaultOrder": 5, - "_isFastLaunch": false, - "category": "General purpose", - "gpuNum": 0, - "hideHardwareSpecs": false, - "memoryGiB": 16, - "name": "ml.m5.xlarge", - "vcpuNum": 4 - }, - { - "_defaultOrder": 6, - "_isFastLaunch": false, - "category": "General purpose", - "gpuNum": 0, - "hideHardwareSpecs": false, - "memoryGiB": 32, - "name": "ml.m5.2xlarge", - "vcpuNum": 8 - }, - { - "_defaultOrder": 7, - "_isFastLaunch": false, - "category": "General purpose", - "gpuNum": 0, - "hideHardwareSpecs": false, - "memoryGiB": 64, - "name": "ml.m5.4xlarge", - "vcpuNum": 16 - }, - { - "_defaultOrder": 8, - "_isFastLaunch": false, - "category": "General purpose", - "gpuNum": 0, - "hideHardwareSpecs": false, - "memoryGiB": 128, - "name": "ml.m5.8xlarge", - "vcpuNum": 32 - }, - { - "_defaultOrder": 9, - "_isFastLaunch": false, - "category": "General purpose", - "gpuNum": 0, - "hideHardwareSpecs": false, - "memoryGiB": 192, - "name": "ml.m5.12xlarge", - "vcpuNum": 48 - }, - { - "_defaultOrder": 10, - "_isFastLaunch": false, - "category": "General purpose", - "gpuNum": 0, - "hideHardwareSpecs": false, - "memoryGiB": 256, - "name": "ml.m5.16xlarge", - "vcpuNum": 64 - }, - { - "_defaultOrder": 11, - "_isFastLaunch": false, - "category": "General purpose", - "gpuNum": 0, - "hideHardwareSpecs": false, - "memoryGiB": 384, - "name": "ml.m5.24xlarge", - "vcpuNum": 96 - }, - { - "_defaultOrder": 12, - "_isFastLaunch": false, - "category": "General purpose", - "gpuNum": 0, - "hideHardwareSpecs": false, - "memoryGiB": 8, - "name": "ml.m5d.large", - "vcpuNum": 2 - }, - { - "_defaultOrder": 13, - "_isFastLaunch": false, - "category": "General purpose", - "gpuNum": 0, - "hideHardwareSpecs": false, - "memoryGiB": 16, - "name": "ml.m5d.xlarge", - "vcpuNum": 4 - }, - { - "_defaultOrder": 14, - "_isFastLaunch": false, - "category": "General purpose", - "gpuNum": 0, - "hideHardwareSpecs": false, - "memoryGiB": 32, - "name": "ml.m5d.2xlarge", - "vcpuNum": 8 - }, - { - "_defaultOrder": 15, - "_isFastLaunch": false, - "category": "General purpose", - "gpuNum": 0, - "hideHardwareSpecs": false, - "memoryGiB": 64, - "name": "ml.m5d.4xlarge", - "vcpuNum": 16 - }, - { - "_defaultOrder": 16, - "_isFastLaunch": false, - "category": "General purpose", - "gpuNum": 0, - "hideHardwareSpecs": false, - "memoryGiB": 128, - "name": "ml.m5d.8xlarge", - "vcpuNum": 32 - }, - { - "_defaultOrder": 17, - "_isFastLaunch": false, - "category": "General purpose", - "gpuNum": 0, - "hideHardwareSpecs": false, - "memoryGiB": 192, - "name": "ml.m5d.12xlarge", - "vcpuNum": 48 - }, - { - "_defaultOrder": 18, - "_isFastLaunch": false, - "category": "General purpose", - "gpuNum": 0, - "hideHardwareSpecs": false, - "memoryGiB": 256, - "name": "ml.m5d.16xlarge", - "vcpuNum": 64 - }, - { - "_defaultOrder": 19, - "_isFastLaunch": false, - "category": "General purpose", - "gpuNum": 0, - "hideHardwareSpecs": false, - "memoryGiB": 384, - "name": "ml.m5d.24xlarge", - "vcpuNum": 96 - }, - { - "_defaultOrder": 20, - "_isFastLaunch": false, - "category": "General purpose", - "gpuNum": 0, - "hideHardwareSpecs": true, - "memoryGiB": 0, - "name": "ml.geospatial.interactive", - "supportedImageNames": [ - "sagemaker-geospatial-v1-0" - ], - "vcpuNum": 0 - }, - { - "_defaultOrder": 21, - "_isFastLaunch": true, - "category": "Compute optimized", - "gpuNum": 0, - "hideHardwareSpecs": false, - "memoryGiB": 4, - "name": "ml.c5.large", - "vcpuNum": 2 - }, - { - "_defaultOrder": 22, - "_isFastLaunch": false, - "category": "Compute optimized", - "gpuNum": 0, - "hideHardwareSpecs": false, - "memoryGiB": 8, - "name": "ml.c5.xlarge", - "vcpuNum": 4 - }, - { - "_defaultOrder": 23, - "_isFastLaunch": false, - "category": "Compute optimized", - "gpuNum": 0, - "hideHardwareSpecs": false, - "memoryGiB": 16, - "name": "ml.c5.2xlarge", - "vcpuNum": 8 - }, - { - "_defaultOrder": 24, - "_isFastLaunch": false, - "category": "Compute optimized", - "gpuNum": 0, - "hideHardwareSpecs": false, - "memoryGiB": 32, - "name": "ml.c5.4xlarge", - "vcpuNum": 16 - }, - { - "_defaultOrder": 25, - "_isFastLaunch": false, - "category": "Compute optimized", - "gpuNum": 0, - "hideHardwareSpecs": false, - "memoryGiB": 72, - "name": "ml.c5.9xlarge", - "vcpuNum": 36 - }, - { - "_defaultOrder": 26, - "_isFastLaunch": false, - "category": "Compute optimized", - "gpuNum": 0, - "hideHardwareSpecs": false, - "memoryGiB": 96, - "name": "ml.c5.12xlarge", - "vcpuNum": 48 - }, - { - "_defaultOrder": 27, - "_isFastLaunch": false, - "category": "Compute optimized", - "gpuNum": 0, - "hideHardwareSpecs": false, - "memoryGiB": 144, - "name": "ml.c5.18xlarge", - "vcpuNum": 72 - }, - { - "_defaultOrder": 28, - "_isFastLaunch": false, - "category": "Compute optimized", - "gpuNum": 0, - "hideHardwareSpecs": false, - "memoryGiB": 192, - "name": "ml.c5.24xlarge", - "vcpuNum": 96 - }, - { - "_defaultOrder": 29, - "_isFastLaunch": true, - "category": "Accelerated computing", - "gpuNum": 1, - "hideHardwareSpecs": false, - "memoryGiB": 16, - "name": "ml.g4dn.xlarge", - "vcpuNum": 4 - }, - { - "_defaultOrder": 30, - "_isFastLaunch": false, - "category": "Accelerated computing", - "gpuNum": 1, - "hideHardwareSpecs": false, - "memoryGiB": 32, - "name": "ml.g4dn.2xlarge", - "vcpuNum": 8 - }, - { - "_defaultOrder": 31, - "_isFastLaunch": false, - "category": "Accelerated computing", - "gpuNum": 1, - "hideHardwareSpecs": false, - "memoryGiB": 64, - "name": "ml.g4dn.4xlarge", - "vcpuNum": 16 - }, - { - "_defaultOrder": 32, - "_isFastLaunch": false, - "category": "Accelerated computing", - "gpuNum": 1, - "hideHardwareSpecs": false, - "memoryGiB": 128, - "name": "ml.g4dn.8xlarge", - "vcpuNum": 32 - }, - { - "_defaultOrder": 33, - "_isFastLaunch": false, - "category": "Accelerated computing", - "gpuNum": 4, - "hideHardwareSpecs": false, - "memoryGiB": 192, - "name": "ml.g4dn.12xlarge", - "vcpuNum": 48 - }, - { - "_defaultOrder": 34, - "_isFastLaunch": false, - "category": "Accelerated computing", - "gpuNum": 1, - "hideHardwareSpecs": false, - "memoryGiB": 256, - "name": "ml.g4dn.16xlarge", - "vcpuNum": 64 - }, - { - "_defaultOrder": 35, - "_isFastLaunch": false, - "category": "Accelerated computing", - "gpuNum": 1, - "hideHardwareSpecs": false, - "memoryGiB": 61, - "name": "ml.p3.2xlarge", - "vcpuNum": 8 - }, - { - "_defaultOrder": 36, - "_isFastLaunch": false, - "category": "Accelerated computing", - "gpuNum": 4, - "hideHardwareSpecs": false, - "memoryGiB": 244, - "name": "ml.p3.8xlarge", - "vcpuNum": 32 - }, - { - "_defaultOrder": 37, - "_isFastLaunch": false, - "category": "Accelerated computing", - "gpuNum": 8, - "hideHardwareSpecs": false, - "memoryGiB": 488, - "name": "ml.p3.16xlarge", - "vcpuNum": 64 - }, - { - "_defaultOrder": 38, - "_isFastLaunch": false, - "category": "Accelerated computing", - "gpuNum": 8, - "hideHardwareSpecs": false, - "memoryGiB": 768, - "name": "ml.p3dn.24xlarge", - "vcpuNum": 96 - }, - { - "_defaultOrder": 39, - "_isFastLaunch": false, - "category": "Memory Optimized", - "gpuNum": 0, - "hideHardwareSpecs": false, - "memoryGiB": 16, - "name": "ml.r5.large", - "vcpuNum": 2 - }, - { - "_defaultOrder": 40, - "_isFastLaunch": false, - "category": "Memory Optimized", - "gpuNum": 0, - "hideHardwareSpecs": false, - "memoryGiB": 32, - "name": "ml.r5.xlarge", - "vcpuNum": 4 - }, - { - "_defaultOrder": 41, - "_isFastLaunch": false, - "category": "Memory Optimized", - "gpuNum": 0, - "hideHardwareSpecs": false, - "memoryGiB": 64, - "name": "ml.r5.2xlarge", - "vcpuNum": 8 - }, - { - "_defaultOrder": 42, - "_isFastLaunch": false, - "category": "Memory Optimized", - "gpuNum": 0, - "hideHardwareSpecs": false, - "memoryGiB": 128, - "name": "ml.r5.4xlarge", - "vcpuNum": 16 - }, - { - "_defaultOrder": 43, - "_isFastLaunch": false, - "category": "Memory Optimized", - "gpuNum": 0, - "hideHardwareSpecs": false, - "memoryGiB": 256, - "name": "ml.r5.8xlarge", - "vcpuNum": 32 - }, - { - "_defaultOrder": 44, - "_isFastLaunch": false, - "category": "Memory Optimized", - "gpuNum": 0, - "hideHardwareSpecs": false, - "memoryGiB": 384, - "name": "ml.r5.12xlarge", - "vcpuNum": 48 - }, - { - "_defaultOrder": 45, - "_isFastLaunch": false, - "category": "Memory Optimized", - "gpuNum": 0, - "hideHardwareSpecs": false, - "memoryGiB": 512, - "name": "ml.r5.16xlarge", - "vcpuNum": 64 - }, - { - "_defaultOrder": 46, - "_isFastLaunch": false, - "category": "Memory Optimized", - "gpuNum": 0, - "hideHardwareSpecs": false, - "memoryGiB": 768, - "name": "ml.r5.24xlarge", - "vcpuNum": 96 - }, - { - "_defaultOrder": 47, - "_isFastLaunch": false, - "category": "Accelerated computing", - "gpuNum": 1, - "hideHardwareSpecs": false, - "memoryGiB": 16, - "name": "ml.g5.xlarge", - "vcpuNum": 4 - }, - { - "_defaultOrder": 48, - "_isFastLaunch": false, - "category": "Accelerated computing", - "gpuNum": 1, - "hideHardwareSpecs": false, - "memoryGiB": 32, - "name": "ml.g5.2xlarge", - "vcpuNum": 8 - }, - { - "_defaultOrder": 49, - "_isFastLaunch": false, - "category": "Accelerated computing", - "gpuNum": 1, - "hideHardwareSpecs": false, - "memoryGiB": 64, - "name": "ml.g5.4xlarge", - "vcpuNum": 16 - }, - { - "_defaultOrder": 50, - "_isFastLaunch": false, - "category": "Accelerated computing", - "gpuNum": 1, - "hideHardwareSpecs": false, - "memoryGiB": 128, - "name": "ml.g5.8xlarge", - "vcpuNum": 32 - }, - { - "_defaultOrder": 51, - "_isFastLaunch": false, - "category": "Accelerated computing", - "gpuNum": 1, - "hideHardwareSpecs": false, - "memoryGiB": 256, - "name": "ml.g5.16xlarge", - "vcpuNum": 64 - }, - { - "_defaultOrder": 52, - "_isFastLaunch": false, - "category": "Accelerated computing", - "gpuNum": 4, - "hideHardwareSpecs": false, - "memoryGiB": 192, - "name": "ml.g5.12xlarge", - "vcpuNum": 48 - }, - { - "_defaultOrder": 53, - "_isFastLaunch": false, - "category": "Accelerated computing", - "gpuNum": 4, - "hideHardwareSpecs": false, - "memoryGiB": 384, - "name": "ml.g5.24xlarge", - "vcpuNum": 96 - }, - { - "_defaultOrder": 54, - "_isFastLaunch": false, - "category": "Accelerated computing", - "gpuNum": 8, - "hideHardwareSpecs": false, - "memoryGiB": 768, - "name": "ml.g5.48xlarge", - "vcpuNum": 192 - }, - { - "_defaultOrder": 55, - "_isFastLaunch": false, - "category": "Accelerated computing", - "gpuNum": 8, - "hideHardwareSpecs": false, - "memoryGiB": 1152, - "name": "ml.p4d.24xlarge", - "vcpuNum": 96 - }, - { - "_defaultOrder": 56, - "_isFastLaunch": false, - "category": "Accelerated computing", - "gpuNum": 8, - "hideHardwareSpecs": false, - "memoryGiB": 1152, - "name": "ml.p4de.24xlarge", - "vcpuNum": 96 - } - ], - "instance_type": "ml.t3.medium", - "kernelspec": { - "display_name": "Python 3 (ipykernel)", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.11.6" - } - }, - "nbformat": 4, - "nbformat_minor": 5 -} diff --git a/docs/docs/integrations/llms/amazon_api_gateway.ipynb b/docs/docs/integrations/llms/amazon_api_gateway.ipynb index e9d801ed9d1e5..a4046548e550d 100644 --- a/docs/docs/integrations/llms/amazon_api_gateway.ipynb +++ b/docs/docs/integrations/llms/amazon_api_gateway.ipynb @@ -11,9 +11,9 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "[Amazon API Gateway](https://aws.amazon.com/api-gateway/) is a fully managed service that makes it easy for developers to create, publish, maintain, monitor, and secure APIs at any scale. APIs act as the \"front door\" for applications to access data, business logic, or functionality from your backend services. Using API Gateway, you can create RESTful APIs and WebSocket APIs that enable real-time two-way communication applications. API Gateway supports containerized and serverless workloads, as well as web applications.\n", + ">[Amazon API Gateway](https://aws.amazon.com/api-gateway/) is a fully managed service that makes it easy for developers to create, publish, maintain, monitor, and secure APIs at any >scale. APIs act as the \"front door\" for applications to access data, business logic, or functionality from your backend services. Using `API Gateway`, you can create RESTful APIs and >WebSocket APIs that enable real-time two-way communication applications. API Gateway supports containerized and serverless workloads, as well as web applications.\n", "\n", - "API Gateway handles all the tasks involved in accepting and processing up to hundreds of thousands of concurrent API calls, including traffic management, CORS support, authorization and access control, throttling, monitoring, and API version management. API Gateway has no minimum fees or startup costs. You pay for the API calls you receive and the amount of data transferred out and, with the API Gateway tiered pricing model, you can reduce your cost as your API usage scales." + ">`API Gateway` handles all the tasks involved in accepting and processing up to hundreds of thousands of concurrent API calls, including traffic management, CORS support, authorization >and access control, throttling, monitoring, and API version management. `API Gateway` has no minimum fees or startup costs. You pay for the API calls you receive and the amount of data >transferred out and, with the `API Gateway` tiered pricing model, you can reduce your cost as your API usage scales." ] }, { diff --git a/docs/docs/integrations/llms/bedrock.ipynb b/docs/docs/integrations/llms/bedrock.ipynb index 5a721c187e342..6439815a9ac06 100644 --- a/docs/docs/integrations/llms/bedrock.ipynb +++ b/docs/docs/integrations/llms/bedrock.ipynb @@ -11,7 +11,15 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "[Amazon Bedrock](https://aws.amazon.com/bedrock/) is a fully managed service that makes FMs from leading AI startups and Amazon available via an API, so you can choose from a wide range of FMs to find the model that is best suited for your use case" + ">[Amazon Bedrock](https://aws.amazon.com/bedrock/) is a fully managed service that offers a choice of \n", + "> high-performing foundation models (FMs) from leading AI companies like `AI21 Labs`, `Anthropic`, `Cohere`, \n", + "> `Meta`, `Stability AI`, and `Amazon` via a single API, along with a broad set of capabilities you need to \n", + "> build generative AI applications with security, privacy, and responsible AI. Using `Amazon Bedrock`, \n", + "> you can easily experiment with and evaluate top FMs for your use case, privately customize them with \n", + "> your data using techniques such as fine-tuning and `Retrieval Augmented Generation` (`RAG`), and build \n", + "> agents that execute tasks using your enterprise systems and data sources. Since `Amazon Bedrock` is \n", + "> serverless, you don't have to manage any infrastructure, and you can securely integrate and deploy \n", + "> generative AI capabilities into your applications using the AWS services you are already familiar with.\n" ] }, { @@ -116,7 +124,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.10.11" + "version": "3.10.12" } }, "nbformat": 4, diff --git a/docs/docs/integrations/platforms/aws.mdx b/docs/docs/integrations/platforms/aws.mdx index 5c6d292d0a503..cb9d9c1d048d9 100644 --- a/docs/docs/integrations/platforms/aws.mdx +++ b/docs/docs/integrations/platforms/aws.mdx @@ -1,11 +1,22 @@ # AWS -All functionality related to [Amazon AWS](https://aws.amazon.com/) platform +The `LangChain` integrations related to [Amazon AWS](https://aws.amazon.com/) platform. ## LLMs ### Bedrock +>[Amazon Bedrock](https://aws.amazon.com/bedrock/) is a fully managed service that offers a choice of +> high-performing foundation models (FMs) from leading AI companies like `AI21 Labs`, `Anthropic`, `Cohere`, +> `Meta`, `Stability AI`, and `Amazon` via a single API, along with a broad set of capabilities you need to +> build generative AI applications with security, privacy, and responsible AI. Using `Amazon Bedrock`, +> you can easily experiment with and evaluate top FMs for your use case, privately customize them with +> your data using techniques such as fine-tuning and `Retrieval Augmented Generation` (`RAG`), and build +> agents that execute tasks using your enterprise systems and data sources. Since `Amazon Bedrock` is +> serverless, you don't have to manage any infrastructure, and you can securely integrate and deploy +> generative AI capabilities into your applications using the AWS services you are already familiar with. + + See a [usage example](/docs/integrations/llms/bedrock). ```python @@ -14,32 +25,28 @@ from langchain.llms.bedrock import Bedrock ### Amazon API Gateway -[Amazon API Gateway](https://aws.amazon.com/api-gateway/) is a fully managed service that makes it easy for developers to create, publish, maintain, monitor, and secure APIs at any scale. APIs act as the "front door" for applications to access data, business logic, or functionality from your backend services. Using API Gateway, you can create RESTful APIs and WebSocket APIs that enable real-time two-way communication applications. API Gateway supports containerized and serverless workloads, as well as web applications. - -API Gateway handles all the tasks involved in accepting and processing up to hundreds of thousands of concurrent API calls, including traffic management, CORS support, authorization and access control, throttling, monitoring, and API version management. API Gateway has no minimum fees or startup costs. You pay for the API calls you receive and the amount of data transferred out and, with the API Gateway tiered pricing model, you can reduce your cost as your API usage scales. +>[Amazon API Gateway](https://aws.amazon.com/api-gateway/) is a fully managed service that makes it easy for +> developers to create, publish, maintain, monitor, and secure APIs at any scale. APIs act as the "front door" +> for applications to access data, business logic, or functionality from your backend services. Using +> `API Gateway`, you can create RESTful APIs and WebSocket APIs that enable real-time two-way communication +> applications. `API Gateway` supports containerized and serverless workloads, as well as web applications. +> +> `API Gateway` handles all the tasks involved in accepting and processing up to hundreds of thousands of +> concurrent API calls, including traffic management, CORS support, authorization and access control, +> throttling, monitoring, and API version management. `API Gateway` has no minimum fees or startup costs. +> You pay for the API calls you receive and the amount of data transferred out and, with the `API Gateway` +> tiered pricing model, you can reduce your cost as your API usage scales. -See a [usage example](/docs/integrations/llms/amazon_api_gateway_example). +See a [usage example](/docs/integrations/llms/amazon_api_gateway). ```python from langchain.llms import AmazonAPIGateway - -api_url = "https://.execute-api..amazonaws.com/LATEST/HF" -# These are sample parameters for Falcon 40B Instruct Deployed from Amazon SageMaker JumpStart -model_kwargs = { - "max_new_tokens": 100, - "num_return_sequences": 1, - "top_k": 50, - "top_p": 0.95, - "do_sample": False, - "return_full_text": True, - "temperature": 0.2, -} -llm = AmazonAPIGateway(api_url=api_url, model_kwargs=model_kwargs) ``` ### SageMaker Endpoint ->[Amazon SageMaker](https://aws.amazon.com/sagemaker/) is a system that can build, train, and deploy machine learning (ML) models with fully managed infrastructure, tools, and workflows. +>[Amazon SageMaker](https://aws.amazon.com/sagemaker/) is a system that can build, train, and deploy +> machine learning (ML) models with fully managed infrastructure, tools, and workflows. We use `SageMaker` to host our model and expose it as the `SageMaker Endpoint`. @@ -50,6 +57,16 @@ from langchain.llms import SagemakerEndpoint from langchain.llms.sagemaker_endpoint import LLMContentHandler ``` +## Chat models + +### Bedrock Chat + +See a [usage example](/docs/integrations/chat/bedrock). + +```python +from langchain.chat_models import BedrockChat +``` + ## Text Embedding Models ### Bedrock @@ -67,11 +84,32 @@ from langchain.embeddings import SagemakerEndpointEmbeddings from langchain.llms.sagemaker_endpoint import ContentHandlerBase ``` +## Chains + +### Amazon Comprehend Moderation Chain + +>[Amazon Comprehend](https://aws.amazon.com/comprehend/) is a natural-language processing (NLP) service that +> uses machine learning to uncover valuable insights and connections in text. + + +We need to install the `boto3` and `nltk` libraries. + +```bash +pip install boto3 nltk +``` + +See a [usage example](/docs/guides/safety/amazon_comprehend_chain). + +```python +from langchain_experimental.comprehend_moderation import AmazonComprehendModerationChain +``` ## Document loaders ### AWS S3 Directory and File ->[Amazon Simple Storage Service (Amazon S3)](https://docs.aws.amazon.com/AmazonS3/latest/userguide/using-folders.html) is an object storage service. + +>[Amazon Simple Storage Service (Amazon S3)](https://docs.aws.amazon.com/AmazonS3/latest/userguide/using-folders.html) +> is an object storage service. >[AWS S3 Directory](https://docs.aws.amazon.com/AmazonS3/latest/userguide/using-folders.html) >[AWS S3 Buckets](https://docs.aws.amazon.com/AmazonS3/latest/userguide/UsingBucket.html) @@ -83,6 +121,17 @@ See a [usage example for S3FileLoader](/docs/integrations/document_loaders/aws_s from langchain.document_loaders import S3DirectoryLoader, S3FileLoader ``` +### Amazon Textract + +>[Amazon Textract](https://docs.aws.amazon.com/managedservices/latest/userguide/textract.html) is a machine +> learning (ML) service that automatically extracts text, handwriting, and data from scanned documents. + +See a [usage example](/docs/integrations/document_loaders/amazon_textract). + +```python +from langchain.document_loaders import AmazonTextractPDFLoader +``` + ## Memory ### AWS DynamoDB @@ -103,3 +152,112 @@ See a [usage example](/docs/integrations/memory/aws_dynamodb). ```python from langchain.memory import DynamoDBChatMessageHistory ``` + +## Retrievers + +### Amazon Kendra + +> [Amazon Kendra](https://docs.aws.amazon.com/kendra/latest/dg/what-is-kendra.html) is an intelligent search service +> provided by `Amazon Web Services` (`AWS`). It utilizes advanced natural language processing (NLP) and machine +> learning algorithms to enable powerful search capabilities across various data sources within an organization. +> `Kendra` is designed to help users find the information they need quickly and accurately, +> improving productivity and decision-making. + +> With `Kendra`, we can search across a wide range of content types, including documents, FAQs, knowledge bases, +> manuals, and websites. It supports multiple languages and can understand complex queries, synonyms, and +> contextual meanings to provide highly relevant search results. + +We need to install the `boto3` library. + +```bash +pip install boto3 +``` + +See a [usage example](/docs/integrations/retrievers/amazon_kendra_retriever). + +```python +from langchain.retrievers import AmazonKendraRetriever +``` + +### Amazon Bedrock (Knowledge Bases) + +> [Knowledge bases for Amazon Bedrock](https://aws.amazon.com/bedrock/knowledge-bases/) is an +> `Amazon Web Services` (`AWS`) offering which lets you quickly build RAG applications by using your +> private data to customize foundation model response. + +We need to install the `boto3` library. + +```bash +pip install boto3 +``` + +See a [usage example](/docs/integrations/retrievers/amazon_bedrock_knowledge_bases). + +```python +from langchain.retrievers import AmazonKnowledgeBasesRetriever +``` + +## Vector stores + +### Amazon OpenSearch Service + +> [Amazon OpenSearch Service](https://aws.amazon.com/opensearch-service/) performs +> interactive log analytics, real-time application monitoring, website search, and more. `OpenSearch` is +> an open source, +> distributed search and analytics suite derived from `Elasticsearch`. `Amazon OpenSearch Service` offers the +> latest versions of `OpenSearch`, support for many versions of `Elasticsearch`, as well as +> visualization capabilities powered by `OpenSearch Dashboards` and `Kibana`. + +We need to install several python libraries. + +```bash +pip install boto3 requests requests-aws4auth +``` + +See a [usage example](/docs/integrations/vectorstores/opensearch#using-aos-amazon-opensearch-service). + +```python +from langchain.vectorstores import OpenSearchVectorSearch +``` + +## Tools + +### AWS Lambda + +>[`Amazon AWS Lambda`](https://aws.amazon.com/pm/lambda/) is a serverless computing service provided by +> `Amazon Web Services` (`AWS`). It helps developers to build and run applications and services without +> provisioning or managing servers. This serverless architecture enables you to focus on writing and +> deploying code, while AWS automatically takes care of scaling, patching, and managing the +> infrastructure required to run your applications. + +We need to install `boto3` python library. + +```bash +pip install boto3 +``` + +See a [usage example](/docs/integrations/tools/awslambda). + + +## Callbacks + +### SageMaker Tracking + +>[Amazon SageMaker](https://aws.amazon.com/sagemaker/) is a fully managed service that is used to quickly +> and easily build, train and deploy machine learning (ML) models. + +>[Amazon SageMaker Experiments](https://docs.aws.amazon.com/sagemaker/latest/dg/experiments.html) is a capability +> of `Amazon SageMaker` that lets you organize, track, +> compare and evaluate ML experiments and model versions. + +We need to install several python libraries. + +```bash +pip install google-search-results sagemaker +``` + +See a [usage example](/docs/integrations/callbacks/sagemaker_tracking). + +```python +from langchain.callbacks import SageMakerCallbackHandler +``` diff --git a/docs/docs/integrations/retrievers/amazon_kendra_retriever.ipynb b/docs/docs/integrations/retrievers/amazon_kendra_retriever.ipynb index 41cc1d3d55b71..e954b900f4ab0 100644 --- a/docs/docs/integrations/retrievers/amazon_kendra_retriever.ipynb +++ b/docs/docs/integrations/retrievers/amazon_kendra_retriever.ipynb @@ -7,9 +7,9 @@ "source": [ "# Amazon Kendra\n", "\n", - "> Amazon Kendra is an intelligent search service provided by Amazon Web Services (AWS). It utilizes advanced natural language processing (NLP) and machine learning algorithms to enable powerful search capabilities across various data sources within an organization. Kendra is designed to help users find the information they need quickly and accurately, improving productivity and decision-making.\n", + "> [Amazon Kendra](https://docs.aws.amazon.com/kendra/latest/dg/what-is-kendra.html) is an intelligent search service provided by `Amazon Web Services` (`AWS`). It utilizes advanced natural language processing (NLP) and machine learning algorithms to enable powerful search capabilities across various data sources within an organization. `Kendra` is designed to help users find the information they need quickly and accurately, improving productivity and decision-making.\n", "\n", - "> With Kendra, users can search across a wide range of content types, including documents, FAQs, knowledge bases, manuals, and websites. It supports multiple languages and can understand complex queries, synonyms, and contextual meanings to provide highly relevant search results." + "> With `Kendra`, users can search across a wide range of content types, including documents, FAQs, knowledge bases, manuals, and websites. It supports multiple languages and can understand complex queries, synonyms, and contextual meanings to provide highly relevant search results." ] }, { @@ -74,11 +74,24 @@ } ], "metadata": { - "language_info": { - "name": "python" + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" }, - "orig_nbformat": 4 + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.12" + } }, "nbformat": 4, - "nbformat_minor": 2 + "nbformat_minor": 4 } diff --git a/docs/docs/integrations/retrievers/bedrock.ipynb b/docs/docs/integrations/retrievers/bedrock.ipynb index 47a280bbb72d5..36c5e2f42aac0 100644 --- a/docs/docs/integrations/retrievers/bedrock.ipynb +++ b/docs/docs/integrations/retrievers/bedrock.ipynb @@ -5,13 +5,13 @@ "id": "b6636c27-35da-4ba7-8313-eca21660cab3", "metadata": {}, "source": [ - "# Amazon Bedrock (Knowledge Bases)\n", + "# Bedrock (Knowledge Bases)\n", "\n", "> [Knowledge bases for Amazon Bedrock](https://aws.amazon.com/bedrock/knowledge-bases/) is an Amazon Web Services (AWS) offering which lets you quickly build RAG applications by using your private data to customize FM response.\n", "\n", - "> Implementing RAG requires organizations to perform several cumbersome steps to convert data into embeddings (vectors), store the embeddings in a specialized vector database, and build custom integrations into the database to search and retrieve text relevant to the user’s query. This can be time-consuming and inefficient.\n", + "> Implementing `RAG` requires organizations to perform several cumbersome steps to convert data into embeddings (vectors), store the embeddings in a specialized vector database, and build custom integrations into the database to search and retrieve text relevant to the user’s query. This can be time-consuming and inefficient.\n", "\n", - "> With Knowledge Bases for Amazon Bedrock, simply point to the location of your data in Amazon S3, and Knowledge Bases for Amazon Bedrock takes care of the entire ingestion workflow into your vector database. If you do not have an existing vector database, Amazon Bedrock creates an Amazon OpenSearch Serverless vector store for you. For retrievals, use the Langchain - Amazon Bedrock integration via the Retrieve API to retrieve relevant results for a user query from knowledge bases.\n", + "> With `Knowledge Bases for Amazon Bedrock`, simply point to the location of your data in `Amazon S3`, and `Knowledge Bases for Amazon Bedrock` takes care of the entire ingestion workflow into your vector database. If you do not have an existing vector database, Amazon Bedrock creates an Amazon OpenSearch Serverless vector store for you. For retrievals, use the Langchain - Amazon Bedrock integration via the Retrieve API to retrieve relevant results for a user query from knowledge bases.\n", "\n", "> Knowledge base can be configured through [AWS Console](https://aws.amazon.com/console/) or by using [AWS SDKs](https://aws.amazon.com/developer/tools/)." ] @@ -108,7 +108,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.9.13" + "version": "3.10.12" } }, "nbformat": 4, diff --git a/docs/docs/integrations/text_embedding/bedrock.ipynb b/docs/docs/integrations/text_embedding/bedrock.ipynb index 84edf31d144a3..8a0b15086924f 100644 --- a/docs/docs/integrations/text_embedding/bedrock.ipynb +++ b/docs/docs/integrations/text_embedding/bedrock.ipynb @@ -7,7 +7,16 @@ "source": [ "# Bedrock\n", "\n", - ">[Amazon Bedrock](https://aws.amazon.com/bedrock/) is a fully managed service that makes FMs from leading AI startups and Amazon available via an API, so you can choose from a wide range of FMs to find the model that is best suited for your use case.\n" + ">[Amazon Bedrock](https://aws.amazon.com/bedrock/) is a fully managed service that offers a choice of \n", + "> high-performing foundation models (FMs) from leading AI companies like `AI21 Labs`, `Anthropic`, `Cohere`, \n", + "> `Meta`, `Stability AI`, and `Amazon` via a single API, along with a broad set of capabilities you need to \n", + "> build generative AI applications with security, privacy, and responsible AI. Using `Amazon Bedrock`, \n", + "> you can easily experiment with and evaluate top FMs for your use case, privately customize them with \n", + "> your data using techniques such as fine-tuning and `Retrieval Augmented Generation` (`RAG`), and build \n", + "> agents that execute tasks using your enterprise systems and data sources. Since `Amazon Bedrock` is \n", + "> serverless, you don't have to manage any infrastructure, and you can securely integrate and deploy \n", + "> generative AI capabilities into your applications using the AWS services you are already familiar with.\n", + "\n" ] }, { diff --git a/docs/docs/integrations/tools/awslambda.ipynb b/docs/docs/integrations/tools/awslambda.ipynb index f22a5f3c73e6b..79bea4372068f 100644 --- a/docs/docs/integrations/tools/awslambda.ipynb +++ b/docs/docs/integrations/tools/awslambda.ipynb @@ -11,11 +11,11 @@ "cell_type": "markdown", "metadata": {}, "source": [ - ">`Amazon AWS Lambda` is a serverless computing service provided by `Amazon Web Services` (`AWS`). It helps developers to build and run applications and services without provisioning or managing servers. This serverless architecture enables you to focus on writing and deploying code, while AWS automatically takes care of scaling, patching, and managing the infrastructure required to run your applications.\n", + ">[`Amazon AWS Lambda`](https://aws.amazon.com/pm/lambda/) is a serverless computing service provided by `Amazon Web Services` (`AWS`). It helps developers to build and run applications and services without provisioning or managing servers. This serverless architecture enables you to focus on writing and deploying code, while AWS automatically takes care of scaling, patching, and managing the infrastructure required to run your applications.\n", "\n", "This notebook goes over how to use the `AWS Lambda` Tool.\n", "\n", - "By including a `awslambda` in the list of tools provided to an Agent, you can grant your Agent the ability to invoke code running in your AWS Cloud for whatever purposes you need.\n", + "By including the `AWS Lambda` in the list of tools provided to an Agent, you can grant your Agent the ability to invoke code running in your AWS Cloud for whatever purposes you need.\n", "\n", "When an Agent uses the `AWS Lambda` tool, it will provide an argument of type string which will in turn be passed into the Lambda function via the event parameter.\n", "\n", diff --git a/docs/docs/integrations/vectorstores/opensearch.ipynb b/docs/docs/integrations/vectorstores/opensearch.ipynb index fdaeef28d9929..945bc75e7ae7f 100644 --- a/docs/docs/integrations/vectorstores/opensearch.ipynb +++ b/docs/docs/integrations/vectorstores/opensearch.ipynb @@ -63,7 +63,7 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 1, "id": "aac9563e", "metadata": {}, "outputs": [], @@ -321,12 +321,30 @@ "id": "5f590d35", "metadata": { "collapsed": false, + "jupyter": { + "outputs_hidden": false + }, "pycharm": { "name": "#%% md\n" } }, "source": [ - "## Using AOSS (Amazon OpenSearch Service Serverless)" + "## Using AOSS (Amazon OpenSearch Service Serverless)\n", + "\n", + "It is an example of the `AOSS` with `faiss` engine and `efficient_filter`.\n", + "\n", + "\n", + "We need to install several `python` packages." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "279bfc5c-b7f4-4553-ad15-2df7baebec47", + "metadata": {}, + "outputs": [], + "source": [ + "#!pip install requests requests-aws4auth" ] }, { @@ -335,13 +353,16 @@ "id": "de397be7", "metadata": { "collapsed": false, + "jupyter": { + "outputs_hidden": false + }, "pycharm": { "name": "#%%\n" } }, "outputs": [], "source": [ - "# This is just an example to show how to use AOSS with faiss engine and efficient_filter, you need to set proper values.\n", + "from requests_aws4auth import AWS4Auth\n", "\n", "service = \"aoss\" # must set the service as 'aoss'\n", "region = \"us-east-2\"\n", @@ -374,7 +395,10 @@ "cell_type": "markdown", "id": "0aa012c8", "metadata": { - "collapsed": false + "collapsed": false, + "jupyter": { + "outputs_hidden": false + } }, "source": [ "## Using AOS (Amazon OpenSearch Service)" @@ -386,6 +410,9 @@ "id": "2c47e408", "metadata": { "collapsed": false, + "jupyter": { + "outputs_hidden": false + }, "pycharm": { "name": "#%%\n" } @@ -436,9 +463,9 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.10.6" + "version": "3.10.12" } }, "nbformat": 4, "nbformat_minor": 5 -} \ No newline at end of file +} diff --git a/docs/vercel.json b/docs/vercel.json index 219d32f82e306..a7dd36d288f45 100644 --- a/docs/vercel.json +++ b/docs/vercel.json @@ -1576,6 +1576,10 @@ "source": "/en/latest/modules/chains/generic/serialization.html", "destination": "/docs/modules/chains/how_to/serialization" }, + { + "source": "/docs/integrations/document_loaders/pdf-amazonTextractPDFLoader", + "destination": "/docs/integrations/document_loaders/amazon_textract" + }, { "source": "/en/latest/modules/indexes/document_loaders/examples/acreom.html", "destination": "/docs/integrations/document_loaders/acreom"