Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Oracle Database Document Loader and Parser #7251

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
---
hide_table_of_contents: true
---

# Oracle AI

This example goes over how to load documents using Oracle AI Vector Search.

## Setup

You'll need to install the [oracledb](https://www.npmjs.com/package/oracledb) package:

```bash npm2yarn
npm install @langchain/community @langchain/core oracledb
```

## Usage

### Connect to Oracle Database
You'll need to provide the username, password, hostname and service_name:

```typescript
import oracledb from 'oracledb';

let connection: oracledb.Connection;

// Replace the placeholders with your information
const username = "<username>";
const password = "<password>";
const dsn = "<hostname>/<service_name>";

try {
connection = await oracledb.getConnection({
user: username,
password:password,
connectString: dsn
});
console.log("Connection Successful");
} catch (err) {
console.error('Connection failed:', err);
throw err;
}
```

### Load Documents
As for loading documents, you have 3 options:
- Loading a local file.
- Loading from a local directory.
- Loading from the Oracle Database.

When loading from the Oracle Database, you must provide the table's name, owner's name, and the name of the column to load. Optionally, you can provide extra column names to be included in the returned documents' metadata:

```typescript
import { OracleDocLoader, OracleLoadFromType } from "@langchain/community/document_loaders/web/oracleai";

/*
// Loading a local file (replace <filepath> with the path of the file you want to load.)
const loader = new OracleDocLoader(connection, <filepath>, OracleLoadFromType.FILE);
// Loading from a local directory (replace <dirpath> with the path of the directory you want to load from.)
const loader = new OracleDocLoader(connection, <dirpath>, OracleLoadFromType.DIR);
*/

// Loading from Oracle Database table (replace the placeholders with your information, optionally add a [metadata_cols] parameter to include columns as metadata.)
const loader = new OracleDocLoader(connection, <tablename>, OracleLoadFromType.TABLE, <owner_name>, <colname>);

// Load the docs
const docs = loader.load();
console.log("Number of docs loaded:", docs.length);
console.log("Document-0:", docs[0].page_content); // content
```

40 changes: 40 additions & 0 deletions examples/src/document_loaders/oracleai.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
import oracledb from 'oracledb';
import { OracleDocLoader, OracleLoadFromType } from "@langchain/community/document_loaders/web/oracleai";

let connection: oracledb.Connection;

// Replace the placeholders with your information
const username = "<username>";
const pwd = "<password>";
const dsn = "<hostname>/<service_name>";

try {
connection = await oracledb.getConnection({
user: username,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please run yarn format

password: pwd,
connectString: dsn
});
console.log("Connection Successful");
} catch (err) {
console.error('Connection failed:', err);
throw err;
}

// Loading a local file (replace <filepath> with the path of the file you want to load.)
const loader = new OracleDocLoader(connection, "src/document_loaders/example_data/bitcoin.pdf", OracleLoadFromType.FILE);

/*
// Loading from a local directory (replace <dirpath> with the path of the directory you want to load from.)
const loader = new OracleDocLoader(connection, <dirpath>, OracleLoadFromType.DIR);
// Loading from Oracle Database table (replace the placeholders with your information, optionally add a [metadata_cols] parameter to include columns as metadata.)
const loader = new OracleDocLoader(connection, <tablename>, OracleLoadFromType.TABLE, <owner_name>, <colname>);
*/

// Load the docs
const docs = loader.load();
console.log("Number of docs loaded:", docs.length);
console.log("Document-0:", docs[0].page_content); // content


4 changes: 4 additions & 0 deletions libs/langchain-community/.gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -922,6 +922,10 @@ document_loaders/web/notionapi.cjs
document_loaders/web/notionapi.js
document_loaders/web/notionapi.d.ts
document_loaders/web/notionapi.d.cts
document_loaders/web/oracleai.cjs
document_loaders/web/oracleai.js
document_loaders/web/oracleai.d.ts
document_loaders/web/oracleai.d.cts
document_loaders/web/pdf.cjs
document_loaders/web/pdf.js
document_loaders/web/pdf.d.ts
Expand Down
2 changes: 2 additions & 0 deletions libs/langchain-community/langchain.config.js
Original file line number Diff line number Diff line change
Expand Up @@ -286,6 +286,7 @@ export const config = {
"document_loaders/web/github": "document_loaders/web/github",
"document_loaders/web/taskade": "document_loaders/web/taskade",
"document_loaders/web/notionapi": "document_loaders/web/notionapi",
"document_loaders/web/oracleai": "document_loaders/web/oracleai",
"document_loaders/web/pdf": "document_loaders/web/pdf",
"document_loaders/web/recursive_url": "document_loaders/web/recursive_url",
"document_loaders/web/s3": "document_loaders/web/s3",
Expand Down Expand Up @@ -505,6 +506,7 @@ export const config = {
"document_loaders/web/pdf",
"document_loaders/web/taskade",
"document_loaders/web/notionapi",
"document_loaders/web/oracleai",
"document_loaders/web/recursive_url",
"document_loaders/web/s3",
"document_loaders/web/sitemap",
Expand Down
16 changes: 16 additions & 0 deletions libs/langchain-community/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -39,9 +39,11 @@
"binary-extensions": "^2.2.0",
"expr-eval": "^2.0.2",
"flat": "^5.0.2",
"htmlparser2": "^9.1.0",
"js-yaml": "^4.1.0",
"langchain": ">=0.2.3 <0.3.0 || >=0.3.4 <0.4.0",
"langsmith": "^0.2.0",
"oracledb": "^6.7.0",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should not be direct dependencies

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the feedback, where should we move this to instead?

"uuid": "^10.0.0",
"zod": "^3.22.3",
"zod-to-json-schema": "^3.22.5"
Expand Down Expand Up @@ -120,6 +122,7 @@
"@types/jsonwebtoken": "^9",
"@types/lodash": "^4",
"@types/mozilla-readability": "^0.2.1",
"@types/oracledb": "^6",
"@types/pdf-parse": "^1.1.1",
"@types/pg": "^8.11.0",
"@types/pg-copy-streams": "^1.2.2",
Expand Down Expand Up @@ -2791,6 +2794,15 @@
"import": "./document_loaders/web/notionapi.js",
"require": "./document_loaders/web/notionapi.cjs"
},
"./document_loaders/web/oracleai": {
"types": {
"import": "./document_loaders/web/oracleai.d.ts",
"require": "./document_loaders/web/oracleai.d.cts",
"default": "./document_loaders/web/oracleai.d.ts"
},
"import": "./document_loaders/web/oracleai.js",
"require": "./document_loaders/web/oracleai.cjs"
},
"./document_loaders/web/pdf": {
"types": {
"import": "./document_loaders/web/pdf.d.ts",
Expand Down Expand Up @@ -4025,6 +4037,10 @@
"document_loaders/web/notionapi.js",
"document_loaders/web/notionapi.d.ts",
"document_loaders/web/notionapi.d.cts",
"document_loaders/web/oracleai.cjs",
"document_loaders/web/oracleai.js",
"document_loaders/web/oracleai.d.ts",
"document_loaders/web/oracleai.d.cts",
"document_loaders/web/pdf.cjs",
"document_loaders/web/pdf.js",
"document_loaders/web/pdf.d.ts",
Expand Down
Binary file not shown.
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<meta name="keywords" />
<meta />
<title>Sample HTML Page</title>
</head>
<body>
<header>
<h1>Welcome to My Sample HTML Page</h1>
</header>

<main>
<h2>Introduction</h2>
<p>
This is a small HTML file with a header, main content section, and a
footer.
</p>
<p>Feel free to modify and experiment with the code!</p>
</main>

<footer>
<p>Footer Content - &copy; 2024</p>
</footer>
</body>
</html>
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
Foo
Bar
Baz

Loading