Skip to content

Commit

Permalink
doc: Google Drive and OneDrive documentation (docqai#169)
Browse files Browse the repository at this point in the history
* doc: Add user guide for file storage services G Drive and One Drive.
* doc: Add dev guide for setting up Google Drive API.
* doc: Add dev guide for setting up  Microsoft OneDrive.
  • Loading branch information
osala-eng authored Dec 4, 2023
1 parent 1eb9b3e commit 3102c02
Show file tree
Hide file tree
Showing 5 changed files with 104 additions and 8 deletions.
Binary file added docs/assets/azure_register_an_application.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
71 changes: 71 additions & 0 deletions docs/developer-guide/file-storage-services.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
# File Storage Services

Docq supports multiple cloud file storage services as a data source. This section covers how to set up the supported file storage services.

- [Google Drive](#file-storage-google-drive)
- [Setup and Configure Google Cloud project](#setup-and-configure-google-cloud-project)
- [Configure Docq web application for Google Drive](#configure-docq-web-application-for-google-drive)
- [OneDrive](#file-storage-onedrive)
- [Setup and Configure Microsoft Azure Application](#setup-and-configure-microsoft-azure-application)
- [Configure Docq web application for OneDrive](#configure-docq-web-application-for-onedrive)


## File storage: Google Drive

This guide aims to assist developers in integrating Google Drive with Docq. The focus will be on setting up the Google Drive API and obtaining the necessary credentials.

### Setup and Configure Google Cloud project

- [Create a Google Cloud project](https://console.cloud.google.com/projectcreate) for your Docq web application.
- [Enable the Google Drive API](https://console.cloud.google.com/flows/enableapi?apiid=drive.googleapis.com) in the project you just created.
- Go to Menu > APIs & Services > [OAuth consent screen](https://console.cloud.google.com/apis/credentials/consent) then click create and complete the App registration form with the following scopes:
- `https://www.googleapis.com/auth/drive.readonly`
- `https://www.googleapis.com/auth/userinfo.email`
- `openid`
- Go to Menu > APIs & Services > [Credentials](https://console.cloud.google.com/apis/credentials) then click create credentials.
- Click `+ CREATE CREDENTIALS` > OAuth client ID then Fill the form with the following details:
- Application type: Web application
- Authorized redirect URIs: `/Admin_Spaces/`, e.g. `http://localhost:8501/Admin_Spaces/`
- Click create and download the credentials.json file.

The more detailed guide can be found [here](https://developers.google.com/drive/api/quickstart/python).

### Configure Docq web application for Google Drive

After setting up the Google Cloud project and configuring the Google Drive API, you need to configure the Docq web application. This involves configuring the following environment variables:

- `DOCQ_GOOGLE_APPLICATION_CREDENTIALS`: The path to the credentials.json file.
- `DOCQ_GOOGLE_AUTH_REDIRECT_URL`: The redirect URL, e.g. `http://localhost:8501/Admin_Spaces/`. This must be an exact match to the Authorized redirect URIs in the Google Cloud Console.

Note: The Google Drive data source will be automatically disabled if any of the above environment variables are not set.


## File storage: OneDrive

This guide aims to assist developers in integrating OneDrive with Docq. The focus will be on setting up the Microsoft Graph API and obtaining the necessary credentials.

### Setup and Configure Microsoft Azure Application

- [Register an Application](https://entra.microsoft.com/#view/Microsoft_AAD_RegisteredApps/CreateApplicationBlade/isMSAApp~/false) in the Microsoft Entra ID center

![Register an Application](../assets/azure_register_an_application.png)
- Configure the following under `Redirect URI`
- Select `Web` as the platform
- Enter the redirect URL to the following path `/Admin_Spaces/` e.g. `http://localhost:8501/Admin_Spaces/`
- Select `API Permissions` on the side nav and add the following permissions
- `Files.Read`
- `User.Read`
- `offline_access`
- Select `Certificates & secrets` on the side nav and create a new client secret and save this for later.

A more detailed guide can be found [here](https://learn.microsoft.com/en-us/graph/auth-register-app-v2#register-an-application).

### Configure Docq web application for OneDrive

After setting up the Microsoft Azure Application and configuring the Microsoft Graph API, you need to configure the Docq web application. This involves configuring the following environment variables:

- `DOCQ_MS_ONEDRIVE_CLIENT_ID`: The client ID of the application you registered in the Microsoft Azure Application.
- `DOCQ_MS_ONEDRIVE_CLIENT_SECRET`: The client secret of the application you registered in the Microsoft Azure Application.
- `DOCQ_MS_ONEDRIVE_REDIRECT_URI`: The redirect URL, e.g. `http://localhost:8501/Admin_Spaces/`. This must be an exact match to the Redirect URI in the Microsoft Azure Application.

Note: The OneDrive data source will be automatically disabled if any of the above environment variables are not set.
1 change: 1 addition & 0 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,3 +15,4 @@ For the developers and other tech-savvy audience, Docq is like **WordPress for g
[User Guide](./user-guide/getting-started.md) - This is section is for _end-users_ (employees) and admins of the Docq app. It covers how to deploy and config Docq and how to use the AI chat funcitonality to help with your daily work.

[Developer Guide](./developer-guide/getting-started.md) - This section id for those wanting to understand the Docq code base, make code changes to customise Docq, extend Docq, and use Docq as a platform to build AI powered applications on top.
- [File Storage Services](./developer-guide/file-storage-services.md) - Docq supports multiple file storage services. This section covers how to setup the supported file storage services.
38 changes: 30 additions & 8 deletions docs/user-guide/config-spaces.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,12 +5,12 @@ A [Spaces](../overview/key-features.md#spaces-as-data-compartmentation) in Docq
To create a space you need to have Admin privileges in Docq.

- Navigate to 'Admin Overview' > click the 'Shared Spaces' tab > click '+ New Space'
- Add a name that help you easily identify the space.
- Add a name that helps you easily identify the space.
- Add a summary with any additional details. This is helpful when managing several spaces.
- Finally select a data source. Most data sources will require additional config which is data source dependent. Each supported data source has a section below with configuration details.
- Finally, select a data source. Most data sources will require additional config which is data source dependent. Each supported data source has a section below with configuration details.
- Click 'Create Space' to complete.

At the moment, data sources other than MANUAL_UPLOAD require manually re-indexing by navigating to 'Manage Documents' and clicking the 'Reindex' botton.
At the moment, data sources other than MANUAL_UPLOAD require manually re-indexing by navigating to 'Manage Documents' and clicking the 'Reindex' button.

![Admin overview create space screenshot](./../assets/admin-overview-create-space.png)

Expand All @@ -20,7 +20,7 @@ Azure blob config screen in Docq

![Azure blob config screenshot](../assets/azure-blob-config-screen.png)

To get the values you will need access to the Azure portal where the Blob container is configured. If you don't have access you will need help from your friendly IT admin or cloud infrastructure engineer that does.
To get the values you will need access to the Azure portal where the Blob container is configured. If you don't have access, you will need help from your friendly IT admin or cloud infrastructure engineer that does.

- Login to the Azure portal with a login that has sufficient access to view (or create) resources in the Azure Storage Accounts service. Blob containers live under a Storage Account.
- Navigate to 'Storage Accounts' then click on the storage account with the blob container you want to link to Docq.
Expand All @@ -29,7 +29,7 @@ To get the values you will need access to the Azure portal where the Blob contai
- **Storage Account URL**: `https://<Storage account name GOES HERE>.blob.core.windows.net` replace `<Storage account name GOES HERE>` with the value from the 'Storage account name' field in the Azure portal.
- **Blob Container Name**: paste the container name here. It's shown in Storage account > Containers in the Azure portal.
- **Credential** - there are two types supported values:
- Access Key - This option gives broad access and might not be suitable in some situations. For example if the storage account has other services and/or other containers with sensitive information.
- Access Key - This option gives broad access and might not be suitable in some situations. For example, if the storage account has other services and/or other containers with sensitive information.
- from the 'Access keys' section, key1 > Key > click the 'show' button then copy button > paste into Docq

Storage account 'Access Keys' screen in the Azure portal:
Expand All @@ -38,7 +38,7 @@ Storage account 'Access Keys' screen in the Azure portal:
## Data source: Web Scraper

- **Data Source**: `WEB_SCRAPER`
- **Website URL**: The root URL with links to pages you want to in the space. Multiple URLs can be provided as a comma separated list.
- **Website URL**: The root URL with links to pages you want to in the space. Multiple URLs can be provided as a comma-separated list.
- **Extract Template Name**: type `readthedocs.io` or `default`.
- **Include Filter Regex**: only URLs that match this regex will be scrapped. Leave blank to scrape all links. Uses Python RegEx.

Expand All @@ -47,11 +47,33 @@ Storage account 'Access Keys' screen in the Azure portal:
This one is similar to the `WEB_SCRAPER` but tuned to specifically handle knowledge bases type sites.

- **Data Source**: `KNOWLEDGE_BASE_SCRAPER`
- **Website URL**: The root URL with links to pages you want to in the space. Multiple URLs can be provided as a comma separated list.
- **Website URL**: The root URL with links to pages you want to in the space. Multiple URLs can be provided as a comma-separated list.
- **Extract Template Name**: type `GenericKnowledgeBaseExtractor`.
- **Include Filter Regex**: only URLs that match this regex will be scrapped. Leave blank to scrape all links. Uses Python RegEx.
- **Title CSS Selector**: a CSS class string that matches the element you want to pull title text from. Defaults to <h1>. The value is added as metadata in the index hence tuning results.
- **Subtitle CSS Selector** a CSS class string that matches the element you want to pull subtitle text from. Defaults to <h2>. The value is added as metadata in the index tuning results.
- **Subtitle CSS Selector**: a CSS class string that matches the element you want to pull subtitle text from. Defaults to <h2>. The value is added as metadata in the index tuning results.

## Data source: Google Drive

- **Data Source**: `GOOGLE_DRIVE`
- **Credential**: Use the `Sign in with Google` button to access your Google Drive account.
- Follow the on-screen prompts to sign in with your Google account, granting Docq read access to your Google Drive.
- The obtained credential is exclusive to the current space creation. Subsequent spaces will require a separate sign-in.
- **Select a folder**: Choose a folder from your Google Drive for indexing.
- Click the 'Select a folder' dropdown and pick the desired folder.
- Only root folders are supported for indexing; subfolders are not currently supported.
- Once the space is created, the selected folder cannot be altered. However, you can add more content to this folder and re-index it.

## Data source: OneDrive

- **Data Source**: `ONEDRIVE`
- **Credential**: Use the `Sign in with Microsoft` button to sign in with your Microsoft account.
- Follow the provided instructions to grant Docq read access to your OneDrive.
- The obtained credential is specifically for the current space. Future spaces will require a fresh sign-in process.
- **Select a folder**: Choose a folder from your OneDrive for indexing.
- Click the 'Select a folder' dropdown and choose the intended folder.
- Only root folders are supported for indexing; subfolders are not currently supported.
- Once the space is created, the selected folder cannot be altered. However, you can add more content to this folder and re-index it.

## Data source: AWS S3

Expand Down
2 changes: 2 additions & 0 deletions docs/user-guide/data-sources.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,3 +6,5 @@ Docq supports associating data to a Space is a varierty of ways. These are calle
- [Azure Blob](./config-spaces.md#data-source-azure-blob-container)
- [Web Scraper](./config-spaces.md#data-source-web-scraper)
- [Knowledgebase Scraper](./config-spaces.md#data-source-knowledgebase-scraper)
- [Google Drive](./config-spaces.md#data-source-google-drive)
- [OneDrive](./config-spaces.md#data-source-onedrive)

0 comments on commit 3102c02

Please sign in to comment.