An example Python app demonstrating how to integrate Pangea's AuthN and AuthZ services into a LangChain app to filter out RAG documents based on user permissions.
-
Python v3.12 or greater.
-
pip v24.2 or uv v0.4.29.
-
A Pangea account with AuthN and AuthZ enabled.
-
An OpenAI API key.
-
A Google Drive folder containing spreadsheets
-
Note down the ID of the folder for later (see the LangChain docs for a guide on how to get the ID from the URL).
-
Each spreadsheet should be named after a user and have two rows. For example:
Alice PTO
Employee Hours Alice 25 Bob PTO
Employee Hours Bob 100
-
-
Two Google Identities (i.e. Alice and Bob)
- One user (i.e. Alice) will act as the admin and own the folder and have full access to all spreadsheets within
- The other user (i.e. Bob) will act as an employee with read access to the folder and their single spreadsheet
-
A Google Cloud project with the Google Drive API and Google Sheets API enabled.
-
A Google service account:
-
In your Google Cloud project, go to IAM & Admin > Service Accounts (using the navigation menu in the top left) and create a new service account.
-
On the service accounts page, select your new service account, click KEYS, and add a new key. Save the key as
credentials.json
in your Python app folder.Your
credentials.json
file should look similar to this:{ "type": "service_account", "project_id": "my-project", "private_key_id": "l3JYno7aIrRSZkAGFHSNPcjYS6lrpL1UnqbkWW1b", "private_key": "-----BEGIN PRIVATE KEY-----\n[...]\n-----END PRIVATE KEY-----\n", "client_email": "[email protected]", "client_id": "1234567890", "auth_uri": "https://accounts.google.com/o/oauth2/auth", "token_uri": "https://oauth2.googleapis.com/token", "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs", "client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/my-service-account%40my-project.iam.gserviceaccount.com", "universe_domain": "googleapis.com" }
-
Share the Google Drive folder with the service account’s email, granting it Editor access so it can query file permissions as needed.
Bonus: see langchain-python-service-authn for an example of how to store such a credential more securely in Pangea Vault instead.
-
After activating AuthN:
- Under AuthN > General> Signup Settings, enable "Allow Signups". This way users won't need to be manually added.
- Under AuthN > General > Redirect (Callback) Settings,
add
http://localhost:3000
as a redirect. - Under AuthN > General > Social (OAuth), enable Google.
- Under AuthN > Overview, note the "Client Token" and "Hosted Login" values for later.
This app assumes that the authorization schema is set to the built-in File Drive schema.
Under AuthZ > Overview, note the "Default Token" value for later.
git clone https://github.com/pangeacyber/authz-rag-app.git
cd authz-rag-app
If using pip:
python -m venv .venv
source .venv/bin/activate
pip install .
Or, if using uv:
uv sync
source .venv/bin/activate
Usage: python -m authz_rag_app [OPTIONS]
Options:
--google-drive-folder-id TEXT The ID of the Google Drive folder to fetch
documents from. [required]
--authn-client-token TEXT Pangea AuthN Client API token. May also be
set via the `PANGEA_AUTHN_CLIENT_TOKEN`
environment variable. [required]
--authn-hosted-login TEXT Pangea AuthN Hosted Login URL. May also be
set via the `PANGEA_AUTHN_HOSTED_LOGIN`
environment variable. [required]
--authz-token SECRET Pangea AuthZ API token. May also be set via
the `PANGEA_AUTHZ_TOKEN` environment
variable. [required]
--pangea-domain TEXT Pangea API domain. May also be set via the
`PANGEA_DOMAIN` environment variable.
[default: aws.us.pangea.cloud; required]
--model TEXT OpenAI model. [default: gpt-4o-mini;
required]
--openai-api-key SECRET OpenAI API key. May also be set via the
`OPENAI_API_KEY` environment variable.
[required]
--help Show this message and exit.
-
Set the following environments variables (or pass the values as command-line arguments):
PANGEA_AUTHN_CLIENT_TOKEN
PANGEA_AUTHN_HOSTED_LOGIN
PANGEA_AUTHZ_TOKEN
OPENAI_API_KEY
-
Run the app, passing the ID of the Google Drive folder that was set up earlier (this sample uses a fake value):
python -m authz_rag_app --google-drive-folder-id 1yucgL9WGgWZdM1TOuKkeghlPizuzMYb5
-
A new tab will open in the system's default web browser where one can perform login via Google. Log in as the user who has Editor access to the Google Drive folder. The tab may be closed once the login flow is complete.
-
Another tab will open to login via Pangea AuthN. Select the "Continue with Google" option and log in again. The Google user selected here does not need to be the same as the one used in the previous step, but if that user is picked then all documents will be available in the subsequent steps, which would not illustrate any access control. Instead, choose one of the accounts that only has Reader access to their own PTO spreadsheet. Again, the tab may be closed once the login flow is complete.
-
Then a chat prompt will appear:
Ask a question about PTO availability:
-
Whoever logged in during step 4 can ask about their PTO balance. For example, if Alice has 21 days remaining according to their Google Sheet, and they logged in above, they might do:
Ask a question about PTO availability: How many PTO days do I have left? You have 21 PTO days left.
-
But if they try to ask for another employee's balance, like Bob's, the answer will not be disclosed:
Ask a question about PTO availability: How much PTO does Bob have left? The context does not provide information about Bob's Paid Time Off (PTO) balance. Therefore, I cannot determine how much PTO Bob has left. You may not be authorized to know the answer.
- After login, the Google token is stored in
token.json
. If you encounter "access denied" errors, deletetoken.json
before you try again. - The file authorization policy is cached in Pangea AuthZ. If you change your Google Drive folder or create new files, visit Pangea AuthZ Settings to reset your authorization schema.