Skip to content

Commit

Permalink
docs[patch]: microsoft platform page update (#14476)
Browse files Browse the repository at this point in the history
Added `presidio` and `OneNote` references to `microsoft.mdx`; added link
and description to the `presidio` notebook

---------

Co-authored-by: Erick Friis <[email protected]>
  • Loading branch information
leo-gan and efriis authored Dec 9, 2023
1 parent 84a57f5 commit 2fa8173
Show file tree
Hide file tree
Showing 2 changed files with 38 additions and 1 deletion.
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,8 @@
"\n",
"[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/langchain-ai/langchain/blob/master/docs/docs/guides/privacy/presidio_data_anonymization/index.ipynb)\n",
"\n",
">[Presidio](https://microsoft.github.io/presidio/) (Origin from Latin praesidium ‘protection, garrison’) helps to ensure sensitive data is properly managed and governed. It provides fast identification and anonymization modules for private entities in text and images such as credit card numbers, names, locations, social security numbers, bitcoin wallets, US phone numbers, financial data and more.\n",
"\n",
"## Use case\n",
"\n",
"Data anonymization is crucial before passing information to a language model like GPT-4 because it helps protect privacy and maintain confidentiality. If data is not anonymized, sensitive information such as names, addresses, contact numbers, or other identifiers linked to specific individuals could potentially be learned and misused. Hence, by obscuring or removing this personally identifiable information (PII), data can be used freely without compromising individuals' privacy rights or breaching data protection laws and regulations.\n",
Expand Down Expand Up @@ -530,7 +532,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.4"
"version": "3.10.12"
}
},
"nbformat": 4,
Expand Down
35 changes: 35 additions & 0 deletions docs/docs/integrations/platforms/microsoft.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -151,6 +151,20 @@ See a [usage example](/docs/integrations/document_loaders/microsoft_powerpoint).
from langchain.document_loaders import UnstructuredPowerPointLoader
```

### Microsoft OneNote

First, let's install dependencies:

```bash
pip install bs4 msal
```

See a [usage example](/docs/integrations/document_loaders/onenote).

```python
from langchain.document_loaders.onenote import OneNoteLoader
```


## Vector stores

Expand Down Expand Up @@ -259,4 +273,25 @@ from langchain.agents.agent_toolkits import PowerBIToolkit
from langchain.utilities.powerbi import PowerBIDataset
```

## More

### Microsoft Presidio

>[Presidio](https://microsoft.github.io/presidio/) (Origin from Latin praesidium ‘protection, garrison’)
> helps to ensure sensitive data is properly managed and governed. It provides fast identification and
> anonymization modules for private entities in text and images such as credit card numbers, names,
> locations, social security numbers, bitcoin wallets, US phone numbers, financial data and more.
First, you need to install several python packages and download a `SpaCy` model.

```bash
pip install langchain-experimental openai presidio-analyzer presidio-anonymizer spacy Faker
python -m spacy download en_core_web_lg
```

See [usage examples](/docs/guides/privacy/presidio_data_anonymization/).

```python
from langchain_experimental.data_anonymizer import PresidioAnonymizer, PresidioReversibleAnonymizer
```

0 comments on commit 2fa8173

Please sign in to comment.