Skip to content

Commit

Permalink
Include info about privacy in the docs (#631)
Browse files Browse the repository at this point in the history
* Add a page about privacy and organize some of the documentation
* Add notice about telemetry
* Improve copy for privacy section, link to telemetry section
  • Loading branch information
sabaimran authored Jan 29, 2024
1 parent 4fb8d5c commit 9ad44f0
Show file tree
Hide file tree
Showing 4 changed files with 54 additions and 13 deletions.
31 changes: 31 additions & 0 deletions documentation/docs/get-started/privacy_security.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
---
sidebar_position: 4
slug: /privacy
---

# Privacy

If you're using Khoj to index you personal data, it's almost certain you'll have sensitive and private information you'd like to index.

Khoj is designed to be a personal AI, so one of our cornerstone principles is to make it as privacy-friendly as possible. That's why, you can *always* choose to run Khoj on your own hardware, and never share your data outside of your device. You can generate your embeddings directly on your machine, and then use an offline chat client so that your data never leaves your machine. You'll find the instructions to [self-hosting](./setup.mdx) here.

Here's what to consider if you're using Khoj, whether self-hosted or on our cloud:
1. Some of your relevant indexed data may be included as context when you chat with Khoj. This means that it may be sent to OpenAI, if you use one of the OpenAI models.
1. We collect completely anonymized usage telemetry and send it to [PostHog](https://posthog.com/). This includes data like unique chat requests, unique search requests, unique requests to index data. Usage data is collected to help us understand how people are using Khoj, and to help us prioritize features.
- We do not log your IP address, nor upload any of your personal data to PostHog.
- You can see our telemetry aggregation code [here](https://github.com/khoj-ai/khoj/blob/master/src/khoj/routers/helpers.py#L71) and see our telemetry server [here](https://github.com/khoj-ai/khoj/blob/master/src/telemetry/telemetry.py).
- If you're self-hosting, you can opt out of telemetry by following [these instructions](./miscellaneous/telemetry).


Self-hosting isn't for everyone, so we've still taken steps to make Khoj privacy-friendly, even if you choose to use our [cloud offering](https://app.khoj.dev/login). Here's what to consider when using Khoj Cloud:
1. Your embeddings are generated by an open source model within our own dedicated endpoint [hosted on AWS with Huggingface](https://huggingface.co/inference-endpoints/dedicated). There's zero persistent memory to the Huggingface Inference endpoints (it's stateless).
1. Your embeddings and the associated raw text are stored in a secure Postgres DB in our private AWS cloud. Your data is sharded on a unique user ID. We store the raw text in your files to improve file syncing and provide context when you chat with Khoj.
1. When you use the single-sign-on option with Google, we only receive your name, a link to your profile photo, and your email address.


:::tip[Info]
Your data is yours. We do not sell your data or use it for training models. Khoj is a sustainable, open-source alternative to closed-source, commercial personal AI. We have no interest in selling your data to make a quick buck.
:::


We have lots of ideas of how to make Khoj really robust as a personal AI and cloud offering, but also trust-less and privacy-centric. Please [reach out](mailto:[email protected]) if this is important to you, and you'd like to help us build it.
16 changes: 7 additions & 9 deletions documentation/docs/get-started/setup.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,10 @@ sidebar_position: 1
# Self-Host
Learn about how to self-host Khoj on your own machine.

Benefits to self-hosting:
1. **Privacy**: Your data will never have to leave your private network. You can even use Khoj without an internet connection if deployed on your personal computer.
2. **Customization**: You can customize Khoj to your liking, from models, to host URL, to feature enablement.

```mdx-code-block
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
Expand Down Expand Up @@ -163,9 +167,9 @@ Note: To start Khoj automatically in the background use [Task scheduler](https:/

You can use our desktop executables to select file paths and folders to index. You can simply select the folders or files, and they'll be automatically uploaded to the server. Once you specify a file or file path, you don't need to update the configuration again; it will grab any data diffs dynamically over time.

**To download the latest desktop client, go to https://download.khoj.dev** and the correct executable for your OS will automatically start downloading. Once downloaded, you can configure your folders for indexing using the settings tab. To set your chat configuration, you'll have to use the web interface for the Khoj server you setup in the previous step.
**To download the latest desktop client, go to https://download.khoj.dev** and the correct executable for your OS will automatically start downloading. You can also go to https://khoj.dev/downloads to explicitly download your image of choice. Once downloaded, you can configure your folders for indexing using the settings tab. To set your chat configuration, you'll have to use the web interface for the Khoj server you setup in the previous step.

To use the desktop client, you need to go to your Khoj server's settings page (http://localhost:42110/config) and copy the API key. Then, paste it into the desktop client's settings page. Once you've done that, you can select files and folders to index.
To use the desktop client, you need to go to your Khoj server's settings page (http://localhost:42110/config) and copy the API key. Then, paste it into the desktop client's settings page. Once you've done that, you can select files and folders to index. Set the desktop client settings to use `http://127.0.0.1:42110` as the host URL.

### 3. Configure
1. Go to http://localhost:42110/server/admin and login with your admin credentials.
Expand All @@ -191,7 +195,7 @@ The optional steps below allow using Khoj from within an existing application li
[Install](/clients/emacs#setup) khoj.el

#### Setup host URL
To configure your host URL on your clients when self-hosting, use `http://127.0.0.1:42110`. This is the default value for the `KHOJ_HOST` environment variable. Note that `localhost` will not work.
To configure your host URL on your clients when self-hosting, use `http://127.0.0.1:42110`. This is the default port for the Khoj server. Note that `localhost` will not work.

### 5. Use Khoj 🚀

Expand Down Expand Up @@ -269,13 +273,7 @@ You can head to http://localhost:42110 to use the web interface. You can also us
```
- **Refer**: [Issue with Fix](https://github.com/khoj-ai/khoj/issues/82#issuecomment-1241890946) for more details

#### Search starts giving wonky results
- **Fix**: Open [/api/update?force=true](http://localhost:42110/api/update?force=true) in browser to regenerate index from scratch
- **Note**: *This is a fix for when you perceive the search results have degraded. Not if you think they've always given wonky results*

#### Khoj in Docker errors out with \"Killed\" in error message
- **Fix**: Increase RAM available to Docker Containers in Docker Settings
- **Refer**: [StackOverflow Solution](https://stackoverflow.com/a/50770267), [Configure Resources on Docker for Mac](https://docs.docker.com/desktop/mac/#resources)

#### Khoj errors out complaining about Tensors mismatch or null
- **Mitigation**: Disable `image` search using the desktop GUI
2 changes: 1 addition & 1 deletion documentation/docs/miscellaneous/telemetry.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ We don't send any personal information or any information from/about your conten

## Disable Telemetry

You can opt out of telemetry at any time. To do so,
If you're self-hosting Khoj, you can opt out of telemetry at any time. To do so,
1. Open `~/.khoj/khoj.yml`
2. Set `should-log-telemetry` to `false`
3. Save the file and restart Khoj
Expand Down
18 changes: 15 additions & 3 deletions documentation/docusaurus.config.js
Original file line number Diff line number Diff line change
Expand Up @@ -87,7 +87,7 @@ const config = {
items: [
{
href: 'https://github.com/khoj-ai/khoj',
label: '📜 Code',
label: '📖 GitHub',
position: 'right',
},
{
Expand All @@ -112,6 +112,10 @@ const config = {
label: 'Get Started',
to: '/',
},
{
label: 'Privacy',
to: '/privacy',
},
{
label: 'Features',
to: '/features/all_features',
Expand Down Expand Up @@ -145,6 +149,14 @@ const config = {
label: 'Twitter',
href: 'https://twitter.com/khoj_ai',
},
{
label: 'GitHub',
href: 'https://github.com/khoj-ai/khoj/issues',
},
{
label: 'Email',
href: 'mailto:[email protected]',
}
],
},
{
Expand All @@ -155,11 +167,11 @@ const config = {
// to: '/blog',
// },
{
label: 'Cloud',
label: 'Khoj Cloud',
href: 'https://app.khoj.dev/login',
},
{
label: 'Code',
label: 'Open Source',
href: 'https://github.com/khoj-ai/khoj',
},
{
Expand Down

0 comments on commit 9ad44f0

Please sign in to comment.