-
Notifications
You must be signed in to change notification settings - Fork 150
Commit
- Loading branch information
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,116 @@ | ||
# Working with Encrypted DataFrames | ||
This comment has been minimized.
Sorry, something went wrong. |
||
|
||
Concrete ML builds upon the pandas data-frame functionality by introducing the capability to construct and perform operations on encrypted data-frames using FHE. This API ensures data scientists can leverage well-known pandas-like operations while maintaining privacy throughout the whole process. | ||
This comment has been minimized.
Sorry, something went wrong.
yuxizama
Contributor
|
||
|
||
Encrypted data-frames are a storage format for encrypted tabular data and they can be exchanged with third-parties without security risks. | ||
This comment has been minimized.
Sorry, something went wrong.
yuxizama
Contributor
|
||
|
||
Potential applications include: | ||
|
||
- Encrypted storage of tabular datasets | ||
- Joint data analysis efforts between multiple parties | ||
- Data preparation steps before machine learning tasks, such as inference or training | ||
- Secure outsourcing of data analysis to untrusted third parties | ||
|
||
## Encrypt and Decrypt a DataFrame | ||
This comment has been minimized.
Sorry, something went wrong. |
||
|
||
To encrypt a pandas `DataFrame`, you must construct a `ClientEngine` which manages keys. Then call the `encrypt_from_pandas` function: | ||
|
||
```python | ||
from concrete.ml.pandas import ClientEngine | ||
from io import StringIO | ||
import pandas | ||
|
||
data_left = """index,total_bill,tip,sex,smoker | ||
1,12.54,2.5,Male,No | ||
2,11.17,1.5,Female,No | ||
3,20.29,2.75,Female,No | ||
""" | ||
|
||
# Load your pandas DataFrame | ||
df = pandas.read_csv(StringIO(data_left)) | ||
|
||
# Obtain client object | ||
client = ClientEngine(keys_path="my_keys") | ||
|
||
# Encrypt the DataFrame | ||
df_encrypted = client.encrypt_from_pandas(df) | ||
|
||
# Decrypt the DataFrame to produce a pandas DataFrame | ||
df_decrypted = client.decrypt_to_pandas(df_encrypted) | ||
``` | ||
|
||
## Supported Data Types and Schema Definition | ||
|
||
Concrete ML's encrypted `DataFrame` operations support a specific set of data types: | ||
|
||
- **Integer**: Integers are supported within a specific range determined by the encryption scheme's quantization parameters. Default range is 1 to 15. 0 being used for the `NaN`. Values outside this range will cause a `ValueError` to be raised during the pre-processing stage. | ||
This comment has been minimized.
Sorry, something went wrong.
yuxizama
Contributor
|
||
- **Quantized Float**: Floating-point numbers are quantized to integers within the supported range. This is achieved by computing a scale and zero point for each column, which are used to map the floating-point numbers to the quantized integer space. | ||
This comment has been minimized.
Sorry, something went wrong.
yuxizama
Contributor
|
||
- **String Enum**: String columns are mapped to integers starting from 1. This mapping is stored and later used for de-quantization. If the number of unique strings exceeds 15, a `ValueError` is raised. | ||
|
||
## Supported Operations on Encrypted Data-frames | ||
This comment has been minimized.
Sorry, something went wrong. |
||
|
||
> **Outsourced execution**: The merge operation on Encrypted DataFrames can be **securely** performed on a third-party server. This means that the server can execute the merge without ever having access to the unencrypted data. The server only requires the encrypted DataFrames. | ||
Encrypted DataFrames support a subset of operations that are available for pandas DataFrames. The following operations are currently supported: | ||
This comment has been minimized.
Sorry, something went wrong.
yuxizama
Contributor
|
||
|
||
- `merge`: left or right join two data-frames | ||
This comment has been minimized.
Sorry, something went wrong. |
||
|
||
<!--pytest-codeblocks:cont--> | ||
|
||
```python | ||
df_right = """index,day,time,size | ||
2,Thur,Lunch,2 | ||
5,Sat,Dinner,3 | ||
9,Sun,Dinner,2""" | ||
|
||
# Encrypt the DataFrame | ||
df_encrypted2 = client.encrypt_from_pandas(pandas.read_csv(StringIO(df_right))) | ||
|
||
df_encrypted_merged = df_encrypted.merge(df_encrypted2, how="left", on="index") | ||
``` | ||
|
||
## Serialization of Encrypted Data-frames | ||
This comment has been minimized.
Sorry, something went wrong. |
||
|
||
Encrypted `DataFrame` objects can be serialized to a file format for storage or transfer. When serialized, they contain the encrypted data and [evaluation keys](../getting-started/concepts.md#cryptography-concepts) necessary to perform computations. | ||
This comment has been minimized.
Sorry, something went wrong.
yuxizama
Contributor
|
||
|
||
> **Security**: Serialized data-frames do not contain any secret keys. The data-frames can be exchanged with any third-party without any risk. | ||
This comment has been minimized.
Sorry, something went wrong.
yuxizama
Contributor
|
||
### Saving and loading Data-frames | ||
|
||
To save or load an encrypted `DataFrame` from a file, use the following commands: | ||
|
||
<!--pytest-codeblocks:cont--> | ||
|
||
```python | ||
from concrete.ml.pandas import load_encrypted_dataframe | ||
|
||
# Save | ||
df_encrypted_merged.save("df_encrypted_merged") | ||
|
||
# Load | ||
df_encrypted_merged = load_encrypted_dataframe("df_encrypted_merged") | ||
|
||
# Decrypt the DataFrame | ||
df_decrypted = client.decrypt_to_pandas(df_encrypted) | ||
``` | ||
|
||
## Error Handling | ||
This comment has been minimized.
Sorry, something went wrong. |
||
|
||
The library is designed to raise specific errors when encountering issues during the pre-processing and post-processing stages: | ||
This comment has been minimized.
Sorry, something went wrong.
yuxizama
Contributor
|
||
|
||
- `ValueError`: Raised when a column contains values outside the allowed range for integers, when there are too many unique strings, or when encountering an unsupported data type. Raised also when an operation is attempted on a data type that is not supported by the operation. | ||
This comment has been minimized.
Sorry, something went wrong.
yuxizama
Contributor
|
||
|
||
## Example Workflow | ||
This comment has been minimized.
Sorry, something went wrong. |
||
|
||
An example workflow where two clients encrypt two `DataFrame` objects, perform a merge operation on the server side, and then decrypt the results is available in the notebook [encrypted_pandas.ipynb](../advanced_examples/EncryptedPandas.ipynb). | ||
This comment has been minimized.
Sorry, something went wrong.
yuxizama
Contributor
|
||
|
||
## Current Limitations | ||
This comment has been minimized.
Sorry, something went wrong. |
||
|
||
While this API offers a new secure way to work on remotely stored and encrypted data, it has some strong limitations at the moment: | ||
This comment has been minimized.
Sorry, something went wrong.
yuxizama
Contributor
|
||
|
||
- **Precision of Values**: The precision for numerical values is limited to 4 bits. | ||
This comment has been minimized.
Sorry, something went wrong. |
||
- **Supported Operations**: The `merge` operation is the only one available. | ||
This comment has been minimized.
Sorry, something went wrong. |
||
- **Index Handling**: Index values are not preserved; users should move any relevant data from the index to a dedicated new column before encrypting. | ||
This comment has been minimized.
Sorry, something went wrong. |
||
- **Integer Range**: The range of integers that can be encrypted is between 1 and 15. | ||
This comment has been minimized.
Sorry, something went wrong. |
||
- **Uniqueness for `merge`**: The `merge` operation requires that the columns to merge on contain unique values. Currently this means that data-frames are limited to 15 rows. | ||
This comment has been minimized.
Sorry, something went wrong. |
||
- **Metadata Security**: Column names and the mapping of strings to integers are not encrypted and are sent to the server in clear text. | ||
This comment has been minimized.
Sorry, something went wrong. |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2,7 +2,11 @@ | |
|
||
<figure><img src="../.gitbook/assets/doc_header_CML.png" alt=""><figcaption></figcaption></figure> | ||
|
||
Concrete ML is an open source, privacy-preserving, machine learning framework based on Fully Homomorphic Encryption (FHE). It enables data scientists without any prior knowledge of cryptography to automatically turn machine learning models into their FHE equivalent, using familiar APIs from scikit-learn and PyTorch (see how it looks for [linear models](../built-in-models/linear.md), [tree-based models](../built-in-models/tree.md), and [neural networks](../built-in-models/neural-networks.md)). Concrete ML supports converting models for inference with FHE but can also [train some models](../built-in-models/training.md) on encrypted data. | ||
Concrete ML is an open source, privacy-preserving, machine learning framework based on Fully Homomorphic Encryption (FHE). It enables data scientists without any prior knowledge of cryptography to: | ||
This comment has been minimized.
Sorry, something went wrong.
yuxizama
Contributor
|
||
|
||
- automatically turn machine learning models into their FHE equivalent, using familiar APIs from scikit-learn and PyTorch (see how this works for [linear models](../built-in-models/linear.md), [tree-based models](../built-in-models/tree.md), and [neural networks](../built-in-models/neural-networks.md)). | ||
This comment has been minimized.
Sorry, something went wrong. |
||
- [train models](../built-in-models/training.md) on encrypted data. | ||
- [pre-process encrypted data](../built-in-models/encrypted_dataframe.md) through a data-frame paradigm | ||
This comment has been minimized.
Sorry, something went wrong. |
||
|
||
Fully Homomorphic Encryption is an encryption technique that allows computing directly on encrypted data, without needing to decrypt it. With FHE, you can build private-by-design applications without compromising on features. You can learn more about FHE in [this introduction](https://www.zama.ai/post/tfhe-deep-dive-part-1) or by joining the [FHE.org](https://fhe.org) community. | ||
|
||
|
Working with encrypted DataFrames