setup docs

Algue-Rythme · Mar 19, 2024 · 5f87494 · 5f87494
1 parent 75c9faf
commit 5f87494
Show file tree

Hide file tree

Showing 19 changed files with 295 additions and 206 deletions.
diff --git a/.github/workflows/python-linters.yml b/.github/workflows/python-linters.yml
@@ -0,0 +1,28 @@
+name: lip-dp linters
+
+on:
+  push:
+    branches:
+      - main
+      - release-no-advertising
+  pull_request:
+    branches:
+      - main
+      - release-no-advertising
+
+jobs:
+  checks:
+    runs-on: ubuntu-latest
+
+    steps:
+    - uses: actions/checkout@v3
+    - name: Set up Python 3.11
+      uses: actions/setup-python@v4
+      with:
+        python-version: 3.11
+    - name: Install dependencies
+      run: |
+        python -m pip install --upgrade pip
+        pip install tox
+    - name: Check lint
+      run: tox -e py311-lint
diff --git a/.github/workflows/tests.yml b/.github/workflows/tests.yml
@@ -2,9 +2,9 @@ name: tests
 
 on:
   push:
-    branches: ["release-no-advertising"]
+    branches: ["main", "release-no-advertising"]
   pull_request:
-    branches: ["release-no-advertising"]
+    branches: ["main", "release-no-advertising"]
 
 jobs:
   build-and-test:

diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -45,7 +45,7 @@ repos:
     rev: v3.0.0a5
     hooks:
       - id: pylint
-        args: [--enable=unused-import --max-line-length=100, --disable=all]
+        args: [--disable=all]
 
 
   # - repo: https://github.com/commitizen-tools/commitizen

diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -4,14 +4,14 @@ Thanks for taking the time to contribute!
 
 From opening a bug report to creating a pull request: every contribution is
 appreciated and welcome. If you're planning to implement a new feature or change
-the api please create an [issue first](https://https://github.com/deel-ai/dp-lipschitz/issues/new). This way we can ensure that your precious
+the api please create an [issue first](https://github.com/Algue-Rythme/lip-dp/issues). This way we can ensure that your precious
 work is not in vain.
 
 
 ## Setup with make
 
-- Clone the repo `git clone https://github.com/deel-ai/lipdp.git`.
-- Go to your freshly downloaded repo `cd lipdp`
+- Clone the repo `git clone git@github.com:Algue-Rythme/lip-dp.git`.
+- Go to your freshly downloaded repo `cd lip-dp`
 - Create a virtual environment and install the necessary dependencies for development:
 
   `make prepare-dev && source lipdp_dev_env/bin/activate`.
@@ -26,9 +26,8 @@ This command activate your virtual environment and launch the `tox` command.
 
 
 `tox` on the otherhand will do the following:
-- run pytest on the tests folder with python 3.6, python 3.7 and python 3.8
-> Note: If you do not have those 3 interpreters the tests would be only performs with your current interpreter
-- run pylint on the deel-datasets main files, also with python 3.6, python 3.7 and python 3.8
+- run pytest on the tests folder
+- run pylint on the deel-datasets main files
 > Note: It is possible that pylint throw false-positive errors. If the linting test failed please check first pylint output to point out the reasons.
 
 Please, make sure you run all the tests at least once before opening a pull request.
@@ -42,7 +41,7 @@ Basically, it will check that your code follow a certain number of convention. A
 
 After getting some feedback, push to your fork and submit a pull request. We
 may suggest some changes or improvements or alternatives, but for small changes
-your pull request should be accepted quickly (see [Governance policy](https://github.com/deel-ai/lipdp/blob/master/GOVERNANCE.md)).
+your pull request should be accepted quickly (see [Governance policy](https://github.com/Algue-Rythme/lip-dp/blob/release-no-advertising/GOVERNANCE.md)).
 
 Something that will increase the chance that your pull request is accepted:
 
@@ -51,4 +50,3 @@ Something that will increase the chance that your pull request is accepted:
 - Follow the existing coding style and run `make check_all` to check all files format.
 - Write a [good commit message](https://tbaggery.com/2008/04/19/a-note-about-git-commit-messages.html) (we follow a lowercase convention).
 - For a major fix/feature make sure your PR has an issue and if it doesn't, please create one. This would help discussion with the community, and polishing ideas in case of a new feature.
-
diff --git a/README.md b/README.md
@@ -1,22 +1,26 @@
 <p align="center">
-<img src="./docs/assets/lipdp_logo.png" alt="lipdp_logo" width="350"/></p>
+<img src="./docs/assets/lipdp_logo.png" alt="lipdp_logo" width="300"/></p>
 <!-- Badge section -->
 <div align="center">
     <a href="#">
-        <img src="https://img.shields.io/badge/Python-3.9|3.10|3.11-efefef">
+        <img src="https://img.shields.io/badge/Python-3.9 | 3.10 | 3.11-efefef">
     </a>
     <a href="https://github.com/Algue-Rythme/lip-dp/actions/workflows/tests.yml">
         <img alt="Tests" src="https://github.com/Algue-Rythme/lip-dp/actions/workflows/tests.yml/badge.svg?branch=release-no-advertising">
     </a>
+    <a href="https://github.com/Algue-Rythme/lip-dp/actions/workflows/python-linters.yml">
+        <img alt="Linter" src="https://github.com/Algue-Rythme/lip-dp/actions/workflows/python-linters.yml/badge.svg?branch=release-no-advertising">
+    </a>
     <a href="#">
         <img src="https://img.shields.io/badge/License-MIT-efefef">
     </a>
 </div>
-<br>
+</p>
 
 <!-- Short description of your library -->
 <p align="center">
   <b>LipDP</b> is a Python toolkit dedicated to robust and certifiable learning under privacy guarantees.  
+</p>
 
 
 This package is the code for the paper "*DP-SGD Without Clipping: The Lipschitz Neural Network Way*" by Louis Béthune, Thomas Massena, Thibaut Boissin, Aurélien Bellet, Franck Mamalet, Yannick Prudent, Corentin Friedrich, Mathieu Serrurier, David Vigouroux, published at the **International Conference on Learning Representations (ICLR 2024)**. The paper is available on [arxiv](https://arxiv.org/abs/2305.16202).   

diff --git a/deel/lipdp/__init__.py b/deel/lipdp/__init__.py
@@ -69,8 +69,6 @@
 )
 from deel.lipdp.sensitivity import (
     get_max_epochs,
-    gradient_norm_check,
-    check_layer_gradient_norm,
 )
 from deel.lipdp.utils import (
     CertifiableAUROC,

diff --git a/deel/lipdp/dynamic.py b/deel/lipdp/dynamic.py
@@ -20,6 +20,7 @@
 # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
 # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
 # SOFTWARE.
+"""Dynamic gradient clipping for differential privacy."""
 import random
 from abc import abstractmethod
 
@@ -66,9 +67,11 @@ def on_train_begin(self, logs=None):
 
     def get_gradloss(self):
         """Computes the norm of gradient of the loss with respect to the model's output.
-        
-        Warning: this method is unsafe from a privacy perspective, as the true gradient bound is computed.
-        It is meant to be used with privacy-preserving methods only, such as the ones implemented in this module.
+
+        Warning: this method is unsafe from a privacy perspective,
+            as the true gradient bound is computed.
+        It is meant to be used with privacy-preserving methods only,
+            such as the ones implemented in this module.
         """
         batch = next(iter(self.ds_train.take(1)))
         imgs, labels = batch

diff --git a/deel/lipdp/model.py b/deel/lipdp/model.py
@@ -20,6 +20,7 @@
 # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
 # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
 # SOFTWARE.
+"""Model class for differentially private training with Lipschitz constraints."""
 from dataclasses import dataclass
 
 import numpy as np

diff --git a/deel/lipdp/pipeline.py b/deel/lipdp/pipeline.py
@@ -354,9 +354,11 @@ def load_and_prepare_images_data(
         nb_samples_train=ds_info.splits["train"].num_examples,
         nb_samples_test=ds_info.splits["test"].num_examples,
         class_names=ds_info.features["label"].names,
-        nb_steps_per_epochs=ds_train.cardinality().numpy()
-        if ds_train.cardinality() > 0  # handle case cardinality return -1 (unknown)
-        else ds_info.splits["train"].num_examples / batch_size,
+        nb_steps_per_epochs=(
+            ds_train.cardinality().numpy()
+            if ds_train.cardinality() > 0  # handle case cardinality return -1 (unknown)
+            else ds_info.splits["train"].num_examples / batch_size
+        ),
         batch_size=batch_size,
         max_norm=bound_val,
     )
@@ -493,9 +495,11 @@ def prepare_tabular_data(
         nb_samples_train=x_train.shape[0],
         nb_samples_test=x_test.shape[0],
         class_names=[str(i) for i in range(nb_classes)],
-        nb_steps_per_epochs=ds_train.cardinality().numpy()
-        if ds_train.cardinality() > 0  # handle case cardinality return -1 (unknown)
-        else x_train.shape[0] / batch_size,
+        nb_steps_per_epochs=(
+            ds_train.cardinality().numpy()
+            if ds_train.cardinality() > 0  # handle case cardinality return -1 (unknown)
+            else x_train.shape[0] / batch_size
+        ),
         batch_size=batch_size,
         max_norm=bound_val,
     )

diff --git a/deel/lipdp/sensitivity.py b/deel/lipdp/sensitivity.py
@@ -25,6 +25,7 @@
 import numpy as np
 import tensorflow as tf
 
+from deel.lipdp.model import compute_gradient_bounds
 from deel.lipdp.model import get_eps_delta
 
 
@@ -91,58 +92,13 @@ def fun(epoch):
         elif error < atol:
             # This branch should never be taken if fun is a non-decreasing function of the number of epochs.
             # fun is mathematcally non-decreasing, but numerical inaccuracy can lead to this case.
-            print(f"Numerical inaccuracy with error {error:.7f} in the dichotomy search: using a conservative value.")
+            print(
+                f"Numerical inaccuracy with error {error:.7f} in the dichotomy search: using a conservative value."
+            )
             return epochs_min - 1
         else:
-            assert False, f"Numerical inaccuracy with error {error:.7f}>{atol:.3f} in the dichotomy search."
+            assert (
+                False,
+            ), f"Numerical inaccuracy with error {error:.7f}>{atol:.3f} in the dichotomy search."
 
     return epochs_max
-
-
-def gradient_norm_check(upper_bounds, model, examples):
-    """Verifies that the values of per-sample gradients on a layer never exceede a value
-    determined by the theoretical work.
-
-    Args :
-        upper_bounds: maximum gradient bounds for each layer (dictionnary of 'layers name ': 'bounds' pairs).
-        model: The model containing the layers we are interested in. Layers must only have one trainable variable.
-        examples: a batch of examples to test on.  
-    Returns :
-        Boolean value. True corresponds to upper bound has been validated.
-    """
-    activations = examples
-    var_seen = set()
-    for layer in model.layers:
-        post_activations = layer(activations, training=True)
-        assert len(layer.trainable_variables) < 2
-        if len(layer.trainable_variables) == 1:
-            assert len(layer.trainable_variables) == 1
-            train_var = layer.trainable_variables[0]
-            var_name = layer.trainable_variables[0].name
-            var_seen.add(var_name)
-            bound = upper_bounds[var_name]
-            check_layer_gradient_norm(bound, layer, activations)
-        activations = post_activations
-    for var_name in upper_bounds:
-        assert var_name in var_seen
-
-
-def check_layer_gradient_norm(S, layer, activations):
-    trainable_vars = layer.trainable_variables[0]
-    with tf.GradientTape() as tape:        
-        y_pred = layer(activations, training=True)
-        flat_pred = tf.reshape(y_pred, (y_pred.shape[0], -1))
-    jacobians = tape.jacobian(flat_pred, trainable_vars)
-    assert jacobians.shape[0] == activations.shape[0]
-    assert jacobians.shape[1] == np.prod(y_pred.shape[1:])
-    assert np.prod(jacobians.shape[2:]) == np.prod(trainable_vars.shape)
-    jacobians = tf.reshape(
-        jacobians,
-        (y_pred.shape[0], -1, np.prod(trainable_vars.shape)),
-        name="Reshaped_Gradient",
-    )
-    J_sigma = tf.linalg.svd(jacobians, full_matrices=False, compute_uv=False, name=None)
-    J_2norm = tf.reduce_max(J_sigma, axis=-1)
-    J_2norm = tf.reduce_max(J_2norm).numpy()
-    atol = 1e-5
-    return J_2norm < S+atol
diff --git a/docs/CONTRIBUTING.md b/docs/CONTRIBUTING.md
@@ -4,14 +4,14 @@ Thanks for taking the time to contribute!
 
 From opening a bug report to creating a pull request: every contribution is
 appreciated and welcome. If you're planning to implement a new feature or change
-the api please create an [issue first](https://https://github.com/deel-ai/dp-lipschitz/issues/new). This way we can ensure that your precious
+the api please create an [issue first](https://github.com/Algue-Rythme/lip-dp/issues). This way we can ensure that your precious
 work is not in vain.
 
 
 ## Setup with make
 
-- Clone the repo `git clone https://github.com/deel-ai/dp-lipschitz.git`.
-- Go to your freshly downloaded repo `cd lipdp`
+- Clone the repo `git clone git@github.com:Algue-Rythme/lip-dp.git`.
+- Go to your freshly downloaded repo `cd lip-dp`
 - Create a virtual environment and install the necessary dependencies for development:
 
   `make prepare-dev && source lipdp_dev_env/bin/activate`.
@@ -26,9 +26,8 @@ This command activate your virtual environment and launch the `tox` command.
 
 
 `tox` on the otherhand will do the following:
-- run pytest on the tests folder with python 3.6, python 3.7 and python 3.8
-> Note: If you do not have those 3 interpreters the tests would be only performs with your current interpreter
-- run pylint on the deel-datasets main files, also with python 3.6, python 3.7 and python 3.8
+- run pytest on the tests folder
+- run pylint on the deel-datasets main files
 > Note: It is possible that pylint throw false-positive errors. If the linting test failed please check first pylint output to point out the reasons.
 
 Please, make sure you run all the tests at least once before opening a pull request.
@@ -42,7 +41,7 @@ Basically, it will check that your code follow a certain number of convention. A
 
 After getting some feedback, push to your fork and submit a pull request. We
 may suggest some changes or improvements or alternatives, but for small changes
-your pull request should be accepted quickly (see [Governance policy](https://github.com/deel-ai/lipdp/blob/master/GOVERNANCE.md)).
+your pull request should be accepted quickly (see [Governance policy](https://github.com/Algue-Rythme/lip-dp/blob/release-no-advertising/GOVERNANCE.md)).
 
 Something that will increase the chance that your pull request is accepted:
 

diff --git a/docs/assets/residuals.png b/docs/assets/residuals.png