From 0825a62185912072c4c39e35a7f0800f40f74e26 Mon Sep 17 00:00:00 2001
From: Jeremy Sadler <53983960+jezsadler@users.noreply.github.com>
Date: Fri, 27 Sep 2024 16:54:12 -0700
Subject: [PATCH] Squashed commit of the following:
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

commit 321a2e213df0a4707184595c399aa52e9bdad69b
Author: Jiří Němeček <accounts@nemecekjiri.cz>
Date:   Sat Aug 24 19:15:47 2024 +0200

    Fixing 404 errors of links to notebooks in the documentation (#143)

    I assume that the notebooks have been moved, but the documentation links
    did not reflect that

    **Legal Acknowledgement**\
    By contributing to this software project, I agree my contributions are
    submitted under the BSD license.
    I represent I am authorized to make the contributions and grant the
    license.
    If my employer has rights to intellectual property that includes these
    contributions,
    I represent that I have received permission to make contributions and
    grant the required license on behalf of that employer.

commit caebfc411a4e3bb43db57cf5c2082e2c7e5e41d0
Author: Andrew Lee <andrew.lee94@gmail.com>
Date:   Thu Aug 22 13:28:24 2024 -0400

    Replace _BlockData with BlockData (#144)

    Pyomo recently made ComponentData classes public
    (https://github.com/Pyomo/pyomo/pull/3221) which will be part of the
    upcoming release. Currently, this causes the following error to occur in
    OMLT:

    ```
    TypeError: metaclass conflict: the metaclass of a derived class must be a (non-strict) subclass of the metaclasses of all its bases
    ```

    The Pyomo team is working to try to address this issue, however OMLT
    should update its code to address this as otherwise deprecation warnings
    will be emitted when using the old class names.

    The fix is to replace all instances of `_BlockData` with `BlockData`
    (just removing the underscore) - this applies to any other instance of
    Pyomo component data objects as well (although I could only find 2
    instances of these in the OMLT code).

    **Legal Acknowledgement**\
    By contributing to this software project, I agree my contributions are
    submitted under the BSD license.
    I represent I am authorized to make the contributions and grant the
    license.
    If my employer has rights to intellectual property that includes these
    contributions,
    I represent that I have received permission to make contributions and
    grant the required license on behalf of that employer.

    Co-authored-by: jalving <jhjalving@gmail.com>

commit c6d274fbc2ce078827dbabe1e38efeec952eb26d
Author: Emma Johnson <12833636+emma58@users.noreply.github.com>
Date:   Thu Aug 22 10:56:10 2024 -0400

    Add tolerance to enforce strict inequalities in linear tree formulations (#163)

    This PR adds a tolerance at which to enforce ``strict'' inequalities in
    linear model trees: That is, the right branch will require that the
    feature value be greater than or equal to the bound plus this tolerance
    (epsilon). This means that users can tune epsilon in order to ensure
    that the MIP solution will match the tree prediction.

    Additionally, the PR simplifies the implementation of the hybrid bigm
    linear tree formulation by using two modern pyomo.gdp transformations.
    This does mean that the linear tree formulations will rely on
    pyomo>=6.7.1 though, if that's okay.

    **Legal Acknowledgement**\
    By contributing to this software project, I agree my contributions are
    submitted under the BSD license.
    I represent I am authorized to make the contributions and grant the
    license.
    If my employer has rights to intellectual property that includes these
    contributions,
    I represent that I have received permission to make contributions and
    grant the required license on behalf of that employer.

    ---------

    Co-authored-by: Emma Johnson <emma58@users.noreply.github.com>

commit d43643a474bca8da9fbfdf53a83cb1a51e4f3780
Author: Lukas Turcani <lukasturcani93@gmail.com>
Date:   Tue Aug 20 23:53:51 2024 +0100

    Clean up package boilerplate (#149)

    This PR does a couple of things to clean up the boilerplate related to
    packaging OMLT, see sections below for detailed explanations of the
    changes.

    * Remove `setup.cfg` , `setup.py`, `docs/requirements.txt`, `tox.ini` in
    favour of `pyproject.toml`.
    * Place `conda` requirements into `environment.yml`
    * Create new workflows `tests.yml` and `publish_release.yml`
    * Add quality checks using `ruff`, `mypy`, `doctest`
    * Use `just` for developer experience
    * Updated the `Development` section of `README`  to talk about `just`
    * Clean up `conf.py`
    * Move `pull_request_template.md`
    * Allow publishing of package to pypi by pushing a new version tag

    # Other comments

    * consider internal package structure
    * force squash merge of PRs - this keeps git history for the `main`
    branch nice and clean

    # Using `pyproject.toml`

    `pyrpoject.toml` is the simplest way to provide package metadata for a
    Python package. It is easy to read and also provides sections for
    configurating tools such as `pytest`, `ruff` and `mypy` all in one
    place. It works seamlessly with the modern Python ecosystem.

    I set up `pyproject.toml` to automactically detect the version of the
    code from git tags. No need to duplicate version numbers across the
    repo. Just add a new tag and everything will be updated. In addition,
    when a new git tag is pushed to the GitHub repo, the new
    `publish_release` workflow will be triggered and a new PYPI version
    released. (See more on this below).

    I also set it up so that the version is automatically added to a file
    called `src/omlt/_version.py` which holds the `__version__` variable.
    this file is autogenerated and therefore added to `.gitignore`. The
    `__version__` veriable is then re-exported in `src/omlt/__init__.py` so
    that our users have access to it.

    I tried to perserve all the information stored in the `setup.cfg` and
    other deleted files -- let me know if there is something i missed!

    ## Optional dependencies

    The `pyproject.toml` file allows the creation of optional dependencies.
    For example, our users can install

    ```bash
    pip install omlt[keras]
    # or
    pip install omlt[torch]
    # or
    pip install omlt[linear-tree,keras-gpu]
    ```
    Ofc any combination of optional dependencies is valid too. This allows
    our users to install the dependencies specific to their use case. Note
    that:

    * I made `onnx` and `onnxruntime` a required dependency because from my
    understanding it is almost always used
    * I added an optinoal dependency set called `dev` which developers can
    use to install all developer tools and all dependencies -- you need this
    to run all the tests for example
    * There is also `dev-gpu` which installs the GPU version of tensorflow
    in case the developer has a GPU

    The available optional dependencies are:

    * `linear-tree`, installs the linear tree dependency
    * `keras`, installs tensorflow and keras
    * `keras-gpu`, installs tensorflow for the gpu and keras
    * `torch`, installs torch and torch geometric
    * `dev-tools` - this is not to be used directly but allows easy re-use
    of dev tools in other optional dependencies, namely dev and dev-gpu
    * `docs` - installs dependencies required to compile docs
    * `dev` - dependecies needed for developing the project, such tooling
    * `dev-gpu` - same as dev but installed with gpu support

    Our documentation probably needs to be updated to tell users they wanna
    install omlt with some combination of `linear-tree`, `keras`,
    `keras-gpu`, `torch` optional dependencies depending on what features of
    the package they are using

    # Quality checks with `ruff`, `mypy` and `doctest`

    I've enabled `ruff`, `mypy` and `doctest`. Currently there are no
    doctests, but its good to have it set up so that it runs in case any are
    added in the future.

    Both `ruff` and `mypy` are failing because there are a number of things
    which need to fixed. For both `ruff` and `mypy` I have disabled some
    checks which it would be good to enable eventually but are probably a
    fair amount of work to fix -- these have comments in `pyproject.toml`.
    The remaining failing checks are ones which I would reccomend fixing
    ASAP. There's two approaches, merge now and fix these errors later. Or
    keep a separate branch where these are incrementally fixed. Up to you to
    decide what you prefer.

    I told ruff to check for `google` style docstrings. I think these are
    the best because they have good readbility and work the best with type
    hints in my opinion.

    # Using `just` instead of `tox`

    https://github.com/casey/just is a simple command runner. It allows the
    developers to define and re-use common operations, for example I can
    define a `check` recipe and then run

    ```bash
    just check
    ```

    in my command line and it will run all the tests. The beauty of this is
    that `just` is extremely simple. If you read the file its basically a
    sequence of bash instructions for each recipe. This makes the `recipes`
    really transparent, and easy to understand, and works as
    code-as-documentation. Users can just read the recipe and run the
    commands one by one to get the same effect without having `just`
    installed. There is no magic which helps with debugging issues. It's
    also language agnostic. `just` comes as a small stand-alone binary,
    which makes it a very non-intrusive tool to have on your computer that
    does not need any dependencies.

    The downside is that it does not provide automatic management for Python
    environments, which I belive tox does provide. The other side of this is
    that we allow developers to use their favorite tools for managing venvs
    rather than proscribing certain tools for this repo. (the difference
    with `just` being that it is essentially optional tool and also serving
    as documentation)

    I may be overly opinionated on this one, so feel free to push back.

    # Cleaning up `docs/conf.py`

    I removed a bunch of the commented out code. This makes it easier to see
    what the configuration is and also prevents the commented out options
    from becoming out of date when a new release of sphinx is made.

    # Moving `pull_request_template.md`

    I moved this into the `.github` folder because it is GitHub
    configuration. Very optional, but makes more sense to me.

    # `readthedocs` automated action

    this guide
    https://docs.readthedocs.io/en/stable/guides/pull-requests.html shows
    how to set it up. requires admin permissions on readthedocs -- can jump
    on a call to help with this

    # publishing with to `PYPI` with a git tag

    for this an API key for PYPI needs to be created and added to the repos
    secrets -- can jump on a call to help with this

    # consider `_internal` package structure

    One way to make it easier to manage private vs public code in a
    repository is to create an `_internal` folder where all the code goes.
    This way all code can be shared easily and moved between modules and its
    by default private, so changes to internal code does not break users.
    Public modules then just re-export code in the `_internal` submodules.
    You can see an example of this structure here
    https://github.com/lukasturcani/stk. Not a huge issue but I find it very
    helpful for managing what things are actually exposed to users the
    code-base grows.

    **Legal Acknowledgement**\
    By contributing to this software project, I agree my contributions are
    submitted under the BSD license.
    I represent I am authorized to make the contributions and grant the
    license.
    If my employer has rights to intellectual property that includes these
    contributions,
    I represent that I have received permission to make contributions and
    grant the required license on behalf of that employer.

    ---------

    Co-authored-by: Jeremy Sadler <53983960+jezsadler@users.noreply.github.com>
---
 .coveragerc                                   |  28 --
 docs/api_doc/omlt.block.rst                   |   2 +-
 docs/notebooks.rst                            |   4 +-
 .../notebooks/data/build_sin_quadratic_csv.py |   4 +-
 .../auto-thermal-reformer-relu.ipynb          | 108 +++--
 .../neuralnet/auto-thermal-reformer.ipynb     | 108 +++--
 docs/notebooks/neuralnet/build_network.ipynb  |  42 +-
 .../graph_neural_network_formulation.ipynb    |  19 +-
 docs/notebooks/neuralnet/import_network.ipynb |  60 +--
 docs/notebooks/neuralnet/index_handling.ipynb |   3 +-
 .../mnist_example_convolutional.ipynb         | 159 +++----
 .../neuralnet/mnist_example_dense.ipynb       | 161 ++++----
 .../neural_network_formulations.ipynb         | 389 +++++++++++-------
 docs/notebooks/trees/bo_with_trees.ipynb      | 168 ++++----
 .../trees/linear_tree_formulations.ipynb      | 279 +++++++------
 pyproject.toml                                |  52 ++-
 src/omlt/block.py                             |   5 +-
 src/omlt/gbt/gbt_formulation.py               |  60 ++-
 src/omlt/io/__init__.py                       |  11 +-
 src/omlt/io/onnx_parser.py                    |  19 +-
 .../torch_geometric/build_gnn_formulation.py  |   4 +-
 .../torch_geometric/torch_geometric_reader.py |  53 ++-
 src/omlt/linear_tree/lt_definition.py         |   4 +-
 src/omlt/linear_tree/lt_formulation.py        | 124 ++----
 src/omlt/neuralnet/activations/__init__.py    |   2 +
 src/omlt/neuralnet/activations/relu.py        |   2 +-
 src/omlt/neuralnet/layer.py                   |  21 +-
 src/omlt/neuralnet/layers/full_space.py       |  28 +-
 src/omlt/neuralnet/layers/partition_based.py  |   4 +-
 src/omlt/neuralnet/nn_formulation.py          |  18 +-
 src/omlt/scaling.py                           |   6 +-
 tests/io/test_onnx_parser.py                  |   5 +-
 tests/io/test_torch_geometric.py              |  12 +-
 tests/linear_tree/test_lt_formulation.py      |  62 ++-
 tests/neuralnet/test_network_definition.py    |   2 -
 tests/neuralnet/test_nn_formulation.py        |  31 +-
 tests/neuralnet/test_onnx.py                  |   1 +
 tests/neuralnet/train_keras_models.py         |  38 +-
 tests/test_block.py                           |  12 +-
 tests/test_formulation.py                     |   2 +
 40 files changed, 1155 insertions(+), 957 deletions(-)
 delete mode 100644 .coveragerc

diff --git a/.coveragerc b/.coveragerc
deleted file mode 100644
index e21d4ef1..00000000
--- a/.coveragerc
+++ /dev/null
@@ -1,28 +0,0 @@
-# .coveragerc to control coverage.py
-[run]
-branch = True
-source = optml
-# omit = bad_file.py
-
-[paths]
-source =
-    src/
-    */site-packages/
-
-[report]
-# Regexes for lines to exclude from consideration
-exclude_lines =
-    # Have to re-enable the standard pragma
-    pragma: no cover
-
-    # Don't complain about missing debug-only code:
-    def __repr__
-    if self\.debug
-
-    # Don't complain if tests don't hit defensive assertion code:
-    raise AssertionError
-    raise NotImplementedError
-
-    # Don't complain if non-runnable code isn't run:
-    if 0:
-    if __name__ == .__main__.:
diff --git a/docs/api_doc/omlt.block.rst b/docs/api_doc/omlt.block.rst
index bb111700..4824f793 100644
--- a/docs/api_doc/omlt.block.rst
+++ b/docs/api_doc/omlt.block.rst
@@ -8,7 +8,7 @@ OMLT Block
   :show-inheritance:
 
 .. note::
-	`OmltBlock` is the name used to declare the custom Pyomo block which is exposed to the user. The block functionality is given by `OmltBlockData` which inherits from Pyomo `_BlockData`.
+	`OmltBlock` is the name used to declare the custom Pyomo block which is exposed to the user. The block functionality is given by `OmltBlockData` which inherits from Pyomo `BlockData`.
 
 .. autoclass:: omlt.block.OmltBlockData
   :members:
diff --git a/docs/notebooks.rst b/docs/notebooks.rst
index f7da92f4..ae587d87 100644
--- a/docs/notebooks.rst
+++ b/docs/notebooks.rst
@@ -14,7 +14,7 @@ The first set of notebooks demonstrates the basic mechanics of OMLT and shows ho
 
 * `index_handling.ipynb <https://github.com/cog-imperial/OMLT/blob/main/docs/notebooks/neuralnet/index_handling.ipynb>`_ shows how to use `IndexMapper` to handle the mappings between indexes.
 
-* `bo_with_trees.ipynb <https://github.com/cog-imperial/OMLT/blob/main/docs/notebooks/bo_with_trees.ipynb>`_ incorporates gradient-boosted trees into a Bayesian optimization loop to optimize the Rosenbrock function.
+* `bo_with_trees.ipynb <https://github.com/cog-imperial/OMLT/blob/main/docs/notebooks/trees/bo_with_trees.ipynb>`_ incorporates gradient-boosted trees into a Bayesian optimization loop to optimize the Rosenbrock function.
 
 * `linear_tree_formulations.ipynb <https://github.com/cog-imperial/OMLT/blob/main/docs/notebooks/trees/linear_tree_formulations.ipynb>`_ showcases the different linear model decision tree formulations available in OMLT.
 
@@ -24,7 +24,7 @@ The second set of notebooks gives application-specific examples:
 
 * `mnist_example_convolutional.ipynb <https://github.com/cog-imperial/OMLT/blob/main/docs/notebooks/neuralnet/mnist_example_convolutional.ipynb>`_ trains a convolutional neural network on MNIST and uses OMLT to find adversarial examples.
 
-* `graph_neural_network_formulation.ipynb <https://github.com/cog-imperial/OMLT/blob/main/docs/notebooks/graph_neural_network_formulation.ipynb>`_ transforms graph neural networks into OMLT and builds formulation to solve optimization problems.
+* `graph_neural_network_formulation.ipynb <https://github.com/cog-imperial/OMLT/blob/main/docs/notebooks/neuralnet/graph_neural_network_formulation.ipynb>`_ transforms graph neural networks into OMLT and builds formulation to solve optimization problems.
 
 * `auto-thermal-reformer.ipynb <https://github.com/cog-imperial/OMLT/blob/main/docs/notebooks/neuralnet/auto-thermal-reformer.ipynb>`_ develops a neural network surrogate (using sigmoid activations) with data from a process model built using `IDAES-PSE <https://github.com/IDAES/idaes-pse>`_.
 
diff --git a/docs/notebooks/data/build_sin_quadratic_csv.py b/docs/notebooks/data/build_sin_quadratic_csv.py
index 72e6c554..261525eb 100644
--- a/docs/notebooks/data/build_sin_quadratic_csv.py
+++ b/docs/notebooks/data/build_sin_quadratic_csv.py
@@ -9,9 +9,7 @@
 rng = np.random.default_rng()
 sin_quads = pd.DataFrame(x, columns=["x"])
 sin_quads["y"] = (
-    np.sin(w * x)
-    + x**2
-    + np.array([rng.uniform() * 0.1 for _ in range(n_samples)])
+    np.sin(w * x) + x**2 + np.array([rng.uniform() * 0.1 for _ in range(n_samples)])
 )
 
 plt.plot(sin_quads["x"], sin_quads["y"])
diff --git a/docs/notebooks/neuralnet/auto-thermal-reformer-relu.ipynb b/docs/notebooks/neuralnet/auto-thermal-reformer-relu.ipynb
index 78c4e1a9..f49a771e 100644
--- a/docs/notebooks/neuralnet/auto-thermal-reformer-relu.ipynb
+++ b/docs/notebooks/neuralnet/auto-thermal-reformer-relu.ipynb
@@ -78,19 +78,20 @@
    ],
    "source": [
     "import os\n",
-    "os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2' # suppress CUDA warnings from tensorflow\n",
+    "\n",
+    "os.environ[\"TF_CPP_MIN_LOG_LEVEL\"] = \"2\"  # suppress CUDA warnings from tensorflow\n",
     "\n",
     "# import the necessary packages\n",
-    "from omlt import OmltBlock, OffsetScaling\n",
-    "from omlt.io.keras import load_keras_sequential\n",
-    "from omlt.neuralnet import ReluBigMFormulation\n",
-    "import pyomo.environ as pyo\n",
     "import pandas as pd\n",
-    "import tensorflow.keras as keras\n",
-    "from tensorflow.keras.models import Sequential\n",
+    "import pyomo.environ as pyo\n",
+    "from tensorflow import keras\n",
     "from tensorflow.keras.layers import Dense\n",
+    "from tensorflow.keras.models import Sequential\n",
     "from tensorflow.keras.optimizers import Adam\n",
-    "from tensorflow.keras.callbacks import ModelCheckpoint"
+    "\n",
+    "from omlt import OffsetScaling, OmltBlock\n",
+    "from omlt.io.keras import load_keras_sequential\n",
+    "from omlt.neuralnet import ReluBigMFormulation"
    ]
   },
   {
@@ -151,10 +152,23 @@
    ],
    "source": [
     "# read in our csv data\n",
-    "columns = ['Bypass Fraction', 'NG Steam Ratio', 'Steam Flow',\n",
-    "           'Reformer Duty','AR', 'C2H6', 'C3H8', 'C4H10',\n",
-    "           'CH4', 'CO', 'CO2', 'H2', 'H2O', 'N2']\n",
-    "df = pd.read_csv('../data/reformer.csv', usecols=columns)\n",
+    "columns = [\n",
+    "    \"Bypass Fraction\",\n",
+    "    \"NG Steam Ratio\",\n",
+    "    \"Steam Flow\",\n",
+    "    \"Reformer Duty\",\n",
+    "    \"AR\",\n",
+    "    \"C2H6\",\n",
+    "    \"C3H8\",\n",
+    "    \"C4H10\",\n",
+    "    \"CH4\",\n",
+    "    \"CO\",\n",
+    "    \"CO2\",\n",
+    "    \"H2\",\n",
+    "    \"H2O\",\n",
+    "    \"N2\",\n",
+    "]\n",
+    "df = pd.read_csv(\"../data/reformer.csv\", usecols=columns)\n",
     "print(df)"
    ]
   },
@@ -169,9 +183,21 @@
    "outputs": [],
    "source": [
     "# separate the data into inputs and outputs\n",
-    "inputs = ['Bypass Fraction', 'NG Steam Ratio']\n",
-    "outputs = [ 'Steam Flow', 'Reformer Duty','AR', 'C2H6', 'C3H8', 'C4H10',\n",
-    "           'CH4', 'CO', 'CO2', 'H2', 'H2O', 'N2']\n",
+    "inputs = [\"Bypass Fraction\", \"NG Steam Ratio\"]\n",
+    "outputs = [\n",
+    "    \"Steam Flow\",\n",
+    "    \"Reformer Duty\",\n",
+    "    \"AR\",\n",
+    "    \"C2H6\",\n",
+    "    \"C3H8\",\n",
+    "    \"C4H10\",\n",
+    "    \"CH4\",\n",
+    "    \"CO\",\n",
+    "    \"CO2\",\n",
+    "    \"H2\",\n",
+    "    \"H2O\",\n",
+    "    \"N2\",\n",
+    "]\n",
     "dfin = df[inputs]\n",
     "dfout = df[outputs]"
    ]
@@ -198,8 +224,8 @@
     "\n",
     "# capture the minimum and maximum values of the scaled inputs\n",
     "# so we don't use the model outside the valid range\n",
-    "scaled_lb = dfin.min()[inputs].values\n",
-    "scaled_ub = dfin.max()[inputs].values"
+    "scaled_lb = dfin.min()[inputs].to_numpy()\n",
+    "scaled_ub = dfin.max()[inputs].to_numpy()"
    ]
   },
   {
@@ -222,13 +248,13 @@
    ],
    "source": [
     "# create our Keras Sequential model\n",
-    "nn = Sequential(name='reformer_relu_4_20')\n",
-    "nn.add(Dense(units=10, input_dim=len(inputs), activation='relu'))\n",
-    "nn.add(Dense(units=10, activation='relu'))\n",
-    "nn.add(Dense(units=10, activation='relu'))\n",
-    "nn.add(Dense(units=10, activation='relu'))\n",
+    "nn = Sequential(name=\"reformer_relu_4_20\")\n",
+    "nn.add(Dense(units=10, input_dim=len(inputs), activation=\"relu\"))\n",
+    "nn.add(Dense(units=10, activation=\"relu\"))\n",
+    "nn.add(Dense(units=10, activation=\"relu\"))\n",
+    "nn.add(Dense(units=10, activation=\"relu\"))\n",
     "nn.add(Dense(units=len(outputs)))\n",
-    "nn.compile(optimizer=Adam(), loss='mse')"
+    "nn.compile(optimizer=Adam(), loss=\"mse\")"
    ]
   },
   {
@@ -449,8 +475,8 @@
    ],
    "source": [
     "# train our model\n",
-    "x = dfin.values\n",
-    "y = dfout.values\n",
+    "x = dfin.to_numpy()\n",
+    "y = dfout.to_numpy()\n",
     "\n",
     "history = nn.fit(x, y, epochs=100)"
    ]
@@ -468,7 +494,7 @@
     "# save the model to disk\n",
     "# While not technically necessary, this shows how we can load a previously saved model into\n",
     "# our optimization formulation)\n",
-    "nn.save('reformer_nn_relu.keras')"
+    "nn.save(\"reformer_nn_relu.keras\")"
    ]
   },
   {
@@ -522,22 +548,24 @@
    "outputs": [],
    "source": [
     "# load the Keras model\n",
-    "nn_reformer = keras.models.load_model('reformer_nn_relu.keras', compile=False)\n",
+    "nn_reformer = keras.models.load_model(\"reformer_nn_relu.keras\", compile=False)\n",
     "\n",
     "# Note: The neural network is in the scaled space. We want access to the\n",
     "# variables in the unscaled space. Therefore, we need to tell OMLT about the\n",
     "# scaling factors\n",
     "scaler = OffsetScaling(\n",
-    "        offset_inputs={i: x_offset[inputs[i]] for i in range(len(inputs))},\n",
-    "        factor_inputs={i: x_factor[inputs[i]] for i in range(len(inputs))},\n",
-    "        offset_outputs={i: y_offset[outputs[i]] for i in range(len(outputs))},\n",
-    "        factor_outputs={i: y_factor[outputs[i]] for i in range(len(outputs))}\n",
-    "    )\n",
+    "    offset_inputs={i: x_offset[inputs[i]] for i in range(len(inputs))},\n",
+    "    factor_inputs={i: x_factor[inputs[i]] for i in range(len(inputs))},\n",
+    "    offset_outputs={i: y_offset[outputs[i]] for i in range(len(outputs))},\n",
+    "    factor_outputs={i: y_factor[outputs[i]] for i in range(len(outputs))},\n",
+    ")\n",
     "\n",
     "scaled_input_bounds = {i: (scaled_lb[i], scaled_ub[i]) for i in range(len(inputs))}\n",
     "\n",
     "# create a network definition from the Keras model\n",
-    "net = load_keras_sequential(nn_reformer, scaling_object=scaler, scaled_input_bounds=scaled_input_bounds)\n",
+    "net = load_keras_sequential(\n",
+    "    nn_reformer, scaling_object=scaler, scaled_input_bounds=scaled_input_bounds\n",
+    ")\n",
     "\n",
     "# create the variables and constraints for the neural network in Pyomo\n",
     "m.reformer.build_formulation(ReluBigMFormulation(net))"
@@ -554,8 +582,8 @@
    "outputs": [],
    "source": [
     "# now add the objective and the constraints\n",
-    "h2_idx = outputs.index('H2')\n",
-    "n2_idx = outputs.index('N2')\n",
+    "h2_idx = outputs.index(\"H2\")\n",
+    "n2_idx = outputs.index(\"N2\")\n",
     "m.obj = pyo.Objective(expr=m.reformer.outputs[h2_idx], sense=pyo.maximize)\n",
     "m.con = pyo.Constraint(expr=m.reformer.outputs[n2_idx] <= 0.34)"
    ]
@@ -571,7 +599,7 @@
    "outputs": [],
    "source": [
     "# now solve the optimization problem (this may take some time)\n",
-    "solver = pyo.SolverFactory('cbc')\n",
+    "solver = pyo.SolverFactory(\"cbc\")\n",
     "status = solver.solve(m, tee=False)"
    ]
   },
@@ -596,10 +624,10 @@
     }
    ],
    "source": [
-    "print('Bypass Fraction:', pyo.value(m.reformer.inputs[0]))\n",
-    "print('NG Steam Ratio:', pyo.value(m.reformer.inputs[1]))\n",
-    "print('H2 Concentration:', pyo.value(m.reformer.outputs[h2_idx]))\n",
-    "print('N2 Concentration:', pyo.value(m.reformer.outputs[n2_idx]))"
+    "print(\"Bypass Fraction:\", pyo.value(m.reformer.inputs[0]))\n",
+    "print(\"NG Steam Ratio:\", pyo.value(m.reformer.inputs[1]))\n",
+    "print(\"H2 Concentration:\", pyo.value(m.reformer.outputs[h2_idx]))\n",
+    "print(\"N2 Concentration:\", pyo.value(m.reformer.outputs[n2_idx]))"
    ]
   }
  ],
diff --git a/docs/notebooks/neuralnet/auto-thermal-reformer.ipynb b/docs/notebooks/neuralnet/auto-thermal-reformer.ipynb
index 650f5700..6f7d4320 100644
--- a/docs/notebooks/neuralnet/auto-thermal-reformer.ipynb
+++ b/docs/notebooks/neuralnet/auto-thermal-reformer.ipynb
@@ -67,19 +67,20 @@
    "outputs": [],
    "source": [
     "import os\n",
-    "os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2' # suppress CUDA warnings from tensorflow\n",
+    "\n",
+    "os.environ[\"TF_CPP_MIN_LOG_LEVEL\"] = \"2\"  # suppress CUDA warnings from tensorflow\n",
     "\n",
     "# import the necessary packages\n",
-    "from omlt import OmltBlock, OffsetScaling\n",
-    "from omlt.io.keras import load_keras_sequential\n",
-    "from omlt.neuralnet import FullSpaceSmoothNNFormulation\n",
-    "import pyomo.environ as pyo\n",
     "import pandas as pd\n",
-    "import tensorflow.keras as keras\n",
-    "from tensorflow.keras.models import Sequential\n",
+    "import pyomo.environ as pyo\n",
+    "from tensorflow import keras\n",
     "from tensorflow.keras.layers import Dense\n",
+    "from tensorflow.keras.models import Sequential\n",
     "from tensorflow.keras.optimizers import Adam\n",
-    "from tensorflow.keras.callbacks import ModelCheckpoint"
+    "\n",
+    "from omlt import OffsetScaling, OmltBlock\n",
+    "from omlt.io.keras import load_keras_sequential\n",
+    "from omlt.neuralnet import FullSpaceSmoothNNFormulation"
    ]
   },
   {
@@ -140,10 +141,23 @@
    ],
    "source": [
     "# read in our csv data\n",
-    "columns = ['Bypass Fraction', 'NG Steam Ratio', 'Steam Flow',\n",
-    "           'Reformer Duty','AR', 'C2H6', 'C3H8', 'C4H10',\n",
-    "           'CH4', 'CO', 'CO2', 'H2', 'H2O', 'N2']\n",
-    "df = pd.read_csv('../data/reformer.csv', usecols=columns)\n",
+    "columns = [\n",
+    "    \"Bypass Fraction\",\n",
+    "    \"NG Steam Ratio\",\n",
+    "    \"Steam Flow\",\n",
+    "    \"Reformer Duty\",\n",
+    "    \"AR\",\n",
+    "    \"C2H6\",\n",
+    "    \"C3H8\",\n",
+    "    \"C4H10\",\n",
+    "    \"CH4\",\n",
+    "    \"CO\",\n",
+    "    \"CO2\",\n",
+    "    \"H2\",\n",
+    "    \"H2O\",\n",
+    "    \"N2\",\n",
+    "]\n",
+    "df = pd.read_csv(\"../data/reformer.csv\", usecols=columns)\n",
     "print(df)"
    ]
   },
@@ -158,9 +172,21 @@
    "outputs": [],
    "source": [
     "# separate the data into inputs and outputs\n",
-    "inputs = ['Bypass Fraction', 'NG Steam Ratio']\n",
-    "outputs = [ 'Steam Flow', 'Reformer Duty','AR', 'C2H6', 'C3H8', 'C4H10',\n",
-    "           'CH4', 'CO', 'CO2', 'H2', 'H2O', 'N2']\n",
+    "inputs = [\"Bypass Fraction\", \"NG Steam Ratio\"]\n",
+    "outputs = [\n",
+    "    \"Steam Flow\",\n",
+    "    \"Reformer Duty\",\n",
+    "    \"AR\",\n",
+    "    \"C2H6\",\n",
+    "    \"C3H8\",\n",
+    "    \"C4H10\",\n",
+    "    \"CH4\",\n",
+    "    \"CO\",\n",
+    "    \"CO2\",\n",
+    "    \"H2\",\n",
+    "    \"H2O\",\n",
+    "    \"N2\",\n",
+    "]\n",
     "dfin = df[inputs]\n",
     "dfout = df[outputs]"
    ]
@@ -187,8 +213,8 @@
     "\n",
     "# capture the minimum and maximum values of the scaled inputs\n",
     "# so we don't use the model outside the valid range\n",
-    "scaled_lb = dfin.min()[inputs].values\n",
-    "scaled_ub = dfin.max()[inputs].values"
+    "scaled_lb = dfin.min()[inputs].to_numpy()\n",
+    "scaled_ub = dfin.max()[inputs].to_numpy()"
    ]
   },
   {
@@ -211,13 +237,13 @@
    ],
    "source": [
     "# create our Keras Sequential model\n",
-    "nn = Sequential(name='reformer_sigmoid_4_20')\n",
-    "nn.add(Dense(units=20, input_dim=len(inputs), activation='sigmoid'))\n",
-    "nn.add(Dense(units=20, activation='sigmoid'))\n",
-    "nn.add(Dense(units=20, activation='sigmoid'))\n",
-    "nn.add(Dense(units=20, activation='sigmoid'))\n",
+    "nn = Sequential(name=\"reformer_sigmoid_4_20\")\n",
+    "nn.add(Dense(units=20, input_dim=len(inputs), activation=\"sigmoid\"))\n",
+    "nn.add(Dense(units=20, activation=\"sigmoid\"))\n",
+    "nn.add(Dense(units=20, activation=\"sigmoid\"))\n",
+    "nn.add(Dense(units=20, activation=\"sigmoid\"))\n",
     "nn.add(Dense(units=len(outputs)))\n",
-    "nn.compile(optimizer=Adam(), loss='mse')"
+    "nn.compile(optimizer=Adam(), loss=\"mse\")"
    ]
   },
   {
@@ -438,8 +464,8 @@
    ],
    "source": [
     "# train our model\n",
-    "x = dfin.values\n",
-    "y = dfout.values\n",
+    "x = dfin.to_numpy()\n",
+    "y = dfout.to_numpy()\n",
     "\n",
     "history = nn.fit(x, y, epochs=100)"
    ]
@@ -457,7 +483,7 @@
     "# save the model to disk\n",
     "# While not technically necessary, this shows how we can load a previously saved model into\n",
     "# our optimization formulation)\n",
-    "nn.save('reformer_nn.keras')"
+    "nn.save(\"reformer_nn.keras\")"
    ]
   },
   {
@@ -511,22 +537,24 @@
    "outputs": [],
    "source": [
     "# load the Keras model\n",
-    "nn_reformer = keras.models.load_model('reformer_nn.keras', compile=False)\n",
+    "nn_reformer = keras.models.load_model(\"reformer_nn.keras\", compile=False)\n",
     "\n",
     "# Note: The neural network is in the scaled space. We want access to the\n",
     "# variables in the unscaled space. Therefore, we need to tell OMLT about the\n",
     "# scaling factors\n",
     "scaler = OffsetScaling(\n",
-    "        offset_inputs={i: x_offset[inputs[i]] for i in range(len(inputs))},\n",
-    "        factor_inputs={i: x_factor[inputs[i]] for i in range(len(inputs))},\n",
-    "        offset_outputs={i: y_offset[outputs[i]] for i in range(len(outputs))},\n",
-    "        factor_outputs={i: y_factor[outputs[i]] for i in range(len(outputs))}\n",
-    "    )\n",
+    "    offset_inputs={i: x_offset[inputs[i]] for i in range(len(inputs))},\n",
+    "    factor_inputs={i: x_factor[inputs[i]] for i in range(len(inputs))},\n",
+    "    offset_outputs={i: y_offset[outputs[i]] for i in range(len(outputs))},\n",
+    "    factor_outputs={i: y_factor[outputs[i]] for i in range(len(outputs))},\n",
+    ")\n",
     "\n",
     "scaled_input_bounds = {i: (scaled_lb[i], scaled_ub[i]) for i in range(len(inputs))}\n",
     "\n",
     "# create a network definition from the Keras model\n",
-    "net = load_keras_sequential(nn_reformer, scaling_object=scaler, scaled_input_bounds=scaled_input_bounds)\n",
+    "net = load_keras_sequential(\n",
+    "    nn_reformer, scaling_object=scaler, scaled_input_bounds=scaled_input_bounds\n",
+    ")\n",
     "\n",
     "# create the variables and constraints for the neural network in Pyomo\n",
     "m.reformer.build_formulation(FullSpaceSmoothNNFormulation(net))"
@@ -543,8 +571,8 @@
    "outputs": [],
    "source": [
     "# now add the objective and the constraints\n",
-    "h2_idx = outputs.index('H2')\n",
-    "n2_idx = outputs.index('N2')\n",
+    "h2_idx = outputs.index(\"H2\")\n",
+    "n2_idx = outputs.index(\"N2\")\n",
     "m.obj = pyo.Objective(expr=m.reformer.outputs[h2_idx], sense=pyo.maximize)\n",
     "m.con = pyo.Constraint(expr=m.reformer.outputs[n2_idx] <= 0.34)"
    ]
@@ -687,7 +715,7 @@
    ],
    "source": [
     "# now solve the optimization problem\n",
-    "solver = pyo.SolverFactory('ipopt')\n",
+    "solver = pyo.SolverFactory(\"ipopt\")\n",
     "status = solver.solve(m, tee=True)"
    ]
   },
@@ -712,10 +740,10 @@
     }
    ],
    "source": [
-    "print('Bypass Fraction:', pyo.value(m.reformer.inputs[0]))\n",
-    "print('NG Steam Ratio:', pyo.value(m.reformer.inputs[1]))\n",
-    "print('H2 Concentration:', pyo.value(m.reformer.outputs[h2_idx]))\n",
-    "print('N2 Concentration:', pyo.value(m.reformer.outputs[n2_idx]))"
+    "print(\"Bypass Fraction:\", pyo.value(m.reformer.inputs[0]))\n",
+    "print(\"NG Steam Ratio:\", pyo.value(m.reformer.inputs[1]))\n",
+    "print(\"H2 Concentration:\", pyo.value(m.reformer.outputs[h2_idx]))\n",
+    "print(\"N2 Concentration:\", pyo.value(m.reformer.outputs[n2_idx]))"
    ]
   }
  ],
diff --git a/docs/notebooks/neuralnet/build_network.ipynb b/docs/notebooks/neuralnet/build_network.ipynb
index 4b0bd499..008042c1 100644
--- a/docs/notebooks/neuralnet/build_network.ipynb
+++ b/docs/notebooks/neuralnet/build_network.ipynb
@@ -37,11 +37,11 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "import pyomo.environ as pyo\n",
     "import numpy as np\n",
+    "import pyomo.environ as pyo\n",
     "\n",
     "from omlt.neuralnet import NetworkDefinition\n",
-    "from omlt.neuralnet.layer import InputLayer, DenseLayer, IndexMapper"
+    "from omlt.neuralnet.layer import DenseLayer, IndexMapper, InputLayer"
    ]
   },
   {
@@ -75,10 +75,14 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "net = NetworkDefinition(scaled_input_bounds={(0,0):(-1.0, 1.0), \n",
-    "                                             (0,1):(-1.0, 1.0),\n",
-    "                                             (1,0):(-1.0, 1.0),\n",
-    "                                             (1,1):(-1.0, 1.0)})"
+    "net = NetworkDefinition(\n",
+    "    scaled_input_bounds={\n",
+    "        (0, 0): (-1.0, 1.0),\n",
+    "        (0, 1): (-1.0, 1.0),\n",
+    "        (1, 0): (-1.0, 1.0),\n",
+    "        (1, 1): (-1.0, 1.0),\n",
+    "    }\n",
+    ")"
    ]
   },
   {
@@ -156,7 +160,7 @@
     }
    ],
    "source": [
-    "Image(filename='../images/input-layer.png', height=300)"
+    "Image(filename=\"../images/input-layer.png\", height=300)"
    ]
   },
   {
@@ -224,7 +228,7 @@
     }
    ],
    "source": [
-    "Image(filename='../images/dense-layer-0.png', height=250)"
+    "Image(filename=\"../images/dense-layer-0.png\", height=250)"
    ]
   },
   {
@@ -238,7 +242,7 @@
     "    output_size=[2, 1],\n",
     "    activation=\"linear\",\n",
     "    weights=np.array([[1.0], [-0.5]]),\n",
-    "    biases=np.array([[0.1], [0.25]])\n",
+    "    biases=np.array([[0.1], [0.25]]),\n",
     ")"
    ]
   },
@@ -288,7 +292,7 @@
     }
    ],
    "source": [
-    "Image(filename='../images/network-structure-0.png', height=200)"
+    "Image(filename=\"../images/network-structure-0.png\", height=200)"
    ]
   },
   {
@@ -368,10 +372,10 @@
     "y = x\n",
     "z = np.maximum(0, y)\n",
     "\n",
-    "plt.plot(x, y, label='Before Activation')\n",
-    "plt.plot(x, z, label='After Activation')\n",
-    "plt.xlabel('x')\n",
-    "plt.ylabel('y')\n",
+    "plt.plot(x, y, label=\"Before Activation\")\n",
+    "plt.plot(x, z, label=\"After Activation\")\n",
+    "plt.xlabel(\"x\")\n",
+    "plt.ylabel(\"y\")\n",
     "plt.legend()"
    ]
   },
@@ -427,7 +431,7 @@
     }
    ],
    "source": [
-    "Image(filename='../images/dense-layer-1.png', height=250)"
+    "Image(filename=\"../images/dense-layer-1.png\", height=250)"
    ]
   },
   {
@@ -476,7 +480,7 @@
     }
    ],
    "source": [
-    "Image(filename='../images/network-structure-1.png', height=200)"
+    "Image(filename=\"../images/network-structure-1.png\", height=200)"
    ]
   },
   {
@@ -672,8 +676,10 @@
     }
    ],
    "source": [
-    "m.neural_net.layer[m.neural_net.layers.at(1)].z.pprint() #Note, the input layer does not have zhat\n",
-    "m.neural_net.layer[m.neural_net.layers.at(2)].zhat.pprint() "
+    "m.neural_net.layer[\n",
+    "    m.neural_net.layers.at(1)\n",
+    "].z.pprint()  # Note, the input layer does not have zhat\n",
+    "m.neural_net.layer[m.neural_net.layers.at(2)].zhat.pprint()"
    ]
   },
   {
diff --git a/docs/notebooks/neuralnet/graph_neural_network_formulation.ipynb b/docs/notebooks/neuralnet/graph_neural_network_formulation.ipynb
index 69cb9675..2699d171 100644
--- a/docs/notebooks/neuralnet/graph_neural_network_formulation.ipynb
+++ b/docs/notebooks/neuralnet/graph_neural_network_formulation.ipynb
@@ -54,13 +54,13 @@
    ],
    "source": [
     "import numpy as np\n",
+    "import pyomo.environ as pyo\n",
     "import torch\n",
     "from torch.nn import Linear, ReLU, Sigmoid\n",
-    "from torch_geometric.nn import Sequential, GCNConv\n",
-    "from torch_geometric.nn import global_mean_pool\n",
-    "from omlt.io.torch_geometric import gnn_with_fixed_graph\n",
-    "import pyomo.environ as pyo\n",
+    "from torch_geometric.nn import GCNConv, Sequential, global_mean_pool\n",
+    "\n",
     "from omlt import OmltBlock\n",
+    "from omlt.io.torch_geometric import gnn_with_fixed_graph\n",
     "\n",
     "\n",
     "def GCN_Sequential(activation, pooling):\n",
@@ -78,7 +78,7 @@
     "            activation(),\n",
     "            Linear(2, 1),\n",
     "        ],\n",
-    "    )\n"
+    "    )"
    ]
   },
   {
@@ -478,14 +478,13 @@
    "outputs": [],
    "source": [
     "import numpy as np\n",
+    "import pyomo.environ as pyo\n",
     "import torch\n",
-    "from torch.nn import Linear, ReLU\n",
-    "from torch_geometric.nn import Sequential, SAGEConv\n",
-    "from torch_geometric.nn import global_add_pool\n",
-    "from omlt.io.torch_geometric import gnn_with_non_fixed_graph\n",
+    "from torch.nn import ReLU\n",
+    "from torch_geometric.nn import SAGEConv, global_add_pool\n",
     "\n",
-    "import pyomo.environ as pyo\n",
     "from omlt import OmltBlock\n",
+    "from omlt.io.torch_geometric import gnn_with_non_fixed_graph\n",
     "\n",
     "\n",
     "def SAGE_Sequential(activation, pooling):\n",
diff --git a/docs/notebooks/neuralnet/import_network.ipynb b/docs/notebooks/neuralnet/import_network.ipynb
index 3f056572..673fa974 100644
--- a/docs/notebooks/neuralnet/import_network.ipynb
+++ b/docs/notebooks/neuralnet/import_network.ipynb
@@ -170,7 +170,7 @@
    "source": [
     "import pandas as pd\n",
     "\n",
-    "df = pd.read_csv('../data/diabetes.csv')\n",
+    "df = pd.read_csv(\"../data/diabetes.csv\")\n",
     "\n",
     "df.head()"
    ]
@@ -215,8 +215,8 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "X = df.iloc[:, :8].values\n",
-    "Y = df.iloc[:, 8].values"
+    "X = df.iloc[:, :8].to_numpy()\n",
+    "Y = df.iloc[:, 8].to_numpy()"
    ]
   },
   {
@@ -265,7 +265,7 @@
     "\n",
     "lb = np.min(X, axis=0)\n",
     "ub = np.max(X, axis=0)\n",
-    "input_bounds = [(l, u) for l, u in zip(lb, ub)]\n",
+    "input_bounds = list(zip(lb, ub))\n",
     "input_bounds"
    ]
   },
@@ -292,7 +292,7 @@
     }
    ],
    "source": [
-    "from omlt.io import write_onnx_model_with_bounds, load_onnx_neural_network_with_bounds"
+    "from omlt.io import load_onnx_neural_network_with_bounds, write_onnx_model_with_bounds"
    ]
   },
   {
@@ -350,17 +350,17 @@
    ],
    "source": [
     "import os\n",
-    "os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'  # or any {'0', '1', '2'}\n",
-    "import keras\n",
-    "from keras.models import Sequential\n",
+    "\n",
+    "os.environ[\"TF_CPP_MIN_LOG_LEVEL\"] = \"3\"  # or any {'0', '1', '2'}\n",
     "from keras.layers import Dense\n",
+    "from keras.models import Sequential\n",
     "\n",
     "model = Sequential()\n",
-    "model.add(Dense(12, input_dim=8, activation='relu'))\n",
-    "model.add(Dense(8, activation='relu'))\n",
-    "model.add(Dense(1, activation='linear'))\n",
+    "model.add(Dense(12, input_dim=8, activation=\"relu\"))\n",
+    "model.add(Dense(8, activation=\"relu\"))\n",
+    "model.add(Dense(1, activation=\"linear\"))\n",
     "\n",
-    "model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])"
+    "model.compile(loss=\"binary_crossentropy\", optimizer=\"adam\", metrics=[\"accuracy\"])"
    ]
   },
   {
@@ -730,13 +730,13 @@
     "# Add output_names for compatibility:\n",
     "model.output_names = [output.name for output in model.outputs]\n",
     "\n",
-    "from tensorflow import TensorSpec\n",
     "import tf2onnx\n",
+    "from tensorflow import TensorSpec\n",
     "\n",
     "spec = [TensorSpec(input.shape, input.dtype, input.name) for input in model.inputs]\n",
     "onnx_model, _ = tf2onnx.convert.from_keras(model, input_signature=spec)\n",
     "\n",
-    "with tempfile.NamedTemporaryFile(suffix='.onnx', delete=False) as f:\n",
+    "with tempfile.NamedTemporaryFile(suffix=\".onnx\", delete=False) as f:\n",
     "    write_onnx_model_with_bounds(f.name, onnx_model, input_bounds)\n",
     "    print(f\"Wrote ONNX model with bounds at {f.name}\")"
    ]
@@ -770,7 +770,7 @@
     }
    ],
    "source": [
-    "Image(filename='../images/simple-neural-network.png', height=600)"
+    "Image(filename=\"../images/simple-neural-network.png\", height=600)"
    ]
   },
   {
@@ -816,10 +816,11 @@
    ],
    "source": [
     "import torch\n",
-    "import torch.nn as nn\n",
     "import torch.nn.functional as F\n",
+    "from torch import nn\n",
     "from torch.utils.data import DataLoader, TensorDataset\n",
     "\n",
+    "\n",
     "class PyTorchModel(nn.Module):\n",
     "    def __init__(self):\n",
     "        super().__init__()\n",
@@ -830,18 +831,20 @@
     "    def forward(self, x):\n",
     "        x = F.relu(self.dense_0(x))\n",
     "        x = F.relu(self.dense_1(x))\n",
-    "        x = self.out(x)\n",
-    "        return x\n",
+    "        return self.out(x)\n",
+    "\n",
     "\n",
     "model = PyTorchModel()\n",
     "loss_function = nn.L1Loss()\n",
-    "optimizer = torch.optim.Adam(model.parameters(),lr=0.01)\n",
+    "optimizer = torch.optim.Adam(model.parameters(), lr=0.01)\n",
     "\n",
-    "dataset = TensorDataset(torch.as_tensor(X, dtype=torch.float32), torch.as_tensor(Y, dtype=torch.float32))\n",
+    "dataset = TensorDataset(\n",
+    "    torch.as_tensor(X, dtype=torch.float32), torch.as_tensor(Y, dtype=torch.float32)\n",
+    ")\n",
     "dataloader = DataLoader(dataset, batch_size=10)\n",
     "\n",
     "for epoch in range(150):\n",
-    "    for id_batch, (x_batch, y_batch) in enumerate(dataloader):\n",
+    "    for x_batch, y_batch in dataloader:\n",
     "        y_batch_pred = model(x_batch)\n",
     "        loss = loss_function(y_batch_pred, y_batch.view(*y_batch_pred.shape))\n",
     "        optimizer.zero_grad()\n",
@@ -849,7 +852,7 @@
     "        optimizer.step()\n",
     "\n",
     "    if epoch % 10 == 0:\n",
-    "        print(f\"Epoch number: {epoch} loss : {loss.item()}\")\n"
+    "        print(f\"Epoch number: {epoch} loss : {loss.item()}\")"
    ]
   },
   {
@@ -878,17 +881,14 @@
     "# model input used for exporting\n",
     "x = torch.randn(10, 8, requires_grad=True)\n",
     "pytorch_model = None\n",
-    "with tempfile.NamedTemporaryFile(suffix='.onnx', delete=False) as f:\n",
+    "with tempfile.NamedTemporaryFile(suffix=\".onnx\", delete=False) as f:\n",
     "    torch.onnx.export(\n",
     "        model,\n",
     "        x,\n",
     "        f,\n",
-    "        input_names=['input'],\n",
-    "        output_names=['output'],\n",
-    "        dynamic_axes={\n",
-    "            'input': {0: 'batch_size'},\n",
-    "            'output': {0: 'batch_size'}\n",
-    "        }\n",
+    "        input_names=[\"input\"],\n",
+    "        output_names=[\"output\"],\n",
+    "        dynamic_axes={\"input\": {0: \"batch_size\"}, \"output\": {0: \"batch_size\"}},\n",
     "    )\n",
     "    write_onnx_model_with_bounds(f.name, None, input_bounds)\n",
     "    print(f\"Wrote PyTorch model to {f.name}\")\n",
@@ -917,7 +917,7 @@
     }
    ],
    "source": [
-    "Image(filename='../images/torch-neural-network.png', height=500)"
+    "Image(filename=\"../images/torch-neural-network.png\", height=500)"
    ]
   },
   {
diff --git a/docs/notebooks/neuralnet/index_handling.ipynb b/docs/notebooks/neuralnet/index_handling.ipynb
index 36ed4338..7b3a9b6a 100644
--- a/docs/notebooks/neuralnet/index_handling.ipynb
+++ b/docs/notebooks/neuralnet/index_handling.ipynb
@@ -29,8 +29,9 @@
    "outputs": [],
    "source": [
     "import numpy as np\n",
+    "\n",
     "from omlt.neuralnet import NetworkDefinition\n",
-    "from omlt.neuralnet.layer import IndexMapper, InputLayer, DenseLayer, PoolingLayer2D"
+    "from omlt.neuralnet.layer import DenseLayer, IndexMapper, InputLayer, PoolingLayer2D"
    ]
   },
   {
diff --git a/docs/notebooks/neuralnet/mnist_example_convolutional.ipynb b/docs/notebooks/neuralnet/mnist_example_convolutional.ipynb
index 1de8f770..cf44882c 100644
--- a/docs/notebooks/neuralnet/mnist_example_convolutional.ipynb
+++ b/docs/notebooks/neuralnet/mnist_example_convolutional.ipynb
@@ -47,26 +47,29 @@
     }
    ],
    "source": [
-    "#Import requisite packages\n",
-    "#data manipulation\n",
-    "import numpy as np\n",
+    "# Import requisite packages\n",
+    "# data manipulation\n",
     "import tempfile\n",
     "\n",
-    "#pytorch for training neural network\n",
-    "import torch, torch.onnx\n",
-    "import torch.nn as nn\n",
-    "import torch.nn.functional as F\n",
-    "import torch.optim as optim\n",
-    "from torchvision import datasets, transforms\n",
-    "from torch.optim.lr_scheduler import StepLR\n",
+    "import numpy as np\n",
     "\n",
-    "#pyomo for optimization\n",
+    "# pyomo for optimization\n",
     "import pyomo.environ as pyo\n",
     "\n",
-    "#omlt for interfacing our neural network with pyomo\n",
+    "# pytorch for training neural network\n",
+    "import torch\n",
+    "import torch.onnx\n",
+    "from torch import nn, optim\n",
+    "from torch.optim.lr_scheduler import StepLR\n",
+    "from torchvision import datasets, transforms\n",
+    "\n",
+    "# omlt for interfacing our neural network with pyomo\n",
     "from omlt import OmltBlock\n",
-    "from omlt.neuralnet import FullSpaceNNFormulation\n",
-    "from omlt.io.onnx import write_onnx_model_with_bounds, load_onnx_neural_network_with_bounds"
+    "from omlt.io.onnx import (\n",
+    "    load_onnx_neural_network_with_bounds,\n",
+    "    write_onnx_model_with_bounds,\n",
+    ")\n",
+    "from omlt.neuralnet import FullSpaceNNFormulation"
    ]
   },
   {
@@ -84,14 +87,16 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "#set training and test batch sizes\n",
-    "train_kwargs = {'batch_size': 64}\n",
-    "test_kwargs = {'batch_size': 1000}\n",
+    "# set training and test batch sizes\n",
+    "train_kwargs = {\"batch_size\": 64}\n",
+    "test_kwargs = {\"batch_size\": 1000}\n",
     "\n",
-    "#build DataLoaders for training and test sets\n",
-    "dataset1 = datasets.MNIST('../data', train=True, download=True, transform=transforms.ToTensor())\n",
-    "dataset2 = datasets.MNIST('../data', train=False, transform=transforms.ToTensor())\n",
-    "train_loader = torch.utils.data.DataLoader(dataset1,**train_kwargs, shuffle=True)\n",
+    "# build DataLoaders for training and test sets\n",
+    "dataset1 = datasets.MNIST(\n",
+    "    \"../data\", train=True, download=True, transform=transforms.ToTensor()\n",
+    ")\n",
+    "dataset2 = datasets.MNIST(\"../data\", train=False, transform=transforms.ToTensor())\n",
+    "train_loader = torch.utils.data.DataLoader(dataset1, **train_kwargs, shuffle=True)\n",
     "test_loader = torch.utils.data.DataLoader(dataset2, **test_kwargs)"
    ]
   },
@@ -110,28 +115,28 @@
    "source": [
     "hidden_size = 10\n",
     "\n",
+    "\n",
     "class Net(nn.Module):\n",
-    "    #define layers of neural network\n",
+    "    # define layers of neural network\n",
     "    def __init__(self):\n",
     "        super().__init__()\n",
-    "        self.conv1  = nn.Conv2d(1, 2, (4,4), (2,2), 0)\n",
-    "        self.conv2  = nn.Conv2d(2, 2, (4,4), (2,2), 0)\n",
-    "        self.hidden1 = nn.Linear(5*5*2, hidden_size)\n",
-    "        self.output  = nn.Linear(hidden_size, 10)\n",
+    "        self.conv1 = nn.Conv2d(1, 2, (4, 4), (2, 2), 0)\n",
+    "        self.conv2 = nn.Conv2d(2, 2, (4, 4), (2, 2), 0)\n",
+    "        self.hidden1 = nn.Linear(5 * 5 * 2, hidden_size)\n",
+    "        self.output = nn.Linear(hidden_size, 10)\n",
     "        self.relu = nn.ReLU()\n",
     "        self.softmax = nn.LogSoftmax(dim=1)\n",
     "\n",
-    "    #define forward pass of neural network\n",
+    "    # define forward pass of neural network\n",
     "    def forward(self, x):\n",
     "        self.x1 = self.conv1(x)\n",
     "        self.x2 = self.relu(self.x1)\n",
     "        self.x3 = self.conv2(self.x2)\n",
     "        self.x4 = self.relu(self.x3)\n",
-    "        self.x5 = self.hidden1(self.x4.view((-1,5*5*2)))\n",
+    "        self.x5 = self.hidden1(self.x4.view((-1, 5 * 5 * 2)))\n",
     "        self.x6 = self.relu(self.x5)\n",
     "        self.x7 = self.output(self.x6)\n",
-    "        x = self.softmax(self.x7)      \n",
-    "        return x"
+    "        return self.softmax(self.x7)"
    ]
   },
   {
@@ -147,33 +152,38 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "#training function computes loss and its gradient on batch, and prints status after every 200 batches\n",
+    "# training function computes loss and its gradient on batch, and prints status after every 200 batches\n",
     "def train(model, train_loader, optimizer, epoch):\n",
-    "    model.train(); criterion = nn.NLLLoss()\n",
+    "    model.train()\n",
+    "    criterion = nn.NLLLoss()\n",
     "    for batch_idx, (data, target) in enumerate(train_loader):\n",
     "        optimizer.zero_grad()\n",
     "        output = model(data)\n",
     "        loss = criterion(output, target)\n",
     "        loss.backward()\n",
     "        optimizer.step()\n",
-    "        if batch_idx % 200  == 0:\n",
-    "            print('Train Epoch: {} [{}/{} ({:.0f}%)]\\tLoss: {:.6f}'.format(\n",
-    "                epoch, batch_idx * len(data), len(train_loader.dataset),\n",
-    "                100. * batch_idx / len(train_loader), loss.item()))\n",
+    "        if batch_idx % 200 == 0:\n",
+    "            print(\n",
+    "                f\"Train Epoch: {epoch} [{batch_idx * len(data)}/{len(train_loader.dataset)} ({100.0 * batch_idx / len(train_loader):.0f}%)]\\tLoss: {loss.item():.6f}\"\n",
+    "            )\n",
     "\n",
-    "#testing function computes loss and prints overall model accuracy on test set\n",
+    "\n",
+    "# testing function computes loss and prints overall model accuracy on test set\n",
     "def test(model, test_loader):\n",
-    "    model.eval(); criterion = nn.NLLLoss(reduction='sum')\n",
-    "    test_loss = 0; correct = 0\n",
+    "    model.eval()\n",
+    "    criterion = nn.NLLLoss(reduction=\"sum\")\n",
+    "    test_loss = 0\n",
+    "    correct = 0\n",
     "    with torch.no_grad():\n",
     "        for data, target in test_loader:\n",
     "            output = model(data)\n",
-    "            test_loss += criterion(output, target).item()  \n",
-    "            pred = output.argmax(dim=1, keepdim=True) \n",
+    "            test_loss += criterion(output, target).item()\n",
+    "            pred = output.argmax(dim=1, keepdim=True)\n",
     "            correct += pred.eq(target.view_as(pred)).sum().item()\n",
     "    test_loss /= len(test_loader.dataset)\n",
-    "    print('\\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\\n'.format(\n",
-    "        test_loss, correct, len(test_loader.dataset), 100. * correct / len(test_loader.dataset)))            "
+    "    print(\n",
+    "        f\"\\nTest set: Average loss: {test_loss:.4f}, Accuracy: {correct}/{len(test_loader.dataset)} ({100.0 * correct / len(test_loader.dataset):.0f}%)\\n\"\n",
+    "    )"
    ]
   },
   {
@@ -237,12 +247,12 @@
     }
    ],
    "source": [
-    "#define model and optimizer\n",
+    "# define model and optimizer\n",
     "model = Net()\n",
     "optimizer = optim.Adadelta(model.parameters(), lr=1)\n",
     "scheduler = StepLR(optimizer, step_size=1, gamma=0.7)\n",
     "\n",
-    "#train CNN model for five epochs\n",
+    "# train CNN model for five epochs\n",
     "for epoch in range(5):\n",
     "    train(model, train_loader, optimizer, epoch)\n",
     "    test(model, test_loader)\n",
@@ -283,27 +293,27 @@
    ],
    "source": [
     "class NoSoftmaxNet(nn.Module):\n",
-    "    #define layers of neural network\n",
+    "    # define layers of neural network\n",
     "    def __init__(self):\n",
     "        super().__init__()\n",
-    "        self.conv1  = nn.Conv2d(1, 2, (4,4), (2,2), 0)\n",
-    "        self.conv2  = nn.Conv2d(2, 2, (4,4), (2,2), 0)\n",
+    "        self.conv1 = nn.Conv2d(1, 2, (4, 4), (2, 2), 0)\n",
+    "        self.conv2 = nn.Conv2d(2, 2, (4, 4), (2, 2), 0)\n",
     "        self.hidden1 = nn.Linear(5 * 5 * 2, hidden_size)\n",
-    "        self.output  = nn.Linear(hidden_size, 10)\n",
+    "        self.output = nn.Linear(hidden_size, 10)\n",
     "        self.relu = nn.ReLU()\n",
     "\n",
-    "    #define forward pass of neural network\n",
+    "    # define forward pass of neural network\n",
     "    def forward(self, x):\n",
     "        self.x1 = self.conv1(x)\n",
     "        self.x2 = self.relu(self.x1)\n",
     "        self.x3 = self.conv2(self.x2)\n",
     "        self.x4 = self.relu(self.x3)\n",
-    "        self.x5 = self.hidden1(self.x4.view((-1,5*5*2)))\n",
+    "        self.x5 = self.hidden1(self.x4.view((-1, 5 * 5 * 2)))\n",
     "        self.x6 = self.relu(self.x5)\n",
-    "        x = self.output(self.x6)    \n",
-    "        return x\n",
+    "        return self.output(self.x6)\n",
+    "\n",
     "\n",
-    "#create neural network without LogSoftmax and load parameters from existing model\n",
+    "# create neural network without LogSoftmax and load parameters from existing model\n",
     "model2 = NoSoftmaxNet()\n",
     "model2.load_state_dict(model.state_dict())"
    ]
@@ -331,24 +341,24 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "#load image and true label from test set with index 'problem_index'\n",
+    "# load image and true label from test set with index 'problem_index'\n",
     "problem_index = 0\n",
     "image = dataset2[problem_index][0].detach().numpy()\n",
     "label = dataset2[problem_index][1]\n",
     "\n",
-    "#define input region defined by infinity norm\n",
+    "# define input region defined by infinity norm\n",
     "epsilon_infty = 1e-3\n",
     "lb = np.maximum(0, image - epsilon_infty)\n",
     "ub = np.minimum(1, image + epsilon_infty)\n",
     "\n",
-    "#save input bounds as dictionary, note that the first index 0 corresponds to the single-channel input\n",
+    "# save input bounds as dictionary, note that the first index 0 corresponds to the single-channel input\n",
     "input_bounds = {}\n",
     "for i in range(28):\n",
     "    for j in range(28):\n",
-    "        input_bounds[(0,i,j)] = (float(lb[0][i,j]), float(ub[0][i,j])) \n",
-    "    \n",
-    "#define dummy input tensor    \n",
-    "x = dataset2[problem_index][0].view(-1,1,28,28)"
+    "        input_bounds[(0, i, j)] = (float(lb[0][i, j]), float(ub[0][i, j]))\n",
+    "\n",
+    "# define dummy input tensor\n",
+    "x = dataset2[problem_index][0].view(-1, 1, 28, 28)"
    ]
   },
   {
@@ -364,22 +374,19 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "with tempfile.NamedTemporaryFile(suffix='.onnx', delete=False) as f:\n",
-    "    #export neural network to ONNX\n",
+    "with tempfile.NamedTemporaryFile(suffix=\".onnx\", delete=False) as f:\n",
+    "    # export neural network to ONNX\n",
     "    torch.onnx.export(\n",
     "        model2,\n",
     "        x,\n",
     "        f,\n",
-    "        input_names=['input'],\n",
-    "        output_names=['output'],\n",
-    "        dynamic_axes={\n",
-    "            'input': {0: 'batch_size'},\n",
-    "            'output': {0: 'batch_size'}\n",
-    "        }\n",
+    "        input_names=[\"input\"],\n",
+    "        output_names=[\"output\"],\n",
+    "        dynamic_axes={\"input\": {0: \"batch_size\"}, \"output\": {0: \"batch_size\"}},\n",
     "    )\n",
-    "    #write ONNX model and its bounds using OMLT\n",
+    "    # write ONNX model and its bounds using OMLT\n",
     "    write_onnx_model_with_bounds(f.name, None, input_bounds)\n",
-    "    #load the network definition from the ONNX model\n",
+    "    # load the network definition from the ONNX model\n",
     "    network_definition = load_onnx_neural_network_with_bounds(f.name)"
    ]
   },
@@ -798,12 +805,12 @@
     }
    ],
    "source": [
-    "#create pyomo model\n",
+    "# create pyomo model\n",
     "m = pyo.ConcreteModel()\n",
     "\n",
-    "#create an OMLT block for the neural network and build its formulation\n",
+    "# create an OMLT block for the neural network and build its formulation\n",
     "m.nn = OmltBlock()\n",
-    "m.nn.build_formulation(formulation) "
+    "m.nn.build_formulation(formulation)"
    ]
   },
   {
@@ -820,7 +827,7 @@
    "outputs": [],
    "source": [
     "adversary = (label + 1) % 10\n",
-    "m.obj = pyo.Objective(expr=(-(m.nn.outputs[0,adversary]-m.nn.outputs[0,label])))"
+    "m.obj = pyo.Objective(expr=(-(m.nn.outputs[0, adversary] - m.nn.outputs[0, label])))"
    ]
   },
   {
@@ -1003,7 +1010,7 @@
     }
    ],
    "source": [
-    "solver = pyo.SolverFactory('cbc')\n",
+    "solver = pyo.SolverFactory(\"cbc\")\n",
     "solver.solve(m, tee=True)"
    ]
   },
diff --git a/docs/notebooks/neuralnet/mnist_example_dense.ipynb b/docs/notebooks/neuralnet/mnist_example_dense.ipynb
index e7af1f06..fecd4467 100644
--- a/docs/notebooks/neuralnet/mnist_example_dense.ipynb
+++ b/docs/notebooks/neuralnet/mnist_example_dense.ipynb
@@ -46,26 +46,29 @@
     }
    ],
    "source": [
-    "#Import requisite packages\n",
-    "#data manipulation\n",
-    "import numpy as np\n",
+    "# Import requisite packages\n",
+    "# data manipulation\n",
     "import tempfile\n",
     "\n",
-    "#pytorch for training neural network\n",
-    "import torch, torch.onnx\n",
-    "import torch.nn as nn\n",
-    "import torch.nn.functional as F\n",
-    "import torch.optim as optim\n",
-    "from torchvision import datasets, transforms\n",
-    "from torch.optim.lr_scheduler import StepLR\n",
+    "import numpy as np\n",
     "\n",
-    "#pyomo for optimization\n",
+    "# pyomo for optimization\n",
     "import pyomo.environ as pyo\n",
     "\n",
-    "#omlt for interfacing our neural network with pyomo\n",
+    "# pytorch for training neural network\n",
+    "import torch\n",
+    "import torch.onnx\n",
+    "from torch import nn, optim\n",
+    "from torch.optim.lr_scheduler import StepLR\n",
+    "from torchvision import datasets, transforms\n",
+    "\n",
+    "# omlt for interfacing our neural network with pyomo\n",
     "from omlt import OmltBlock\n",
-    "from omlt.neuralnet import FullSpaceNNFormulation\n",
-    "from omlt.io.onnx import write_onnx_model_with_bounds, load_onnx_neural_network_with_bounds"
+    "from omlt.io.onnx import (\n",
+    "    load_onnx_neural_network_with_bounds,\n",
+    "    write_onnx_model_with_bounds,\n",
+    ")\n",
+    "from omlt.neuralnet import FullSpaceNNFormulation"
    ]
   },
   {
@@ -83,14 +86,16 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "#set training and test batch sizes\n",
-    "train_kwargs = {'batch_size': 64}\n",
-    "test_kwargs = {'batch_size': 1000}\n",
+    "# set training and test batch sizes\n",
+    "train_kwargs = {\"batch_size\": 64}\n",
+    "test_kwargs = {\"batch_size\": 1000}\n",
     "\n",
-    "#build DataLoaders for training and test sets\n",
-    "dataset1 = datasets.MNIST('../data', train=True, download=True, transform=transforms.ToTensor())\n",
-    "dataset2 = datasets.MNIST('../data', train=False, transform=transforms.ToTensor())\n",
-    "train_loader = torch.utils.data.DataLoader(dataset1,**train_kwargs, shuffle=True)\n",
+    "# build DataLoaders for training and test sets\n",
+    "dataset1 = datasets.MNIST(\n",
+    "    \"../data\", train=True, download=True, transform=transforms.ToTensor()\n",
+    ")\n",
+    "dataset2 = datasets.MNIST(\"../data\", train=False, transform=transforms.ToTensor())\n",
+    "train_loader = torch.utils.data.DataLoader(dataset1, **train_kwargs, shuffle=True)\n",
     "test_loader = torch.utils.data.DataLoader(dataset2, **test_kwargs)"
    ]
   },
@@ -109,25 +114,25 @@
    "source": [
     "hidden_size = 50\n",
     "\n",
+    "\n",
     "class Net(nn.Module):\n",
-    "    #define layers of neural network\n",
+    "    # define layers of neural network\n",
     "    def __init__(self):\n",
     "        super().__init__()\n",
-    "        self.hidden1  = nn.Linear(784, hidden_size)\n",
-    "        self.hidden2  = nn.Linear(hidden_size, hidden_size)\n",
-    "        self.output  = nn.Linear(hidden_size, 10)\n",
+    "        self.hidden1 = nn.Linear(784, hidden_size)\n",
+    "        self.hidden2 = nn.Linear(hidden_size, hidden_size)\n",
+    "        self.output = nn.Linear(hidden_size, 10)\n",
     "        self.relu = nn.ReLU()\n",
     "        self.softmax = nn.LogSoftmax(dim=1)\n",
     "\n",
-    "    #define forward pass of neural network\n",
+    "    # define forward pass of neural network\n",
     "    def forward(self, x):\n",
     "        x = self.hidden1(x)\n",
     "        x = self.relu(x)\n",
     "        x = self.hidden2(x)\n",
     "        x = self.relu(x)\n",
     "        x = self.output(x)\n",
-    "        x = self.softmax(x)      \n",
-    "        return x"
+    "        return self.softmax(x)"
    ]
   },
   {
@@ -143,33 +148,38 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "#training function computes loss and its gradient on batch, and prints status after every 200 batches\n",
+    "# training function computes loss and its gradient on batch, and prints status after every 200 batches\n",
     "def train(model, train_loader, optimizer, epoch):\n",
-    "    model.train(); criterion = nn.NLLLoss()\n",
+    "    model.train()\n",
+    "    criterion = nn.NLLLoss()\n",
     "    for batch_idx, (data, target) in enumerate(train_loader):\n",
     "        optimizer.zero_grad()\n",
-    "        output = model(data.view(-1, 28*28))\n",
+    "        output = model(data.view(-1, 28 * 28))\n",
     "        loss = criterion(output, target)\n",
     "        loss.backward()\n",
     "        optimizer.step()\n",
-    "        if batch_idx % 200  == 0:\n",
-    "            print('Train Epoch: {} [{}/{} ({:.0f}%)]\\tLoss: {:.6f}'.format(\n",
-    "                epoch, batch_idx * len(data), len(train_loader.dataset),\n",
-    "                100. * batch_idx / len(train_loader), loss.item()))\n",
+    "        if batch_idx % 200 == 0:\n",
+    "            print(\n",
+    "                f\"Train Epoch: {epoch} [{batch_idx * len(data)}/{len(train_loader.dataset)} ({100.0 * batch_idx / len(train_loader):.0f}%)]\\tLoss: {loss.item():.6f}\"\n",
+    "            )\n",
+    "\n",
     "\n",
-    "#testing function computes loss and prints overall model accuracy on test set\n",
+    "# testing function computes loss and prints overall model accuracy on test set\n",
     "def test(model, test_loader):\n",
-    "    model.eval(); criterion = nn.NLLLoss( reduction='sum')\n",
-    "    test_loss = 0; correct = 0\n",
+    "    model.eval()\n",
+    "    criterion = nn.NLLLoss(reduction=\"sum\")\n",
+    "    test_loss = 0\n",
+    "    correct = 0\n",
     "    with torch.no_grad():\n",
     "        for data, target in test_loader:\n",
-    "            output = model(data.view(-1, 28*28))\n",
-    "            test_loss += criterion(output, target).item()  \n",
-    "            pred = output.argmax(dim=1, keepdim=True) \n",
+    "            output = model(data.view(-1, 28 * 28))\n",
+    "            test_loss += criterion(output, target).item()\n",
+    "            pred = output.argmax(dim=1, keepdim=True)\n",
     "            correct += pred.eq(target.view_as(pred)).sum().item()\n",
     "    test_loss /= len(test_loader.dataset)\n",
-    "    print('\\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\\n'.format(\n",
-    "        test_loss, correct, len(test_loader.dataset), 100. * correct / len(test_loader.dataset)))            "
+    "    print(\n",
+    "        f\"\\nTest set: Average loss: {test_loss:.4f}, Accuracy: {correct}/{len(test_loader.dataset)} ({100.0 * correct / len(test_loader.dataset):.0f}%)\\n\"\n",
+    "    )"
    ]
   },
   {
@@ -233,12 +243,12 @@
     }
    ],
    "source": [
-    "#define model and optimizer\n",
+    "# define model and optimizer\n",
     "model = Net()\n",
     "optimizer = optim.Adadelta(model.parameters(), lr=1)\n",
     "scheduler = StepLR(optimizer, step_size=1, gamma=0.7)\n",
     "\n",
-    "#train neural network for five epochs\n",
+    "# train neural network for five epochs\n",
     "for epoch in range(5):\n",
     "    train(model, train_loader, optimizer, epoch)\n",
     "    test(model, test_loader)\n",
@@ -279,24 +289,24 @@
    ],
    "source": [
     "class NoSoftmaxNet(nn.Module):\n",
-    "    #define layers of neural network\n",
+    "    # define layers of neural network\n",
     "    def __init__(self):\n",
     "        super().__init__()\n",
-    "        self.hidden1  = nn.Linear(784, hidden_size)\n",
-    "        self.hidden2  = nn.Linear(hidden_size, hidden_size)\n",
-    "        self.output  = nn.Linear(hidden_size, 10)\n",
+    "        self.hidden1 = nn.Linear(784, hidden_size)\n",
+    "        self.hidden2 = nn.Linear(hidden_size, hidden_size)\n",
+    "        self.output = nn.Linear(hidden_size, 10)\n",
     "        self.relu = nn.ReLU()\n",
     "\n",
-    "    #define forward pass of neural network\n",
+    "    # define forward pass of neural network\n",
     "    def forward(self, x):\n",
     "        x = self.hidden1(x)\n",
     "        x = self.relu(x)\n",
     "        x = self.hidden2(x)\n",
     "        x = self.relu(x)\n",
-    "        x = self.output(x)\n",
-    "        return x\n",
+    "        return self.output(x)\n",
+    "\n",
     "\n",
-    "#create neural network without LogSoftmax and load parameters from existing model\n",
+    "# create neural network without LogSoftmax and load parameters from existing model\n",
     "model2 = NoSoftmaxNet()\n",
     "model2.load_state_dict(model.state_dict())"
    ]
@@ -324,23 +334,23 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "#load image and true label from test set with index 'problem_index'\n",
+    "# load image and true label from test set with index 'problem_index'\n",
     "problem_index = 0\n",
-    "image = dataset2[problem_index][0].view(-1,28*28).detach().numpy()\n",
+    "image = dataset2[problem_index][0].view(-1, 28 * 28).detach().numpy()\n",
     "label = dataset2[problem_index][1]\n",
     "\n",
-    "#define input region defined by infinity norm\n",
+    "# define input region defined by infinity norm\n",
     "epsilon_infty = 5e-2\n",
     "lb = np.maximum(0, image - epsilon_infty)\n",
     "ub = np.minimum(1, image + epsilon_infty)\n",
     "\n",
-    "#save input bounds as dictionary\n",
+    "# save input bounds as dictionary\n",
     "input_bounds = {}\n",
-    "for i in range(28*28):\n",
-    "    input_bounds[i] = (float(lb[0][i]), float(ub[0][i])) \n",
-    "    \n",
-    "#define dummy input tensor    \n",
-    "x_temp = dataset2[problem_index][0].view(-1,28*28)"
+    "for i in range(28 * 28):\n",
+    "    input_bounds[i] = (float(lb[0][i]), float(ub[0][i]))\n",
+    "\n",
+    "# define dummy input tensor\n",
+    "x_temp = dataset2[problem_index][0].view(-1, 28 * 28)"
    ]
   },
   {
@@ -356,22 +366,19 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "with tempfile.NamedTemporaryFile(suffix='.onnx', delete=False) as f:\n",
-    "    #export neural network to ONNX\n",
+    "with tempfile.NamedTemporaryFile(suffix=\".onnx\", delete=False) as f:\n",
+    "    # export neural network to ONNX\n",
     "    torch.onnx.export(\n",
     "        model2,\n",
     "        x_temp,\n",
     "        f,\n",
-    "        input_names=['input'],\n",
-    "        output_names=['output'],\n",
-    "        dynamic_axes={\n",
-    "            'input': {0: 'batch_size'},\n",
-    "            'output': {0: 'batch_size'}\n",
-    "        }\n",
+    "        input_names=[\"input\"],\n",
+    "        output_names=[\"output\"],\n",
+    "        dynamic_axes={\"input\": {0: \"batch_size\"}, \"output\": {0: \"batch_size\"}},\n",
     "    )\n",
-    "    #write ONNX model and its bounds using OMLT\n",
+    "    # write ONNX model and its bounds using OMLT\n",
     "    write_onnx_model_with_bounds(f.name, None, input_bounds)\n",
-    "    #load the network definition from the ONNX model\n",
+    "    # load the network definition from the ONNX model\n",
     "    network_definition = load_onnx_neural_network_with_bounds(f.name)"
    ]
   },
@@ -777,12 +784,12 @@
     }
    ],
    "source": [
-    "#create pyomo model\n",
+    "# create pyomo model\n",
     "m = pyo.ConcreteModel()\n",
     "\n",
-    "#create an OMLT block for the neural network and build its formulation\n",
+    "# create an OMLT block for the neural network and build its formulation\n",
     "m.nn = OmltBlock()\n",
-    "m.nn.build_formulation(formulation) "
+    "m.nn.build_formulation(formulation)"
    ]
   },
   {
@@ -799,7 +806,7 @@
    "outputs": [],
    "source": [
     "adversary = (label + 1) % 10\n",
-    "m.obj = pyo.Objective(expr=(-(m.nn.outputs[adversary]-m.nn.outputs[label])))"
+    "m.obj = pyo.Objective(expr=(-(m.nn.outputs[adversary] - m.nn.outputs[label])))"
    ]
   },
   {
@@ -961,7 +968,7 @@
     }
    ],
    "source": [
-    "pyo.SolverFactory('cbc').solve(m, tee=True)"
+    "pyo.SolverFactory(\"cbc\").solve(m, tee=True)"
    ]
   }
  ],
diff --git a/docs/notebooks/neuralnet/neural_network_formulations.ipynb b/docs/notebooks/neuralnet/neural_network_formulations.ipynb
index 3317acd9..07613122 100644
--- a/docs/notebooks/neuralnet/neural_network_formulations.ipynb
+++ b/docs/notebooks/neuralnet/neural_network_formulations.ipynb
@@ -45,6 +45,7 @@
   {
    "cell_type": "code",
    "execution_count": 1,
+   "id": "7fb27b941602401d91542211134fc71a",
    "metadata": {
     "pycharm": {
      "name": "#%%\n"
@@ -62,31 +63,37 @@
     }
    ],
    "source": [
-    "#Start by importing the following libraries\n",
-    "#data manipulation and plotting\n",
-    "import pandas as pd\n",
-    "import numpy as np\n",
-    "import matplotlib.pyplot as plt\n",
+    "# Start by importing the following libraries\n",
+    "# data manipulation and plotting\n",
     "import matplotlib\n",
-    "matplotlib.rc('font', size=24)\n",
-    "plt.rc('axes', titlesize=24)\n",
+    "import matplotlib.pyplot as plt\n",
+    "import numpy as np\n",
+    "import pandas as pd\n",
     "\n",
-    "#tensorflow objects\n",
-    "from tensorflow.keras.models import Sequential, Model\n",
+    "matplotlib.rc(\"font\", size=24)\n",
+    "plt.rc(\"axes\", titlesize=24)\n",
+    "\n",
+    "# tensorflow objects\n",
+    "# pyomo for optimization\n",
+    "import pyomo.environ as pyo\n",
     "from tensorflow.keras.layers import Dense, Input\n",
+    "from tensorflow.keras.models import Sequential\n",
     "from tensorflow.keras.optimizers import Adam\n",
     "\n",
-    "#pyomo for optimization\n",
-    "import pyomo.environ as pyo\n",
+    "import omlt\n",
     "\n",
-    "#omlt for interfacing our neural network with pyomo\n",
+    "# omlt for interfacing our neural network with pyomo\n",
     "from omlt import OmltBlock\n",
-    "from omlt.neuralnet import NetworkDefinition, FullSpaceNNFormulation, \\\n",
-    "FullSpaceSmoothNNFormulation, ReducedSpaceSmoothNNFormulation, ReluBigMFormulation,\\\n",
-    "ReluComplementarityFormulation, ReluPartitionFormulation\n",
-    "from omlt.neuralnet.activations import ComplementarityReLUActivation\n",
     "from omlt.io.keras import keras_reader\n",
-    "import omlt"
+    "from omlt.neuralnet import (\n",
+    "    FullSpaceNNFormulation,\n",
+    "    FullSpaceSmoothNNFormulation,\n",
+    "    ReducedSpaceSmoothNNFormulation,\n",
+    "    ReluBigMFormulation,\n",
+    "    ReluComplementarityFormulation,\n",
+    "    ReluPartitionFormulation,\n",
+    ")\n",
+    "from omlt.neuralnet.activations import ComplementarityReLUActivation"
    ]
   },
   {
@@ -116,6 +123,7 @@
   {
    "cell_type": "code",
    "execution_count": 2,
+   "id": "acae54e37e7d407bbb7b55eff062a284",
    "metadata": {
     "pycharm": {
      "name": "#%%\n"
@@ -123,7 +131,7 @@
    },
    "outputs": [],
    "source": [
-    "df = pd.read_csv(\"../data/sin_quadratic.csv\",index_col=[0]);"
+    "df = pd.read_csv(\"../data/sin_quadratic.csv\", index_col=[0]);"
    ]
   },
   {
@@ -141,6 +149,7 @@
   {
    "cell_type": "code",
    "execution_count": 3,
+   "id": "9a63283cbaf04dbcab1f6479b197f3a8",
    "metadata": {
     "pycharm": {
      "name": "#%%\n"
@@ -159,27 +168,27 @@
     }
    ],
    "source": [
-    "#retrieve input 'x' and output 'y' from the dataframe\n",
+    "# retrieve input 'x' and output 'y' from the dataframe\n",
     "x = df[\"x\"]\n",
     "y = df[\"y\"]\n",
     "\n",
-    "#calculate mean and standard deviation, add scaled 'x' and scaled 'y' to the dataframe\n",
+    "# calculate mean and standard deviation, add scaled 'x' and scaled 'y' to the dataframe\n",
     "mean_data = df.mean(axis=0)\n",
     "std_data = df.std(axis=0)\n",
-    "df[\"x_scaled\"] = (df['x'] - mean_data['x']) / std_data['x']\n",
-    "df[\"y_scaled\"] = (df['y'] - mean_data['y']) / std_data['y']\n",
+    "df[\"x_scaled\"] = (df[\"x\"] - mean_data[\"x\"]) / std_data[\"x\"]\n",
+    "df[\"y_scaled\"] = (df[\"y\"] - mean_data[\"y\"]) / std_data[\"y\"]\n",
     "\n",
-    "#create plots for unscaled and scaled data\n",
-    "f, (ax1, ax2) = plt.subplots(1, 2,figsize = (16,8))\n",
+    "# create plots for unscaled and scaled data\n",
+    "f, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 8))\n",
     "\n",
     "ax1.plot(x, y)\n",
     "ax1.set_xlabel(\"x\")\n",
-    "ax1.set_ylabel(\"y\");\n",
+    "ax1.set_ylabel(\"y\")\n",
     "ax1.set_title(\"Training Data\")\n",
     "\n",
     "ax2.plot(df[\"x_scaled\"], df[\"y_scaled\"])\n",
     "ax2.set_xlabel(\"x_scaled\")\n",
-    "ax2.set_ylabel(\"y_scaled\");\n",
+    "ax2.set_ylabel(\"y_scaled\")\n",
     "ax2.set_title(\"Scaled Training Data\")\n",
     "\n",
     "plt.tight_layout()"
@@ -205,6 +214,7 @@
   {
    "cell_type": "code",
    "execution_count": 4,
+   "id": "8dd0d8092fe74a7c96281538738b07e2",
    "metadata": {
     "pycharm": {
      "name": "#%%\n"
@@ -212,34 +222,35 @@
    },
    "outputs": [],
    "source": [
-    "#sigmoid neural network\n",
-    "nn1 = Sequential(name='sin_wave_sigmoid')\n",
+    "# sigmoid neural network\n",
+    "nn1 = Sequential(name=\"sin_wave_sigmoid\")\n",
     "nn1.add(Input(np.array((1,))))\n",
-    "nn1.add(Dense(50, activation='sigmoid'))\n",
-    "nn1.add(Dense(50, activation='sigmoid'))\n",
+    "nn1.add(Dense(50, activation=\"sigmoid\"))\n",
+    "nn1.add(Dense(50, activation=\"sigmoid\"))\n",
     "nn1.add(Dense(1))\n",
-    "nn1.compile(optimizer=Adam(), loss='mse')\n",
+    "nn1.compile(optimizer=Adam(), loss=\"mse\")\n",
     "\n",
-    "#relu neural network\n",
-    "nn2 = Sequential(name='sin_wave_relu')\n",
+    "# relu neural network\n",
+    "nn2 = Sequential(name=\"sin_wave_relu\")\n",
     "nn2.add(Input(np.array((1,))))\n",
-    "nn2.add(Dense(30, activation='relu'))\n",
-    "nn2.add(Dense(30, activation='relu'))\n",
+    "nn2.add(Dense(30, activation=\"relu\"))\n",
+    "nn2.add(Dense(30, activation=\"relu\"))\n",
     "nn2.add(Dense(1))\n",
-    "nn2.compile(optimizer=Adam(), loss='mse')\n",
+    "nn2.compile(optimizer=Adam(), loss=\"mse\")\n",
     "\n",
-    "#mixed neural network\n",
-    "nn3 = Sequential(name='sin_wave_mixed')\n",
+    "# mixed neural network\n",
+    "nn3 = Sequential(name=\"sin_wave_mixed\")\n",
     "nn3.add(Input(np.array((1,))))\n",
-    "nn3.add(Dense(50, activation='sigmoid'))\n",
-    "nn3.add(Dense(50, activation='relu'))\n",
+    "nn3.add(Dense(50, activation=\"sigmoid\"))\n",
+    "nn3.add(Dense(50, activation=\"relu\"))\n",
     "nn3.add(Dense(1))\n",
-    "nn3.compile(optimizer=Adam(), loss='mse')"
+    "nn3.compile(optimizer=Adam(), loss=\"mse\")"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 5,
+   "id": "72eea5119410473aa328ad9291626812",
    "metadata": {
     "pycharm": {
      "name": "#%%\n"
@@ -850,15 +861,14 @@
       "\u001b[1m313/313\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 923us/step - loss: 0.0090\n",
       "Epoch 150/150\n",
       "\u001b[1m313/313\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 929us/step - loss: 0.0087\n"
-
      ]
     }
    ],
    "source": [
-    "#train all three neural networks\n",
-    "history1 = nn1.fit(x=df['x_scaled'], y=df['y_scaled'],verbose=1, epochs=75)\n",
-    "history2 = nn2.fit(x=df['x_scaled'], y=df['y_scaled'],verbose=1, epochs=75)\n",
-    "history3 = nn3.fit(x=df['x_scaled'], y=df['y_scaled'],verbose=1, epochs=150)"
+    "# train all three neural networks\n",
+    "history1 = nn1.fit(x=df[\"x_scaled\"], y=df[\"y_scaled\"], verbose=1, epochs=75)\n",
+    "history2 = nn2.fit(x=df[\"x_scaled\"], y=df[\"y_scaled\"], verbose=1, epochs=75)\n",
+    "history3 = nn3.fit(x=df[\"x_scaled\"], y=df[\"y_scaled\"], verbose=1, epochs=150)"
    ]
   },
   {
@@ -877,6 +887,7 @@
   {
    "cell_type": "code",
    "execution_count": 6,
+   "id": "8edb47106e1a46a883d545849b8ab81b",
    "metadata": {
     "pycharm": {
      "name": "#%%\n"
@@ -894,23 +905,24 @@
     }
    ],
    "source": [
-    "#note: we calculate the unscaled output for each neural network to check the predictions\n",
-    "#nn1\n",
-    "y_predict_scaled_sigmoid = nn1.predict(x=df['x_scaled'])\n",
-    "y_predict_sigmoid = y_predict_scaled_sigmoid*(std_data['y']) + mean_data['y']\n",
-    "\n",
-    "#nn2\n",
-    "y_predict_scaled_relu = nn2.predict(x=df['x_scaled'])\n",
-    "y_predict_relu = y_predict_scaled_relu*(std_data['y']) + mean_data['y']\n",
-    "\n",
-    "#nn3\n",
-    "y_predict_scaled_mixed = nn3.predict(x=df['x_scaled'])\n",
-    "y_predict_mixed = y_predict_scaled_mixed*(std_data['y']) + mean_data['y']"
+    "# note: we calculate the unscaled output for each neural network to check the predictions\n",
+    "# nn1\n",
+    "y_predict_scaled_sigmoid = nn1.predict(x=df[\"x_scaled\"])\n",
+    "y_predict_sigmoid = y_predict_scaled_sigmoid * (std_data[\"y\"]) + mean_data[\"y\"]\n",
+    "\n",
+    "# nn2\n",
+    "y_predict_scaled_relu = nn2.predict(x=df[\"x_scaled\"])\n",
+    "y_predict_relu = y_predict_scaled_relu * (std_data[\"y\"]) + mean_data[\"y\"]\n",
+    "\n",
+    "# nn3\n",
+    "y_predict_scaled_mixed = nn3.predict(x=df[\"x_scaled\"])\n",
+    "y_predict_mixed = y_predict_scaled_mixed * (std_data[\"y\"]) + mean_data[\"y\"]"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 7,
+   "id": "10185d26023b46108eb7d9f57d49d2b3",
    "metadata": {
     "pycharm": {
      "name": "#%%\n"
@@ -929,12 +941,12 @@
     }
    ],
    "source": [
-    "#create a single plot with the original data and each neural network's predictions\n",
-    "fig,ax = plt.subplots(1,figsize = (8,8))\n",
-    "ax.plot(x,y,linewidth = 3.0,label = \"data\", alpha = 0.5)\n",
-    "ax.plot(x,y_predict_relu,linewidth = 3.0,linestyle=\"dotted\",label = \"relu\")\n",
-    "ax.plot(x,y_predict_sigmoid,linewidth = 3.0,linestyle=\"dotted\",label = \"sigmoid\")\n",
-    "ax.plot(x,y_predict_mixed,linewidth = 3.0,linestyle=\"dotted\",label = \"mixed\")\n",
+    "# create a single plot with the original data and each neural network's predictions\n",
+    "fig, ax = plt.subplots(1, figsize=(8, 8))\n",
+    "ax.plot(x, y, linewidth=3.0, label=\"data\", alpha=0.5)\n",
+    "ax.plot(x, y_predict_relu, linewidth=3.0, linestyle=\"dotted\", label=\"relu\")\n",
+    "ax.plot(x, y_predict_sigmoid, linewidth=3.0, linestyle=\"dotted\", label=\"sigmoid\")\n",
+    "ax.plot(x, y_predict_mixed, linewidth=3.0, linestyle=\"dotted\", label=\"mixed\")\n",
     "plt.xlabel(\"x\")\n",
     "plt.ylabel(\"y\")\n",
     "plt.legend();"
@@ -1051,6 +1063,7 @@
   {
    "cell_type": "code",
    "execution_count": 8,
+   "id": "8763a12b2bbd4a93a75aff182afb95dc",
    "metadata": {
     "pycharm": {
      "name": "#%%\n"
@@ -1067,17 +1080,23 @@
     }
    ],
    "source": [
-    "#create an omlt scaling object\n",
-    "scaler = omlt.scaling.OffsetScaling(offset_inputs=[mean_data['x']],\n",
-    "                    factor_inputs=[std_data['x']],\n",
-    "                    offset_outputs=[mean_data['y']],\n",
-    "                    factor_outputs=[std_data['y']])\n",
-    "\n",
-    "#create the input bounds. note that the key `0` corresponds to input `0` and that we also scale the input bounds\n",
-    "input_bounds={0:((min(df['x']) - mean_data['x'])/std_data['x'],\n",
-    "                 (max(df['x']) - mean_data['x'])/std_data['x'])};\n",
+    "# create an omlt scaling object\n",
+    "scaler = omlt.scaling.OffsetScaling(\n",
+    "    offset_inputs=[mean_data[\"x\"]],\n",
+    "    factor_inputs=[std_data[\"x\"]],\n",
+    "    offset_outputs=[mean_data[\"y\"]],\n",
+    "    factor_outputs=[std_data[\"y\"]],\n",
+    ")\n",
+    "\n",
+    "# create the input bounds. note that the key `0` corresponds to input `0` and that we also scale the input bounds\n",
+    "input_bounds = {\n",
+    "    0: (\n",
+    "        (min(df[\"x\"]) - mean_data[\"x\"]) / std_data[\"x\"],\n",
+    "        (max(df[\"x\"]) - mean_data[\"x\"]) / std_data[\"x\"],\n",
+    "    )\n",
+    "}\n",
     "print(scaler)\n",
-    "print(\"Scaled input bounds: \",input_bounds)"
+    "print(\"Scaled input bounds: \", input_bounds)"
    ]
   },
   {
@@ -1099,6 +1118,7 @@
   {
    "cell_type": "code",
    "execution_count": 9,
+   "id": "7623eae2785240b9bd12b16a66d81610",
    "metadata": {
     "pycharm": {
      "name": "#%%\n"
@@ -1181,39 +1201,43 @@
     }
    ],
    "source": [
-    "#create a network definition\n",
-    "net_sigmoid = keras_reader.load_keras_sequential(nn1,scaler,input_bounds)\n",
+    "# create a network definition\n",
+    "net_sigmoid = keras_reader.load_keras_sequential(nn1, scaler, input_bounds)\n",
     "\n",
-    "#create a pyomo model with variables x and y\n",
+    "# create a pyomo model with variables x and y\n",
     "model1_reduced = pyo.ConcreteModel()\n",
-    "model1_reduced.x = pyo.Var(initialize = 0)\n",
-    "model1_reduced.y = pyo.Var(initialize = 0)\n",
+    "model1_reduced.x = pyo.Var(initialize=0)\n",
+    "model1_reduced.y = pyo.Var(initialize=0)\n",
     "model1_reduced.obj = pyo.Objective(expr=(model1_reduced.y))\n",
     "\n",
-    "#create an OmltBlock\n",
+    "# create an OmltBlock\n",
     "model1_reduced.nn = OmltBlock()\n",
     "\n",
-    "#use the reduced-space formulation\n",
+    "# use the reduced-space formulation\n",
     "formulation1_reduced = ReducedSpaceSmoothNNFormulation(net_sigmoid)\n",
     "model1_reduced.nn.build_formulation(formulation1_reduced)\n",
     "\n",
-    "#connect pyomo variables to the neural network\n",
+    "\n",
+    "# connect pyomo variables to the neural network\n",
     "@model1_reduced.Constraint()\n",
     "def connect_inputs(mdl):\n",
     "    return mdl.x == mdl.nn.inputs[0]\n",
     "\n",
+    "\n",
     "@model1_reduced.Constraint()\n",
     "def connect_outputs(mdl):\n",
     "    return mdl.y == mdl.nn.outputs[0]\n",
     "\n",
-    "#solve the model and query the solution\n",
-    "status_1_reduced = pyo.SolverFactory('ipopt').solve(model1_reduced, tee=True)\n",
-    "solution_1_reduced = (pyo.value(model1_reduced.x),pyo.value(model1_reduced.y))"
+    "\n",
+    "# solve the model and query the solution\n",
+    "status_1_reduced = pyo.SolverFactory(\"ipopt\").solve(model1_reduced, tee=True)\n",
+    "solution_1_reduced = (pyo.value(model1_reduced.x), pyo.value(model1_reduced.y))"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 10,
+   "id": "7cdc8c89c7104fffa095e18ddfef8986",
    "metadata": {
     "pycharm": {
      "name": "#%%\n"
@@ -1234,13 +1258,13 @@
     }
    ],
    "source": [
-    "#print out model size and solution values\n",
+    "# print out model size and solution values\n",
     "print(\"Reduced Space Solution:\")\n",
-    "print(\"# of variables: \",model1_reduced.nvariables())\n",
-    "print(\"# of constraints: \",model1_reduced.nconstraints())\n",
+    "print(\"# of variables: \", model1_reduced.nvariables())\n",
+    "print(\"# of constraints: \", model1_reduced.nconstraints())\n",
     "print(\"x = \", solution_1_reduced[0])\n",
     "print(\"y = \", solution_1_reduced[1])\n",
-    "print(\"Solve Time: \", status_1_reduced['Solver'][0]['Time'])"
+    "print(\"Solve Time: \", status_1_reduced[\"Solver\"][0][\"Time\"])"
    ]
   },
   {
@@ -1261,6 +1285,7 @@
   {
    "cell_type": "code",
    "execution_count": 11,
+   "id": "b118ea5561624da68c537baed56e602f",
    "metadata": {
     "pycharm": {
      "name": "#%%\n"
@@ -1447,32 +1472,36 @@
     }
    ],
    "source": [
-    "net_sigmoid = keras_reader.load_keras_sequential(nn1,scaler,input_bounds)\n",
+    "net_sigmoid = keras_reader.load_keras_sequential(nn1, scaler, input_bounds)\n",
     "\n",
     "model1_full = pyo.ConcreteModel()\n",
-    "model1_full.x = pyo.Var(initialize = 0)\n",
-    "model1_full.y = pyo.Var(initialize = 0)\n",
+    "model1_full.x = pyo.Var(initialize=0)\n",
+    "model1_full.y = pyo.Var(initialize=0)\n",
     "model1_full.obj = pyo.Objective(expr=(model1_full.y))\n",
     "model1_full.nn = OmltBlock()\n",
     "\n",
     "formulation2_full = FullSpaceSmoothNNFormulation(net_sigmoid)\n",
     "model1_full.nn.build_formulation(formulation2_full)\n",
     "\n",
+    "\n",
     "@model1_full.Constraint()\n",
     "def connect_inputs(mdl):\n",
     "    return mdl.x == mdl.nn.inputs[0]\n",
     "\n",
+    "\n",
     "@model1_full.Constraint()\n",
     "def connect_outputs(mdl):\n",
     "    return mdl.y == mdl.nn.outputs[0]\n",
     "\n",
-    "status_1_full = pyo.SolverFactory('ipopt').solve(model1_full, tee=True)\n",
-    "solution_1_full = (pyo.value(model1_full.x),pyo.value(model1_full.y))"
+    "\n",
+    "status_1_full = pyo.SolverFactory(\"ipopt\").solve(model1_full, tee=True)\n",
+    "solution_1_full = (pyo.value(model1_full.x), pyo.value(model1_full.y))"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 12,
+   "id": "938c804e27f84196a10c8828c723f798",
    "metadata": {
     "pycharm": {
      "name": "#%%\n"
@@ -1493,13 +1522,13 @@
     }
    ],
    "source": [
-    "#print out model size and solution values\n",
+    "# print out model size and solution values\n",
     "print(\"Full Space Solution:\")\n",
-    "print(\"# of variables: \",model1_full.nvariables())\n",
-    "print(\"# of constraints: \",model1_full.nconstraints())\n",
+    "print(\"# of variables: \", model1_full.nvariables())\n",
+    "print(\"# of constraints: \", model1_full.nconstraints())\n",
     "print(\"x = \", solution_1_full[0])\n",
     "print(\"y = \", solution_1_full[1])\n",
-    "print(\"Solve Time: \", status_1_full['Solver'][0]['Time'])"
+    "print(\"Solve Time: \", status_1_full[\"Solver\"][0][\"Time\"])"
    ]
   },
   {
@@ -1521,6 +1550,7 @@
   {
    "cell_type": "code",
    "execution_count": 13,
+   "id": "504fb2a444614c0babb325280ed9130a",
    "metadata": {
     "pycharm": {
      "name": "#%%\n"
@@ -1668,32 +1698,36 @@
     }
    ],
    "source": [
-    "net_relu = keras_reader.load_keras_sequential(nn2,scaler,input_bounds)\n",
+    "net_relu = keras_reader.load_keras_sequential(nn2, scaler, input_bounds)\n",
     "\n",
     "model2_comp = pyo.ConcreteModel()\n",
-    "model2_comp.x = pyo.Var(initialize = 0)\n",
-    "model2_comp.y = pyo.Var(initialize = 0)\n",
+    "model2_comp.x = pyo.Var(initialize=0)\n",
+    "model2_comp.y = pyo.Var(initialize=0)\n",
     "model2_comp.obj = pyo.Objective(expr=(model2_comp.y))\n",
     "model2_comp.nn = OmltBlock()\n",
     "\n",
     "formulation2_comp = ReluComplementarityFormulation(net_relu)\n",
     "model2_comp.nn.build_formulation(formulation2_comp)\n",
     "\n",
+    "\n",
     "@model2_comp.Constraint()\n",
     "def connect_inputs(mdl):\n",
     "    return mdl.x == mdl.nn.inputs[0]\n",
     "\n",
+    "\n",
     "@model2_comp.Constraint()\n",
     "def connect_outputs(mdl):\n",
     "    return mdl.y == mdl.nn.outputs[0]\n",
     "\n",
-    "status_2_comp = pyo.SolverFactory('ipopt').solve(model2_comp, tee=True)\n",
-    "solution_2_comp = (pyo.value(model2_comp.x),pyo.value(model2_comp.y))"
+    "\n",
+    "status_2_comp = pyo.SolverFactory(\"ipopt\").solve(model2_comp, tee=True)\n",
+    "solution_2_comp = (pyo.value(model2_comp.x), pyo.value(model2_comp.y))"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 14,
+   "id": "59bbdb311c014d738909a11f9e486628",
    "metadata": {
     "pycharm": {
      "name": "#%%\n"
@@ -1714,13 +1748,13 @@
     }
    ],
    "source": [
-    "#print out model size and solution values\n",
+    "# print out model size and solution values\n",
     "print(\"ReLU Complementarity Solution:\")\n",
-    "print(\"# of variables: \",model2_comp.nvariables())\n",
-    "print(\"# of constraints: \",model2_comp.nconstraints())\n",
+    "print(\"# of variables: \", model2_comp.nvariables())\n",
+    "print(\"# of constraints: \", model2_comp.nconstraints())\n",
     "print(\"x = \", solution_2_comp[0])\n",
     "print(\"y = \", solution_2_comp[1])\n",
-    "print(\"Solve Time: \", status_2_comp['Solver'][0]['Time'])"
+    "print(\"Solve Time: \", status_2_comp[\"Solver\"][0][\"Time\"])"
    ]
   },
   {
@@ -1741,6 +1775,7 @@
   {
    "cell_type": "code",
    "execution_count": 15,
+   "id": "b43b363d81ae4b689946ece5c682cd59",
    "metadata": {
     "pycharm": {
      "name": "#%%\n"
@@ -1748,32 +1783,36 @@
    },
    "outputs": [],
    "source": [
-    "net_relu = keras_reader.load_keras_sequential(nn2,scaler,input_bounds)\n",
+    "net_relu = keras_reader.load_keras_sequential(nn2, scaler, input_bounds)\n",
     "\n",
     "model2_bigm = pyo.ConcreteModel()\n",
-    "model2_bigm.x = pyo.Var(initialize = 0)\n",
-    "model2_bigm.y = pyo.Var(initialize = 0)\n",
+    "model2_bigm.x = pyo.Var(initialize=0)\n",
+    "model2_bigm.y = pyo.Var(initialize=0)\n",
     "model2_bigm.obj = pyo.Objective(expr=(model2_bigm.y))\n",
     "model2_bigm.nn = OmltBlock()\n",
     "\n",
     "formulation2_bigm = ReluBigMFormulation(net_relu)\n",
     "model2_bigm.nn.build_formulation(formulation2_bigm)\n",
     "\n",
+    "\n",
     "@model2_bigm.Constraint()\n",
     "def connect_inputs(mdl):\n",
     "    return mdl.x == mdl.nn.inputs[0]\n",
     "\n",
+    "\n",
     "@model2_bigm.Constraint()\n",
     "def connect_outputs(mdl):\n",
     "    return mdl.y == mdl.nn.outputs[0]\n",
     "\n",
-    "status_2_bigm = pyo.SolverFactory('cbc').solve(model2_bigm, tee=False)\n",
-    "solution_2_bigm = (pyo.value(model2_bigm.x),pyo.value(model2_bigm.y))"
+    "\n",
+    "status_2_bigm = pyo.SolverFactory(\"cbc\").solve(model2_bigm, tee=False)\n",
+    "solution_2_bigm = (pyo.value(model2_bigm.x), pyo.value(model2_bigm.y))"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 16,
+   "id": "8a65eabff63a45729fe45fb5ade58bdc",
    "metadata": {
     "pycharm": {
      "name": "#%%\n"
@@ -1794,13 +1833,13 @@
     }
    ],
    "source": [
-    "#print out model size and solution values\n",
+    "# print out model size and solution values\n",
     "print(\"ReLU BigM Solution:\")\n",
-    "print(\"# of variables: \",model2_bigm.nvariables())\n",
-    "print(\"# of constraints: \",model2_bigm.nconstraints())\n",
+    "print(\"# of variables: \", model2_bigm.nvariables())\n",
+    "print(\"# of constraints: \", model2_bigm.nconstraints())\n",
     "print(\"x = \", solution_2_bigm[0])\n",
     "print(\"y = \", solution_2_bigm[1])\n",
-    "print(\"Solve Time: \", status_2_bigm['Solver'][0]['Time'])"
+    "print(\"Solve Time: \", status_2_bigm[\"Solver\"][0][\"Time\"])"
    ]
   },
   {
@@ -1822,6 +1861,7 @@
   {
    "cell_type": "code",
    "execution_count": 17,
+   "id": "c3933fab20d04ec698c2621248eb3be0",
    "metadata": {
     "pycharm": {
      "name": "#%%\n"
@@ -1956,46 +1996,54 @@
     }
    ],
    "source": [
-    "net_relu_partition = keras_reader.load_keras_sequential(nn2,scaler,input_bounds)\n",
+    "net_relu_partition = keras_reader.load_keras_sequential(nn2, scaler, input_bounds)\n",
+    "\n",
     "\n",
-    "#create a function that partitions a vector of weights w` into `n` partitions\n",
-    "#by default, the `ReluPartitionFormulation` will use this function with n=2\n",
+    "# create a function that partitions a vector of weights w` into `n` partitions\n",
+    "# by default, the `ReluPartitionFormulation` will use this function with n=2\n",
     "def partition_split_func(w, n):\n",
     "    sorted_indexes = np.argsort(w)\n",
     "    n = min(n, len(sorted_indexes))\n",
     "    return np.array_split(sorted_indexes, n)\n",
     "\n",
-    "#change the number of partitions and create a function we can pass to the formulation\n",
+    "\n",
+    "# change the number of partitions and create a function we can pass to the formulation\n",
     "#'N = 1' corresponds to BigM, 'N = n_inputs' corresponds to a convex hull formulation\n",
     "N = 1\n",
     "split_func = lambda w: partition_split_func(w, N)\n",
     "\n",
     "model2_partition = pyo.ConcreteModel()\n",
-    "model2_partition.x = pyo.Var(initialize = 0)\n",
-    "model2_partition.y = pyo.Var(initialize = 0)\n",
+    "model2_partition.x = pyo.Var(initialize=0)\n",
+    "model2_partition.y = pyo.Var(initialize=0)\n",
     "model2_partition.obj = pyo.Objective(expr=(model2_partition.y))\n",
     "model2_partition.nn = OmltBlock()\n",
     "\n",
-    "formulation2_partition = ReluPartitionFormulation(net_relu_partition, split_func=split_func)\n",
+    "formulation2_partition = ReluPartitionFormulation(\n",
+    "    net_relu_partition, split_func=split_func\n",
+    ")\n",
     "model2_partition.nn.build_formulation(formulation2_partition)\n",
     "\n",
+    "\n",
     "@model2_partition.Constraint()\n",
     "def connect_inputs(mdl):\n",
     "    return mdl.x == mdl.nn.inputs[0]\n",
     "\n",
+    "\n",
     "@model2_partition.Constraint()\n",
     "def connect_outputs(mdl):\n",
     "    return mdl.y == mdl.nn.outputs[0]\n",
     "\n",
-    "solver = pyo.SolverFactory('cbc')\n",
+    "\n",
+    "solver = pyo.SolverFactory(\"cbc\")\n",
     "solver.options[\"printingOptions\"] = \"normal\"\n",
-    "status_2_partition=solver.solve(model2_partition, tee=True)\n",
-    "solution_2_partition = (pyo.value(model2_partition.x),pyo.value(model2_partition.y))"
+    "status_2_partition = solver.solve(model2_partition, tee=True)\n",
+    "solution_2_partition = (pyo.value(model2_partition.x), pyo.value(model2_partition.y))"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 18,
+   "id": "4dd4641cc4064e0191573fe9c69df29b",
    "metadata": {
     "pycharm": {
      "name": "#%%\n"
@@ -2016,13 +2064,13 @@
     }
    ],
    "source": [
-    "#print out model size and solution values\n",
+    "# print out model size and solution values\n",
     "print(\"ReLU Partition Solution:\")\n",
-    "print(\"# of variables: \",model2_partition.nvariables())\n",
-    "print(\"# of constraints: \",model2_partition.nconstraints())\n",
+    "print(\"# of variables: \", model2_partition.nvariables())\n",
+    "print(\"# of constraints: \", model2_partition.nconstraints())\n",
     "print(\"x = \", solution_2_partition[0])\n",
     "print(\"y = \", solution_2_partition[1])\n",
-    "print(\"Solve Time: \", status_2_partition['Solver'][0]['Time'])"
+    "print(\"Solve Time: \", status_2_partition[\"Solver\"][0][\"Time\"])"
    ]
   },
   {
@@ -2041,6 +2089,7 @@
   {
    "cell_type": "code",
    "execution_count": 19,
+   "id": "8309879909854d7188b41380fd92a7c3",
    "metadata": {
     "pycharm": {
      "name": "#%%\n"
@@ -2173,34 +2222,39 @@
     }
    ],
    "source": [
-    "net_mixed = keras_reader.load_keras_sequential(nn3,scaler,input_bounds)\n",
+    "net_mixed = keras_reader.load_keras_sequential(nn3, scaler, input_bounds)\n",
     "\n",
     "model3_mixed = pyo.ConcreteModel()\n",
-    "model3_mixed.x = pyo.Var(initialize = 0)\n",
-    "model3_mixed.y = pyo.Var(initialize = 0)\n",
+    "model3_mixed.x = pyo.Var(initialize=0)\n",
+    "model3_mixed.y = pyo.Var(initialize=0)\n",
     "model3_mixed.obj = pyo.Objective(expr=(model3_mixed.y))\n",
     "model3_mixed.nn = OmltBlock()\n",
     "\n",
-    "formulation3_mixed = FullSpaceNNFormulation(net_mixed,activation_constraints={\n",
-    "            \"relu\": ComplementarityReLUActivation()})\n",
+    "formulation3_mixed = FullSpaceNNFormulation(\n",
+    "    net_mixed, activation_constraints={\"relu\": ComplementarityReLUActivation()}\n",
+    ")\n",
     "model3_mixed.nn.build_formulation(formulation3_mixed)\n",
     "\n",
+    "\n",
     "@model3_mixed.Constraint()\n",
     "def connect_inputs(mdl):\n",
     "    return mdl.x == mdl.nn.inputs[0]\n",
     "\n",
+    "\n",
     "@model3_mixed.Constraint()\n",
     "def connect_outputs(mdl):\n",
     "    return mdl.y == mdl.nn.outputs[0]\n",
     "\n",
-    "solver = pyo.SolverFactory('ipopt')\n",
-    "status_3_mixed = solver.solve(model3_mixed, tee='true')\n",
-    "solution_3_mixed = (pyo.value(model3_mixed.x),pyo.value(model3_mixed.y))"
+    "\n",
+    "solver = pyo.SolverFactory(\"ipopt\")\n",
+    "status_3_mixed = solver.solve(model3_mixed, tee=\"true\")\n",
+    "solution_3_mixed = (pyo.value(model3_mixed.x), pyo.value(model3_mixed.y))"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 20,
+   "id": "3ed186c9a28b402fb0bc4494df01f08d",
    "metadata": {
     "pycharm": {
      "name": "#%%\n"
@@ -2221,13 +2275,13 @@
     }
    ],
    "source": [
-    "#print out model size and solution values\n",
+    "# print out model size and solution values\n",
     "print(\"Mixed NN Solution:\")\n",
-    "print(\"# of variables: \",model3_mixed.nvariables())\n",
-    "print(\"# of constraints: \",model3_mixed.nconstraints())\n",
+    "print(\"# of variables: \", model3_mixed.nvariables())\n",
+    "print(\"# of constraints: \", model3_mixed.nconstraints())\n",
     "print(\"x = \", solution_3_mixed[0])\n",
     "print(\"y = \", solution_3_mixed[1])\n",
-    "print(\"Solve Time: \", status_3_mixed['Solver'][0]['Time'])"
+    "print(\"Solve Time: \", status_3_mixed[\"Solver\"][0][\"Time\"])"
    ]
   },
   {
@@ -2253,6 +2307,7 @@
   {
    "cell_type": "code",
    "execution_count": 21,
+   "id": "cb1e1581032b452c9409d6c6813c49d1",
    "metadata": {
     "pycharm": {
      "name": "#%%\n"
@@ -2271,28 +2326,50 @@
     }
    ],
    "source": [
-    "#create a plot with 3 subplots\n",
-    "fig,axs = plt.subplots(1,3,figsize = (24,8))\n",
+    "# create a plot with 3 subplots\n",
+    "fig, axs = plt.subplots(1, 3, figsize=(24, 8))\n",
     "\n",
-    "#nn1 - sigmoid\n",
-    "axs[0].plot(x,y_predict_sigmoid,linewidth = 3.0,linestyle=\"dotted\",color = \"orange\")\n",
+    "# nn1 - sigmoid\n",
+    "axs[0].plot(x, y_predict_sigmoid, linewidth=3.0, linestyle=\"dotted\", color=\"orange\")\n",
     "axs[0].set_title(\"sigmoid\")\n",
-    "axs[0].scatter([solution_1_reduced[0]],[solution_1_reduced[1]],color = \"black\",s = 300, label=\"reduced space\")\n",
-    "axs[0].scatter([solution_1_full[0]],[solution_1_full[1]],color = \"blue\",s = 300, label=\"full space\")\n",
+    "axs[0].scatter(\n",
+    "    [solution_1_reduced[0]],\n",
+    "    [solution_1_reduced[1]],\n",
+    "    color=\"black\",\n",
+    "    s=300,\n",
+    "    label=\"reduced space\",\n",
+    ")\n",
+    "axs[0].scatter(\n",
+    "    [solution_1_full[0]], [solution_1_full[1]], color=\"blue\", s=300, label=\"full space\"\n",
+    ")\n",
     "axs[0].legend()\n",
     "\n",
-    "#nn2 - relu\n",
-    "axs[1].plot(x,y_predict_relu,linewidth = 3.0,linestyle=\"dotted\",color = \"green\")\n",
+    "# nn2 - relu\n",
+    "axs[1].plot(x, y_predict_relu, linewidth=3.0, linestyle=\"dotted\", color=\"green\")\n",
     "axs[1].set_title(\"relu\")\n",
-    "axs[1].scatter([solution_2_comp[0]],[solution_2_comp[1]],color = \"black\",s = 300, label=\"complementarity\")\n",
-    "axs[1].scatter([solution_2_bigm[0]],[solution_2_bigm[1]],color = \"blue\",s = 300, label=\"bigm\")\n",
-    "axs[1].scatter([solution_2_partition[0]],[solution_2_partition[1]],color = \"purple\",s = 300, label=\"partition\")\n",
+    "axs[1].scatter(\n",
+    "    [solution_2_comp[0]],\n",
+    "    [solution_2_comp[1]],\n",
+    "    color=\"black\",\n",
+    "    s=300,\n",
+    "    label=\"complementarity\",\n",
+    ")\n",
+    "axs[1].scatter(\n",
+    "    [solution_2_bigm[0]], [solution_2_bigm[1]], color=\"blue\", s=300, label=\"bigm\"\n",
+    ")\n",
+    "axs[1].scatter(\n",
+    "    [solution_2_partition[0]],\n",
+    "    [solution_2_partition[1]],\n",
+    "    color=\"purple\",\n",
+    "    s=300,\n",
+    "    label=\"partition\",\n",
+    ")\n",
     "axs[1].legend()\n",
     "\n",
-    "#nn3 - mixed\n",
-    "axs[2].plot(x,y_predict_mixed,linewidth = 3.0,linestyle=\"dotted\", color = \"red\")\n",
+    "# nn3 - mixed\n",
+    "axs[2].plot(x, y_predict_mixed, linewidth=3.0, linestyle=\"dotted\", color=\"red\")\n",
     "axs[2].set_title(\"mixed\")\n",
-    "axs[2].scatter([solution_3_mixed[0]],[solution_3_mixed[1]],color = \"black\",s = 300);"
+    "axs[2].scatter([solution_3_mixed[0]], [solution_3_mixed[1]], color=\"black\", s=300);"
    ]
   },
   {
diff --git a/docs/notebooks/trees/bo_with_trees.ipynb b/docs/notebooks/trees/bo_with_trees.ipynb
index 11801d96..0f52e595 100644
--- a/docs/notebooks/trees/bo_with_trees.ipynb
+++ b/docs/notebooks/trees/bo_with_trees.ipynb
@@ -47,29 +47,33 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "import numpy as np\n",
     "import random\n",
     "\n",
+    "import numpy as np\n",
+    "\n",
+    "\n",
     "def f(x, y):\n",
-    "    return (1-x)**2 + 100* ((y-x**2))**2\n",
+    "    return (1 - x) ** 2 + 100 * (y - x**2) ** 2\n",
+    "\n",
+    "\n",
+    "f_bnds = [(-2.048, 2.048) for _ in range(2)]\n",
     "\n",
-    "f_bnds = [(-2.048,2.048) for _ in range(2)]\n",
     "\n",
     "def generate_samples(num_samples, bb_bnds):\n",
-    "    data = {'X': [], 'y': []}\n",
+    "    data = {\"X\": [], \"y\": []}\n",
     "\n",
     "    for _ in range(num_samples):\n",
     "        sample = []\n",
     "\n",
     "        # iterate through all dimension bounds\n",
-    "        for idx, var_bnds in enumerate(bb_bnds):\n",
+    "        for _, var_bnds in enumerate(bb_bnds):\n",
     "            val = random.uniform(var_bnds[0], var_bnds[1])\n",
     "\n",
     "            # populate the sample\n",
     "            sample.append(val)\n",
     "\n",
-    "        data['X'].append(sample)\n",
-    "        data['y'].append(f(sample[0], sample[1]))\n",
+    "        data[\"X\"].append(sample)\n",
+    "        data[\"y\"].append(f(sample[0], sample[1]))\n",
     "    return data"
    ]
   },
@@ -89,29 +93,28 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "import lightgbm as lgb\n",
     "import warnings\n",
     "\n",
+    "import lightgbm as lgb\n",
+    "\n",
+    "\n",
     "def train_tree(data):\n",
     "    with warnings.catch_warnings():\n",
     "        warnings.simplefilter(\"ignore\")\n",
-    "        PARAMS = {'objective': 'regression',\n",
-    "                  'metric': 'rmse',\n",
-    "                  'boosting': 'gbdt',\n",
-    "                  'num_trees': 50,\n",
-    "                  'max_depth': 3,\n",
-    "                  'min_data_in_leaf': 2,\n",
-    "                  'random_state': 100,\n",
-    "                  'verbose': -1}\n",
-    "        train_x = np.asarray(data['X'])\n",
-    "        train_data = lgb.Dataset(train_x, \n",
-    "                                 label=data['y'],\n",
-    "                                 params={'verbose': -1})\n",
-    "\n",
-    "        model = lgb.train(PARAMS, \n",
-    "                          train_data,\n",
-    "                          verbose_eval=False)\n",
-    "    return model"
+    "        PARAMS = {\n",
+    "            \"objective\": \"regression\",\n",
+    "            \"metric\": \"rmse\",\n",
+    "            \"boosting\": \"gbdt\",\n",
+    "            \"num_trees\": 50,\n",
+    "            \"max_depth\": 3,\n",
+    "            \"min_data_in_leaf\": 2,\n",
+    "            \"random_state\": 100,\n",
+    "            \"verbose\": -1,\n",
+    "        }\n",
+    "        train_x = np.asarray(data[\"X\"])\n",
+    "        train_data = lgb.Dataset(train_x, label=data[\"y\"], params={\"verbose\": -1})\n",
+    "\n",
+    "        return lgb.train(PARAMS, train_data, verbose_eval=False)"
    ]
   },
   {
@@ -133,14 +136,12 @@
     "from onnxmltools.convert.lightgbm.convert import convert\n",
     "from skl2onnx.common.data_types import FloatTensorType\n",
     "\n",
+    "\n",
     "def get_onnx_model(lgb_model):\n",
     "    # export onnx model\n",
     "    float_tensor_type = FloatTensorType([None, lgb_model.num_feature()])\n",
-    "    initial_types = [('float_input', float_tensor_type)]\n",
-    "    onnx_model = convert(lgb_model, \n",
-    "                         initial_types=initial_types, \n",
-    "                         target_opset=8)\n",
-    "    return onnx_model"
+    "    initial_types = [(\"float_input\", float_tensor_type)]\n",
+    "    return convert(lgb_model, initial_types=initial_types, target_opset=8)"
    ]
   },
   {
@@ -160,9 +161,10 @@
    "source": [
     "def write_onnx_to_file(onnx_model, path, file_name=\"output.onnx\"):\n",
     "    from pathlib import Path\n",
+    "\n",
     "    with open(Path(path) / file_name, \"wb\") as onnx_file:\n",
     "        onnx_file.write(onnx_model.SerializeToString())\n",
-    "        print(f'Onnx model written to {onnx_file.name}')"
+    "        print(f\"Onnx model written to {onnx_file.name}\")"
    ]
   },
   {
@@ -182,15 +184,16 @@
    "outputs": [],
    "source": [
     "import pyomo.environ as pe\n",
+    "\n",
     "from omlt.block import OmltBlock\n",
     "from omlt.gbt import GBTBigMFormulation, GradientBoostedTreeModel\n",
     "\n",
+    "\n",
     "def add_tree_model(opt_model, onnx_model, input_bounds):\n",
     "    # init omlt block and gbt model based on the onnx format\n",
     "    opt_model.gbt = OmltBlock()\n",
-    "    gbt_model = GradientBoostedTreeModel(onnx_model, \n",
-    "                                         scaled_input_bounds=input_bounds)\n",
-    "    \n",
+    "    gbt_model = GradientBoostedTreeModel(onnx_model, scaled_input_bounds=input_bounds)\n",
+    "\n",
     "    # omlt uses a big-m formulation to encode the tree models\n",
     "    formulation = GBTBigMFormulation(gbt_model)\n",
     "    opt_model.gbt.build_formulation(formulation)"
@@ -211,27 +214,26 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "import numpy as np\n",
-    "\n",
     "def add_unc_metric(opt_model, data):\n",
-    "    \n",
     "    # compute mean and std for standardization\n",
-    "    data_x = np.asarray(data['X'])\n",
+    "    data_x = np.asarray(data[\"X\"])\n",
     "    std = np.std(data_x, axis=0)\n",
     "    mean = np.mean(data_x, axis=0)\n",
-    "    \n",
+    "\n",
     "    # alpha capture the uncertainty value\n",
-    "    alpha_bound = abs(0.5*np.var(data['y']))\n",
-    "    opt_model.alpha = pe.Var(within=pe.NonNegativeReals, bounds=(0,alpha_bound))\n",
+    "    alpha_bound = abs(0.5 * np.var(data[\"y\"]))\n",
+    "    opt_model.alpha = pe.Var(within=pe.NonNegativeReals, bounds=(0, alpha_bound))\n",
     "    opt_model.unc_constr = pe.ConstraintList()\n",
-    "    \n",
+    "\n",
     "    for x in data_x:\n",
     "        x_var = opt_model.gbt.inputs\n",
     "        opt_model.unc_constr.add(\n",
-    "            opt_model.alpha <= \\\n",
-    "            sum((x[idx]-(x_var[idx]-mean[idx])/std[idx])*\\\n",
-    "                (x[idx]-(x_var[idx]-mean[idx])/std[idx]) \n",
-    "                for idx in range(len(x_var)) )\n",
+    "            opt_model.alpha\n",
+    "            <= sum(\n",
+    "                (x[idx] - (x_var[idx] - mean[idx]) / std[idx])\n",
+    "                * (x[idx] - (x_var[idx] - mean[idx]) / std[idx])\n",
+    "                for idx in range(len(x_var))\n",
+    "            )\n",
     "        )"
    ]
   },
@@ -270,11 +272,13 @@
     "random.seed(10)\n",
     "data = generate_samples(5, f_bnds)\n",
     "\n",
+    "\n",
     "def plot_progress(data, input_bounds):\n",
     "    # plot contour line and data points\n",
     "    import matplotlib.pyplot as plt\n",
+    "\n",
     "    fig = plt.figure()\n",
-    "    ax = fig.add_axes([0,0,2,2])\n",
+    "    ax = fig.add_axes([0, 0, 2, 2])\n",
     "\n",
     "    # create mesh\n",
     "    s = 0.01\n",
@@ -283,19 +287,20 @@
     "    X, Y = np.meshgrid(X, Y)\n",
     "\n",
     "    # rosenbrock function\n",
-    "    Z = f(X,Y)\n",
+    "    Z = f(X, Y)\n",
     "\n",
     "    # plot contour line\n",
-    "    clevf = np.arange(Z.min(),Z.max(), 10)\n",
+    "    clevf = np.arange(Z.min(), Z.max(), 10)\n",
     "    CS = plt.contourf(X, Y, Z, clevf)\n",
     "    fig.colorbar(CS)\n",
     "\n",
     "    # plot initial data set\n",
-    "    ax.scatter([x[0] for x in data['X']], [x[1] for x in data['X']], c='r', s=100)\n",
+    "    ax.scatter([x[0] for x in data[\"X\"]], [x[1] for x in data[\"X\"]], c=\"r\", s=100)\n",
     "\n",
-    "    plt.rcParams.update({'font.size': 15})\n",
+    "    plt.rcParams.update({\"font.size\": 15})\n",
     "    plt.show()\n",
-    "    \n",
+    "\n",
+    "\n",
     "plot_progress(data, f_bnds)"
    ]
   },
@@ -324,29 +329,28 @@
     "    # building the optimization model\n",
     "    onnx_model = get_onnx_model(lgb_model)\n",
     "    opt_model = pe.ConcreteModel()\n",
-    "    \n",
+    "\n",
     "    add_tree_model(opt_model, onnx_model, f_bnds)\n",
-    "    \n",
+    "\n",
     "    if has_unc:\n",
     "        add_unc_metric(opt_model, data)\n",
-    "        opt_model.obj = pe.Objective(expr=opt_model.gbt.outputs[0] - 1.96*opt_model.alpha)\n",
+    "        opt_model.obj = pe.Objective(\n",
+    "            expr=opt_model.gbt.outputs[0] - 1.96 * opt_model.alpha\n",
+    "        )\n",
     "\n",
     "        # add uncertainty leads to non-convex MIQP, i.e. solvers like Gurobi can solve this\n",
-    "        solver = pe.SolverFactory('gurobi')\n",
-    "        solver.options['NonConvex'] = 2\n",
+    "        solver = pe.SolverFactory(\"gurobi\")\n",
+    "        solver.options[\"NonConvex\"] = 2\n",
     "        solution = solver.solve(opt_model, tee=False)\n",
     "    else:\n",
     "        opt_model.obj = pe.Objective(expr=opt_model.gbt.outputs[0])\n",
     "\n",
     "        # without uncerainty we can use cbc to solve the model\n",
-    "        solver = pe.SolverFactory('cbc')\n",
-    "        solution = solver.solve(opt_model, tee=False)   \n",
-    "    \n",
+    "        solver = pe.SolverFactory(\"cbc\")\n",
+    "        solution = solver.solve(opt_model, tee=False)\n",
+    "\n",
     "    # extract solution from solved model\n",
-    "    next_x = [opt_model.gbt.inputs[idx].value \n",
-    "              for idx in range(len(opt_model.gbt.inputs))]\n",
-    "    \n",
-    "    return next_x"
+    "    return [opt_model.gbt.inputs[idx].value for idx in range(len(opt_model.gbt.inputs))]"
    ]
   },
   {
@@ -400,40 +404,43 @@
    "source": [
     "from tqdm.notebook import tqdm\n",
     "\n",
-    "for itr in tqdm(range(80)):\n",
+    "for _ in tqdm(range(80)):\n",
     "    # training the tree ensemble\n",
     "    lgb_model = train_tree(data)\n",
-    "    \n",
+    "\n",
     "    # minimize the trained model\n",
     "    next_x = minimize_model(f_bnds, lgb_model, has_unc=False)\n",
-    "    \n",
+    "\n",
     "    # evaluating the following input\n",
     "    next_y = f(next_x[0], next_x[1])\n",
     "\n",
-    "    data['X'].append(next_x)\n",
-    "    data['y'].append(next_y)\n",
-    "    \n",
-    "def plot_progress(data):  \n",
+    "    data[\"X\"].append(next_x)\n",
+    "    data[\"y\"].append(next_y)\n",
+    "\n",
+    "\n",
+    "def plot_progress(data):\n",
     "    # set up plot\n",
     "    import matplotlib.pyplot as plt\n",
+    "\n",
     "    fig = plt.figure()\n",
-    "    ax = fig.add_axes([0,0,2,2])\n",
+    "    ax = fig.add_axes([0, 0, 2, 2])\n",
     "    plt.ylabel(\"Black-Box Function Objective\")\n",
     "    plt.xlabel(\"# Iterations\")\n",
     "\n",
     "    # extract best_y\n",
     "    min_y = []\n",
-    "    curr_min = data['y'][0]\n",
-    "    for y in data['y']:\n",
-    "        curr_min = min(y,curr_min)\n",
+    "    curr_min = data[\"y\"][0]\n",
+    "    for y in data[\"y\"]:\n",
+    "        curr_min = min(y, curr_min)\n",
     "        min_y.append(curr_min)\n",
-    "    \n",
+    "\n",
     "    # plot steps to show progress\n",
-    "    ax.step(np.arange(len(data['y'])), min_y, linewidth=2, color=\"b\")\n",
-    "    plt.axhline(y=0.0, color='r', linewidth=3, linestyle='--')\n",
-    "    \n",
+    "    ax.step(np.arange(len(data[\"y\"])), min_y, linewidth=2, color=\"b\")\n",
+    "    plt.axhline(y=0.0, color=\"r\", linewidth=3, linestyle=\"--\")\n",
+    "\n",
     "    plt.show()\n",
     "\n",
+    "\n",
     "plot_progress(data)"
    ]
   },
@@ -469,7 +476,8 @@
    ],
    "source": [
     "from IPython.display import Image\n",
-    "Image(filename='images/bo-with-trees.png', height=300)"
+    "\n",
+    "Image(filename=\"images/bo-with-trees.png\", height=300)"
    ]
   },
   {
diff --git a/docs/notebooks/trees/linear_tree_formulations.ipynb b/docs/notebooks/trees/linear_tree_formulations.ipynb
index f98373e1..e7f263ae 100644
--- a/docs/notebooks/trees/linear_tree_formulations.ipynb
+++ b/docs/notebooks/trees/linear_tree_formulations.ipynb
@@ -55,26 +55,30 @@
    },
    "outputs": [],
    "source": [
-    "#Start by importing the following libraries\n",
-    "#data manipulation and plotting\n",
-    "import pandas as pd\n",
-    "import numpy as np\n",
-    "import matplotlib.pyplot as plt\n",
+    "# Start by importing the following libraries\n",
+    "# data manipulation and plotting\n",
     "import matplotlib\n",
-    "matplotlib.rc('font', size=24)\n",
-    "plt.rc('axes', titlesize=24)\n",
+    "import matplotlib.pyplot as plt\n",
+    "import pandas as pd\n",
+    "\n",
+    "matplotlib.rc(\"font\", size=24)\n",
+    "plt.rc(\"axes\", titlesize=24)\n",
     "\n",
-    "#linear-tree objects\n",
+    "# linear-tree objects\n",
+    "# pyomo for optimization\n",
+    "import pyomo.environ as pyo\n",
     "from lineartree import LinearTreeRegressor\n",
     "from sklearn.linear_model import LinearRegression\n",
     "\n",
-    "#pyomo for optimization\n",
-    "import pyomo.environ as pyo\n",
+    "import omlt\n",
     "\n",
-    "#omlt for interfacing our linear tree with pyomo\n",
+    "# omlt for interfacing our linear tree with pyomo\n",
     "from omlt import OmltBlock\n",
-    "from omlt.linear_tree import LinearTreeGDPFormulation, LinearTreeHybridBigMFormulation, LinearTreeDefinition\n",
-    "import omlt"
+    "from omlt.linear_tree import (\n",
+    "    LinearTreeDefinition,\n",
+    "    LinearTreeGDPFormulation,\n",
+    "    LinearTreeHybridBigMFormulation,\n",
+    ")"
    ]
   },
   {
@@ -114,7 +118,7 @@
    },
    "outputs": [],
    "source": [
-    "df = pd.read_csv(\"../data/sin_quadratic.csv\",index_col=[0])"
+    "df = pd.read_csv(\"../data/sin_quadratic.csv\", index_col=[0])"
    ]
   },
   {
@@ -152,18 +156,18 @@
     }
    ],
    "source": [
-    "#retrieve input 'x' and output 'y' from the dataframe\n",
+    "# retrieve input 'x' and output 'y' from the dataframe\n",
     "x = df[\"x\"]\n",
     "y = df[\"y\"]\n",
     "\n",
-    "#calculate mean and standard deviation, add scaled 'x' and scaled 'y' to the dataframe\n",
+    "# calculate mean and standard deviation, add scaled 'x' and scaled 'y' to the dataframe\n",
     "mean_data = df.mean(axis=0)\n",
     "std_data = df.std(axis=0)\n",
-    "df[\"x_scaled\"] = (df['x'] - mean_data['x']) / std_data['x']\n",
-    "df[\"y_scaled\"] = (df['y'] - mean_data['y']) / std_data['y']\n",
+    "df[\"x_scaled\"] = (df[\"x\"] - mean_data[\"x\"]) / std_data[\"x\"]\n",
+    "df[\"y_scaled\"] = (df[\"y\"] - mean_data[\"y\"]) / std_data[\"y\"]\n",
     "\n",
-    "#create plots for unscaled and scaled data\n",
-    "f, (ax1, ax2) = plt.subplots(1, 2,figsize = (16,8))\n",
+    "# create plots for unscaled and scaled data\n",
+    "f, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 8))\n",
     "\n",
     "ax1.plot(x, y)\n",
     "ax1.set_xlabel(\"x\")\n",
@@ -202,12 +206,10 @@
    },
    "outputs": [],
    "source": [
-    "#Build the linear-tree model\n",
-    "regr = LinearTreeRegressor(LinearRegression(), \n",
-    "                            criterion='mse', \n",
-    "                            max_bins=120, \n",
-    "                            min_samples_leaf=30, \n",
-    "                            max_depth=8)"
+    "# Build the linear-tree model\n",
+    "regr = LinearTreeRegressor(\n",
+    "    LinearRegression(), criterion=\"mse\", max_bins=120, min_samples_leaf=30, max_depth=8\n",
+    ")"
    ]
   },
   {
@@ -221,12 +223,12 @@
    },
    "outputs": [],
    "source": [
-    "#Data needs to be in array and reshaped\n",
-    "x_scaled = df[\"x_scaled\"].to_numpy().reshape(-1,1)\n",
-    "y_scaled = df[\"y_scaled\"].to_numpy().reshape(-1,1)\n",
+    "# Data needs to be in array and reshaped\n",
+    "x_scaled = df[\"x_scaled\"].to_numpy().reshape(-1, 1)\n",
+    "y_scaled = df[\"y_scaled\"].to_numpy().reshape(-1, 1)\n",
     "\n",
-    "#train the linear tree on the scaled data\n",
-    "history1 = regr.fit(x_scaled,y_scaled)"
+    "# train the linear tree on the scaled data\n",
+    "history1 = regr.fit(x_scaled, y_scaled)"
    ]
   },
   {
@@ -278,9 +280,9 @@
    },
    "outputs": [],
    "source": [
-    "#note: we calculate the unscaled output for each neural network to check the predictions\n",
+    "# note: we calculate the unscaled output for each neural network to check the predictions\n",
     "y_predict_scaled_lt = regr.predict(x_scaled)\n",
-    "y_predict_lt = y_predict_scaled_lt*(std_data['y']) + mean_data['y']"
+    "y_predict_lt = y_predict_scaled_lt * (std_data[\"y\"]) + mean_data[\"y\"]"
    ]
   },
   {
@@ -315,10 +317,10 @@
     }
    ],
    "source": [
-    "#create a single plot with the original data and each neural network's predictions\n",
-    "fig,ax = plt.subplots(1,figsize = (8,8))\n",
-    "ax.plot(x,y,linewidth = 3.0,label = \"data\", alpha = 0.5)\n",
-    "ax.plot(x,y_predict_lt,linewidth = 3.0,linestyle=\"dotted\",label = \"linear-tree\")\n",
+    "# create a single plot with the original data and each neural network's predictions\n",
+    "fig, ax = plt.subplots(1, figsize=(8, 8))\n",
+    "ax.plot(x, y, linewidth=3.0, label=\"data\", alpha=0.5)\n",
+    "ax.plot(x, y_predict_lt, linewidth=3.0, linestyle=\"dotted\", label=\"linear-tree\")\n",
     "plt.xlabel(\"x\")\n",
     "plt.ylabel(\"y\")\n",
     "plt.legend()"
@@ -392,17 +394,23 @@
     }
    ],
    "source": [
-    "#create an omlt scaling object\n",
-    "scaler = omlt.scaling.OffsetScaling(offset_inputs=[mean_data['x']],\n",
-    "                    factor_inputs=[std_data['x']],\n",
-    "                    offset_outputs=[mean_data['y']],\n",
-    "                    factor_outputs=[std_data['y']])\n",
-    "\n",
-    "#create the input bounds. note that the key `0` corresponds to input `0` and that we also scale the input bounds\n",
-    "input_bounds={0:((min(df['x']) - mean_data['x'])/std_data['x'],\n",
-    "                 (max(df['x']) - mean_data['x'])/std_data['x'])};\n",
+    "# create an omlt scaling object\n",
+    "scaler = omlt.scaling.OffsetScaling(\n",
+    "    offset_inputs=[mean_data[\"x\"]],\n",
+    "    factor_inputs=[std_data[\"x\"]],\n",
+    "    offset_outputs=[mean_data[\"y\"]],\n",
+    "    factor_outputs=[std_data[\"y\"]],\n",
+    ")\n",
+    "\n",
+    "# create the input bounds. note that the key `0` corresponds to input `0` and that we also scale the input bounds\n",
+    "input_bounds = {\n",
+    "    0: (\n",
+    "        (min(df[\"x\"]) - mean_data[\"x\"]) / std_data[\"x\"],\n",
+    "        (max(df[\"x\"]) - mean_data[\"x\"]) / std_data[\"x\"],\n",
+    "    )\n",
+    "}\n",
     "print(scaler)\n",
-    "print(\"Scaled input bounds: \",input_bounds)"
+    "print(\"Scaled input bounds: \", input_bounds)"
    ]
   },
   {
@@ -569,34 +577,37 @@
     }
    ],
    "source": [
-    "#create a LinearTreeDefinition Object\n",
-    "ltmodel = LinearTreeDefinition(regr,scaler,input_bounds)\n",
+    "# create a LinearTreeDefinition Object\n",
+    "ltmodel = LinearTreeDefinition(regr, scaler, input_bounds)\n",
     "\n",
-    "#create a pyomo model with variables x and y\n",
+    "# create a pyomo model with variables x and y\n",
     "model1 = pyo.ConcreteModel()\n",
-    "model1.x = pyo.Var(initialize = 0)\n",
-    "model1.y = pyo.Var(initialize = 0)\n",
+    "model1.x = pyo.Var(initialize=0)\n",
+    "model1.y = pyo.Var(initialize=0)\n",
     "model1.obj = pyo.Objective(expr=(model1.y))\n",
     "\n",
-    "#create an OmltBlock\n",
+    "# create an OmltBlock\n",
     "model1.lt = OmltBlock()\n",
     "\n",
-    "#use the GDP formulation with a big-M, transformation\n",
-    "formulation1_lt = LinearTreeGDPFormulation(ltmodel, transformation='bigm')\n",
+    "# use the GDP formulation with a big-M, transformation\n",
+    "formulation1_lt = LinearTreeGDPFormulation(ltmodel, transformation=\"bigm\")\n",
     "model1.lt.build_formulation(formulation1_lt)\n",
     "\n",
-    "#connect pyomo variables to the neural network\n",
+    "\n",
+    "# connect pyomo variables to the neural network\n",
     "@model1.Constraint()\n",
     "def connect_inputs(mdl):\n",
     "    return mdl.x == mdl.lt.inputs[0]\n",
     "\n",
+    "\n",
     "@model1.Constraint()\n",
     "def connect_outputs(mdl):\n",
     "    return mdl.y == mdl.lt.outputs[0]\n",
     "\n",
-    "#solve the model and query the solution\n",
-    "status_1_bigm = pyo.SolverFactory('cbc').solve(model1, tee=True)\n",
-    "solution_1_bigm = (pyo.value(model1.x),pyo.value(model1.y))"
+    "\n",
+    "# solve the model and query the solution\n",
+    "status_1_bigm = pyo.SolverFactory(\"cbc\").solve(model1, tee=True)\n",
+    "solution_1_bigm = (pyo.value(model1.x), pyo.value(model1.y))"
    ]
   },
   {
@@ -623,13 +634,13 @@
     }
    ],
    "source": [
-    "#print out model size and solution values\n",
+    "# print out model size and solution values\n",
     "print(\"Big-M Transformation Solution:\")\n",
-    "print(\"# of variables: \",model1.nvariables())\n",
-    "print(\"# of constraints: \",model1.nconstraints())\n",
+    "print(\"# of variables: \", model1.nvariables())\n",
+    "print(\"# of constraints: \", model1.nconstraints())\n",
     "print(\"x = \", solution_1_bigm[0])\n",
     "print(\"y = \", solution_1_bigm[1])\n",
-    "print(\"Solve Time: \", status_1_bigm['Solver'][0]['Time'])"
+    "print(\"Solve Time: \", status_1_bigm[\"Solver\"][0][\"Time\"])"
    ]
   },
   {
@@ -724,31 +735,34 @@
     }
    ],
    "source": [
-    "#create a pyomo model with variables x and y\n",
+    "# create a pyomo model with variables x and y\n",
     "model2 = pyo.ConcreteModel()\n",
-    "model2.x = pyo.Var(initialize = 0)\n",
-    "model2.y = pyo.Var(initialize = 0)\n",
+    "model2.x = pyo.Var(initialize=0)\n",
+    "model2.y = pyo.Var(initialize=0)\n",
     "model2.obj = pyo.Objective(expr=(model2.y))\n",
     "\n",
-    "#create an OmltBlock\n",
+    "# create an OmltBlock\n",
     "model2.lt = OmltBlock()\n",
     "\n",
-    "#use the GDP formulation with a hull transformation\n",
-    "formulation2_lt = LinearTreeGDPFormulation(ltmodel, transformation='hull')\n",
+    "# use the GDP formulation with a hull transformation\n",
+    "formulation2_lt = LinearTreeGDPFormulation(ltmodel, transformation=\"hull\")\n",
     "model2.lt.build_formulation(formulation2_lt)\n",
     "\n",
-    "#connect pyomo variables to the neural network\n",
+    "\n",
+    "# connect pyomo variables to the neural network\n",
     "@model2.Constraint()\n",
     "def connect_inputs(mdl):\n",
     "    return mdl.x == mdl.lt.inputs[0]\n",
     "\n",
+    "\n",
     "@model2.Constraint()\n",
     "def connect_outputs(mdl):\n",
     "    return mdl.y == mdl.lt.outputs[0]\n",
     "\n",
-    "#solve the model and query the solution\n",
-    "status_2_hull = pyo.SolverFactory('cbc').solve(model2, tee=True)\n",
-    "solution_2_hull = (pyo.value(model2.x),pyo.value(model2.y))"
+    "\n",
+    "# solve the model and query the solution\n",
+    "status_2_hull = pyo.SolverFactory(\"cbc\").solve(model2, tee=True)\n",
+    "solution_2_hull = (pyo.value(model2.x), pyo.value(model2.y))"
    ]
   },
   {
@@ -771,13 +785,13 @@
     }
    ],
    "source": [
-    "#print out model size and solution values\n",
+    "# print out model size and solution values\n",
     "print(\"Hull Transformation Solution:\")\n",
-    "print(\"# of variables: \",model2.nvariables())\n",
-    "print(\"# of constraints: \",model2.nconstraints())\n",
+    "print(\"# of variables: \", model2.nvariables())\n",
+    "print(\"# of constraints: \", model2.nconstraints())\n",
     "print(\"x = \", solution_2_hull[0])\n",
     "print(\"y = \", solution_2_hull[1])\n",
-    "print(\"Solve Time: \", status_2_hull['Solver'][0]['Time'])"
+    "print(\"Solve Time: \", status_2_hull[\"Solver\"][0][\"Time\"])"
    ]
   },
   {
@@ -939,35 +953,38 @@
     }
    ],
    "source": [
-    "#create a pyomo model with variables x and y\n",
+    "# create a pyomo model with variables x and y\n",
     "model_c = pyo.ConcreteModel()\n",
-    "model_c.x = pyo.Var(initialize = 0)\n",
-    "model_c.y = pyo.Var(initialize = 0)\n",
+    "model_c.x = pyo.Var(initialize=0)\n",
+    "model_c.y = pyo.Var(initialize=0)\n",
     "model_c.obj = pyo.Objective(expr=(model_c.y))\n",
     "\n",
-    "#create an OmltBlock\n",
+    "# create an OmltBlock\n",
     "model_c.lt = OmltBlock()\n",
     "\n",
-    "#use the GDP formulation with a custom transformation\n",
-    "formulation_c_lt = LinearTreeGDPFormulation(ltmodel, transformation='custom')\n",
+    "# use the GDP formulation with a custom transformation\n",
+    "formulation_c_lt = LinearTreeGDPFormulation(ltmodel, transformation=\"custom\")\n",
     "model_c.lt.build_formulation(formulation_c_lt)\n",
     "\n",
-    "#connect pyomo variables to the neural network\n",
+    "\n",
+    "# connect pyomo variables to the neural network\n",
     "@model_c.Constraint()\n",
     "def connect_inputs(mdl):\n",
     "    return mdl.x == mdl.lt.inputs[0]\n",
     "\n",
+    "\n",
     "@model_c.Constraint()\n",
     "def connect_outputs(mdl):\n",
     "    return mdl.y == mdl.lt.outputs[0]\n",
     "\n",
+    "\n",
     "# NOTE: Since we passed the 'custom' transformation option, the user must\n",
     "# transform the model or the omlt block before passing the model to the solver\n",
-    "pyo.TransformationFactory('gdp.bigm').apply_to(model_c)\n",
+    "pyo.TransformationFactory(\"gdp.bigm\").apply_to(model_c)\n",
     "\n",
-    "#solve the model and query the solution\n",
-    "status_c_bigm = pyo.SolverFactory('cbc').solve(model_c, tee=True)\n",
-    "solution_c_bigm = (pyo.value(model_c.x),pyo.value(model_c.y))"
+    "# solve the model and query the solution\n",
+    "status_c_bigm = pyo.SolverFactory(\"cbc\").solve(model_c, tee=True)\n",
+    "solution_c_bigm = (pyo.value(model_c.x), pyo.value(model_c.y))"
    ]
   },
   {
@@ -990,13 +1007,13 @@
     }
    ],
    "source": [
-    "#print out model size and solution values\n",
+    "# print out model size and solution values\n",
     "print(\"BigM Transformation Solution:\")\n",
-    "print(\"# of variables: \",model_c.nvariables())\n",
-    "print(\"# of constraints: \",model_c.nconstraints())\n",
+    "print(\"# of variables: \", model_c.nvariables())\n",
+    "print(\"# of constraints: \", model_c.nconstraints())\n",
     "print(\"x = \", solution_c_bigm[0])\n",
     "print(\"y = \", solution_c_bigm[1])\n",
-    "print(\"Solve Time: \", status_c_bigm['Solver'][0]['Time'])"
+    "print(\"Solve Time: \", status_c_bigm[\"Solver\"][0][\"Time\"])"
    ]
   },
   {
@@ -1292,31 +1309,34 @@
     }
    ],
    "source": [
-    "#create a pyomo model with variables x and y\n",
+    "# create a pyomo model with variables x and y\n",
     "model3 = pyo.ConcreteModel()\n",
-    "model3.x = pyo.Var(initialize = 0)\n",
-    "model3.y = pyo.Var(initialize = 0)\n",
+    "model3.x = pyo.Var(initialize=0)\n",
+    "model3.y = pyo.Var(initialize=0)\n",
     "model3.obj = pyo.Objective(expr=(model3.y))\n",
     "\n",
-    "#create an OmltBlock\n",
+    "# create an OmltBlock\n",
     "model3.lt = OmltBlock()\n",
     "\n",
-    "#use the Hybrid Big-M formulation\n",
+    "# use the Hybrid Big-M formulation\n",
     "formulation3_lt = LinearTreeHybridBigMFormulation(ltmodel)\n",
     "model3.lt.build_formulation(formulation3_lt)\n",
     "\n",
-    "#connect pyomo variables to the neural network\n",
+    "\n",
+    "# connect pyomo variables to the neural network\n",
     "@model3.Constraint()\n",
     "def connect_inputs(mdl):\n",
     "    return mdl.x == mdl.lt.inputs[0]\n",
     "\n",
+    "\n",
     "@model3.Constraint()\n",
     "def connect_outputs(mdl):\n",
     "    return mdl.y == mdl.lt.outputs[0]\n",
     "\n",
-    "#solve the model and query the solution\n",
-    "status_3_hyb = pyo.SolverFactory('scip').solve(model3, tee=True)\n",
-    "solution_3_hyb = (pyo.value(model3.x),pyo.value(model3.y))"
+    "\n",
+    "# solve the model and query the solution\n",
+    "status_3_hyb = pyo.SolverFactory(\"scip\").solve(model3, tee=True)\n",
+    "solution_3_hyb = (pyo.value(model3.x), pyo.value(model3.y))"
    ]
   },
   {
@@ -1339,13 +1359,13 @@
     }
    ],
    "source": [
-    "#print out model size and solution values\n",
+    "# print out model size and solution values\n",
     "print(\"Hull Transformation Solution:\")\n",
-    "print(\"# of variables: \",model3.nvariables())\n",
-    "print(\"# of constraints: \",model3.nconstraints())\n",
+    "print(\"# of variables: \", model3.nvariables())\n",
+    "print(\"# of constraints: \", model3.nconstraints())\n",
     "print(\"x = \", solution_3_hyb[0])\n",
     "print(\"y = \", solution_3_hyb[1])\n",
-    "print(\"Solve Time: \", status_3_hyb['Solver'][0]['Time'])"
+    "print(\"Solve Time: \", status_3_hyb[\"Solver\"][0][\"Time\"])"
    ]
   },
   {
@@ -1394,26 +1414,53 @@
     }
    ],
    "source": [
-    "#create a plot with 3 subplots\n",
-    "fig,axs = plt.subplots(1,3,figsize = (24,8))\n",
+    "# create a plot with 3 subplots\n",
+    "fig, axs = plt.subplots(1, 3, figsize=(24, 8))\n",
     "\n",
-    "#GDP Representation - Big-M Transformation\n",
-    "axs[0].plot(x,y_predict_lt,linewidth = 3.0,linestyle=\"dotted\",color = \"orange\", label='Fitted Model')\n",
+    "# GDP Representation - Big-M Transformation\n",
+    "axs[0].plot(\n",
+    "    x,\n",
+    "    y_predict_lt,\n",
+    "    linewidth=3.0,\n",
+    "    linestyle=\"dotted\",\n",
+    "    color=\"orange\",\n",
+    "    label=\"Fitted Model\",\n",
+    ")\n",
     "axs[0].set_title(\"Big-M\")\n",
-    "axs[0].scatter([solution_1_bigm[0]],[solution_1_bigm[1]],color = \"black\",s = 300, label='Optimum')\n",
+    "axs[0].scatter(\n",
+    "    [solution_1_bigm[0]], [solution_1_bigm[1]], color=\"black\", s=300, label=\"Optimum\"\n",
+    ")\n",
     "axs[0].legend()\n",
     "\n",
-    "#GDP Representation - Hull Transformation\n",
-    "axs[1].plot(x,y_predict_lt,linewidth = 3.0,linestyle=\"dotted\",color = \"orange\", label='Fitted Model')\n",
+    "# GDP Representation - Hull Transformation\n",
+    "axs[1].plot(\n",
+    "    x,\n",
+    "    y_predict_lt,\n",
+    "    linewidth=3.0,\n",
+    "    linestyle=\"dotted\",\n",
+    "    color=\"orange\",\n",
+    "    label=\"Fitted Model\",\n",
+    ")\n",
     "axs[1].set_title(\"Convex Hull\")\n",
-    "axs[1].scatter([solution_2_hull[0]],[solution_2_hull[1]],color = \"black\",s = 300, label='Optimum')\n",
+    "axs[1].scatter(\n",
+    "    [solution_2_hull[0]], [solution_2_hull[1]], color=\"black\", s=300, label=\"Optimum\"\n",
+    ")\n",
     "axs[1].legend()\n",
     "\n",
     "\n",
-    "#Hybrid Big-M Representation\n",
-    "axs[2].plot(x,y_predict_lt,linewidth = 3.0,linestyle=\"dotted\",color = \"orange\", label='Fitted Model')\n",
+    "# Hybrid Big-M Representation\n",
+    "axs[2].plot(\n",
+    "    x,\n",
+    "    y_predict_lt,\n",
+    "    linewidth=3.0,\n",
+    "    linestyle=\"dotted\",\n",
+    "    color=\"orange\",\n",
+    "    label=\"Fitted Model\",\n",
+    ")\n",
     "axs[2].set_title(\"Hybrid Big-M\")\n",
-    "axs[2].scatter([solution_3_hyb[0]],[solution_3_hyb[1]],color = \"black\",s = 300, label='Optimum')\n",
+    "axs[2].scatter(\n",
+    "    [solution_3_hyb[0]], [solution_3_hyb[1]], color=\"black\", s=300, label=\"Optimum\"\n",
+    ")\n",
     "axs[2].legend()"
    ]
   },
diff --git a/pyproject.toml b/pyproject.toml
index c504866e..2b215f71 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -11,8 +11,8 @@ authors = [
 dependencies = [
   "networkx",
   "numpy",
-  # TODO: Remove constraint when fix to https://github.com/Pyomo/pyomo/issues/3262 is released
-  "pyomo==6.6.2",
+  # Pyomo release that included #3262 fix and has transformations for linear trees
+  "pyomo>=6.7.3",
   "onnx",
   "onnxruntime",
 ]
@@ -120,6 +120,26 @@ convention = "google"
 ]
 "docs/conf.py" = ["D100", "INP001"]
 "src/omlt/neuralnet/layer.py" = ["N802"]
+"docs/notebooks/data/build_sin_quadratic_csv.py" = ["INP001"]
+"docs/notebooks/*" = [
+  "T201",
+  "F811",
+  "E402",
+  "ICN001",
+  "E501",
+  "PD901",
+  "E731",
+  "F841",
+  "FBT002",
+  "PTH123",
+  "S311",
+  "N812",
+  "A001",
+  "E741",
+  "N802",
+  "PERF401",
+  "PLR2004",
+]
 
 [tool.mypy]
 show_error_codes = true
@@ -151,4 +171,30 @@ module = [
 ignore_missing_imports = true
 
 [tool.pytest.ini_options]
-addopts = "--cov omlt --cov-report term-missing --verbose"
+addopts = "--cov omlt --cov-report term-missing --cov-config pyproject.toml --verbose"
+
+[tool.coverage.run]
+branch = true
+
+[tool.coverage.paths]
+source = [
+    "src/",
+    "*/site-packages/",
+]
+
+[tool.coverage.report]
+# Regexes for lines to exclude from consideration
+exclude_lines = [
+    # Have to re-enable the standard pragma
+    "pragma: no cover",
+
+    # Don't complain about missing debug-only code:
+    "def __repr__",
+
+    # Don't complain if tests don't hit defensive assertion code:
+    "raise AssertionError",
+    "raise NotImplementedError",
+
+    # Don't complain if non-runnable code isn't run:
+    "if __name__ == .__main__.:",
+]
diff --git a/src/omlt/block.py b/src/omlt/block.py
index b8bb391d..e056c806 100644
--- a/src/omlt/block.py
+++ b/src/omlt/block.py
@@ -24,13 +24,12 @@ class is used in combination with a formulation object to construct the
         pyo.assert_optimal_termination(status)
 """
 
-
 import pyomo.environ as pyo
-from pyomo.core.base.block import _BlockData, declare_custom_block
+from pyomo.core.base.block import BlockData, declare_custom_block
 
 
 @declare_custom_block(name="OmltBlock")
-class OmltBlockData(_BlockData):
+class OmltBlockData(BlockData):
     def __init__(self, component):
         super().__init__(component)
         self.__formulation = None
diff --git a/src/omlt/gbt/gbt_formulation.py b/src/omlt/gbt/gbt_formulation.py
index 4e1069fe..d81e26fb 100644
--- a/src/omlt/gbt/gbt_formulation.py
+++ b/src/omlt/gbt/gbt_formulation.py
@@ -15,17 +15,15 @@ class GBTBigMFormulation(_PyomoFormulation):
     constraints to enforce splitting rules according to:
 
     References:
-    ----------
-     * Misic, V. "Optimization of tree ensembles."
-       Operations Research 68.5 (2020): 1605-1624.
-     * Mistry, M., et al. "Mixed-integer convex nonlinear optimization with
-        gradient-boosted trees embedded."
-       INFORMS Journal on Computing (2020).
-
-    Parameters
-    ----------
-    tree_ensemble_structure : GradientBoostedTreeModel
-        the tree ensemble definition
+         * Misic, V. "Optimization of tree ensembles."
+           Operations Research 68.5 (2020): 1605-1624.
+         * Mistry, M., et al. "Mixed-integer convex nonlinear optimization with
+           gradient-boosted trees embedded."
+           INFORMS Journal on Computing (2020).
+
+    Parameters:
+        tree_ensemble_structure (GradientBoostedTreeModel):
+            the tree ensemble definition
     """
 
     def __init__(self, gbt_model):
@@ -63,7 +61,7 @@ def _build_formulation(self):
         )
 
 
-def add_formulation_to_block(block, model_definition, input_vars, output_vars):
+def add_formulation_to_block(block, model_definition, input_vars, output_vars):  # noqa: C901, PLR0915
     r"""Adds the gradient-boosted trees formulation to the given Pyomo block.
 
     .. math::
@@ -89,23 +87,21 @@ def add_formulation_to_block(block, model_definition, input_vars, output_vars):
 
 
     References:
-    ----------
-     * Misic, V. "Optimization of tree ensembles."
-       Operations Research 68.5 (2020): 1605-1624.
-     * Mistry, M., et al. "Mixed-integer convex nonlinear optimization with
-        gradient-boosted trees embedded."
-       INFORMS Journal on Computing (2020).
-
-    Parameters
-    ----------
-    block : Block
-        the Pyomo block
-    tree_ensemble_structure : GradientBoostedTreeModel
-        the tree ensemble definition
-    input_vars : Var
-        the input variables of the Pyomo block
-    output_vars : Var
-        the output variables of the Pyomo block
+         * Misic, V. "Optimization of tree ensembles."
+           Operations Research 68.5 (2020): 1605-1624.
+         * Mistry, M., et al. "Mixed-integer convex nonlinear optimization with
+           gradient-boosted trees embedded."
+           INFORMS Journal on Computing (2020).
+
+    Parameters:
+        block (Block):
+            the Pyomo block
+        tree_ensemble_structure (GradientBoostedTreeModel):
+            the tree ensemble definition
+        input_vars (Var):
+            the input variables of the Pyomo block
+        output_vars (Var):
+            the output variables of the Pyomo block
 
     """
     if isinstance(model_definition, GradientBoostedTreeModel):
@@ -129,13 +125,7 @@ def add_formulation_to_block(block, model_definition, input_vars, output_vars):
     nodes_node_ids = np.array(attr["nodes_nodeids"].ints)
     nodes_false_node_ids = np.array(attr["nodes_falsenodeids"].ints)
     nodes_true_node_ids = np.array(attr["nodes_truenodeids"].ints)
-    nodes_hitrates = np.array(attr["nodes_hitrates"].floats)
-    nodes_missing_value_tracks_true = np.array(
-        attr["nodes_missing_value_tracks_true"].ints
-    )
 
-    n_targets = attr["n_targets"].i
-    target_ids = np.array(attr["target_ids"].ints)
     target_node_ids = np.array(attr["target_nodeids"].ints)
     target_tree_ids = np.array(attr["target_treeids"].ints)
     target_weights = np.array(attr["target_weights"].floats)
diff --git a/src/omlt/io/__init__.py b/src/omlt/io/__init__.py
index b568fb90..64fa72e1 100644
--- a/src/omlt/io/__init__.py
+++ b/src/omlt/io/__init__.py
@@ -16,7 +16,12 @@
     from omlt.io.keras import load_keras_sequential
 
 __all__ = [
-    "keras_available", "onnx_available", "torch_available", "torch_geometric_available",
-    "load_onnx_neural_network", "load_onnx_neural_network_with_bounds",
-    "write_onnx_model_with_bounds", "load_keras_sequential"
+    "keras_available",
+    "onnx_available",
+    "torch_available",
+    "torch_geometric_available",
+    "load_onnx_neural_network",
+    "load_onnx_neural_network_with_bounds",
+    "write_onnx_model_with_bounds",
+    "load_keras_sequential",
 ]
diff --git a/src/omlt/io/onnx_parser.py b/src/omlt/io/onnx_parser.py
index f35fadb9..85b37526 100644
--- a/src/omlt/io/onnx_parser.py
+++ b/src/omlt/io/onnx_parser.py
@@ -1,5 +1,5 @@
 import math
-from typing import Any
+from typing import TYPE_CHECKING, Any
 
 import numpy as np
 from onnx import numpy_helper
@@ -13,6 +13,9 @@
 )
 from omlt.neuralnet.network_definition import NetworkDefinition
 
+if TYPE_CHECKING:
+    from collections.abc import Callable
+
 _ACTIVATION_OP_TYPES = ["Relu", "Sigmoid", "LogSoftmax", "Tanh", "Softplus"]
 _POOLING_OP_TYPES = ["MaxPool"]
 DENSE_INPUT_DIMENSIONS = 2
@@ -33,8 +36,7 @@ class NetworkParser:
     """Network Parser.
 
     References:
-    ----------
-    * https://github.com/onnx/onnx/blob/master/docs/Operators.md
+        * https://github.com/onnx/onnx/blob/master/docs/Operators.md
     """
 
     def __init__(self):
@@ -51,7 +53,7 @@ def _reset_state(self):
         self._node_stack = []
         self._node_map = {}
 
-    def parse_network(self, graph, scaling_object, input_bounds):
+    def parse_network(self, graph, scaling_object, input_bounds):  # noqa: C901, PLR0912, PLR0915
         self._reset_state()
         self._graph = graph
 
@@ -92,7 +94,6 @@ def parse_network(self, graph, scaling_object, input_bounds):
                     f'All dimensions in graph "{graph.name}" input tensor have 0 value.'
                 )
                 raise ValueError(msg)
-            assert network_input is None
             network_input = InputLayer(size)
             self._node_map[input_node.name] = network_input
             network.add_layer(network_input)
@@ -193,7 +194,7 @@ def _visit_node(self, node, next_nodes):
 
         return new_layer, new_layer_inputs
 
-    def _consume_dense_nodes(
+    def _consume_dense_nodes(  # noqa: C901, PLR0912
         self, node: Any, next_nodes: Any
     ) -> tuple[Any, Any, list[Any]]:
         """Starting from a MatMul node, consume nodes to form a dense Ax + b node."""
@@ -342,7 +343,7 @@ def _consume_gemm_dense_nodes(self, node, next_nodes):
 
         return next_nodes, dense_layer, [input_layer]
 
-    def _consume_conv_nodes(self, node, next_nodes):
+    def _consume_conv_nodes(self, node, next_nodes):  # noqa: PLR0912, C901, PLR0915
         """Consume Conv nodes.
 
         Starting from a Conv node, consume nodes to form a convolution node with
@@ -484,7 +485,7 @@ def _consume_reshape_nodes(self, node, next_nodes):
         self._node_map[node.output[0]] = (transformer, input_layer)
         return next_nodes
 
-    def _consume_pool_nodes(self, node, next_nodes):
+    def _consume_pool_nodes(self, node, next_nodes):  # noqa: PLR0912, C901, PLR0915
         """Consume MaxPool nodes.
 
         Starting from a MaxPool node, consume nodes to form a pooling node with
@@ -569,7 +570,7 @@ def _consume_pool_nodes(self, node, next_nodes):
             )
             raise ValueError(msg)
 
-        output_shape_wrapper = math.floor
+        output_shape_wrapper: Callable[[float], int] = math.floor
         if "ceil_mode" in attr and attr["ceil_mode"] == 1:
             output_shape_wrapper = math.ceil
 
diff --git a/src/omlt/io/torch_geometric/build_gnn_formulation.py b/src/omlt/io/torch_geometric/build_gnn_formulation.py
index 66e48775..545925de 100644
--- a/src/omlt/io/torch_geometric/build_gnn_formulation.py
+++ b/src/omlt/io/torch_geometric/build_gnn_formulation.py
@@ -7,7 +7,7 @@
 from omlt.neuralnet import FullSpaceNNFormulation
 
 
-def gnn_with_non_fixed_graph(
+def gnn_with_non_fixed_graph(  # noqa: PLR0913
     block,
     nn,
     N,
@@ -78,7 +78,7 @@ def gnn_with_non_fixed_graph(
     return block
 
 
-def gnn_with_fixed_graph(
+def gnn_with_fixed_graph(  # noqa: PLR0913
     block,
     nn,
     N,
diff --git a/src/omlt/io/torch_geometric/torch_geometric_reader.py b/src/omlt/io/torch_geometric/torch_geometric_reader.py
index d37ec960..5ce9b315 100644
--- a/src/omlt/io/torch_geometric/torch_geometric_reader.py
+++ b/src/omlt/io/torch_geometric/torch_geometric_reader.py
@@ -102,7 +102,7 @@ def _process_gnn_parameters(gnn_weights_uv, gnn_weights_vv, gnn_biases, gnn_norm
 _OP_TYPES = _LAYER_OP_TYPES_FIXED_GRAPH + _ACTIVATION_OP_TYPES + _POOLING_OP_TYPES
 
 
-def load_torch_geometric_sequential(
+def load_torch_geometric_sequential(  # noqa: C901, PLR0913, PLR0912, PLR0915
     nn,
     N,
     A=None,
@@ -154,12 +154,12 @@ def load_torch_geometric_sequential(
     net.add_layer(prev_layer)
 
     operations = []
-    for l in nn:
+    for layer in nn:
         op_name = None
-        if l.__class__.__name__ == "function":
-            op_name = l.__name__
+        if layer.__class__.__name__ == "function":
+            op_name = layer.__name__
         else:
-            op_name = l.__class__.__name__
+            op_name = layer.__class__.__name__
 
         if op_name not in _OP_TYPES:
             msg = f"Operation {op_name} is not supported."
@@ -167,18 +167,20 @@ def load_torch_geometric_sequential(
         operations.append(op_name)
 
     if A is None:
+        supported_layers = {
+            "Linear",
+            *_ACTIVATION_OP_TYPES,
+            *_POOLING_OP_TYPES,
+        }
         # If A is None, then the graph is not fixed.
         # Only layers in _LAYER_OP_TYPES_NON_FIXED_GRAPH are supported.
         # Only "sum" aggregation is supported.
         # Since all weights and biases are possibly needed, A is set to correspond to a
         # complete graph.
-        for index, l in enumerate(nn):
-            if (
-                operations[index]
-                in ["Linear"] + _ACTIVATION_OP_TYPES + _POOLING_OP_TYPES
-            ):
+        for index, layer in enumerate(nn):
+            if operations[index] in supported_layers:
                 # nonlinear activation results in a MINLP
-                if operations[index] in ["Sigmoid", "LogSoftmax", "Softplus", "Tanh"]:
+                if operations[index] in {"Sigmoid", "LogSoftmax", "Softplus", "Tanh"}:
                     warnings.warn(
                         "nonlinear activation results in a MINLP", stacklevel=2
                     )
@@ -188,13 +190,13 @@ def load_torch_geometric_sequential(
             if operations[index] not in _LAYER_OP_TYPES_NON_FIXED_GRAPH:
                 msg = "this layer is not supported when the graph is not fixed."
                 raise ValueError(msg)
-            if l.aggr != "sum":
+            if layer.aggr != "sum":
                 msg = "this aggregation is not supported when the graph is not fixed"
                 raise ValueError(msg)
 
         A = np.ones((N, N)) - np.eye(N)
 
-    for index, l in enumerate(nn):
+    for index, layer in enumerate(nn):
         if operations[index] in _ACTIVATION_OP_TYPES:
             # Skip activation layers since they are already handled in last layer
             continue
@@ -205,8 +207,8 @@ def load_torch_geometric_sequential(
             activation = operations[index + 1].lower()
 
         if operations[index] == "Linear":
-            gnn_weights = l.weight.detach().numpy()
-            gnn_biases = l.bias.detach().numpy()
+            gnn_weights = layer.weight.detach().numpy()
+            gnn_biases = layer.bias.detach().numpy()
             # A linear layer is either applied on each node's features (i.e.,
             # prev_layer.output_size[-1] = N * gnn_weights.shape[1])
             # or the features after pooling (i.e.,
@@ -224,12 +226,8 @@ def load_torch_geometric_sequential(
                 biases=biases,
             )
         elif operations[index] == "GCNConv":
-            assert not l.improved
-            assert not l.cached
-            assert l.add_self_loops
-            assert l.normalize
-            gnn_weights = l.lin.weight.detach().numpy()
-            gnn_biases = l.bias.detach().numpy()
+            gnn_weights = layer.lin.weight.detach().numpy()
+            gnn_biases = layer.bias.detach().numpy()
             gnn_norm = _compute_gcn_norm(A)
             weights, biases = _process_gnn_parameters(
                 gnn_weights, gnn_weights, gnn_biases, gnn_norm
@@ -244,15 +242,12 @@ def load_torch_geometric_sequential(
                 N=N,
             )
         elif operations[index] == "SAGEConv":
-            assert not l.normalize
-            assert not l.project
-            assert l.aggr in _AGGREGATION_OP_TYPES
-            gnn_weights_uv = l.lin_l.weight.detach().numpy()
-            gnn_biases = l.lin_l.bias.detach().numpy()
+            gnn_weights_uv = layer.lin_l.weight.detach().numpy()
+            gnn_biases = layer.lin_l.bias.detach().numpy()
             gnn_weights_vv = np.zeros(shape=gnn_weights_uv.shape)
-            if l.root_weight:
-                gnn_weights_vv = l.lin_r.weight.detach().numpy()
-            gnn_norm = _compute_sage_norm(A, l.aggr)
+            if layer.root_weight:
+                gnn_weights_vv = layer.lin_r.weight.detach().numpy()
+            gnn_norm = _compute_sage_norm(A, layer.aggr)
             weights, biases = _process_gnn_parameters(
                 gnn_weights_uv, gnn_weights_vv, gnn_biases, gnn_norm
             )
diff --git a/src/omlt/linear_tree/lt_definition.py b/src/omlt/linear_tree/lt_definition.py
index cf1b5a4a..09adbb78 100644
--- a/src/omlt/linear_tree/lt_definition.py
+++ b/src/omlt/linear_tree/lt_definition.py
@@ -230,7 +230,7 @@ def _reassign_none_bounds(leaves, input_bounds):
     return leaves
 
 
-def _parse_tree_data(model, input_bounds):
+def _parse_tree_data(model, input_bounds):  # noqa: C901, PLR0915, PLR0912
     """Parse tree data.
 
     This function creates the data structures with the information required
@@ -241,7 +241,7 @@ def _parse_tree_data(model, input_bounds):
     Arguments:
         model: Trained linear-tree model or dic containing linear-tree model
             summary (e.g. dict = model.summary())
-        input_bounds:
+        input_bounds: The input bounds
 
     Returns:
         leaves - Dict containing the following information for each leaf:
diff --git a/src/omlt/linear_tree/lt_formulation.py b/src/omlt/linear_tree/lt_formulation.py
index 5960a442..d72df160 100644
--- a/src/omlt/linear_tree/lt_formulation.py
+++ b/src/omlt/linear_tree/lt_formulation.py
@@ -49,7 +49,7 @@ class LinearTreeGDPFormulation(_PyomoFormulation):
           optimization development. Optimization and Engineering, 23:607-642
     """
 
-    def __init__(self, lt_definition, transformation="bigm"):
+    def __init__(self, lt_definition, transformation="bigm", epsilon=0):
         """Create a LinearTreeGDPFormulation object.
 
         Arguments:
@@ -59,6 +59,9 @@ def __init__(self, lt_definition, transformation="bigm"):
             transformation: choose which Pyomo.GDP formulation to apply.
                 Supported transformations are bigm, hull, mbigm, and custom
                 (default: {'bigm'})
+            epsilon: Tolerance to use in enforcing that choosing the right
+                branch of a linear tree node can only happen if the feature
+                is strictly greater than the branch value.(default: 0)
 
         Raises:
             Exception: If transformation not in supported transformations
@@ -66,6 +69,7 @@ def __init__(self, lt_definition, transformation="bigm"):
         super().__init__()
         self.model_definition = lt_definition
         self.transformation = transformation
+        self.epsilon = epsilon
 
         # Ensure that the GDP transformation given is supported
         supported_transformations = ["bigm", "hull", "mbigm", "custom"]
@@ -101,6 +105,7 @@ def _build_formulation(self):
             input_vars=self.block.scaled_inputs,
             output_vars=self.block.scaled_outputs,
             transformation=self.transformation,
+            epsilon=self.epsilon,
         )
 
 
@@ -133,14 +138,21 @@ class LinearTreeHybridBigMFormulation(_PyomoFormulation):
 
     """
 
-    def __init__(self, lt_definition):
+    def __init__(self, lt_definition, epsilon=0):
         """Create a LinearTreeHybridBigMFormulation object.
 
         Arguments:
             lt_definition: LinearTreeDefinition Object
+
+        Keyword Arguments:
+            epsilon: Tolerance to use in enforcing that choosing the right
+                branch of a linear tree node can only happen if the feature
+                is strictly greater than the branch value.(default: 0)
+
         """
         super().__init__()
         self.model_definition = lt_definition
+        self.epsilon = epsilon
 
     @property
     def input_indexes(self):
@@ -164,13 +176,18 @@ def _build_formulation(self):
             self.model_definition.scaled_input_bounds,
         )
 
-        _add_hybrid_formulation_to_block(
+        _add_gdp_formulation_to_block(
             block=self.block,
             model_definition=self.model_definition,
             input_vars=self.block.scaled_inputs,
             output_vars=self.block.scaled_outputs,
+            transformation="custom",
+            epsilon=self.epsilon,
         )
 
+        pe.TransformationFactory("gdp.bound_pretransformation").apply_to(self.block)
+        pe.TransformationFactory("gdp.binary_multiplication").apply_to(self.block)
+
 
 def _build_output_bounds(model_def, input_bounds):
     """Build output bounds.
@@ -206,18 +223,16 @@ def _build_output_bounds(model_def, input_bounds):
                 else:
                     upper_bound += slopes[k] * input_bounds[k][1] + intercept
                     lower_bound += slopes[k] * input_bounds[k][0] + intercept
-                if upper_bound >= bounds[1]:
-                    bounds[1] = upper_bound
-                if lower_bound <= bounds[0]:
-                    bounds[0] = lower_bound
+                bounds[1] = max(bounds[1], upper_bound)
+                bounds[0] = min(bounds[0], lower_bound)
             upper_bound = 0
             lower_bound = 0
 
     return bounds
 
 
-def _add_gdp_formulation_to_block(
-    block, model_definition, input_vars, output_vars, transformation
+def _add_gdp_formulation_to_block(  # noqa: PLR0913
+    block, model_definition, input_vars, output_vars, transformation, epsilon
 ):
     """This function adds the GDP representation to the OmltBlock using Pyomo.GDP.
 
@@ -227,6 +242,9 @@ def _add_gdp_formulation_to_block(
         input_vars: input variables to the linear tree model
         output_vars: output variable of the linear tree model
         transformation: Transformation to apply
+        epsilon: Tolerance to use in enforcing that choosing the right
+            branch of a linear tree node can only happen if the feature
+            is strictly greater than the branch value.
 
     """
     leaves = model_definition.leaves
@@ -256,7 +274,7 @@ def _add_gdp_formulation_to_block(
     # and the linear model expression.
     def disjuncts_rule(dsj, tree, leaf):
         def lb_rule(dsj, feat):
-            return input_vars[feat] >= leaves[tree][leaf]["bounds"][feat][0]
+            return input_vars[feat] >= leaves[tree][leaf]["bounds"][feat][0] + epsilon
 
         dsj.lb_constraint = pe.Constraint(features, rule=lb_rule)
 
@@ -287,89 +305,3 @@ def disjunction_rule(b, tree):
 
     if transformation != "custom":
         pe.TransformationFactory(transformation_string).apply_to(block)
-
-
-def _add_hybrid_formulation_to_block(block, model_definition, input_vars, output_vars):
-    """This function adds the Hybrid BigM representation to the OmltBlock.
-
-    Arguments:
-        block: OmltBlock
-        model_definition: LinearTreeDefinition Object
-        input_vars: input variables to the linear tree model
-        output_vars: output variable of the linear tree model
-    """
-    leaves = model_definition.leaves
-    input_bounds = model_definition.scaled_input_bounds
-    n_inputs = model_definition.n_inputs
-
-    # The set of trees
-    tree_ids = list(leaves.keys())
-    # Create a list of tuples that contains the tree and leaf indices. Note that
-    # the leaf indices depend on the tree in the ensemble.
-    t_l = [(tree, leaf) for tree in tree_ids for leaf in leaves[tree]]
-
-    features = np.arange(0, n_inputs)
-
-    # Use the input_bounds and the linear models in the leaves to calculate
-    # the lower and upper bounds on the output variable. Required for Pyomo.GDP
-    output_bounds = _build_output_bounds(model_definition, input_bounds)
-
-    # Ouptuts are automatically scaled based on whether inputs are scaled
-    block.outputs.setub(output_bounds[1])
-    block.outputs.setlb(output_bounds[0])
-    block.scaled_outputs.setub(output_bounds[1])
-    block.scaled_outputs.setlb(output_bounds[0])
-
-    # Create the intermeditate variables. z is binary that indicates which leaf
-    # in tree t is returned. intermediate_output is the output of tree t and
-    # the total output of the model is the sum of the intermediate_output vars
-    block.z = pe.Var(t_l, within=pe.Binary)
-    block.intermediate_output = pe.Var(tree_ids)
-
-    @block.Constraint(features, tree_ids)
-    def lower_bound_constraints(mdl, feat, tree):
-        leaf_ids = list(leaves[tree].keys())
-        return (
-            sum(
-                leaves[tree][leaf]["bounds"][feat][0] * mdl.z[tree, leaf]
-                for leaf in leaf_ids
-            )
-            <= input_vars[feat]
-        )
-
-    @block.Constraint(features, tree_ids)
-    def upper_bound_constraints(mdl, feat, tree):
-        leaf_ids = list(leaves[tree].keys())
-        return (
-            sum(
-                leaves[tree][leaf]["bounds"][feat][1] * mdl.z[tree, leaf]
-                for leaf in leaf_ids
-            )
-            >= input_vars[feat]
-        )
-
-    @block.Constraint(tree_ids)
-    def linear_constraint(mdl, tree):
-        leaf_ids = list(leaves[tree].keys())
-        return block.intermediate_output[tree] == sum(
-            (
-                sum(
-                    leaves[tree][leaf]["slope"][feat] * input_vars[feat]
-                    for feat in features
-                )
-                + leaves[tree][leaf]["intercept"]
-            )
-            * block.z[tree, leaf]
-            for leaf in leaf_ids
-        )
-
-    @block.Constraint(tree_ids)
-    def only_one_leaf_per_tree(b, tree):
-        leaf_ids = list(leaves[tree].keys())
-        return sum(block.z[tree, leaf] for leaf in leaf_ids) == 1
-
-    @block.Constraint()
-    def output_sum_of_trees(b):
-        return output_vars[0] == sum(
-            block.intermediate_output[tree] for tree in tree_ids
-        )
diff --git a/src/omlt/neuralnet/activations/__init__.py b/src/omlt/neuralnet/activations/__init__.py
index 740022ad..07e1e7b9 100644
--- a/src/omlt/neuralnet/activations/__init__.py
+++ b/src/omlt/neuralnet/activations/__init__.py
@@ -7,6 +7,8 @@
 """
 from typing import Any
 
+from typing import Any
+
 from .linear import linear_activation_constraint, linear_activation_function
 from .relu import ComplementarityReLUActivation, bigm_relu_activation_constraint
 from .smooth import (
diff --git a/src/omlt/neuralnet/activations/relu.py b/src/omlt/neuralnet/activations/relu.py
index 733abb91..b02b0591 100644
--- a/src/omlt/neuralnet/activations/relu.py
+++ b/src/omlt/neuralnet/activations/relu.py
@@ -116,7 +116,7 @@ def __init__(self, transform=None):
             transform = "mpec.simple_nonlinear"
         self.transform = transform
 
-    def __call__(self, net_block, net, layer_block, layer):
+    def __call__(self, net_block, net, layer_block, layer):  # noqa: ARG002
         layer_block._complementarity = mpec.Complementarity(
             layer.output_indexes, rule=_relu_complementarity
         )
diff --git a/src/omlt/neuralnet/layer.py b/src/omlt/neuralnet/layer.py
index d7a52750..3b6faf09 100644
--- a/src/omlt/neuralnet/layer.py
+++ b/src/omlt/neuralnet/layer.py
@@ -17,6 +17,7 @@
 """
 
 import itertools
+from typing import ClassVar
 
 import numpy as np
 
@@ -195,7 +196,7 @@ class DenseLayer(Layer):
         map indexes from this layer index to the input layer index size
     """
 
-    def __init__(
+    def __init__(  # noqa: PLR0913
         self,
         input_size,
         output_size,
@@ -321,7 +322,7 @@ class GNNLayer(DenseLayer):
         map indexes from this layer index to the input layer index size
     """
 
-    def __init__(
+    def __init__(  # noqa: PLR0913
         self,
         input_size,
         output_size,
@@ -380,7 +381,6 @@ def _eval_with_adjacency(self, x, A):
             if self.input_index_mapper is not None
             else x[:]
         )
-        assert x_reshaped.shape == tuple(self.input_size)
         y = np.zeros(shape=self.output_size)
         for output_index in self.output_indexes:
             for input_index in self.input_indexes:
@@ -447,7 +447,7 @@ def kernel_depth(self):
         """Return the depth of the kernel."""
         raise NotImplementedError
 
-    def kernel_index_with_input_indexes(self, out_d, out_r, out_c):
+    def kernel_index_with_input_indexes(self, out_d, out_r, out_c):  # noqa: ARG002
         """Kernel index with input indexes.
 
         Returns an iterator over the index within the kernel and input index
@@ -468,16 +468,12 @@ def kernel_index_with_input_indexes(self, out_d, out_r, out_c):
         start_in_d = 0
         start_in_r = out_r * rows_stride
         start_in_c = out_c * cols_stride
-        mapper = lambda x: x
-        if self.input_index_mapper is not None:
-            mapper = self.input_index_mapper
 
         for k_d in range(kernel_d):
             for k_r in range(kernel_r):
                 for k_c in range(kernel_c):
                     input_index = (start_in_d + k_d, start_in_r + k_r, start_in_c + k_c)
 
-                    assert len(input_index) == len(self.input_size)
                     # don't yield an out-of-bounds input index;
                     # can happen if ceil mode is enabled for pooling layers
                     # as this could require using a partial kernel
@@ -542,9 +538,9 @@ class PoolingLayer2D(Layer2D):
         map indexes from this layer index to the input layer index size
     """
 
-    _POOL_FUNCTIONS = {"max": max}
+    _POOL_FUNCTIONS: ClassVar = {"max": max}
 
-    def __init__(
+    def __init__(  # noqa: PLR0913
         self,
         input_size,
         output_size,
@@ -618,7 +614,7 @@ class ConvLayer2D(Layer2D):
         map indexes from this layer index to the input layer index size
     """
 
-    def __init__(
+    def __init__(  # noqa: PLR0913
         self,
         input_size,
         output_size,
@@ -677,7 +673,8 @@ def __str__(self):
         return (
             f"ConvLayer(input_size={self.input_size}, output_size={self.output_size},"
             f" strides={self.strides}, kernel_shape={self.kernel_shape})"
-            )
+        )
+
     def _eval_at_index(self, x, out_d, out_r, out_c):
         acc = 0.0
         for k, index in self.kernel_with_input_indexes(out_d, out_r, out_c):
diff --git a/src/omlt/neuralnet/layers/full_space.py b/src/omlt/neuralnet/layers/full_space.py
index 25fd2dbb..978f1ce2 100644
--- a/src/omlt/neuralnet/layers/full_space.py
+++ b/src/omlt/neuralnet/layers/full_space.py
@@ -238,25 +238,24 @@ def full_space_maxpool2d_layer(net_block, net, layer_block, layer):
         \end{align*}
 
     where:
+
     :math:`w` is the convolution kernel on the preceding convolutional layer;
     :math:`d` is the number of features in each of the :math:`N` max pooling windows;
     :math:`x_{i}` is the set of :math:`d` features in the :math:`i`-th max pooling
-        window;
+    window;
     :math:`\Delta^{d}` is the :math:`d`-dimensional simplex; and [L_{i},U_{i}] are the
-        bounds on x_{i}.
+    bounds on x_{i}.
 
     NOTE This formulation is adapted from the Anderson et al. (2020) formulation,
     section 5.1, with the following changes:
 
     - OMLT presently does not support biases on convolutional layers. Bias terms from
-    the original formulation are removed.
-
+      the original formulation are removed.
     - The original formulation formulates the max of :math:`w^{l}\cdot x + b^{l}`,
       varying the weights :math:`w` and biases :math:`b` and keeping the input :math:`x`
       constant. Since convolutional layers have constant weights and biases convolved
       with varying portions of the feature map, this formulation formulates the max of
       :math:`w\cdot x^{l} + b`.
-
     - Due to the above 2 changes, the calculation of :math:`N^{l,k}` is changed.
 
     """
@@ -270,9 +269,6 @@ def full_space_maxpool2d_layer(net_block, net, layer_block, layer):
             " are not supported."
         )
         raise ValueError(msg)
-    # TODO @cog-imperial: add support for non-increasing activation functions on
-    # preceding convolutional layer
-    # https://github.com/cog-imperial/OMLT/issues/154
 
     # note kernel indexes are the same set of values for any output index, so wlog get
     # kernel indexes for (0, 0, 0)
@@ -318,30 +314,32 @@ def full_space_maxpool2d_layer(net_block, net, layer_block, layer):
             == 1
         )
 
-        for l, input_index in layer.kernel_index_with_input_indexes(
+        for layer_index, input_index in layer.kernel_index_with_input_indexes(
             out_d, out_r, out_c
         ):
             mapped_input_index = layer.input_index_mapper(input_index)
 
             # Since biases are zero,
             # input_layer_block.z[input_index] is equal to w dot x in the formulation.
-            layer_block._zhat_upper_bound[output_index, l] = layer_block.zhat[
+            layer_block._zhat_upper_bound[output_index, layer_index] = layer_block.zhat[
                 output_index
             ] <= input_layer_block.z[mapped_input_index] + sum(
                 layer_block.q_maxpool[output_index, k]
-                * _calculate_n_plus(output_index, l, k, layer, input_layer_block)
+                * _calculate_n_plus(
+                    output_index, layer_index, k, layer, input_layer_block
+                )
                 for k in layer_block._kernel_indexes
             )
-            layer_block._zhat_lower_bound[output_index, l] = (
+            layer_block._zhat_lower_bound[output_index, layer_index] = (
                 layer_block.zhat[output_index]
                 >= input_layer_block.z[mapped_input_index]
             )
 
 
-def _calculate_n_plus(out_index, l, k, layer, input_layer_block):
-    if l == k:
+def _calculate_n_plus(out_index, kernel_index, k, layer, input_layer_block):
+    if kernel_index == k:
         return 0
-    x_l_index = layer.input_index_mapper(layer.get_input_index(out_index, l))
+    x_l_index = layer.input_index_mapper(layer.get_input_index(out_index, kernel_index))
     x_k_index = layer.input_index_mapper(layer.get_input_index(out_index, k))
     return max(
         x_k_bound - x_l_bound
diff --git a/src/omlt/neuralnet/layers/partition_based.py b/src/omlt/neuralnet/layers/partition_based.py
index 1430332a..f5d7d06c 100644
--- a/src/omlt/neuralnet/layers/partition_based.py
+++ b/src/omlt/neuralnet/layers/partition_based.py
@@ -14,7 +14,7 @@ def default_partition_split_func(w, n):
     return np.array_split(sorted_indexes, n)
 
 
-def partition_based_dense_relu_layer(net_block, net, layer_block, layer, split_func):
+def partition_based_dense_relu_layer(net_block, net, layer_block, layer, split_func):  # noqa: C901, PLR0915
     r"""Partition-based ReLU activation formulation.
 
     Generates the constraints for the ReLU activation function:
@@ -75,7 +75,7 @@ def partition_based_dense_relu_layer(net_block, net, layer_block, layer, split_f
     prev_layer_block = net_block.layer[id(prev_layer)]
 
     @layer_block.Block(layer.output_indexes)
-    def output_node_block(b, *output_index):
+    def output_node_block(b, *output_index):  # noqa: PLR0915
         # dense layers multiply only the last dimension of
         # their inputs
         weights = layer.weights[:, output_index[-1]]
diff --git a/src/omlt/neuralnet/nn_formulation.py b/src/omlt/neuralnet/nn_formulation.py
index 8e835d23..bed521b2 100644
--- a/src/omlt/neuralnet/nn_formulation.py
+++ b/src/omlt/neuralnet/nn_formulation.py
@@ -1,3 +1,5 @@
+from functools import partial
+
 import pyomo.environ as pyo
 
 from omlt.formulation import _PyomoFormulation, _setup_scaled_inputs_outputs
@@ -136,7 +138,7 @@ def output_indexes(self):
         return network_outputs[0].output_indexes
 
 
-def _build_neural_network_formulation(
+def _build_neural_network_formulation(  # noqa: C901
     block, network_structure, layer_constraints, activation_constraints
 ):
     """Adds the neural network formulation to the given Pyomo block.
@@ -394,20 +396,12 @@ def z(b, *output_index):
 
         @block.Constraint(output_layer.output_indexes)
         def output_assignment(b, *output_index):
-            pb = b.parent_block()
+            b.parent_block()
             return (
                 b.scaled_outputs[output_index]
                 == b.layer[id(output_layer)].z[output_index]
             )
 
-    # @property
-    # def layer_constraints(self):
-    #     return self._layer_constraints
-
-    # @property
-    # def activation_constraints(self):
-    #     return self._activation_constraints
-
     @property
     def input_indexes(self):
         """The indexes of the formulation inputs."""
@@ -471,11 +465,11 @@ def __init__(self, network_structure, split_func=None):
         self.__scaled_input_bounds = network_structure.scaled_input_bounds
 
         if split_func is None:
-            split_func = lambda w: default_partition_split_func(w, 2)
+            split_func = partial(default_partition_split_func, n=2)
 
         self.__split_func = split_func
 
-    def _build_formulation(self):
+    def _build_formulation(self):  # noqa: C901
         _setup_scaled_inputs_outputs(
             self.block, self.__scaling_object, self.__scaled_input_bounds
         )
diff --git a/src/omlt/scaling.py b/src/omlt/scaling.py
index 5ffaafbe..443f14e3 100644
--- a/src/omlt/scaling.py
+++ b/src/omlt/scaling.py
@@ -120,8 +120,7 @@ def get_unscaled_input_expressions(self, scaled_input_vars):
 
         scaled_x = scaled_input_vars
         return {
-            k: scaled_x[k] * self.__x_factor[k] + self.__x_offset[k]
-            for k in scaled_x
+            k: scaled_x[k] * self.__x_factor[k] + self.__x_offset[k] for k in scaled_x
         }
 
     def get_scaled_output_expressions(self, output_vars):
@@ -163,6 +162,5 @@ def get_unscaled_output_expressions(self, scaled_output_vars):
 
         scaled_y = scaled_output_vars
         return {
-            k: scaled_y[k] * self.__y_factor[k] + self.__y_offset[k]
-            for k in scaled_y
+            k: scaled_y[k] * self.__y_factor[k] + self.__y_offset[k] for k in scaled_y
         }
diff --git a/tests/io/test_onnx_parser.py b/tests/io/test_onnx_parser.py
index 3227e67d..fc74b34c 100644
--- a/tests/io/test_onnx_parser.py
+++ b/tests/io/test_onnx_parser.py
@@ -287,6 +287,7 @@ def test_consume_maxpool_wrong_dims(datadir):
     parser._nodes["node1"][1].input.append("abcd")
     expected_msg_maxpool = (
         "node1 input has 2 dimensions, only nodes with 1 input "
-                            "dimension can be used as starting points for parsing.")
-    with pytest.raises(ValueError, match = expected_msg_maxpool):
+        "dimension can be used as starting points for parsing."
+    )
+    with pytest.raises(ValueError, match=expected_msg_maxpool):
         parser._consume_pool_nodes(parser._nodes["node1"][1], parser._nodes["node1"][2])
diff --git a/tests/io/test_torch_geometric.py b/tests/io/test_torch_geometric.py
index be098406..53d9db58 100644
--- a/tests/io/test_torch_geometric.py
+++ b/tests/io/test_torch_geometric.py
@@ -75,7 +75,7 @@ def _test_torch_geometric_reader(nn, activation, pooling):
     A = np.ones((N, N), dtype=int)
     net = load_torch_geometric_sequential(nn, N, A)
     layers = list(net.layers)
-    assert len(layers) == 7
+    assert len(layers) == 7  # noqa: PLR2004
     assert layers[1].weights.shape == (8, 16)
     assert layers[2].weights.shape == (16, 16)
     assert layers[3].weights.shape == (16, 16)
@@ -112,8 +112,8 @@ def _test_gnn_with_fixed_graph(nn):
     m.nn = OmltBlock()
     A = np.eye(N, dtype=int)
     gnn_with_fixed_graph(m.nn, nn, N, A, scaled_input_bounds=input_bounds)
-    assert m.nvariables() == 282
-    assert m.nconstraints() == 614
+    assert m.nvariables() == 282  # noqa: PLR2004
+    assert m.nconstraints() == 614  # noqa: PLR2004
 
 
 @pytest.mark.skipif(
@@ -131,8 +131,8 @@ def _test_gnn_with_non_fixed_graph(nn):
     m = pyo.ConcreteModel()
     m.nn = OmltBlock()
     gnn_with_non_fixed_graph(m.nn, nn, N, scaled_input_bounds=input_bounds)
-    assert m.nvariables() == 282
-    assert m.nconstraints() == 620
+    assert m.nvariables() == 282  # noqa: PLR2004
+    assert m.nconstraints() == 620  # noqa: PLR2004
 
 
 @pytest.mark.skipif(
@@ -189,7 +189,7 @@ def _test_gnn_value_error(nn, error_info, error_type="ValueError"):
     for i in range(input_size[0]):
         input_bounds[(i)] = (-1.0, 1.0)
     if error_type == "ValueError":
-        with pytest.raises(ValueError) as excinfo:
+        with pytest.raises(ValueError) as excinfo:  # noqa: PT011
             load_torch_geometric_sequential(
                 nn=nn,
                 N=N,
diff --git a/tests/linear_tree/test_lt_formulation.py b/tests/linear_tree/test_lt_formulation.py
index 30e3a1a2..e9959133 100644
--- a/tests/linear_tree/test_lt_formulation.py
+++ b/tests/linear_tree/test_lt_formulation.py
@@ -5,6 +5,8 @@
 
 if lineartree_available:
     from lineartree import LinearTreeRegressor
+    from sklearn.linear_model import LinearRegression
+
     from omlt.linear_tree import (
         LinearTreeDefinition,
         LinearTreeGDPFormulation,
@@ -84,7 +86,7 @@ def linear_model_tree(X, y):
 
 
 @pytest.mark.skipif(not lineartree_available, reason="Need Linear-Tree Package")
-def test_linear_tree_model_single_var():
+def test_linear_tree_model_single_var():  # noqa: C901
     # construct a LinearTreeDefinition
     regr_small = linear_model_tree(X=X_small, y=y_small)
     input_bounds = {0: (min(X_small)[0], max(X_small)[0])}
@@ -192,6 +194,60 @@ def connect_outputs(mdl):
     assert y_pred[0] == pytest.approx(solution_1_bigm[1])
 
 
+def get_epsilon_test_model(formulation_lt):
+    model1 = pe.ConcreteModel()
+    model1.x = pe.Var(initialize=0)
+    model1.y = pe.Var(initialize=0)
+    model1.obj = pe.Objective(expr=model1.y, sense=pe.maximize)
+    model1.lt = OmltBlock()
+    model1.lt.build_formulation(formulation_lt)
+
+    @model1.Constraint()
+    def connect_inputs(mdl):
+        return mdl.x == mdl.lt.inputs[0]
+
+    @model1.Constraint()
+    def connect_outputs(mdl):
+        return mdl.y == mdl.lt.outputs[0]
+
+    model1.x.fix(1.058749)
+
+    return model1
+
+
+@pytest.mark.skipif(
+    not lineartree_available or not cbc_available,
+    reason="Need Linear-Tree Package and cbc",
+)
+def test_nonzero_epsilon():
+    regr_small = linear_model_tree(X=X_small, y=y_small)
+    input_bounds = {0: (min(X_small)[0], max(X_small)[0])}
+    ltmodel_small = LinearTreeDefinition(regr_small, unscaled_input_bounds=input_bounds)
+    formulation_bad = LinearTreeGDPFormulation(
+        ltmodel_small, transformation="bigm", epsilon=0
+    )
+    formulation1_lt = LinearTreeGDPFormulation(
+        ltmodel_small, transformation="bigm", epsilon=1e-4
+    )
+
+    model_good = get_epsilon_test_model(formulation1_lt)
+    model_bad = get_epsilon_test_model(formulation_bad)
+
+    status_1_bigm = pe.SolverFactory("cbc").solve(model_bad)
+    pe.assert_optimal_termination(status_1_bigm)
+    solution_1_bigm = (pe.value(model_bad.x), pe.value(model_bad.y))
+    y_pred = regr_small.predict(np.array(solution_1_bigm[0]).reshape(1, -1))
+    # Without an epsilon, the model cheats and does not match the tree prediction
+    assert y_pred[0] != pytest.approx(solution_1_bigm[1])
+
+    status = pe.SolverFactory("cbc").solve(model_good)
+    pe.assert_optimal_termination(status)
+    solution = (pe.value(model_good.x), pe.value(model_good.y))
+    y_pred = regr_small.predict(np.array(solution[0]).reshape(1, -1))
+    # With epsilon, the model matches the tree prediction
+    assert y_pred[0] == pytest.approx(solution[1])
+
+
 @pytest.mark.skipif(
     not lineartree_available or not cbc_available,
     reason="Need Linear-Tree Package and cbc",
@@ -381,7 +437,7 @@ def test_scaling():
 
 
 @pytest.mark.skipif(not lineartree_available, reason="Need Linear-Tree Package")
-def test_linear_tree_model_multi_var():
+def test_linear_tree_model_multi_var():  # noqa: C901
     # construct a LinearTreeDefinition
     regr = linear_model_tree(X=X, y=Y)
     input_bounds = {0: (min(X[:, 0]), max(X[:, 0])), 1: (min(X[:, 1]), max(X[:, 1]))}
@@ -626,7 +682,7 @@ def connect_outputs(mdl):
 
 
 @pytest.mark.skipif(not lineartree_available, reason="Need Linear-Tree Package")
-def test_summary_dict_as_argument():
+def test_summary_dict_as_argument():  # noqa: C901
     # construct a LinearTreeDefinition
     regr = linear_model_tree(X=X, y=Y)
     input_bounds = {0: (min(X[:, 0]), max(X[:, 0])), 1: (min(X[:, 1]), max(X[:, 1]))}
diff --git a/tests/neuralnet/test_network_definition.py b/tests/neuralnet/test_network_definition.py
index ee073c5e..a18d7eea 100644
--- a/tests/neuralnet/test_network_definition.py
+++ b/tests/neuralnet/test_network_definition.py
@@ -12,8 +12,6 @@
 ALMOST_EXACTLY_EQUAL = 1e-8
 
 
-# TODO @cog-imperial: Build more tests with different activations and edge cases
-# https://github.com/cog-imperial/OMLT/issues/158
 def test_two_node_full_space():
     """Two node full space network.
 
diff --git a/tests/neuralnet/test_nn_formulation.py b/tests/neuralnet/test_nn_formulation.py
index 315bb176..2ac66377 100644
--- a/tests/neuralnet/test_nn_formulation.py
+++ b/tests/neuralnet/test_nn_formulation.py
@@ -1,4 +1,6 @@
 import re
+from functools import partial
+from typing import TYPE_CHECKING
 
 import numpy as np
 import pyomo.environ as pyo
@@ -48,6 +50,25 @@
 THREE_NODE_VARS = 81
 THREE_NODE_CONSTRAINTS = 120
 
+if TYPE_CHECKING:
+    from omlt.formulation import _PyomoFormulation
+
+formulations = {
+    "FullSpace": FullSpaceNNFormulation,
+    "ReducedSpace": ReducedSpaceNNFormulation,
+    "relu": ReluPartitionFormulation,
+}
+
+NEAR_EQUAL = 1e-6
+FULLSPACE_SMOOTH_VARS = 15
+FULLSPACE_SMOOTH_CONSTRAINTS = 14
+FULLSPACE_RELU_VARS = 19
+FULLSPACE_RELU_CONSTRAINTS = 26
+REDUCED_VARS = 6
+REDUCED_CONSTRAINTS = 5
+THREE_NODE_VARS = 81
+THREE_NODE_CONSTRAINTS = 120
+
 
 def two_node_network(activation, input_value):
     """Two node network.
@@ -344,8 +365,6 @@ def test_maxpool_full_space_nn_formulation():
 
     net, y = _maxpool_conv_network(inputs)
     m.neural_net_block.build_formulation(FullSpaceNNFormulation(net))
-    # assert m.nvariables() == 15
-    # assert m.nconstraints() == 14
 
     for inputs_d in range(inputs.shape[0]):
         for inputs_r in range(inputs.shape[1]):
@@ -523,7 +542,7 @@ def test_partition_based_unbounded_below():
     prev_layer_block = m.neural_net_block.layer[prev_layer_id]
     prev_layer_block.z.setlb(-interval.inf)
 
-    split_func = lambda w: default_partition_split_func(w, 2)
+    split_func = partial(default_partition_split_func, n=2)
 
     expected_msg = "Expression is unbounded below."
     with pytest.raises(ValueError, match=expected_msg):
@@ -544,7 +563,7 @@ def test_partition_based_unbounded_above():
     prev_layer_block = m.neural_net_block.layer[prev_layer_id]
     prev_layer_block.z.setub(interval.inf)
 
-    split_func = lambda w: default_partition_split_func(w, 2)
+    split_func = partial(default_partition_split_func, n=2)
 
     expected_msg = "Expression is unbounded above."
     with pytest.raises(ValueError, match=expected_msg):
@@ -563,7 +582,7 @@ def test_partition_based_bias_unbounded_below():
     m.neural_net_block.build_formulation(formulation)
 
     test_layer.biases[0] = -interval.inf
-    split_func = lambda w: default_partition_split_func(w, 2)
+    split_func = partial(default_partition_split_func, n=2)
 
     expected_msg = "Expression is unbounded below."
     with pytest.raises(ValueError, match=expected_msg):
@@ -582,7 +601,7 @@ def test_partition_based_bias_unbounded_above():
     m.neural_net_block.build_formulation(formulation)
 
     test_layer.biases[0] = interval.inf
-    split_func = lambda w: default_partition_split_func(w, 2)
+    split_func = partial(default_partition_split_func, n=2)
     expected_msg = "Expression is unbounded above."
     with pytest.raises(ValueError, match=expected_msg):
         partition_based_dense_relu_layer(
diff --git a/tests/neuralnet/test_onnx.py b/tests/neuralnet/test_onnx.py
index 7d33675f..b2ee298e 100644
--- a/tests/neuralnet/test_onnx.py
+++ b/tests/neuralnet/test_onnx.py
@@ -7,6 +7,7 @@
 
 if onnx_available:
     import onnxruntime as ort
+
     from omlt.io.onnx import (
         load_onnx_neural_network,
         load_onnx_neural_network_with_bounds,
diff --git a/tests/neuralnet/train_keras_models.py b/tests/neuralnet/train_keras_models.py
index 81469c6a..fdcad286 100644
--- a/tests/neuralnet/train_keras_models.py
+++ b/tests/neuralnet/train_keras_models.py
@@ -3,11 +3,13 @@
 from keras.layers import Conv2D, Dense
 from keras.models import Sequential
 from keras.optimizers import Adamax
+from pyomo.common.fileutils import this_file_dir
+
 from omlt.io import write_onnx_model_with_bounds
 from pyomo.common.fileutils import this_file_dir
 
 
-def train_models():
+def train_models():  # noqa: PLR0915
     x, y, x_test = get_neural_network_data("131")
     nn = Sequential(name="keras_linear_131")
     nn.add(
@@ -34,9 +36,7 @@ def train_models():
         )
     )
     nn.compile(optimizer=Adamax(learning_rate=0.01), loss="mae")
-    nn.fit(
-        x=x, y=y, validation_split=0.2, batch_size=16, verbose=1, epochs=15
-    )
+    nn.fit(x=x, y=y, validation_split=0.2, batch_size=16, verbose=1, epochs=15)
     nn.save(this_file_dir() + "/models/keras_linear_131.keras")
 
     x, y, x_test = get_neural_network_data("131")
@@ -66,9 +66,7 @@ def train_models():
         )
     )
     nn.compile(optimizer=Adamax(learning_rate=0.01), loss="mae")
-    nn.fit(
-        x=x, y=y, validation_split=0.2, batch_size=16, verbose=1, epochs=15
-    )
+    nn.fit(x=x, y=y, validation_split=0.2, batch_size=16, verbose=1, epochs=15)
     nn.save(this_file_dir() + "/models/keras_linear_131_sigmoid.keras")
 
     x, y, x_test = get_neural_network_data("131")
@@ -99,9 +97,7 @@ def train_models():
         )
     )
     nn.compile(optimizer=Adamax(learning_rate=0.01), loss="mae")
-    nn.fit(
-        x=x, y=y, validation_split=0.2, batch_size=16, verbose=1, epochs=15
-    )
+    nn.fit(x=x, y=y, validation_split=0.2, batch_size=16, verbose=1, epochs=15)
     nn.save(
         this_file_dir() + "/models/keras_linear_131_sigmoid_output_activation.keras"
     )
@@ -133,9 +129,7 @@ def train_models():
         )
     )
     nn.compile(optimizer=Adamax(learning_rate=0.01), loss="mae")
-    nn.fit(
-        x=x, y=y, validation_split=0.2, batch_size=16, verbose=1, epochs=15
-    )
+    nn.fit(x=x, y=y, validation_split=0.2, batch_size=16, verbose=1, epochs=15)
     nn.save(this_file_dir() + "/models/keras_linear_131_relu.keras")
 
     x, y, x_test = get_neural_network_data("131")
@@ -166,9 +160,7 @@ def train_models():
         )
     )
     nn.compile(optimizer=Adamax(learning_rate=0.01), loss="mae")
-    nn.fit(
-        x=x, y=y, validation_split=0.2, batch_size=16, verbose=1, epochs=15
-    )
+    nn.fit(x=x, y=y, validation_split=0.2, batch_size=16, verbose=1, epochs=15)
     nn.save(this_file_dir() + "/models/keras_linear_131_relu_output_activation.keras")
 
     x, y, x_test = get_neural_network_data("131")
@@ -199,9 +191,7 @@ def train_models():
         )
     )
     nn.compile(optimizer=Adamax(learning_rate=0.01), loss="mae")
-    nn.fit(
-        x=x, y=y, validation_split=0.2, batch_size=16, verbose=1, epochs=15
-    )
+    nn.fit(x=x, y=y, validation_split=0.2, batch_size=16, verbose=1, epochs=15)
     nn.save(
         this_file_dir()
         + "/models/keras_linear_131_sigmoid_softplus_output_activation.keras"
@@ -260,9 +250,7 @@ def train_models():
         )
     )
     nn.compile(optimizer=Adamax(learning_rate=0.01), loss="mae")
-    nn.fit(
-        x=x, y=y, validation_split=0.2, batch_size=16, verbose=1, epochs=15
-    )
+    nn.fit(x=x, y=y, validation_split=0.2, batch_size=16, verbose=1, epochs=15)
     nn.save(this_file_dir() + "/models/big.keras")
 
     x, y, x_test = get_neural_network_data("2353")
@@ -302,9 +290,7 @@ def train_models():
         )
     )
     nn.compile(optimizer=Adamax(learning_rate=0.01), loss="mae")
-    nn.fit(
-        x=x, y=y, validation_split=0.2, batch_size=16, verbose=1, epochs=15
-    )
+    nn.fit(x=x, y=y, validation_split=0.2, batch_size=16, verbose=1, epochs=15)
 
     nn.save(this_file_dir() + "/models/keras_linear_2353.keras")
 
@@ -336,7 +322,7 @@ def train_conv():
             input_bounds[0, i, j] = (0.0, 1.0)
     with tempfile.NamedTemporaryFile(suffix=".onnx", delete=False) as f:
         write_onnx_model_with_bounds(f.name, onnx_model, input_bounds)
-        print(f"Wrote ONNX model with bounds at {f.name}")
+        print(f"Wrote ONNX model with bounds at {f.name}")  # noqa: T201
 
 
 if __name__ == "__main__":
diff --git a/tests/test_block.py b/tests/test_block.py
index 9711345c..ed19449b 100644
--- a/tests/test_block.py
+++ b/tests/test_block.py
@@ -69,17 +69,17 @@ def test_input_output_auto_creation():
     formulation1 = DummyFormulation()
     formulation1._clear_inputs()
     expected_msg = (
-                "OmltBlock must have at least one input to build a formulation. "
-                f"{formulation1} has no inputs."
-            )
+        "OmltBlock must have at least one input to build a formulation. "
+        f"{formulation1} has no inputs."
+    )
     with pytest.raises(ValueError, match=expected_msg):
         m.b3.build_formulation(formulation1)
 
     formulation2 = DummyFormulation()
     formulation2._clear_outputs()
     expected_msg = (
-                "OmltBlock must have at least one output to build a formulation. "
-                f"{formulation2} has no outputs."
-            )
+        "OmltBlock must have at least one output to build a formulation. "
+        f"{formulation2} has no outputs."
+    )
     with pytest.raises(ValueError, match=expected_msg):
         m.b3.build_formulation(formulation2)
diff --git a/tests/test_formulation.py b/tests/test_formulation.py
index 155e596c..e2639ba0 100644
--- a/tests/test_formulation.py
+++ b/tests/test_formulation.py
@@ -1,4 +1,6 @@
 import pytest
+from pyomo.environ import ConcreteModel, Objective, SolverFactory, value
+
 from omlt import OmltBlock
 from omlt.formulation import _setup_scaled_inputs_outputs
 from omlt.scaling import OffsetScaling