-
Notifications
You must be signed in to change notification settings - Fork 131
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
added environment variable to disable extensions autoload (#1095)
* added environment variable to disable extensions autoload --------- Co-authored-by: zilto <tjean@DESKTOP-V6JDCS2>
- Loading branch information
Showing
6 changed files
with
247 additions
and
54 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,97 @@ | ||
===================== | ||
Extension autoloading | ||
===================== | ||
|
||
Under ``hamilton.plugins``, there are many modules named ``*_extensions`` (e.g., ``hamilton.plugins.pandas_extensions``, ``hamilton.plugins.mlflow_extensions``). They implement Hamilton features for 3rd party libraries, including ``@extract_columns``, materializers (``to.parquet``, ``from_.mlflow``), and more. | ||
|
||
|
||
Autoloading behavior | ||
-------------------- | ||
|
||
By default, Hamilton attempts to load all extensions one-by-one. This means that as you have more Python packages in your environment (e.g., ``pandas``, ``pyspark``, ``mlflow``, ``xgboost``), importing Hamilton appears to become slower because it actually imports many packages. | ||
|
||
This behavior can be less desirable when your Hamilton dataflow doesn't use any of these packages, but you need them in your Python environment nonetheless. For example, if only ``pandas`` is needed for your dataflow, but you have ``mlflow`` and ``xgboost`` in your environment their respective extensions will be loaded each time. | ||
|
||
|
||
Disable autoloading | ||
-------------------- | ||
|
||
Disabling extension autoloading allows to import Hamilton without any extensions, which can reduce import time from 2-3 sec to less than 0.5 sec. This speedup is welcomed when you need to restart a notebook's kernel often or you're operating in a low RAM environment (some Python packages are larger than 50Mbs). | ||
|
||
There are three ways to opt-out: programmatically, environment variables, configuration file. You must opt-out before having any other ``hamilton`` import. | ||
|
||
1. Programmatically | ||
~~~~~~~~~~~~~~~~~~~ | ||
|
||
.. code-block:: python | ||
from hamilton import registry | ||
registry.disable_autoload() | ||
2. Environment variables | ||
~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
From the console | ||
|
||
.. code-block:: console | ||
export HAMILTON_AUTOLOAD_EXTENSIONS=0 | ||
Programmatically via Python ``os.environ``. | ||
|
||
.. code-block:: python | ||
import os | ||
os.environ["HAMILTON_AUTOLOAD_EXTENSIONS"] = "0" | ||
Programmatically in Jupyter notebooks | ||
|
||
.. code-block:: python | ||
%env HAMILTON_AUTOLOAD_EXTENSIONS=0 | ||
3. Configuration file | ||
~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
Using the following command disables autoloading via the configuration file ``./hamilton.conf``. Hamilton won't autoload extensions anymore (i.e., you won't need to use approach 1 or 2 each time). | ||
|
||
.. code-block:: console | ||
hamilton-disable-autoload-extensions | ||
To revert this configuration use the following command | ||
|
||
.. code-block:: console | ||
hamilton-enable-autoload-extensions | ||
To reenable autoloading in specific files, you can delete the environment variable or use ``registry.enable_autoload()`` before calling ``registry.initialize()`` | ||
|
||
.. code-block:: python | ||
from hamilton import registry | ||
registry.enable_autoload() | ||
registry.initialize() | ||
Manually loading extensions | ||
---------------------------- | ||
|
||
If you disabled autoloading, extensions need to be loaded manually. You should load them before having any other ``hamilton`` import to avoid hard-to-track bugs. There are two ways. | ||
|
||
1. Importing the extension | ||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
.. code-block:: python | ||
from hamilton.plugins import pandas_extensions, mlflow_extensions | ||
2. Registering the extension | ||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
This approach has good IDE support via ``typing.Literal`` | ||
|
||
.. code-block:: python | ||
from hamilton import registry | ||
registry.load_extensions("mlflow") |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
import pytest | ||
|
||
from hamilton import registry | ||
|
||
|
||
@pytest.mark.parametrize("entrypoint", ["config_disable_autoload", "config_enable_autoload"]) | ||
def test_command_entrypoints_arent_renamed(entrypoint: str): | ||
"""Ensures that functions associated with an entrypoint in | ||
pyproject.toml aren't renamed. | ||
This doesn't prevent the entrypoints from being renamed | ||
""" | ||
assert hasattr(registry, entrypoint) |