Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update documentation of hooks LIFO order #3013

Merged
merged 27 commits into from
Sep 18, 2023
Merged
Show file tree
Hide file tree
Changes from 22 commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
9d4d1b0
Update release note
noklam Sep 6, 2023
6c079b3
Merge branch 'main' into noklam/document-the-lifo-order-1940
noklam Sep 6, 2023
2b2d51e
add placeholder
noklam Sep 6, 2023
cf6e755
update link
noklam Sep 6, 2023
2983609
add hook execution order
noklam Sep 7, 2023
8cca7e3
update template
noklam Sep 7, 2023
8c87790
Update template
noklam Sep 7, 2023
d274023
add pyproject.toml example
noklam Sep 7, 2023
37b0fc6
update plugin list
noklam Sep 7, 2023
3bcbd05
Merge branch 'main' into noklam/document-the-lifo-order-1940
noklam Sep 7, 2023
ead8abe
Introducing a weasel word
stichbury Sep 8, 2023
661a890
Further changes to Hooks docs
stichbury Sep 8, 2023
77ad03f
fix typo with review suggestions.
noklam Sep 8, 2023
66c1f1e
Apply suggestions from code review
noklam Sep 8, 2023
fcc8672
Add a new command `make language-lint` for doc lint. Trigger only man…
noklam Sep 8, 2023
5511c97
Merge branch 'noklam/document-the-lifo-order-1940' of https://github.…
noklam Sep 8, 2023
424c14f
fix makefile
noklam Sep 8, 2023
6a83c1b
fix lint
noklam Sep 8, 2023
7db3db2
Fix broken links
noklam Sep 8, 2023
f253436
Fix links
noklam Sep 8, 2023
930d1cf
Remove setup.py mention, reference it to 0.18.13 documentation
noklam Sep 8, 2023
18746e5
Merge branch 'main' into noklam/document-the-lifo-order-1940
noklam Sep 13, 2023
1ffb3b3
remove debug message
noklam Sep 14, 2023
38c5e3a
Fix release note
noklam Sep 14, 2023
4f45096
Apply suggestions from code review
noklam Sep 14, 2023
7460c59
Apply suggestions from code review
noklam Sep 14, 2023
1e4bce4
Merge branch 'main' into noklam/document-the-lifo-order-1940
noklam Sep 18, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -70,3 +70,9 @@ sign-off:
echo '--trailer "Signed-off-by: $$(git config user.name) <$$(git config user.email)>" \c' >> .git/hooks/commit-msg
echo '--in-place "$$1"' >> .git/hooks/commit-msg
chmod +x .git/hooks/commit-msg

language-lint: dir ?= docs

# Pattern rule to allow "make language-lint dir=doc/source/hooks>" syntax
language-lint:
vale $(dir)
1 change: 1 addition & 0 deletions RELEASE.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@
* Updated dataset factories to resolve nested catalog config properly.

## Documentation changes
* Added documentation to clarify execution orders of hooks.
noklam marked this conversation as resolved.
Show resolved Hide resolved
## Breaking changes to the API
## Upcoming deprecations for Kedro 0.19.0
## Community contributions
Expand Down
53 changes: 23 additions & 30 deletions docs/source/extend_kedro/plugins.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,13 +29,12 @@ def to_json(metadata):
pipeline = pipelines["__default__"]
print(pipeline.to_json())
```
From version 0.18.14, Kedro replaced `setup.py` with `pyproject.toml`. The plugin needs to provide entry points in either file. If you are using `setup.py`, please refer to the [`0.18.13` version of documentations](https://docs.kedro.org/en/0.18.13/extend_kedro/plugins.html).

The plugin provides the following `entry_points` config in `setup.py`:

```python
setup(
entry_points={"kedro.project_commands": ["kedrojson = kedrojson.plugin:commands"]}
)
To add the entry point to `pyproject.toml`, the plugin needs to provide the following `entry_points` configuration:
```toml
[project.entry-points."kedro.project_commands"]
kedrojson = kedrojson.plugin.commands
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops, typo

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did test it with the plugin hooks🥲 maybe I mess it up when I copy-paste it back to the doc. Thanks for spotting this.

```

Once the plugin is installed, you can run it as follows:
Expand Down Expand Up @@ -73,12 +72,11 @@ starters = [

The `directory` argument is optional and should be used when you have multiple templates in one repository as for the [official kedro-starters](https://github.com/kedro-org/kedro-starters). If you only have one template, your top-level directory will be treated as the template. For an example, see the [pandas-iris starter](https://github.com/kedro-org/kedro-starters/tree/main/pandas-iris).

In your `setup.py`, you need to register the specifications to `kedro.starters`.
In your `pyproject.toml`, you need to register the specifications to `kedro.starters`:

```python
setup(
entry_points={"kedro.starters": ["starter = plugin:starters"]},
)
```toml
[project.entry-points."kedro.starters"]
starter = plugin.starters
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be plugin:starters? (Same typo as above)

```

After that you can use this starter with `kedro new --starter=test_plugin_starter`.
Expand Down Expand Up @@ -127,10 +125,14 @@ We use the following command convention: `kedro <plugin-name> <command>`, with `

## Hooks

You can develop hook implementations and have them automatically registered to the project context when the plugin is installed. To enable this for your custom plugin, simply add the following entry in your `setup.py`:
You can develop hook implementations and have them automatically registered to the project context when the plugin is installed.

```python
setup(entry_points={"kedro.hooks": ["plugin_name = plugin_name.plugin:hooks"]})
To enable this for your custom plugin, simply add the following entry in your `pyproject.toml`
noklam marked this conversation as resolved.
Show resolved Hide resolved

To use `pyproject.toml`, specifiy
noklam marked this conversation as resolved.
Show resolved Hide resolved
```toml
[project.entry-points."kedro.hooks"]
plugin_name = plugin_name.plugin.hooks
```

where `plugin.py` is the module where you declare hook implementations:
Expand All @@ -156,13 +158,15 @@ hooks = MyHooks()

## CLI Hooks

You can also develop hook implementations to extend Kedro's CLI behaviour in your plugin. To find available CLI hooks, please visit [kedro.framework.cli.hooks](/kedro.framework.cli.hooks). To register CLI hooks developed in your plugin with Kedro, add the following entry in your project's `setup.py`:
You can also develop hook implementations to extend Kedro's CLI behaviour in your plugin. To find available CLI hooks, please visit [kedro.framework.cli.hooks](/kedro.framework.cli.hooks). To register CLI hooks developed in your plugin with Kedro, add the following entry in your project's `pyproject.toml`:
noklam marked this conversation as resolved.
Show resolved Hide resolved

```python
setup(entry_points={"kedro.cli_hooks": ["plugin_name = plugin_name.plugin:cli_hooks"]})

```toml
[project.entry-points."kedro.cli_hooks"]
plugin_name = plugin_name.plugin.cli_hooks
```

where `plugin.py` is the module where you declare hook implementations:
(where `plugin.py` is the module where you declare hook implementations):
noklam marked this conversation as resolved.
Show resolved Hide resolved

```python
import logging
Expand Down Expand Up @@ -204,28 +208,17 @@ connectors are implementations of the `AbstractDataset`

## Community-developed plugins

See the full list of plugins using the GitHub tag [kedro-plugin](https://github.com/topics/kedro-plugin).
There are many community-developed plugins available and a comprehensive list of plugins is published on the [`awesome-kedro`](https://github.com/kedro-org/awesome-kedro) GitHub repository. The list below is a small snapshot of some of those under active maintenance.


```{note}
Your plugin needs to have an [Apache 2.0 compatible license](https://www.apache.org/legal/resolved.html#category-a) to be considered for this list.
```

- [Kedro-Pandas-Profiling](https://github.com/BrickFrog/kedro-pandas-profiling), by [Justin Malloy](https://github.com/BrickFrog), uses [Pandas Profiling](https://github.com/pandas-profiling/pandas-profiling) to profile datasets in the Kedro catalog
- [find-kedro](https://github.com/WaylonWalker/find-kedro), by [Waylon Walker](https://github.com/WaylonWalker), automatically constructs pipelines using `pytest`-style pattern matching
- [kedro-static-viz](https://github.com/WaylonWalker/kedro-static-viz), by [Waylon Walker](https://github.com/WaylonWalker), generates a static [Kedro-Viz](https://github.com/kedro-org/kedro-viz) site (HTML, CSS, JS)
- [steel-toes](https://github.com/WaylonWalker/steel-toes), by [Waylon Walker](https://github.com/WaylonWalker), prevents stepping on toes by automatically branching data paths
- [kedro-wings](https://github.com/tamsanh/kedro-wings), by [Tam-Sanh Nguyen](https://github.com/tamsanh), simplifies and speeds up pipeline creation by auto-generating catalog datasets
- [kedro-great](https://github.com/tamsanh/kedro-great), by [Tam-Sanh Nguyen](https://github.com/tamsanh), integrates Kedro with [Great Expectations](https://greatexpectations.io), enabling catalog-based expectation generation and data validation on pipeline run
- [Kedro-Accelerator](https://github.com/deepyaman/kedro-accelerator), by [Deepyaman Datta](https://github.com/deepyaman), speeds up pipelines by parallelizing I/O in the background
- [kedro-dataframe-dropin](https://github.com/mzjp2/kedro-dataframe-dropin), by [Zain Patel](https://github.com/mzjp2), lets you swap out pandas datasets for modin or RAPIDs equivalents for specialised use to speed up your workflows (e.g on GPUs)
- [kedro-mlflow](https://github.com/Galileo-Galilei/kedro-mlflow), by [Yolan Honoré-Rougé](https://github.com/galileo-galilei) and [Takieddine Kadiri](https://github.com/takikadiri), facilitates [MLflow](https://www.mlflow.org/) integration within a Kedro project. Its main features are modular configuration, automatic parameters tracking, datasets versioning, Kedro pipelines packaging and serving and automatic synchronization between training and inference pipelines for high reproducibility of machine learning experiments and ease of deployment. A tutorial is provided in the [kedro-mlflow-tutorial repo](https://github.com/Galileo-Galilei/kedro-mlflow-tutorial). You can find more information in the [kedro-mlflow documentation](https://kedro-mlflow.readthedocs.io/en/stable/).
- [Kedro-Neptune](https://github.com/neptune-ai/kedro-neptune), by [Jakub Czakon](https://github.com/jakubczakon) and [Rafał Jankowski](https://github.com/Raalsky), lets you have all the benefits of a nicely organized Kedro pipeline with Neptune: a powerful user interface built for ML metadata management. It lets you browse and filter pipeline executions, compare nodes and pipelines on metrics and parameters, and visualize pipeline metadata like learning curves, node outputs, and charts. For more information, tutorials and videos, go to the [Kedro-Neptune documentation](https://docs.neptune.ai/integrations-and-supported-tools/automation-pipelines/kedro).
- [kedro-dolt](https://www.dolthub.com/blog/2021-06-16-kedro-dolt-plugin/), by [Max Hoffman](https://github.com/max-hoffman) and [Oscar Batori](https://github.com/oscarbatori), allows you to expand the data versioning abilities of data scientists and engineers
- [kedro-kubeflow](https://github.com/getindata/kedro-kubeflow), by [GetInData](https://github.com/getindata), lets you run and schedule pipelines on Kubernetes clusters using [Kubeflow Pipelines](https://www.kubeflow.org/docs/components/pipelines/overview/)
- [kedro-airflow-k8s](https://github.com/getindata/kedro-airflow-k8s), by [GetInData](https://github.com/getindata), enables running a Kedro pipeline with Airflow on a Kubernetes cluster
- [kedro-vertexai](https://github.com/getindata/kedro-vertexai), by [GetInData](https://github.com/getindata), enables running a Kedro pipeline with Vertex AI Pipelines service
- [kedro-azureml](https://github.com/getindata/kedro-azureml), by [GetInData](https://github.com/getindata), enables running a Kedro pipeline with Azure ML Pipelines service
- [kedro-sagemaker](https://github.com/getindata/kedro-sagemaker), by [GetInData](https://github.com/getindata), enables running a Kedro pipeline with Amazon SageMaker service
- [kedro-partitioned](https://github.com/ProjetaAi/kedro-partitioned), by [Gabriel Daiha Alves](https://github.com/gabrieldaiha) and [Nickolas da Rocha Machado](https://github.com/nickolasrm), extends the functionality on processing partitioned data.
- [kedro-auto-catalog](https://github.com/WaylonWalker/kedro-auto-catalog), by [Waylon Walker](https://github.com/WaylonWalker) A configurable replacement for `kedro catalog create` that allows you to create default dataset types other than MemoryDataset.
2 changes: 1 addition & 1 deletion docs/source/hooks/common_use_cases.md
Original file line number Diff line number Diff line change
Expand Up @@ -189,7 +189,7 @@ class AzureSecretsHook:
}
```

Finally, [register the hook](./introduction.md#registering-your-hook-implementations-with-kedro) in your `settings.py` file:
Finally, [register the hook](./introduction.md#registering-the-hook-implementation-with-kedro) in your `settings.py` file:
noklam marked this conversation as resolved.
Show resolved Hide resolved

```python
from my_project.hooks import AzureSecretsHook
Expand Down
8 changes: 4 additions & 4 deletions docs/source/hooks/examples.md
Original file line number Diff line number Diff line change
Expand Up @@ -164,7 +164,7 @@ class DataValidationHooks:
)
```

* Register Hooks implementation, as described in the [hooks documentation](introduction.md#registering-your-hook-implementations-with-kedro) and run Kedro.
* Register Hooks implementation, as described in the [hooks documentation](introduction.md#registering-the-hook-implementation-with-kedro) and run Kedro.

`Great Expectations` example report:

Expand Down Expand Up @@ -298,7 +298,7 @@ class PipelineMonitoringHooks:
self._client.incr("run")
```

* Register Hooks implementation, as described in the [hooks documentation](introduction.md#registering-your-hook-implementations-with-kedro) and run Kedro.
* Register Hooks implementation, as described in the [hooks documentation](introduction.md#registering-the-hook-implementation-with-kedro) and run Kedro.
noklam marked this conversation as resolved.
Show resolved Hide resolved

`Grafana` example page:

Expand Down Expand Up @@ -365,7 +365,7 @@ class ModelTrackingHooks:
mlflow.end_run()
```

* Register Hooks implementation, as described in the [hooks documentation](introduction.md#registering-your-hook-implementations-with-kedro) and run Kedro.
* Register Hooks implementation, as described in the [hooks documentation](introduction.md#registering-the-hook-implementation-with-kedro) and run Kedro.
noklam marked this conversation as resolved.
Show resolved Hide resolved

`MLflow` example page:

Expand Down Expand Up @@ -409,4 +409,4 @@ In the example above, the `before_node_run` hook implementation must return data
```


To apply the changes once you have implemented a new hook, you must register it, as described in the [hooks documentation](introduction.md#registering-your-hook-implementations-with-kedro), and then run Kedro.
To apply the changes once you have implemented a new hook, you must register it, as described in the [hooks documentation](introduction.md#registering-the-hook-implementation-with-kedro), and then run Kedro.
noklam marked this conversation as resolved.
Show resolved Hide resolved
2 changes: 1 addition & 1 deletion docs/source/hooks/index.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Hooks

Hooks are a mechanism to add extra behaviour to Kedro's main execution in an easy and consistent manner. Some examples might include:
Hooks are a mechanism to add extra behaviour to Kedro's main execution in a very easy and consistent manner. Some examples might include:
noklam marked this conversation as resolved.
Show resolved Hide resolved

* Adding a log statement after the data catalog is loaded.
* Adding data validation to the inputs before a node runs, and to the outputs after a node has run. This makes it possible to integrate with other tools like [Great-Expectations](https://docs.greatexpectations.io/en/latest/).
Expand Down
43 changes: 31 additions & 12 deletions docs/source/hooks/introduction.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,9 @@

## Concepts

A Hook consists of a Hook specification, and Hook implementation. To add Hooks to your project, you must:
A Hook consists of a Hook specification, and Hook implementation.

* Create or modify the file `src/<package_name>/hooks.py` to define a Hook implementation for an existing Kedro-defined Hook specification
* Register your Hook implementation in the [`src/<package_name>/settings.py`](../kedro_project_setup/settings.md) file under the `HOOKS` key

### Hook specification
## Hook specifications

Kedro defines Hook specifications for particular execution points where users can inject additional behaviour. Currently, the following Hook specifications are provided in [kedro.framework.hooks](/kedro.framework.hooks):

Expand Down Expand Up @@ -36,18 +33,24 @@ The naming convention for error hooks is `on_<noun>_error`, in which:
[kedro.framework.hooks](/kedro.framework.hooks) lists the full specifications for which you can inject additional behaviours by providing an implementation.


#### CLI hooks
### CLI hooks
noklam marked this conversation as resolved.
Show resolved Hide resolved

Lastly, Kedro defines a small set of CLI hooks that inject additional behaviour around execution of a Kedro CLI command:
Kedro defines a small set of CLI hooks that inject additional behaviour around execution of a Kedro CLI command:

* `before_command_run`
* `after_command_run`

This is what the [`kedro-telemetry` plugin](https://github.com/kedro-org/kedro-plugins/tree/main/kedro-telemetry) relies on under the hood in order to be able to collect CLI usage statistics.

### Hook implementation
## Hook implementation

To add Hooks to your Kedro project, you must:

* Create or modify the file `src/<package_name>/hooks.py` to define a Hook implementation for the particular Hook specification that describes the point at which you want to inject additional behaviour
* Register that Hook implementation in the [`src/<package_name>/settings.py`](../kedro_project_setup/settings.md) file under the `HOOKS` key

You should provide an implementation for the specification that describes the point at which you want to inject additional behaviour. The Hook implementation should have the same name as the specification. The Hook must provide a concrete implementation with a subset of the corresponding specification's parameters (you do not need to use them all).
### Define the Hook implementation
The Hook implementation should have the same name as the specification. The Hook must provide a concrete implementation with a subset of the corresponding specification's parameters (you do not need to use them all).

To declare a Hook implementation, use the `@hook_impl` decorator.

Expand Down Expand Up @@ -92,13 +95,13 @@ The name of a module that contains Hooks implementation is arbitrary and is not

We recommend that you group related Hook implementations under a namespace, preferably a class, within a `hooks.py` file that you create in your project.

#### Registering your Hook implementations with Kedro
### Registering the Hook implementation with Kedro

Hook implementations should be registered with Kedro using the [`src/<package_name>/settings.py`](../kedro_project_setup/settings.md) file under the `HOOKS` key.

You can register more than one implementation for the same specification. They will be called in LIFO (last-in, first-out) order.

The following example sets up a Hook so that the `after_data_catalog_created` implementation is called every time after a data catalog is created.
The following example sets up a Hook so that the `after_data_catalog_created` implementation is called, every time, after a data catalog is created.

```python
# src/<package_name>/settings.py
Expand All @@ -113,6 +116,9 @@ Kedro also has auto-discovery enabled by default. This means that any installed
Auto-discovered Hooks will run *first*, followed by the ones specified in `settings.py`.
```

#### Auto-registered Hook with plugin
You can auto-register a Hook (pip-installable) by creating a [Kedro plugin](https://docs.kedro.org/en/stable/extend_kedro/plugins.html#hooks). Kedro provides `kedro.hooks` entrypoints to extend this easily.


#### Disable auto-registered plugins' Hooks

Expand All @@ -126,6 +132,19 @@ DISABLE_HOOKS_FOR_PLUGINS = ("<plugin_name>",)

where `<plugin_name>` is the name of an installed plugin for which the auto-registered Hooks must be disabled.

## Hook execution order
Hooks follow a Last-In-First-Out (LIFO) order, which means the first registered Hook will be executed last.

Hooks are registered in the following order:

1. Project Hooks in `settings.py` - If you have `HOOKS = (hook_a, hook_b,)`, `hook_b` will be executed before `hook_a`
2. Plugin Hooks registered in `kedro.hooks`, which follows alphabetical order

In general, Hook execution order is not guaranteed and you should not rely on it. If you need to make sure a particular Hook is executed first or last, you can use the the [`tryfirst` or `trylast` argument](https://pluggy.readthedocs.io/en/stable/index.html#call-time-order) for `hook_impl`.

## Under the hood

Under the hood, we use [pytest's pluggy](https://pluggy.readthedocs.io/en/latest/) to implement Kedro's Hook mechanism. We recommend reading their documentation if you have more questions about the underlying implementation.
Under the hood, we use [pytest's pluggy](https://pluggy.readthedocs.io/en/latest/) to implement Kedro's Hook mechanism. We recommend reading their documentation to find out more about the underlying implementation.

### Plugin Hooks
Plugin Hooks are registered using [`importlib_metadata`'s `EntryPoints` API](https://docs.python.org/3/library/importlib.metadata.html).
1 change: 1 addition & 0 deletions kedro/framework/hooks/manager.py
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,7 @@ def _register_hooks_setuptools(
plugin_names = set()
disabled_plugin_names = set()
for plugin, dist in plugininfo:
print("DEBUG!!!", plugin, dist)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing you only added this for debugging? 😄

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! 😅 I was working between different branches must have left this accidentally.

if dist.project_name in disabled_plugins:
# `unregister()` is used instead of `set_blocked()` because
# we want to disable hooks for specific plugin based on project
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -27,5 +27,7 @@ docs = [
"myst-parser~=0.17.2",
]

[project.entry-points."kedro.hooks"]

[tool.setuptools.dynamic]
dependencies = {file = "requirements.txt"}
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
# Instantiated project hooks.
# For example, after creating a hooks.py and defining a ProjectHooks class there, do
# from {{cookiecutter.python_package}}.hooks import ProjectHooks
# Hooks are executed in a Last-In-First-Out (LIFO) order.
# HOOKS = (ProjectHooks(),)

# Installed plugins for which to disable hook auto-registration.
Expand Down
Loading