Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add dynamic artifacts naming, documentation and tests #3201

Open
wants to merge 17 commits into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 15 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion .gitbook.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,8 @@ redirects:
how-to/build-pipelines/schedule-a-pipeline: how-to/pipeline-development/build-pipelines/schedule-a-pipeline.md
how-to/build-pipelines/delete-a-pipeline: how-to/pipeline-development/build-pipelines/delete-a-pipeline.md
how-to/build-pipelines/compose-pipelines: how-to/pipeline-development/build-pipelines/compose-pipelines.md
how-to/build-pipelines/dynamically-assign-artifact-names: how-to/pipeline-development/build-pipelines/dynamically-assign-artifact-names.md
how-to/build-pipelines/dynamically-assign-artifact-names: how-to/data-artifact-management/handle-data-artifacts/artifacts-naming.md
how-to/pipeline-development/build-pipelines/dynamically-assign-artifact-names: how-to/data-artifact-management/handle-data-artifacts/artifacts-naming.md
how-to/build-pipelines/retry-steps: how-to/pipeline-development/build-pipelines/retry-steps.md
how-to/build-pipelines/run-pipelines-asynchronously: how-to/pipeline-development/build-pipelines/run-pipelines-asynchronously.md
how-to/build-pipelines/control-execution-order-of-steps: how-to/pipeline-development/build-pipelines/control-execution-order-of-steps.md
Expand Down
avishniakov marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
@@ -0,0 +1,146 @@
---
description: Understand how you can name your ZenML artifacts.
---

# How Artifact Naming works in ZenML
avishniakov marked this conversation as resolved.
Show resolved Hide resolved

In ZenML pipelines, you often need to reuse the same step multiple times with different inputs, resulting in multiple artifacts. However, the default naming convention for artifacts can make it challenging to track and differentiate between these outputs, especially when they need to be used in subsequent pipelines. Below you can find a detailed exploration of how you might name your output artifacts dynamically or statically, depending on your needs.

ZenML uses type annotations in function definitions to determine artifact names. Output artifacts with the same name are saved with incremented version numbers.

ZenML provides flexible options for naming output artifacts, supporting both static and dynamic naming strategies:
- Names can be generated dynamically at runtime
- Support for string templates (standard and custom placeholders supported)
- Compatible with single and multiple output scenarios
- Annotations help define naming strategy without modifying core logic

## Naming Strategies

### Static Naming
Static names are defined directly as string literals.

```python
@step
def static_single() -> Annotated[str, "static_output_name"]:
return "null"
```

### Dynamic Naming
Dynamic names can be generated using:

#### String Templates Using Standard Placeholders
Use the following placeholders that ZenML will replace automatically:

* `{date}` will resolve to the current date, e.g. `2024_11_18`
* `{time}` will resolve to the current time, e.g. `11_07_09_326492`

```python
str_namer = "placeholder_name_{date}_{time}"

@step
def dynamic_single_string() -> Annotated[str, str_namer]:
return "null"
```

#### String Templates Using Custom Placeholders
Use any placeholders that ZenML will replace for you, if they are provided into a step via `name_subs` parameter:

```python
str_namer = "placeholder_name_{custom_placeholder}_{time}"

@step(name_subs={"custom_placeholder": "some_substitute"})
def dynamic_single_string() -> Annotated[str, str_namer]:
return "null"
```

Another option is to use `with_options` to dynamically redefine the placeholder, like this:

```python
str_namer = "{stage}_dataset"

@step
def extract_data(source: str) -> Annotated[str, str_namer]:
...
return "my data"

@pipeline
def extraction_pipeline():
extract_data.with_options(name_subs={"stage": "train"})(source="s3://train")
extract_data.with_options(name_subs={"stage": "test"})(source="s3://test")
```

### Multiple Output Handling

If you plan to return multiple artifacts from you ZenML step you can flexibly combine all naming options outlined above, like this:

```python
@step
def mixed_tuple() -> Tuple[
Annotated[str, "static_output_name"],
Annotated[str, "placeholder_name_{date}_{time}"],
]:
return "static_namer", "str_namer"
```

## Naming in cached runs

If your ZenML step is running with enabled caching and cache was used the names of the outputs artifacts (both static and dynamic) will remain the same as in the original run.

```python
from typing_extensions import Annotated
from typing import Tuple

from zenml import step, pipeline
from zenml.models import PipelineRunResponse


@step(name_subs={"custom_placeholder": "resolution"})
def demo() -> Tuple[
Annotated[int, "dummy_{date}_{time}"],
Annotated[int, "dummy_{custom_placeholder}"],
]:
return 42, 43


@pipeline
def my_pipeline():
demo()


if __name__ == "__main__":
run_without_cache: PipelineRunResponse = my_pipeline.with_options(
enable_cache=False
)()
run_with_cache: PipelineRunResponse = my_pipeline.with_options(enable_cache=True)()

assert set(run_without_cache.steps["demo"].outputs.keys()) == set(
run_with_cache.steps["demo"].outputs.keys()
)
print(list(run_without_cache.steps["demo"].outputs.keys()))
```

These 2 runs will produce output like the one below:
```
Initiating a new run for the pipeline: my_pipeline.
Caching is disabled by default for my_pipeline.
Using user: default
Using stack: default
orchestrator: default
artifact_store: default
You can visualize your pipeline runs in the ZenML Dashboard. In order to try it locally, please run zenml login --local.
Step demo has started.
Step demo has finished in 0.038s.
Pipeline run has finished in 0.064s.
Initiating a new run for the pipeline: my_pipeline.
Using user: default
Using stack: default
orchestrator: default
artifact_store: default
You can visualize your pipeline runs in the ZenML Dashboard. In order to try it locally, please run zenml login --local.
Using cached version of step demo.
All steps of the pipeline run were cached.
['dummy_2024_11_21_14_27_33_750134', 'dummy_resolution']
```

<!-- For scarf -->
<figure><img alt="ZenML Scarf" referrerpolicy="no-referrer-when-downgrade" src="https://static.scarf.sh/a.png?x-pxid=f0b4f458-0a54-4fcd-aa95-d5ee424815bc" /></figure>

This file was deleted.

2 changes: 1 addition & 1 deletion docs/book/toc.md
Original file line number Diff line number Diff line change
Expand Up @@ -94,7 +94,6 @@
* [Schedule a pipeline](how-to/pipeline-development/build-pipelines/schedule-a-pipeline.md)
* [Deleting a pipeline](how-to/pipeline-development/build-pipelines/delete-a-pipeline.md)
* [Compose pipelines](how-to/pipeline-development/build-pipelines/compose-pipelines.md)
* [Dynamically assign artifact names](how-to/pipeline-development/build-pipelines/dynamically-assign-artifact-names.md)
* [Automatically retry steps](how-to/pipeline-development/build-pipelines/retry-steps.md)
* [Run pipelines asynchronously](how-to/pipeline-development/build-pipelines/run-pipelines-asynchronously.md)
* [Control execution order of steps](how-to/pipeline-development/build-pipelines/control-execution-order-of-steps.md)
Expand Down Expand Up @@ -123,6 +122,7 @@
* [How ZenML stores data](how-to/data-artifact-management/handle-data-artifacts/artifact-versioning.md)
* [Return multiple outputs from a step](how-to/data-artifact-management/handle-data-artifacts/return-multiple-outputs-from-a-step.md)
* [Delete an artifact](how-to/data-artifact-management/handle-data-artifacts/delete-an-artifact.md)
* [Artifacts naming](how-to/data-artifact-management/handle-data-artifacts/artifacts-naming.md)
* [Organize data with tags](how-to/data-artifact-management/handle-data-artifacts/tagging.md)
* [Get arbitrary artifacts in a step](how-to/data-artifact-management/handle-data-artifacts/get-arbitrary-artifacts-in-a-step.md)
* [Handle custom data types](how-to/data-artifact-management/handle-data-artifacts/handle-custom-data-types.md)
Expand Down
32 changes: 31 additions & 1 deletion src/zenml/artifacts/artifact_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
# permissions and limitations under the License.
"""Artifact Config classes to support Model Control Plane feature."""

import re
from typing import Any, Dict, List, Optional, Union

from pydantic import BaseModel, Field, model_validator
Expand All @@ -21,6 +22,7 @@
from zenml.logger import get_logger
from zenml.metadata.metadata_types import MetadataType
from zenml.utils.pydantic_utils import before_validator_handler
from zenml.utils.string_utils import format_name_template

logger = get_logger(__name__)

Expand All @@ -45,7 +47,10 @@ def my_step() -> Annotated[
```

Attributes:
name: The name of the artifact.
name: The name of the artifact:
- static string e.g. "name"
- dynamic callable e.g. lambda: "name"+str(42)
avishniakov marked this conversation as resolved.
Show resolved Hide resolved
- dynamic string e.g. "name_{date}_{time}"
version: The version of the artifact.
tags: The tags of the artifact.
run_metadata: Metadata to add to the artifact.
Expand Down Expand Up @@ -111,3 +116,28 @@ def _remove_old_attributes(cls, data: Dict[str, Any]) -> Dict[str, Any]:
data.setdefault("artifact_type", ArtifactType.SERVICE)

return data

def _evaluated_name(self, name_subs: Dict[str, str]) -> Optional[str]:
"""Evaluated name of the artifact.

Args:
name_subs: Extra placeholders to use in the name template.

Returns:
The evaluated name of the artifact.
"""
if self.name:
return format_name_template(self.name, **name_subs)
return self.name

@property
def _original_name(self) -> Optional[str]:
"""Original name of the dynamic artifact.

Returns:
The original name of the dynamic artifact.
"""
pattern = r"\{[^}]+\}"
if re.findall(pattern, str(self.name)):
return self.name
return None
Loading
Loading