Skip to content

Commit

Permalink
Merge pull request #7 from getindata/release-0.2.0
Browse files Browse the repository at this point in the history
Release 0.2.0
  • Loading branch information
szczeles authored Feb 8, 2023
2 parents 62513b7 + c310093 commit d4b7876
Show file tree
Hide file tree
Showing 25 changed files with 961 additions and 72 deletions.
2 changes: 1 addition & 1 deletion .bumpversion.cfg
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
[bumpversion]
current_version = 0.1.1
current_version = 0.2.0

[bumpversion:file:pyproject.toml]

Expand Down
2 changes: 1 addition & 1 deletion .copier-answers.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ description: Kedro plugin with AWS SageMaker Pipelines support
docs_url: https://kedro-sagemaker.readthedocs.io/
full_name: Kedro SageMaker Pipelines plugin
github_url: https://github.com/getindata/kedro-sagemaker
initial_version: 0.1.1
initial_version: 0.2.0
keywords:
- kedro
- sagemaker
Expand Down
10 changes: 1 addition & 9 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -164,13 +164,5 @@ credentials.json
/mlruns/
/ga4/

# terraform
terraform/terraform.tfstate.backup
terraform/.terraform.lock.hcl
terraform/.terraform/providers/registry.terraform.io/hashicorp/google-beta/4.21.0/darwin_amd64/terraform-provider-google-beta_v4.21.0_x5
terraform/.terraform/providers/registry.terraform.io/hashicorp/google/4.21.0/darwin_amd64/terraform-provider-google_v4.21.0_x5
terraform/terraform.tfstate

.idea
conf/azure/credentials.yml

tests/mlruns
2 changes: 1 addition & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
repos:
- repo: https://github.com/pycqa/isort
rev: 5.10.1
rev: 5.12.0
hooks:
- id: isort
args: ["--profile", "black", "--line-length=79"]
Expand Down
9 changes: 8 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,11 @@

## [Unreleased]

## [0.2.0] - 2023-02-08

- Support for Mlflow with shared run across pipeline steps
- Fixed ability to overwrite docker image in `kedro sagemaker run`

## [0.1.1] - 2022-12-30

- Pass missing environment to the internal entrypoint
Expand All @@ -15,7 +20,9 @@

- Project seed prepared

[Unreleased]: https://github.com/getindata/kedro-sagemaker/compare/0.1.1...HEAD
[Unreleased]: https://github.com/getindata/kedro-sagemaker/compare/0.2.0...HEAD

[0.2.0]: https://github.com/getindata/kedro-sagemaker/compare/0.1.1...0.2.0

[0.1.1]: https://github.com/getindata/kedro-sagemaker/compare/0.1.0...0.1.1

Expand Down
Binary file added docs/images/pipeline_with_mlflow.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion docs/source/02_installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
First, you need to install base Kedro package

```console
$ pip install ">=0.18.3,<0.19"
$ pip install "kedro>=0.18.3,<0.19"
```

## Plugin installation
Expand Down
6 changes: 5 additions & 1 deletion docs/source/03_quickstart.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ Before you start, make sure that you have the following:

- AWS CLI installed
- AWS SageMaker domain
- SageMaker Execution role ARN (in a form `arn:aws:iam::<ID>:role/service-role/AmazonSageMaker-ExecutionRole-<NUMBERS>`)
- SageMaker Execution role ARN (in a form `arn:aws:iam::<ID>:role/service-role/AmazonSageMaker-ExecutionRole-<NUMBERS>`). If you don't have one, follow the [official AWS docs](https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-roles.html#sagemaker-roles-create-execution-role).
- S3 bucket that the above role has R/W access
- Docker installed
- Amazon Elastic Container Registry (`Amazon ECR <https://aws.amazon.com/ecr/>`__) repository created that the above role has read access and you have write access
Expand Down Expand Up @@ -100,6 +100,10 @@ Finally, you will see similar logs in your terminal:
|Kedro SageMaker Pipelines execution|

Additionally, if you have (`kedro-mlflow <https://kedro-mlflow.readthedocs.io/en/stable/>`__) plugin installed, an additional node called `start-mlflow-run` will appear on execution graph. It's job is to log the SageMaker's Pipeline Execution ARN (so you can link runs with mlflow with runs in SageMaker) and make sure that all nodes use common Mlflow run.

|Kedro SageMaker Pipeline with Mlflow|

.. |Kedro SageMaker Pipelines execution| image:: ../images/sagemaker_running_pipeline.gif

.. |Kedro SageMaker Pipeline with Mlflow| image:: ../images/pipeline_with_mlflow.gif
2 changes: 1 addition & 1 deletion kedro_sagemaker/__init__.py
Original file line number Diff line number Diff line change
@@ -1 +1 @@
__version__ = "0.1.1"
__version__ = "0.2.0"
42 changes: 39 additions & 3 deletions kedro_sagemaker/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@
from kedro_sagemaker.cli_functions import (
docker_autobuild,
get_context_and_pipeline,
lookup_mlflow_run_id,
parse_extra_params,
write_file_and_confirm_overwrite,
)
Expand All @@ -20,14 +21,17 @@
from kedro_sagemaker.constants import (
KEDRO_SAGEMAKER_ARGS,
KEDRO_SAGEMAKER_DEBUG,
KEDRO_SAGEMAKER_EXECUTION_ARN,
KEDRO_SAGEMAKER_S3_TEMP_DIR_NAME,
KEDRO_SAGEMAKER_WORKING_DIRECTORY,
MLFLOW_TAG_EXECUTION_ARN,
)
from kedro_sagemaker.docker import DOCKERFILE_TEMPLATE, DOCKERIGNORE_TEMPLATE
from kedro_sagemaker.runner import SageMakerPipelinesRunner
from kedro_sagemaker.utils import (
CliContext,
KedroContextManager,
is_mlflow_enabled,
parse_flat_parameters,
)

Expand Down Expand Up @@ -196,9 +200,11 @@ def run(
)

is_ok = client.run(
local,
wait_for_completion,
lambda p: click.echo(f"Pipeline ARN: {p.describe()['PipelineArn']}"),
is_local=local,
wait_for_completion=wait_for_completion,
on_pipeline_started=lambda p: click.echo(
f"Pipeline ARN: {p.describe()['PipelineArn']}"
),
)

if is_ok:
Expand Down Expand Up @@ -341,5 +347,35 @@ def execute(ctx: CliContext, pipeline: str, node: str, params: str):
with KedroContextManager(
ctx.metadata.package_name, env=ctx.env, extra_params=parameters
) as mgr:
if is_mlflow_enabled():
env_key, env_value = lookup_mlflow_run_id(
mgr.context, os.getenv(KEDRO_SAGEMAKER_EXECUTION_ARN)
)
if env_value is not None:
click.echo(f"Mlflow run id: {env_value}")
os.environ[env_key] = env_value

runner = SageMakerPipelinesRunner()
mgr.session.run(pipeline, node_names=[node], runner=runner)


@sagemaker_group.command(hidden=True)
@click.pass_obj
def mlflow_start(ctx: CliContext):
"""
Registers new mlflow run with Sagemaker Execution ARN inside the tags
"""
import mlflow
from kedro_mlflow.config.kedro_mlflow_config import KedroMlflowConfig

with KedroContextManager(ctx.metadata.package_name, env=ctx.env) as mgr:
mlflow_conf: KedroMlflowConfig = mgr.context.mlflow

run = mlflow.start_run(
experiment_id=mlflow.get_experiment_by_name(
mlflow_conf.tracking.experiment.name
).experiment_id,
nested=False,
)
mlflow.set_tag(MLFLOW_TAG_EXECUTION_ARN, os.environ[KEDRO_SAGEMAKER_EXECUTION_ARN])
click.echo(f"Started run: {run.info.run_id}")
28 changes: 28 additions & 0 deletions kedro_sagemaker/cli_functions.py
Original file line number Diff line number Diff line change
@@ -1,11 +1,14 @@
import importlib
import json
import logging
from contextlib import contextmanager
from pathlib import Path
from typing import Callable, Iterator, Optional, Tuple

import click
from sagemaker.workflow.pipeline import Pipeline as SageMakerPipeline

from kedro_sagemaker.constants import MLFLOW_TAG_EXECUTION_ARN
from kedro_sagemaker.generator import KedroSageMakerGenerator
from kedro_sagemaker.utils import (
CliContext,
Expand All @@ -14,6 +17,8 @@
docker_push,
)

logger = logging.getLogger()


def parse_extra_params(params, silent=False):
if params and (parameters := json.loads(params.strip("'"))):
Expand Down Expand Up @@ -94,3 +99,26 @@ def write_file_and_confirm_overwrite(
filepath.write_text(contents)
elif on_denied_overwrite:
on_denied_overwrite(filepath)


def lookup_mlflow_run_id(context, sagemaker_execution_arn: str):
import mlflow
from kedro_mlflow.config.kedro_mlflow_config import KedroMlflowConfig

mlflow_conf: KedroMlflowConfig = context.mlflow
mlflow_runs = mlflow.search_runs(
experiment_names=[mlflow_conf.tracking.experiment.name],
filter_string=f'tags.`{MLFLOW_TAG_EXECUTION_ARN}` = "{sagemaker_execution_arn}"',
max_results=1,
output_format="list",
)
importlib.reload(mlflow.tracking.request_header.registry)

if len(mlflow_runs) == 0:
logger.warning(
"Unable to find parent mlflow run id for the current execution (%s)",
sagemaker_execution_arn,
)
return mlflow.tracking._RUN_ID_ENV_VAR, None

return mlflow.tracking._RUN_ID_ENV_VAR, mlflow_runs[0].info.run_id
2 changes: 2 additions & 0 deletions kedro_sagemaker/constants.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,8 @@
KEDRO_SAGEMAKER_DEBUG = f"{KEDRO_SAGEMAKER}_DEBUG"
KEDRO_SAGEMAKER_WORKING_DIRECTORY = f"{KEDRO_SAGEMAKER}_WD"
KEDRO_SAGEMAKER_PARAMETERS = f"{KEDRO_SAGEMAKER}_PARAMETERS"
KEDRO_SAGEMAKER_EXECUTION_ARN = f"{KEDRO_SAGEMAKER}_EXECUTION_ARN"
KEDRO_SAGEMAKER_PARAM_KEY_PREFIX = f"{KEDRO_SAGEMAKER}_PARAM_KEY_"
KEDRO_SAGEMAKER_PARAM_VALUE_PREFIX = f"{KEDRO_SAGEMAKER}_PARAM_VALUE_"
KEDRO_SAGEMAKER_S3_TEMP_DIR_NAME = "kedro-sagemaker-tmp"
MLFLOW_TAG_EXECUTION_ARN = "sagemaker_execution_arn"
Loading

0 comments on commit d4b7876

Please sign in to comment.