Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QST] Support Triton ensemble runtime for SageMaker multi-model deployment #393

Open
mvidela31 opened this issue Dec 27, 2024 · 1 comment

Comments

@mvidela31
Copy link

❓ Questions & Help

I was wondering if it is possible to support the SageMaker multi-model deployment using the Triton ensemble of Merlin models.

SageMaker already supports multilpe hosting modes for Model deployment with Triton Inference Server, including the Multi-model endpoints with ensemble hosting mode. I tried to use that hosting mode with the Triton ensembles of Merlin models, but according to the last update of the Merlin SageMaker example implementation #1040 the --model-control-mode=explicit control mode (required by multiple models hosting for dynamic model loading) was removed.

I hypothesize that the cause of this incompatibility is due to the generated Merlin executor_model is not a proper Triton ensemble (since its config.pbtxt file doesn't have the correct platform platform: "ensemble", neither the required ensemble_scheduling: {...} section), but just another Triton model that executes the 0_transformworkflowtriton and 1_predictpytorchtriton steps internally. Therefore, the executor_model it's not automatically recognized as the ensemble of the 0_transformworkflowtriton and 1_predictpytorchtriton models to be executed.

EDIT: I realized that in #255 the Triton ensemble runtime was deprecated and changed to the current executor model. It is possible to support the option of exporting the recommender system artifacts as a Triton ensemble, at least for Transformers4rec systems deployment?

@mvidela31
Copy link
Author

I verified that writing the Triton ensemble with its config.pbtxt file allows to deploy the Transformers4rec systems in multi-model hosting mode using SageMaker.

For anyone facing the same problem, I share below the auxiliary functions that I used to make the Triton ensemble using the outputs of the merlin-systems export method:

import os

from nvtabular.workflow import Workflow
from transformers4rec import torch as tr


def export_t4rec_triton_ensemble(
    data_workflow: Workflow,
    model: tr.Model,
    executor_model_path: str,
    output_path: str = ".",
) -> None:
    """
    Export a Triton ensemble with the `NVtabular` data processing step and the \
    `Transformer4rec` model inference step.

    Parameters
    ----------
    data_workflow: nvtabular.workflow.Workflow
        Data processing workflow.
    model: transformers4rec.torch.Model
        Recommender model.
    executor_model_path: str
        Exported `Transformers4rec` execution model directory.
    output_path: str
        Output path to save the generated Triton ensemble files.
    """
    ensemble_cfg = get_t4rec_triton_ensemble_config(
        data_workflow=data_workflow,
        model=model,
        executor_model_config_path=os.path.join(executor_model_path, "config.pbtxt"),
    )
    os.makedirs(os.path.join(output_path, "ensemble_model"), exist_ok=True)
    os.makedirs(os.path.join(output_path, "ensemble_model", "1"), exist_ok=True)
    with open(os.path.join(output_path, "ensemble_model", "config.pbtxt"), "w") as f:
        f.write(ensemble_cfg)
    return


def get_t4rec_triton_ensemble_config(
    data_workflow: Workflow,
    model: tr.Model,
    executor_model_config_path: str,
) -> str:
    """
    Generates a `config.pbtxt` file for the `Transformers4rec` Triton ensemble.

    Parameters
    ----------
    data_workflow: nvtabular.workflow.Workflow
        Data processing workflow.
    model: transformers4rec.torch.Model
        Recommender model.
    executor_model_config_path: str
        `config.pbtxt` file path of the exported `Transformers4rec` execution model.

    Returns
    -------
    str
        Triton ensemble `config.pbtxt` file contents.
    """
    cfg = 'name: "ensemble_model"\nplatform: "ensemble"\n'
    # Ensemble input/outputs
    with open(executor_model_config_path, "r") as f:
        executor_cfg = f.read()
    cfg += "\n".join(executor_cfg.split("\n")[2:-4]) + "\n"
    # Ensemble scheduling
    cfg += "ensemble_scheduling {\n\tstep [\n"
    ## 0_transformworkflowtriton model step
    cfg += "\t\t{\n"
    cfg += '\t\t\tmodel_name: "%s"\n\t\t\tmodel_version: %s\n' % (
        "0_transformworkflowtriton",
        "-1",
    )
    for col in data_workflow.input_schema.column_names:
        cfg += '\t\t\tinput_map {\n\t\t\t\tkey: "%s"\n\t\t\t\tvalue: "%s"\n\t\t\t}\n\t\t\t\n' % (
            col,
            col,
        )
    for col in model.input_schema.column_names:
        cfg += (
            '\t\t\toutput_map {\n\t\t\t\tkey: "%s__values"\n\t\t\t\tvalue: "%s__values"\n\t\t\t\
        }\n\t\t\toutput_map {\n\t\t\t\tkey: "%s__offsets"\n\t\t\t\tvalue: "%s__offsets"\n\t\t\t}\
        \n'
            % (col, col, col, col)
        )
    cfg += "\t\t},\n"
    ## 1_predictpytorchtriton model step
    cfg += "\t\t{\n"
    cfg += '\t\t\tmodel_name: "%s"\n\t\t\tmodel_version: %s\n' % ("1_predictpytorchtriton", "-1")
    for col in model.input_schema.column_names:
        cfg += (
            '\t\t\tinput_map {\n\t\t\t\tkey: "%s__values"\n\t\t\t\tvalue: "%s__values"\n\t\t\t}\
        \n\t\t\tinput_map {\n\t\t\t\tkey: "%s__offsets"\n\t\t\t\tvalue: "%s__offsets"\n\t\t\t}\n'
            % (col, col, col, col)
        )
    for col in model.output_schema.column_names:
        cfg += '\t\t\toutput_map {\n\t\t\t\tkey: "%s"\n\t\t\t\tvalue: "%s"\n\t\t\t}\n' % (
            col,
            col,
        )
    cfg += "\t\t}\n"
    cfg += "\t]\n}"
    cfg = cfg.replace("\t", "  ")
    return cfg

Nevertheless, it would be awesome to support exporting the Triton ensemble artifacts via the merlin-systems library.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant