You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was wondering if it is possible to support the SageMaker multi-model deployment using the Triton ensemble of Merlin models.
SageMaker already supports multilpe hosting modes for Model deployment with Triton Inference Server, including the Multi-model endpoints with ensemble hosting mode. I tried to use that hosting mode with the Triton ensembles of Merlin models, but according to the last update of the Merlin SageMaker example implementation #1040, the --model-control-mode=explicit control mode (required by multiple models hosting for dynamic model loading) was removed.
I hypothesize that the cause of this incompatibility is due to the generated Merlin executor_model is not a proper Triton ensemble (since its config.pbtxt file doesn't have the correct platform platform: "ensemble", neither the required ensemble_scheduling: {...} section), but just another Triton model that executes the 0_transformworkflowtriton and 1_predictpytorchtriton steps internally. Therefore, the executor_model it's not automatically recognized as the ensemble of the 0_transformworkflowtriton and 1_predictpytorchtriton models to be executed.
EDIT: I realized that in merlin-systems PR#255 the Triton ensemble runtime was deprecated and changed to the current executor model. It is possible to support the option of exporting the recommender system artifacts as a Triton ensemble, at least for Transformers4rec systems deployment?
The text was updated successfully, but these errors were encountered:
mvidela31
changed the title
[FEA] Support SageMaker multi-model deployment
[FEA] Support Triton ensemble runtime for SageMaker multi-model deployment
Dec 27, 2024
🚀 Feature request
I was wondering if it is possible to support the SageMaker multi-model deployment using the Triton ensemble of Merlin models.
SageMaker already supports multilpe hosting modes for Model deployment with Triton Inference Server, including the Multi-model endpoints with ensemble hosting mode. I tried to use that hosting mode with the Triton ensembles of Merlin models, but according to the last update of the Merlin SageMaker example implementation #1040, the
--model-control-mode=explicit
control mode (required by multiple models hosting for dynamic model loading) was removed.I hypothesize that the cause of this incompatibility is due to the generated Merlin
executor_model
is not a proper Triton ensemble (since itsconfig.pbtxt
file doesn't have the correct platformplatform: "ensemble"
, neither the requiredensemble_scheduling: {...}
section), but just another Triton model that executes the0_transformworkflowtriton
and1_predictpytorchtriton
steps internally. Therefore, theexecutor_model
it's not automatically recognized as the ensemble of the0_transformworkflowtriton
and1_predictpytorchtriton
models to be executed.EDIT: I realized that in merlin-systems PR#255 the Triton ensemble runtime was deprecated and changed to the current executor model. It is possible to support the option of exporting the recommender system artifacts as a Triton ensemble, at least for Transformers4rec systems deployment?
The text was updated successfully, but these errors were encountered: