Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(WIP) PyTorch back-end operator #151

Closed
wants to merge 7 commits into from
Closed

(WIP) PyTorch back-end operator #151

wants to merge 7 commits into from

Conversation

karlhigley
Copy link
Contributor

No description provided.

@karlhigley karlhigley requested a review from nv-alaiacano July 27, 2022 17:46
@nvidia-merlin-bot
Copy link

Click to view CI Results
GitHub pull request #151 of commit e56024a48b504d27710428f20259694b18eed702, no merge conflicts.
Running as SYSTEM
Setting status of e56024a48b504d27710428f20259694b18eed702 to PENDING with url https://10.20.13.93:8080/job/merlin_systems/156/console and message: 'Pending'
Using context: Jenkins
Building on master in workspace /var/jenkins_home/workspace/merlin_systems
using credential fce1c729-5d7c-48e8-90cb-b0c314b1076e
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/systems # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/systems
 > git --version # timeout=10
using GIT_ASKPASS to set credentials login for merlin-systems user + githubtoken
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/systems +refs/pull/151/*:refs/remotes/origin/pr/151/* # timeout=10
 > git rev-parse e56024a48b504d27710428f20259694b18eed702^{commit} # timeout=10
Checking out Revision e56024a48b504d27710428f20259694b18eed702 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f e56024a48b504d27710428f20259694b18eed702 # timeout=10
Commit message: "(WIP) PyTorch back-end operator"
 > git rev-list --no-walk 25a9acc37ce9bd422753ab87b61b754dd1b953b5 # timeout=10
[merlin_systems] $ /bin/bash /tmp/jenkins2057429621542053081.sh
PYTHONPATH=:/usr/local/lib/python3.8/dist-packages/:/usr/local/hugectr/lib:/var/jenkins_home/workspace/merlin_systems/systems
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/merlin_systems/systems, configfile: pyproject.toml
plugins: anyio-3.6.1, xdist-2.5.0, forked-1.4.0, cov-3.0.0
collected 52 items

tests/unit/test_version.py . [ 1%]
tests/unit/examples/test_serving_ranking_models_with_merlin_systems.py . [ 3%]
[ 3%]
tests/unit/systems/test_ensemble.py .... [ 11%]
tests/unit/systems/test_ensemble_ops.py .. [ 15%]
tests/unit/systems/test_export.py . [ 17%]
tests/unit/systems/test_graph.py . [ 19%]
tests/unit/systems/test_inference_ops.py ... [ 25%]
tests/unit/systems/test_model_registry.py . [ 26%]
tests/unit/systems/test_op_runner.py .... [ 34%]
tests/unit/systems/fil/test_fil.py .......................... [ 84%]
tests/unit/systems/fil/test_forest.py ... [ 90%]
tests/unit/systems/tf/test_tf_op.py ... [ 96%]
tests/unit/systems/torch/test_torch.py

PDB set_trace (IO-capturing turned off) >>>>>>>>>>>>>>>>>>>>
/var/jenkins_home/workspace/merlin_systems/systems/tests/unit/systems/torch/test_torch.py(64)test_pytorch_op_exports_own_config()
-> export_path = pathlib.Path(tmpdir) / triton_op.export_name
(Pdb)
FF [100%]

=================================== FAILURES ===================================
______________________ test_pytorch_op_exports_own_config ______________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-13/test_pytorch_op_exports_own_co0')

def test_pytorch_op_exports_own_config(tmpdir):
    triton_op = ptorch_op.PredictPyTorch(model, model_input_schema, model_output_schema)

    triton_op.export(tmpdir, None, None)

    import pdb

    pdb.set_trace()

    # Export creates directory
  export_path = pathlib.Path(tmpdir) / triton_op.export_name

tests/unit/systems/torch/test_torch.py:64:


tests/unit/systems/torch/test_torch.py:64: in test_pytorch_op_exports_own_config
export_path = pathlib.Path(tmpdir) / triton_op.export_name
/usr/lib/python3.8/bdb.py:88: in trace_dispatch
return self.dispatch_line(frame)


self = <_pytest.debugging.pytestPDB._get_pdb_wrapper_class..PytestPdbWrapper object at 0x7fb1171a3460>
frame = <frame at 0x63b2af50, file '/var/jenkins_home/workspace/merlin_systems/systems/tests/unit/systems/torch/test_torch.py', line 64, code test_pytorch_op_exports_own_config>

def dispatch_line(self, frame):
    """Invoke user function and return trace function for line event.

    If the debugger stops on the current line, invoke
    self.user_line(). Raise BdbQuit if self.quitting is set.
    Return self.trace_dispatch to continue tracing in this scope.
    """
    if self.stop_here(frame) or self.break_here(frame):
        self.user_line(frame)
      if self.quitting: raise BdbQuit

E bdb.BdbQuit

/usr/lib/python3.8/bdb.py:113: BdbQuit
__________________________________ test_torch __________________________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-13/test_torch0')

def test_torch(tmpdir):
    model_repository = Path(tmpdir)

    model_dir = model_repository / model_name
    model_version_dir = model_dir / "1"
    model_version_dir.mkdir(parents=True, exist_ok=True)

    # Write config out
    config_path = model_dir / "config.pbtxt"
    with open(str(config_path), "w") as f:
        f.write(model_config)

    # Write model
    model_scripted = torch.jit.script(model)
    model_scripted.save(str(model_version_dir / "model.pt"))

    input_data = np.array([[2.0, 3.0, 4.0], [4.0, 8.0, 1.0]]).astype(np.float32)

    inputs = [
        grpcclient.InferInput(
            "input", input_data.shape, triton.np_to_triton_dtype(input_data.dtype)
        )
    ]
    inputs[0].set_data_from_numpy(input_data)

    outputs = [grpcclient.InferRequestedOutput("OUTPUT__0")]

    response = None
  with run_triton_server(tmpdir) as client:

tests/unit/systems/torch/test_torch.py:109:


/usr/lib/python3.8/contextlib.py:113: in enter
return next(self.gen)


modelpath = local('/tmp/pytest-of-jenkins/pytest-13/test_torch0')

@contextlib.contextmanager
def run_triton_server(modelpath):
    """This function starts up a Triton server instance and returns a client to it.

    Parameters
    ----------
    modelpath : string
        The path to the model to load.

    Yields
    ------
    client: tritonclient.InferenceServerClient
        The client connected to the Triton server.

    """
    cmdline = [
        TRITON_SERVER_PATH,
        "--model-repository",
        modelpath,
        "--backend-config=tensorflow,version=2",
    ]
    env = os.environ.copy()
    env["CUDA_VISIBLE_DEVICES"] = "0"
    with subprocess.Popen(cmdline, env=env) as process:
        try:
            with grpcclient.InferenceServerClient("localhost:8001") as client:
                # wait until server is ready
                for _ in range(60):
                    if process.poll() is not None:
                        retcode = process.returncode
                      raise RuntimeError(f"Tritonserver failed to start (ret={retcode})")

E RuntimeError: Tritonserver failed to start (ret=1)

merlin/systems/triton/utils.py:46: RuntimeError
----------------------------- Captured stderr call -----------------------------
I0727 17:59:54.013319 29881 pinned_memory_manager.cc:240] Pinned memory pool is created at '0x7fdca6000000' with size 268435456
I0727 17:59:54.014080 29881 cuda_memory_manager.cc:105] CUDA memory pool is created on device 0 with size 67108864
I0727 17:59:54.016411 29881 model_repository_manager.cc:1191] loading: example_model:1
E0727 17:59:54.116859 29881 model_repository_manager.cc:1348] failed to load 'example_model' version 1: Invalid argument: unable to find 'libtriton_pytorch.so' for model 'example_model', searched: /tmp/pytest-of-jenkins/pytest-13/test_torch0/example_model/1, /tmp/pytest-of-jenkins/pytest-13/test_torch0/example_model, /opt/tritonserver/backends/pytorch
I0727 17:59:54.116985 29881 server.cc:556]
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+

I0727 17:59:54.117016 29881 server.cc:583]
+---------+------+--------+
| Backend | Path | Config |
+---------+------+--------+
+---------+------+--------+

I0727 17:59:54.117069 29881 server.cc:626]
+---------------+---------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Model | Version | Status |
+---------------+---------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| example_model | 1 | UNAVAILABLE: Invalid argument: unable to find 'libtriton_pytorch.so' for model 'example_model', searched: /tmp/pytest-of-jenkins/pytest-13/test_torch0/example_model/1, /tmp/pytest-of-jenkins/pytest-13/test_torch0/example_model, /opt/tritonserver/backends/pytorch |
+---------------+---------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0727 17:59:54.176360 29881 metrics.cc:650] Collecting metrics for GPU 0: Tesla P100-DGXS-16GB
I0727 17:59:54.177254 29881 tritonserver.cc:2138]
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option | Value |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| server_id | triton |
| server_version | 2.22.0 |
| server_extensions | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data statistics trace |
| model_repository_path[0] | /tmp/pytest-of-jenkins/pytest-13/test_torch0 |
| model_control_mode | MODE_NONE |
| strict_model_config | 1 |
| rate_limit | OFF |
| pinned_memory_pool_byte_size | 268435456 |
| cuda_memory_pool_byte_size{0} | 67108864 |
| response_cache_byte_size | 0 |
| min_supported_compute_capability | 6.0 |
| strict_readiness | 1 |
| exit_timeout | 30 |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0727 17:59:54.177289 29881 server.cc:257] Waiting for in-flight requests to complete.
I0727 17:59:54.177297 29881 server.cc:273] Timeout 30: Found 0 model versions that have in-flight inferences
I0727 17:59:54.177307 29881 server.cc:288] All models are stopped, unloading models
I0727 17:59:54.177313 29881 server.cc:295] Timeout 30: Found 0 live models and 0 in-flight non-inference requests
error: creating server: Internal - failed to load all models
W0727 17:59:55.201034 29881 metrics.cc:468] Unable to get energy consumption for GPU 0. Status:Success, value:0
W0727 17:59:55.201094 29881 metrics.cc:507] Unable to get memory usage for GPU 0. Memory usage status:Success, value:0. Memory total status:Success, value:0
=============================== warnings summary ===============================
../../../../../usr/local/lib/python3.8/dist-packages/nvtabular/framework_utils/init.py:18
/usr/local/lib/python3.8/dist-packages/nvtabular/framework_utils/init.py:18: DeprecationWarning: The nvtabular.framework_utils module is being replaced by the Merlin Models library. Support for importing from nvtabular.framework_utils is deprecated, and will be removed in a future version. Please consider using the models and layers from Merlin Models instead.
warnings.warn(

tests/unit/examples/test_serving_ranking_models_with_merlin_systems.py: 1 warning
tests/unit/systems/test_ensemble.py: 2 warnings
tests/unit/systems/test_export.py: 1 warning
tests/unit/systems/test_inference_ops.py: 2 warnings
tests/unit/systems/test_op_runner.py: 4 warnings
/usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:384: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.
warnings.warn(

tests/unit/systems/test_export.py::test_export_run_ensemble_triton[tensorflow-parquet]
/var/jenkins_home/workspace/merlin_systems/systems/merlin/systems/triton/export.py:304: UserWarning: Column x is being generated by NVTabular workflow but is unused in test_name_tf model
warnings.warn(

tests/unit/systems/test_export.py::test_export_run_ensemble_triton[tensorflow-parquet]
/var/jenkins_home/workspace/merlin_systems/systems/merlin/systems/triton/export.py:304: UserWarning: Column y is being generated by NVTabular workflow but is unused in test_name_tf model
warnings.warn(

tests/unit/systems/test_export.py::test_export_run_ensemble_triton[tensorflow-parquet]
/var/jenkins_home/workspace/merlin_systems/systems/merlin/systems/triton/export.py:304: UserWarning: Column id is being generated by NVTabular workflow but is unused in test_name_tf model
warnings.warn(

tests/unit/systems/fil/test_fil.py::test_binary_classifier_default[sklearn_forest_classifier-get_model_params4]
tests/unit/systems/fil/test_fil.py::test_binary_classifier_with_proba[sklearn_forest_classifier-get_model_params4]
tests/unit/systems/fil/test_fil.py::test_multi_classifier[sklearn_forest_classifier-get_model_params4]
tests/unit/systems/fil/test_fil.py::test_regressor[sklearn_forest_regressor-get_model_params4]
tests/unit/systems/fil/test_fil.py::test_model_file[sklearn_forest_regressor-checkpoint.tl]
/usr/local/lib/python3.8/dist-packages/sklearn/utils/deprecation.py:103: FutureWarning: Attribute n_features_ was deprecated in version 1.0 and will be removed in 1.2. Use n_features_in_ instead.
warnings.warn(msg, category=FutureWarning)

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=========================== short test summary info ============================
FAILED tests/unit/systems/torch/test_torch.py::test_pytorch_op_exports_own_config
FAILED tests/unit/systems/torch/test_torch.py::test_torch - RuntimeError: Tri...
============ 2 failed, 50 passed, 19 warnings in 234.91s (0:03:54) =============
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/systems/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[merlin_systems] $ /bin/bash /tmp/jenkins13820146091652038189.sh

@nvidia-merlin-bot
Copy link

Click to view CI Results
GitHub pull request #151 of commit b999a47601e45836b8455832c1bd686cca8ba10e, no merge conflicts.
Running as SYSTEM
Setting status of b999a47601e45836b8455832c1bd686cca8ba10e to PENDING with url https://10.20.13.93:8080/job/merlin_systems/157/console and message: 'Pending'
Using context: Jenkins
Building on master in workspace /var/jenkins_home/workspace/merlin_systems
using credential fce1c729-5d7c-48e8-90cb-b0c314b1076e
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/systems # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/systems
 > git --version # timeout=10
using GIT_ASKPASS to set credentials login for merlin-systems user + githubtoken
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/systems +refs/pull/151/*:refs/remotes/origin/pr/151/* # timeout=10
 > git rev-parse b999a47601e45836b8455832c1bd686cca8ba10e^{commit} # timeout=10
Checking out Revision b999a47601e45836b8455832c1bd686cca8ba10e (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f b999a47601e45836b8455832c1bd686cca8ba10e # timeout=10
Commit message: "Merge branch 'main' into feature/pytorch"
 > git rev-list --no-walk e56024a48b504d27710428f20259694b18eed702 # timeout=10
[merlin_systems] $ /bin/bash /tmp/jenkins5680607808064206387.sh
PYTHONPATH=:/usr/local/lib/python3.8/dist-packages/:/usr/local/hugectr/lib:/var/jenkins_home/workspace/merlin_systems/systems
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/merlin_systems/systems, configfile: pyproject.toml
plugins: anyio-3.6.1, xdist-2.5.0, forked-1.4.0, cov-3.0.0
collected 52 items

tests/unit/test_version.py . [ 1%]
tests/unit/examples/test_serving_ranking_models_with_merlin_systems.py . [ 3%]
[ 3%]
tests/unit/systems/test_ensemble.py .... [ 11%]
tests/unit/systems/test_ensemble_ops.py .. [ 15%]
tests/unit/systems/test_export.py . [ 17%]
tests/unit/systems/test_graph.py . [ 19%]
tests/unit/systems/test_inference_ops.py ... [ 25%]
tests/unit/systems/test_model_registry.py . [ 26%]
tests/unit/systems/test_op_runner.py .... [ 34%]
tests/unit/systems/fil/test_fil.py .......................... [ 84%]
tests/unit/systems/fil/test_forest.py ... [ 90%]
tests/unit/systems/tf/test_tf_op.py ... [ 96%]
tests/unit/systems/torch/test_torch.py

PDB set_trace (IO-capturing turned off) >>>>>>>>>>>>>>>>>>>>
/var/jenkins_home/workspace/merlin_systems/systems/tests/unit/systems/torch/test_torch.py(64)test_pytorch_op_exports_own_config()
-> export_path = pathlib.Path(tmpdir) / triton_op.export_name
(Pdb)
FF [100%]

=================================== FAILURES ===================================
______________________ test_pytorch_op_exports_own_config ______________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-14/test_pytorch_op_exports_own_co0')

def test_pytorch_op_exports_own_config(tmpdir):
    triton_op = ptorch_op.PredictPyTorch(model, model_input_schema, model_output_schema)

    triton_op.export(tmpdir, None, None)

    import pdb

    pdb.set_trace()

    # Export creates directory
  export_path = pathlib.Path(tmpdir) / triton_op.export_name

tests/unit/systems/torch/test_torch.py:64:


tests/unit/systems/torch/test_torch.py:64: in test_pytorch_op_exports_own_config
export_path = pathlib.Path(tmpdir) / triton_op.export_name
/usr/lib/python3.8/bdb.py:88: in trace_dispatch
return self.dispatch_line(frame)


self = <_pytest.debugging.pytestPDB._get_pdb_wrapper_class..PytestPdbWrapper object at 0x7f24cc353670>
frame = <frame at 0x62eb9ae0, file '/var/jenkins_home/workspace/merlin_systems/systems/tests/unit/systems/torch/test_torch.py', line 64, code test_pytorch_op_exports_own_config>

def dispatch_line(self, frame):
    """Invoke user function and return trace function for line event.

    If the debugger stops on the current line, invoke
    self.user_line(). Raise BdbQuit if self.quitting is set.
    Return self.trace_dispatch to continue tracing in this scope.
    """
    if self.stop_here(frame) or self.break_here(frame):
        self.user_line(frame)
      if self.quitting: raise BdbQuit

E bdb.BdbQuit

/usr/lib/python3.8/bdb.py:113: BdbQuit
__________________________________ test_torch __________________________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-14/test_torch0')

def test_torch(tmpdir):
    model_repository = Path(tmpdir)

    model_dir = model_repository / model_name
    model_version_dir = model_dir / "1"
    model_version_dir.mkdir(parents=True, exist_ok=True)

    # Write config out
    config_path = model_dir / "config.pbtxt"
    with open(str(config_path), "w") as f:
        f.write(model_config)

    # Write model
    model_scripted = torch.jit.script(model)
    model_scripted.save(str(model_version_dir / "model.pt"))

    input_data = np.array([[2.0, 3.0, 4.0], [4.0, 8.0, 1.0]]).astype(np.float32)

    inputs = [
        grpcclient.InferInput(
            "input", input_data.shape, triton.np_to_triton_dtype(input_data.dtype)
        )
    ]
    inputs[0].set_data_from_numpy(input_data)

    outputs = [grpcclient.InferRequestedOutput("OUTPUT__0")]

    response = None
  with run_triton_server(tmpdir) as client:

tests/unit/systems/torch/test_torch.py:109:


/usr/lib/python3.8/contextlib.py:113: in enter
return next(self.gen)


modelpath = local('/tmp/pytest-of-jenkins/pytest-14/test_torch0')

@contextlib.contextmanager
def run_triton_server(modelpath):
    """This function starts up a Triton server instance and returns a client to it.

    Parameters
    ----------
    modelpath : string
        The path to the model to load.

    Yields
    ------
    client: tritonclient.InferenceServerClient
        The client connected to the Triton server.

    """
    cmdline = [
        TRITON_SERVER_PATH,
        "--model-repository",
        modelpath,
        "--backend-config=tensorflow,version=2",
    ]
    env = os.environ.copy()
    env["CUDA_VISIBLE_DEVICES"] = "0"
    with subprocess.Popen(cmdline, env=env) as process:
        try:
            with grpcclient.InferenceServerClient("localhost:8001") as client:
                # wait until server is ready
                for _ in range(60):
                    if process.poll() is not None:
                        retcode = process.returncode
                      raise RuntimeError(f"Tritonserver failed to start (ret={retcode})")

E RuntimeError: Tritonserver failed to start (ret=1)

merlin/systems/triton/utils.py:46: RuntimeError
----------------------------- Captured stderr call -----------------------------
I0727 18:05:07.550367 31488 pinned_memory_manager.cc:240] Pinned memory pool is created at '0x7f0324000000' with size 268435456
I0727 18:05:07.551124 31488 cuda_memory_manager.cc:105] CUDA memory pool is created on device 0 with size 67108864
I0727 18:05:07.553421 31488 model_repository_manager.cc:1191] loading: example_model:1
E0727 18:05:07.653855 31488 model_repository_manager.cc:1348] failed to load 'example_model' version 1: Invalid argument: unable to find 'libtriton_pytorch.so' for model 'example_model', searched: /tmp/pytest-of-jenkins/pytest-14/test_torch0/example_model/1, /tmp/pytest-of-jenkins/pytest-14/test_torch0/example_model, /opt/tritonserver/backends/pytorch
I0727 18:05:07.653970 31488 server.cc:556]
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+

I0727 18:05:07.654006 31488 server.cc:583]
+---------+------+--------+
| Backend | Path | Config |
+---------+------+--------+
+---------+------+--------+

I0727 18:05:07.654073 31488 server.cc:626]
+---------------+---------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Model | Version | Status |
+---------------+---------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| example_model | 1 | UNAVAILABLE: Invalid argument: unable to find 'libtriton_pytorch.so' for model 'example_model', searched: /tmp/pytest-of-jenkins/pytest-14/test_torch0/example_model/1, /tmp/pytest-of-jenkins/pytest-14/test_torch0/example_model, /opt/tritonserver/backends/pytorch |
+---------------+---------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0727 18:05:07.716004 31488 metrics.cc:650] Collecting metrics for GPU 0: Tesla P100-DGXS-16GB
I0727 18:05:07.716881 31488 tritonserver.cc:2138]
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option | Value |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| server_id | triton |
| server_version | 2.22.0 |
| server_extensions | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data statistics trace |
| model_repository_path[0] | /tmp/pytest-of-jenkins/pytest-14/test_torch0 |
| model_control_mode | MODE_NONE |
| strict_model_config | 1 |
| rate_limit | OFF |
| pinned_memory_pool_byte_size | 268435456 |
| cuda_memory_pool_byte_size{0} | 67108864 |
| response_cache_byte_size | 0 |
| min_supported_compute_capability | 6.0 |
| strict_readiness | 1 |
| exit_timeout | 30 |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0727 18:05:07.716917 31488 server.cc:257] Waiting for in-flight requests to complete.
I0727 18:05:07.716924 31488 server.cc:273] Timeout 30: Found 0 model versions that have in-flight inferences
I0727 18:05:07.716934 31488 server.cc:288] All models are stopped, unloading models
I0727 18:05:07.716940 31488 server.cc:295] Timeout 30: Found 0 live models and 0 in-flight non-inference requests
error: creating server: Internal - failed to load all models
W0727 18:05:08.743943 31488 metrics.cc:468] Unable to get energy consumption for GPU 0. Status:Success, value:0
W0727 18:05:08.744003 31488 metrics.cc:507] Unable to get memory usage for GPU 0. Memory usage status:Success, value:0. Memory total status:Success, value:0
=============================== warnings summary ===============================
../../../../../usr/local/lib/python3.8/dist-packages/nvtabular/framework_utils/init.py:18
/usr/local/lib/python3.8/dist-packages/nvtabular/framework_utils/init.py:18: DeprecationWarning: The nvtabular.framework_utils module is being replaced by the Merlin Models library. Support for importing from nvtabular.framework_utils is deprecated, and will be removed in a future version. Please consider using the models and layers from Merlin Models instead.
warnings.warn(

tests/unit/examples/test_serving_ranking_models_with_merlin_systems.py: 1 warning
tests/unit/systems/test_ensemble.py: 2 warnings
tests/unit/systems/test_export.py: 1 warning
tests/unit/systems/test_inference_ops.py: 2 warnings
tests/unit/systems/test_op_runner.py: 4 warnings
/usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:384: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.
warnings.warn(

tests/unit/systems/test_export.py::test_export_run_ensemble_triton[tensorflow-parquet]
/var/jenkins_home/workspace/merlin_systems/systems/merlin/systems/triton/export.py:304: UserWarning: Column x is being generated by NVTabular workflow but is unused in test_name_tf model
warnings.warn(

tests/unit/systems/test_export.py::test_export_run_ensemble_triton[tensorflow-parquet]
/var/jenkins_home/workspace/merlin_systems/systems/merlin/systems/triton/export.py:304: UserWarning: Column y is being generated by NVTabular workflow but is unused in test_name_tf model
warnings.warn(

tests/unit/systems/test_export.py::test_export_run_ensemble_triton[tensorflow-parquet]
/var/jenkins_home/workspace/merlin_systems/systems/merlin/systems/triton/export.py:304: UserWarning: Column id is being generated by NVTabular workflow but is unused in test_name_tf model
warnings.warn(

tests/unit/systems/fil/test_fil.py::test_binary_classifier_default[sklearn_forest_classifier-get_model_params4]
tests/unit/systems/fil/test_fil.py::test_binary_classifier_with_proba[sklearn_forest_classifier-get_model_params4]
tests/unit/systems/fil/test_fil.py::test_multi_classifier[sklearn_forest_classifier-get_model_params4]
tests/unit/systems/fil/test_fil.py::test_regressor[sklearn_forest_regressor-get_model_params4]
tests/unit/systems/fil/test_fil.py::test_model_file[sklearn_forest_regressor-checkpoint.tl]
/usr/local/lib/python3.8/dist-packages/sklearn/utils/deprecation.py:103: FutureWarning: Attribute n_features_ was deprecated in version 1.0 and will be removed in 1.2. Use n_features_in_ instead.
warnings.warn(msg, category=FutureWarning)

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=========================== short test summary info ============================
FAILED tests/unit/systems/torch/test_torch.py::test_pytorch_op_exports_own_config
FAILED tests/unit/systems/torch/test_torch.py::test_torch - RuntimeError: Tri...
============ 2 failed, 50 passed, 19 warnings in 250.79s (0:04:10) =============
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/systems/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[merlin_systems] $ /bin/bash /tmp/jenkins16877300768792265468.sh

@karlhigley karlhigley added the enhancement New feature or request label Jul 27, 2022
@nvidia-merlin-bot
Copy link

Click to view CI Results
GitHub pull request #151 of commit 3a5a6026257268dec0e19b8aad41a5051fa58c2c, no merge conflicts.
Running as SYSTEM
Setting status of 3a5a6026257268dec0e19b8aad41a5051fa58c2c to PENDING with url https://10.20.13.93:8080/job/merlin_systems/158/console and message: 'Pending'
Using context: Jenkins
Building on master in workspace /var/jenkins_home/workspace/merlin_systems
using credential fce1c729-5d7c-48e8-90cb-b0c314b1076e
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/systems # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/systems
 > git --version # timeout=10
using GIT_ASKPASS to set credentials login for merlin-systems user + githubtoken
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/systems +refs/pull/151/*:refs/remotes/origin/pr/151/* # timeout=10
 > git rev-parse 3a5a6026257268dec0e19b8aad41a5051fa58c2c^{commit} # timeout=10
Checking out Revision 3a5a6026257268dec0e19b8aad41a5051fa58c2c (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 3a5a6026257268dec0e19b8aad41a5051fa58c2c # timeout=10
Commit message: "Merge remote-tracking branch 'origin/feature/pytorch' into feature/pytorch"
 > git rev-list --no-walk b999a47601e45836b8455832c1bd686cca8ba10e # timeout=10
[merlin_systems] $ /bin/bash /tmp/jenkins1287057132715756526.sh
PYTHONPATH=:/usr/local/lib/python3.8/dist-packages/:/usr/local/hugectr/lib:/var/jenkins_home/workspace/merlin_systems/systems
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/merlin_systems/systems, configfile: pyproject.toml
plugins: anyio-3.6.1, xdist-2.5.0, forked-1.4.0, cov-3.0.0
collected 53 items

tests/unit/test_version.py . [ 1%]
tests/unit/examples/test_serving_ranking_models_with_merlin_systems.py . [ 3%]
[ 3%]
tests/unit/systems/test_ensemble.py .... [ 11%]
tests/unit/systems/test_ensemble_ops.py .. [ 15%]
tests/unit/systems/test_export.py . [ 16%]
tests/unit/systems/test_graph.py . [ 18%]
tests/unit/systems/test_inference_ops.py ... [ 24%]
tests/unit/systems/test_model_registry.py . [ 26%]
tests/unit/systems/test_op_runner.py .... [ 33%]
tests/unit/systems/fil/test_fil.py .......................... [ 83%]
tests/unit/systems/fil/test_forest.py ... [ 88%]
tests/unit/systems/tf/test_tf_op.py ... [ 94%]
tests/unit/systems/torch/test_torch.py .FF [100%]

=================================== FAILURES ===================================
______________________________ test_torch_backend ______________________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-19/test_torch_backend0')

def test_torch_backend(tmpdir):
    model_repository = Path(tmpdir)

    model_dir = model_repository / model_name
    model_version_dir = model_dir / "1"
    model_version_dir.mkdir(parents=True, exist_ok=True)

    # Write config out
    config_path = model_dir / "config.pbtxt"
    with open(str(config_path), "w") as f:
        f.write(model_config)

    # Write model
    model_scripted = torch.jit.script(model)
    model_scripted.save(str(model_version_dir / "model.pt"))

    input_data = np.array([[2.0, 3.0, 4.0], [4.0, 8.0, 1.0]]).astype(np.float32)

    inputs = [
        grpcclient.InferInput(
            "input", input_data.shape, triton.np_to_triton_dtype(input_data.dtype)
        )
    ]
    inputs[0].set_data_from_numpy(input_data)

    outputs = [grpcclient.InferRequestedOutput("OUTPUT__0")]

    response = None
  with run_triton_server(tmpdir) as client:

tests/unit/systems/torch/test_torch.py:109:


/usr/lib/python3.8/contextlib.py:113: in enter
return next(self.gen)


modelpath = local('/tmp/pytest-of-jenkins/pytest-19/test_torch_backend0')

@contextlib.contextmanager
def run_triton_server(modelpath):
    """This function starts up a Triton server instance and returns a client to it.

    Parameters
    ----------
    modelpath : string
        The path to the model to load.

    Yields
    ------
    client: tritonclient.InferenceServerClient
        The client connected to the Triton server.

    """
    cmdline = [
        TRITON_SERVER_PATH,
        "--model-repository",
        modelpath,
        "--backend-config=tensorflow,version=2",
    ]
    env = os.environ.copy()
    env["CUDA_VISIBLE_DEVICES"] = "0"
    with subprocess.Popen(cmdline, env=env) as process:
        try:
            with grpcclient.InferenceServerClient("localhost:8001") as client:
                # wait until server is ready
                for _ in range(60):
                    if process.poll() is not None:
                        retcode = process.returncode
                      raise RuntimeError(f"Tritonserver failed to start (ret={retcode})")

E RuntimeError: Tritonserver failed to start (ret=1)

merlin/systems/triton/utils.py:46: RuntimeError
----------------------------- Captured stderr call -----------------------------
I0727 21:16:51.300829 19200 pinned_memory_manager.cc:240] Pinned memory pool is created at '0x7f754e000000' with size 268435456
I0727 21:16:51.301589 19200 cuda_memory_manager.cc:105] CUDA memory pool is created on device 0 with size 67108864
I0727 21:16:51.303861 19200 model_repository_manager.cc:1191] loading: example_model:1
E0727 21:16:51.404345 19200 model_repository_manager.cc:1348] failed to load 'example_model' version 1: Invalid argument: unable to find 'libtriton_pytorch.so' for model 'example_model', searched: /tmp/pytest-of-jenkins/pytest-19/test_torch_backend0/example_model/1, /tmp/pytest-of-jenkins/pytest-19/test_torch_backend0/example_model, /opt/tritonserver/backends/pytorch
I0727 21:16:51.404475 19200 server.cc:556]
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+

I0727 21:16:51.404513 19200 server.cc:583]
+---------+------+--------+
| Backend | Path | Config |
+---------+------+--------+
+---------+------+--------+

I0727 21:16:51.404579 19200 server.cc:626]
+---------------+---------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Model | Version | Status |
+---------------+---------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| example_model | 1 | UNAVAILABLE: Invalid argument: unable to find 'libtriton_pytorch.so' for model 'example_model', searched: /tmp/pytest-of-jenkins/pytest-19/test_torch_backend0/example_model/1, /tmp/pytest-of-jenkins/pytest-19/test_torch_backend0/example_model, /opt/tritonserver/backends/pytorch |
+---------------+---------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0727 21:16:51.465050 19200 metrics.cc:650] Collecting metrics for GPU 0: Tesla P100-DGXS-16GB
I0727 21:16:51.465945 19200 tritonserver.cc:2138]
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option | Value |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| server_id | triton |
| server_version | 2.22.0 |
| server_extensions | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data statistics trace |
| model_repository_path[0] | /tmp/pytest-of-jenkins/pytest-19/test_torch_backend0 |
| model_control_mode | MODE_NONE |
| strict_model_config | 1 |
| rate_limit | OFF |
| pinned_memory_pool_byte_size | 268435456 |
| cuda_memory_pool_byte_size{0} | 67108864 |
| response_cache_byte_size | 0 |
| min_supported_compute_capability | 6.0 |
| strict_readiness | 1 |
| exit_timeout | 30 |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0727 21:16:51.465980 19200 server.cc:257] Waiting for in-flight requests to complete.
I0727 21:16:51.465988 19200 server.cc:273] Timeout 30: Found 0 model versions that have in-flight inferences
I0727 21:16:51.465998 19200 server.cc:288] All models are stopped, unloading models
I0727 21:16:51.466004 19200 server.cc:295] Timeout 30: Found 0 live models and 0 in-flight non-inference requests
error: creating server: Internal - failed to load all models
W0727 21:16:52.485024 19200 metrics.cc:468] Unable to get energy consumption for GPU 0. Status:Success, value:0
W0727 21:16:52.485089 19200 metrics.cc:507] Unable to get memory usage for GPU 0. Memory usage status:Success, value:0. Memory total status:Success, value:0
___________________________ test_pytorch_op_serving ____________________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-19/test_pytorch_op_serving0')

@pytest.mark.skipif(not TRITON_SERVER_PATH, reason="triton server not found")
def test_pytorch_op_serving(tmpdir):
    from merlin.core.dispatch import make_df
    from merlin.systems.dag.ensemble import Ensemble
    from tests.unit.systems.utils.triton import _run_ensemble_on_tritonserver

    model_name = "0_predictpytorch"
    model_path = str(tmpdir / "model.pt")

    model_scripted = torch.jit.script(model)
    model_scripted.save(model_path)

    predictions = ["input"] >> ptorch_op.PredictPyTorch(
        model_path, model_input_schema, model_output_schema
    )
    ensemble = Ensemble(predictions, model_input_schema)
    ens_config, node_configs = ensemble.export(tmpdir)

    input_data = np.array([[2.0, 3.0, 4.0], [4.0, 8.0, 1.0]]).astype(np.float32)

    inputs = [
        grpcclient.InferInput(
            "input", input_data.shape, triton.np_to_triton_dtype(input_data.dtype)
        )
    ]
    inputs[0].set_data_from_numpy(input_data)

    outputs = [grpcclient.InferRequestedOutput("OUTPUT__0")]

    response = None
  with run_triton_server(tmpdir) as client:

tests/unit/systems/torch/test_torch.py:145:


/usr/lib/python3.8/contextlib.py:113: in enter
return next(self.gen)


modelpath = local('/tmp/pytest-of-jenkins/pytest-19/test_pytorch_op_serving0')

@contextlib.contextmanager
def run_triton_server(modelpath):
    """This function starts up a Triton server instance and returns a client to it.

    Parameters
    ----------
    modelpath : string
        The path to the model to load.

    Yields
    ------
    client: tritonclient.InferenceServerClient
        The client connected to the Triton server.

    """
    cmdline = [
        TRITON_SERVER_PATH,
        "--model-repository",
        modelpath,
        "--backend-config=tensorflow,version=2",
    ]
    env = os.environ.copy()
    env["CUDA_VISIBLE_DEVICES"] = "0"
    with subprocess.Popen(cmdline, env=env) as process:
        try:
            with grpcclient.InferenceServerClient("localhost:8001") as client:
                # wait until server is ready
                for _ in range(60):
                    if process.poll() is not None:
                        retcode = process.returncode
                      raise RuntimeError(f"Tritonserver failed to start (ret={retcode})")

E RuntimeError: Tritonserver failed to start (ret=1)

merlin/systems/triton/utils.py:46: RuntimeError
----------------------------- Captured stderr call -----------------------------
I0727 21:16:54.771100 19214 pinned_memory_manager.cc:240] Pinned memory pool is created at '0x7f1e4e000000' with size 268435456
I0727 21:16:54.771866 19214 cuda_memory_manager.cc:105] CUDA memory pool is created on device 0 with size 67108864
I0727 21:16:54.774276 19214 model_repository_manager.cc:1191] loading: 0_predictpytorch:1
E0727 21:16:54.874609 19214 model_repository_manager.cc:1348] failed to load '0_predictpytorch' version 1: Invalid argument: unable to find 'libtriton_pytorch.so' for model '0_predictpytorch', searched: /tmp/pytest-of-jenkins/pytest-19/test_pytorch_op_serving0/0_predictpytorch/1, /tmp/pytest-of-jenkins/pytest-19/test_pytorch_op_serving0/0_predictpytorch, /opt/tritonserver/backends/pytorch
E0727 21:16:54.874697 19214 model_repository_manager.cc:1551] Invalid argument: ensemble 'ensemble_model' depends on '0_predictpytorch' which has no loaded version
I0727 21:16:54.874741 19214 server.cc:556]
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+

I0727 21:16:54.874760 19214 server.cc:583]
+---------+------+--------+
| Backend | Path | Config |
+---------+------+--------+
+---------+------+--------+

I0727 21:16:54.874823 19214 server.cc:626]
+------------------+---------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Model | Version | Status |
+------------------+---------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 0_predictpytorch | 1 | UNAVAILABLE: Invalid argument: unable to find 'libtriton_pytorch.so' for model '0_predictpytorch', searched: /tmp/pytest-of-jenkins/pytest-19/test_pytorch_op_serving0/0_predictpytorch/1, /tmp/pytest-of-jenkins/pytest-19/test_pytorch_op_serving0/0_predictpytorch, /opt/tritonserver/backends/pytorch |
+------------------+---------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0727 21:16:54.934231 19214 metrics.cc:650] Collecting metrics for GPU 0: Tesla P100-DGXS-16GB
I0727 21:16:54.935109 19214 tritonserver.cc:2138]
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option | Value |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| server_id | triton |
| server_version | 2.22.0 |
| server_extensions | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data statistics trace |
| model_repository_path[0] | /tmp/pytest-of-jenkins/pytest-19/test_pytorch_op_serving0 |
| model_control_mode | MODE_NONE |
| strict_model_config | 1 |
| rate_limit | OFF |
| pinned_memory_pool_byte_size | 268435456 |
| cuda_memory_pool_byte_size{0} | 67108864 |
| response_cache_byte_size | 0 |
| min_supported_compute_capability | 6.0 |
| strict_readiness | 1 |
| exit_timeout | 30 |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0727 21:16:54.935143 19214 server.cc:257] Waiting for in-flight requests to complete.
I0727 21:16:54.935151 19214 server.cc:273] Timeout 30: Found 0 model versions that have in-flight inferences
I0727 21:16:54.935161 19214 server.cc:288] All models are stopped, unloading models
I0727 21:16:54.935167 19214 server.cc:295] Timeout 30: Found 0 live models and 0 in-flight non-inference requests
error: creating server: Internal - failed to load all models
W0727 21:16:55.960258 19214 metrics.cc:468] Unable to get energy consumption for GPU 0. Status:Success, value:0
W0727 21:16:55.960318 19214 metrics.cc:507] Unable to get memory usage for GPU 0. Memory usage status:Success, value:0. Memory total status:Success, value:0
=============================== warnings summary ===============================
../../../../../usr/local/lib/python3.8/dist-packages/nvtabular/framework_utils/init.py:18
/usr/local/lib/python3.8/dist-packages/nvtabular/framework_utils/init.py:18: DeprecationWarning: The nvtabular.framework_utils module is being replaced by the Merlin Models library. Support for importing from nvtabular.framework_utils is deprecated, and will be removed in a future version. Please consider using the models and layers from Merlin Models instead.
warnings.warn(

tests/unit/examples/test_serving_ranking_models_with_merlin_systems.py: 1 warning
tests/unit/systems/test_ensemble.py: 2 warnings
tests/unit/systems/test_export.py: 1 warning
tests/unit/systems/test_inference_ops.py: 2 warnings
tests/unit/systems/test_op_runner.py: 4 warnings
/usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:384: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.
warnings.warn(

tests/unit/systems/test_export.py::test_export_run_ensemble_triton[tensorflow-parquet]
/var/jenkins_home/workspace/merlin_systems/systems/merlin/systems/triton/export.py:304: UserWarning: Column x is being generated by NVTabular workflow but is unused in test_name_tf model
warnings.warn(

tests/unit/systems/test_export.py::test_export_run_ensemble_triton[tensorflow-parquet]
/var/jenkins_home/workspace/merlin_systems/systems/merlin/systems/triton/export.py:304: UserWarning: Column y is being generated by NVTabular workflow but is unused in test_name_tf model
warnings.warn(

tests/unit/systems/test_export.py::test_export_run_ensemble_triton[tensorflow-parquet]
/var/jenkins_home/workspace/merlin_systems/systems/merlin/systems/triton/export.py:304: UserWarning: Column id is being generated by NVTabular workflow but is unused in test_name_tf model
warnings.warn(

tests/unit/systems/fil/test_fil.py::test_binary_classifier_default[sklearn_forest_classifier-get_model_params4]
tests/unit/systems/fil/test_fil.py::test_binary_classifier_with_proba[sklearn_forest_classifier-get_model_params4]
tests/unit/systems/fil/test_fil.py::test_multi_classifier[sklearn_forest_classifier-get_model_params4]
tests/unit/systems/fil/test_fil.py::test_regressor[sklearn_forest_regressor-get_model_params4]
tests/unit/systems/fil/test_fil.py::test_model_file[sklearn_forest_regressor-checkpoint.tl]
/usr/local/lib/python3.8/dist-packages/sklearn/utils/deprecation.py:103: FutureWarning: Attribute n_features_ was deprecated in version 1.0 and will be removed in 1.2. Use n_features_in_ instead.
warnings.warn(msg, category=FutureWarning)

tests/unit/systems/torch/test_torch.py::test_pytorch_op_serving
/usr/local/lib/python3.8/dist-packages/torch/serialization.py:707: UserWarning: 'torch.load' received a zip file that looks like a TorchScript archive dispatching to 'torch.jit.load' (call 'torch.jit.load' directly to silence this warning)
warnings.warn("'torch.load' received a zip file that looks like a TorchScript archive"

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=========================== short test summary info ============================
FAILED tests/unit/systems/torch/test_torch.py::test_torch_backend - RuntimeEr...
FAILED tests/unit/systems/torch/test_torch.py::test_pytorch_op_serving - Runt...
============ 2 failed, 51 passed, 20 warnings in 252.59s (0:04:12) =============
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/systems/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[merlin_systems] $ /bin/bash /tmp/jenkins6490429663934475607.sh

@karlhigley karlhigley closed this Jul 28, 2022
@karlhigley karlhigley reopened this Jul 28, 2022
@nvidia-merlin-bot
Copy link

Click to view CI Results
GitHub pull request #151 of commit 5b345d6ce665f0e014446d175bea0a6935b8bf0a, no merge conflicts.
Running as SYSTEM
Setting status of 5b345d6ce665f0e014446d175bea0a6935b8bf0a to PENDING with url https://10.20.13.93:8080/job/merlin_systems/159/console and message: 'Pending'
Using context: Jenkins
Building on master in workspace /var/jenkins_home/workspace/merlin_systems
using credential fce1c729-5d7c-48e8-90cb-b0c314b1076e
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/systems # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/systems
 > git --version # timeout=10
using GIT_ASKPASS to set credentials login for merlin-systems user + githubtoken
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/systems +refs/pull/151/*:refs/remotes/origin/pr/151/* # timeout=10
 > git rev-parse 5b345d6ce665f0e014446d175bea0a6935b8bf0a^{commit} # timeout=10
Checking out Revision 5b345d6ce665f0e014446d175bea0a6935b8bf0a (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 5b345d6ce665f0e014446d175bea0a6935b8bf0a # timeout=10
Commit message: "(WIP) Add Python back-end to operator"
 > git rev-list --no-walk 3a5a6026257268dec0e19b8aad41a5051fa58c2c # timeout=10
[merlin_systems] $ /bin/bash /tmp/jenkins16926813798588220287.sh
PYTHONPATH=:/usr/local/lib/python3.8/dist-packages/:/usr/local/hugectr/lib:/var/jenkins_home/workspace/merlin_systems/systems
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/merlin_systems/systems, configfile: pyproject.toml
plugins: anyio-3.6.1, xdist-2.5.0, forked-1.4.0, cov-3.0.0
collected 57 items

tests/unit/test_version.py . [ 1%]
tests/unit/examples/test_serving_ranking_models_with_merlin_systems.py . [ 3%]
[ 3%]
tests/unit/systems/test_ensemble.py .... [ 10%]
tests/unit/systems/test_ensemble_ops.py .. [ 14%]
tests/unit/systems/test_export.py . [ 15%]
tests/unit/systems/test_graph.py . [ 17%]
tests/unit/systems/test_inference_ops.py ... [ 22%]
tests/unit/systems/test_model_registry.py . [ 24%]
tests/unit/systems/test_op_runner.py .... [ 31%]
tests/unit/systems/fil/test_fil.py .......................... [ 77%]
tests/unit/systems/fil/test_forest.py ... [ 82%]
tests/unit/systems/tf/test_tf_op.py ... [ 87%]
tests/unit/systems/torch/test_torch.py ..FFFFF [100%]

=================================== FAILURES ===================================
______________________________ test_torch_backend ______________________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-1/test_torch_backend0')

def test_torch_backend(tmpdir):
    model_repository = Path(tmpdir)

    model_dir = model_repository / model_name
    model_version_dir = model_dir / "1"
    model_version_dir.mkdir(parents=True, exist_ok=True)

    # Write config out
    config_path = model_dir / "config.pbtxt"
    with open(str(config_path), "w") as f:
        f.write(model_config)

    # Write model
    model_scripted = torch.jit.script(model)
    model_scripted.save(str(model_version_dir / "model.pt"))

    input_data = {"input": np.array([[2.0, 3.0, 4.0], [4.0, 8.0, 1.0]]).astype(np.float32)}

    inputs = [
        grpcclient.InferInput(
            "input", input_data["input"].shape, triton.np_to_triton_dtype(input_data["input"].dtype)
        )
    ]
  inputs[0].set_data_from_numpy(input_data)

tests/unit/systems/torch/test_torch.py:130:


/usr/local/lib/python3.8/dist-packages/tritonclient/grpc/init.py:1701: in set_data_from_numpy
raise_error("input_tensor must be a numpy array")


msg = 'input_tensor must be a numpy array'

def raise_error(msg):
    """
    Raise error with the provided message
    """
  raise InferenceServerException(msg=msg) from None

E tritonclient.utils.InferenceServerException: input_tensor must be a numpy array

/usr/local/lib/python3.8/dist-packages/tritonclient/utils/init.py:35: InferenceServerException
______________________ test_pytorch_op_serving[True-True] ______________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-1/test_pytorch_op_serving_True_T0')
use_path = True, torchscript = True

@pytest.mark.skipif(not TRITON_SERVER_PATH, reason="triton server not found")
@pytest.mark.parametrize("torchscript", [True, False])
@pytest.mark.parametrize("use_path", [True, False])
def test_pytorch_op_serving(tmpdir, use_path, torchscript):
    # from merlin.core.dispatch import make_df

    model_name = "0_predictpytorch"
    model_path = str(tmpdir / "model.pt")

    model_to_use = model_scripted if torchscript else model
    model_or_path = model_path if use_path else model_to_use

    if use_path:
        try:
            # jit-compiled version of a model
            model_to_use.save(model_path)
        except AttributeError:
            # non-jit-compiled version of a model
            torch.save(model_to_use, model_path)


    predictions = ["input"] >> ptorch_op.PredictPyTorch(
        model_or_path, torchscript, model_input_schema, model_output_schema, sparse_max={"input": 3}
    )
    ensemble = Ensemble(predictions, model_input_schema)
    ens_config, node_configs = ensemble.export(tmpdir)

    input_data = {"input": np.array([[2.0, 3.0, 4.0], [4.0, 8.0, 1.0]]).astype(np.float32)}
    # input_data = {"input": np.array([2.0, 3.0, 4.0]).astype(np.float32)}

    inputs = [
        grpcclient.InferInput(
            "input", input_data["input"].shape, triton.np_to_triton_dtype(input_data["input"].dtype)
        )
    ]
    inputs[0].set_data_from_numpy(input_data["input"])

    outputs = [grpcclient.InferRequestedOutput("OUTPUT__0")]

    response = None
  with run_triton_server(tmpdir) as client:

tests/unit/systems/torch/test_torch.py:181:


/usr/lib/python3.8/contextlib.py:113: in enter
return next(self.gen)


modelpath = local('/tmp/pytest-of-jenkins/pytest-1/test_pytorch_op_serving_True_T0')

@contextlib.contextmanager
def run_triton_server(modelpath):
    """This function starts up a Triton server instance and returns a client to it.

    Parameters
    ----------
    modelpath : string
        The path to the model to load.

    Yields
    ------
    client: tritonclient.InferenceServerClient
        The client connected to the Triton server.

    """
    cmdline = [
        TRITON_SERVER_PATH,
        "--model-repository",
        modelpath,
        "--backend-config=tensorflow,version=2",
    ]
    env = os.environ.copy()
    env["CUDA_VISIBLE_DEVICES"] = "0"
    with subprocess.Popen(cmdline, env=env) as process:
        try:
            with grpcclient.InferenceServerClient("localhost:8001") as client:
                # wait until server is ready
                for _ in range(60):
                    if process.poll() is not None:
                        retcode = process.returncode
                      raise RuntimeError(f"Tritonserver failed to start (ret={retcode})")

E RuntimeError: Tritonserver failed to start (ret=1)

merlin/systems/triton/utils.py:46: RuntimeError
----------------------------- Captured stderr call -----------------------------
I0728 15:39:28.325910 17365 pinned_memory_manager.cc:240] Pinned memory pool is created at '0x7f1be6000000' with size 268435456
I0728 15:39:28.326669 17365 cuda_memory_manager.cc:105] CUDA memory pool is created on device 0 with size 67108864
I0728 15:39:28.329071 17365 model_repository_manager.cc:1191] loading: 0_predictpytorch:1
E0728 15:39:28.429420 17365 model_repository_manager.cc:1348] failed to load '0_predictpytorch' version 1: Invalid argument: unable to find 'libtriton_pytorch.so' for model '0_predictpytorch', searched: /tmp/pytest-of-jenkins/pytest-1/test_pytorch_op_serving_True_T0/0_predictpytorch/1, /tmp/pytest-of-jenkins/pytest-1/test_pytorch_op_serving_True_T0/0_predictpytorch, /opt/tritonserver/backends/pytorch
E0728 15:39:28.429507 17365 model_repository_manager.cc:1551] Invalid argument: ensemble 'ensemble_model' depends on '0_predictpytorch' which has no loaded version
I0728 15:39:28.429560 17365 server.cc:556]
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+

I0728 15:39:28.429584 17365 server.cc:583]
+---------+------+--------+
| Backend | Path | Config |
+---------+------+--------+
+---------+------+--------+

I0728 15:39:28.429642 17365 server.cc:626]
+------------------+---------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Model | Version | Status |
+------------------+---------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 0_predictpytorch | 1 | UNAVAILABLE: Invalid argument: unable to find 'libtriton_pytorch.so' for model '0_predictpytorch', searched: /tmp/pytest-of-jenkins/pytest-1/test_pytorch_op_serving_True_T0/0_predictpytorch/1, /tmp/pytest-of-jenkins/pytest-1/test_pytorch_op_serving_True_T0/0_predictpytorch, /opt/tritonserver/backends/pytorch |
+------------------+---------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0728 15:39:28.489069 17365 metrics.cc:650] Collecting metrics for GPU 0: Tesla P100-DGXS-16GB
I0728 15:39:28.489911 17365 tritonserver.cc:2138]
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option | Value |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| server_id | triton |
| server_version | 2.22.0 |
| server_extensions | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data statistics trace |
| model_repository_path[0] | /tmp/pytest-of-jenkins/pytest-1/test_pytorch_op_serving_True_T0 |
| model_control_mode | MODE_NONE |
| strict_model_config | 1 |
| rate_limit | OFF |
| pinned_memory_pool_byte_size | 268435456 |
| cuda_memory_pool_byte_size{0} | 67108864 |
| response_cache_byte_size | 0 |
| min_supported_compute_capability | 6.0 |
| strict_readiness | 1 |
| exit_timeout | 30 |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0728 15:39:28.489946 17365 server.cc:257] Waiting for in-flight requests to complete.
I0728 15:39:28.489954 17365 server.cc:273] Timeout 30: Found 0 model versions that have in-flight inferences
I0728 15:39:28.489965 17365 server.cc:288] All models are stopped, unloading models
I0728 15:39:28.489971 17365 server.cc:295] Timeout 30: Found 0 live models and 0 in-flight non-inference requests
error: creating server: Internal - failed to load all models
W0728 15:39:29.514141 17365 metrics.cc:468] Unable to get energy consumption for GPU 0. Status:Success, value:0
W0728 15:39:29.514203 17365 metrics.cc:507] Unable to get memory usage for GPU 0. Memory usage status:Success, value:0. Memory total status:Success, value:0
_____________________ test_pytorch_op_serving[True-False] ______________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-1/test_pytorch_op_serving_True_F0')
use_path = True, torchscript = False

@pytest.mark.skipif(not TRITON_SERVER_PATH, reason="triton server not found")
@pytest.mark.parametrize("torchscript", [True, False])
@pytest.mark.parametrize("use_path", [True, False])
def test_pytorch_op_serving(tmpdir, use_path, torchscript):
    # from merlin.core.dispatch import make_df

    model_name = "0_predictpytorch"
    model_path = str(tmpdir / "model.pt")

    model_to_use = model_scripted if torchscript else model
    model_or_path = model_path if use_path else model_to_use

    if use_path:
        try:
            # jit-compiled version of a model
            model_to_use.save(model_path)
        except AttributeError:
            # non-jit-compiled version of a model
            torch.save(model_to_use, model_path)


    predictions = ["input"] >> ptorch_op.PredictPyTorch(
        model_or_path, torchscript, model_input_schema, model_output_schema, sparse_max={"input": 3}
    )
    ensemble = Ensemble(predictions, model_input_schema)
    ens_config, node_configs = ensemble.export(tmpdir)

    input_data = {"input": np.array([[2.0, 3.0, 4.0], [4.0, 8.0, 1.0]]).astype(np.float32)}
    # input_data = {"input": np.array([2.0, 3.0, 4.0]).astype(np.float32)}

    inputs = [
        grpcclient.InferInput(
            "input", input_data["input"].shape, triton.np_to_triton_dtype(input_data["input"].dtype)
        )
    ]
    inputs[0].set_data_from_numpy(input_data["input"])

    outputs = [grpcclient.InferRequestedOutput("OUTPUT__0")]

    response = None
  with run_triton_server(tmpdir) as client:

tests/unit/systems/torch/test_torch.py:181:


/usr/lib/python3.8/contextlib.py:113: in enter
return next(self.gen)


modelpath = local('/tmp/pytest-of-jenkins/pytest-1/test_pytorch_op_serving_True_F0')

@contextlib.contextmanager
def run_triton_server(modelpath):
    """This function starts up a Triton server instance and returns a client to it.

    Parameters
    ----------
    modelpath : string
        The path to the model to load.

    Yields
    ------
    client: tritonclient.InferenceServerClient
        The client connected to the Triton server.

    """
    cmdline = [
        TRITON_SERVER_PATH,
        "--model-repository",
        modelpath,
        "--backend-config=tensorflow,version=2",
    ]
    env = os.environ.copy()
    env["CUDA_VISIBLE_DEVICES"] = "0"
    with subprocess.Popen(cmdline, env=env) as process:
        try:
            with grpcclient.InferenceServerClient("localhost:8001") as client:
                # wait until server is ready
                for _ in range(60):
                    if process.poll() is not None:
                        retcode = process.returncode
                      raise RuntimeError(f"Tritonserver failed to start (ret={retcode})")

E RuntimeError: Tritonserver failed to start (ret=1)

merlin/systems/triton/utils.py:46: RuntimeError
----------------------------- Captured stderr call -----------------------------
I0728 15:39:31.419910 17379 pinned_memory_manager.cc:240] Pinned memory pool is created at '0x7f6324000000' with size 268435456
I0728 15:39:31.420717 17379 cuda_memory_manager.cc:105] CUDA memory pool is created on device 0 with size 67108864
I0728 15:39:31.423281 17379 model_repository_manager.cc:1191] loading: 0_predictpytorch:1
I0728 15:39:31.530474 17379 python.cc:2388] TRITONBACKEND_ModelInstanceInitialize: 0_predictpytorch (GPU device 0)
E0728 15:39:31.531442 17379 model_repository_manager.cc:1348] failed to load '0_predictpytorch' version 1: Internal: model.py does not exist in the model repository path: /tmp/pytest-of-jenkins/pytest-1/test_pytorch_op_serving_True_F0/0_predictpytorch/1/model.py
E0728 15:39:31.531510 17379 model_repository_manager.cc:1551] Invalid argument: ensemble 'ensemble_model' depends on '0_predictpytorch' which has no loaded version
I0728 15:39:31.531561 17379 server.cc:556]
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+

I0728 15:39:31.531607 17379 server.cc:583]
+---------+-------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Backend | Path | Config |
+---------+-------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------+
| python | /opt/tritonserver/backends/python/libtriton_python.so | {"cmdline":{"auto-complete-config":"false","min-compute-capability":"6.000000","backend-directory":"/opt/tritonserver/backends","default-max-batch-size":"4"}} |
+---------+-------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0728 15:39:31.531652 17379 server.cc:626]
+------------------+---------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Model | Version | Status |
+------------------+---------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 0_predictpytorch | 1 | UNAVAILABLE: Internal: model.py does not exist in the model repository path: /tmp/pytest-of-jenkins/pytest-1/test_pytorch_op_serving_True_F0/0_predictpytorch/1/model.py |
+------------------+---------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0728 15:39:31.591182 17379 metrics.cc:650] Collecting metrics for GPU 0: Tesla P100-DGXS-16GB
I0728 15:39:31.592028 17379 tritonserver.cc:2138]
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option | Value |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| server_id | triton |
| server_version | 2.22.0 |
| server_extensions | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data statistics trace |
| model_repository_path[0] | /tmp/pytest-of-jenkins/pytest-1/test_pytorch_op_serving_True_F0 |
| model_control_mode | MODE_NONE |
| strict_model_config | 1 |
| rate_limit | OFF |
| pinned_memory_pool_byte_size | 268435456 |
| cuda_memory_pool_byte_size{0} | 67108864 |
| response_cache_byte_size | 0 |
| min_supported_compute_capability | 6.0 |
| strict_readiness | 1 |
| exit_timeout | 30 |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0728 15:39:31.592066 17379 server.cc:257] Waiting for in-flight requests to complete.
I0728 15:39:31.592073 17379 server.cc:273] Timeout 30: Found 0 model versions that have in-flight inferences
I0728 15:39:31.592084 17379 server.cc:288] All models are stopped, unloading models
I0728 15:39:31.592090 17379 server.cc:295] Timeout 30: Found 0 live models and 0 in-flight non-inference requests
error: creating server: Internal - failed to load all models
W0728 15:39:32.617041 17379 metrics.cc:468] Unable to get energy consumption for GPU 0. Status:Success, value:0
W0728 15:39:32.617102 17379 metrics.cc:507] Unable to get memory usage for GPU 0. Memory usage status:Success, value:0. Memory total status:Success, value:0
_____________________ test_pytorch_op_serving[False-True] ______________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-1/test_pytorch_op_serving_False_0')
use_path = False, torchscript = True

@pytest.mark.skipif(not TRITON_SERVER_PATH, reason="triton server not found")
@pytest.mark.parametrize("torchscript", [True, False])
@pytest.mark.parametrize("use_path", [True, False])
def test_pytorch_op_serving(tmpdir, use_path, torchscript):
    # from merlin.core.dispatch import make_df

    model_name = "0_predictpytorch"
    model_path = str(tmpdir / "model.pt")

    model_to_use = model_scripted if torchscript else model
    model_or_path = model_path if use_path else model_to_use

    if use_path:
        try:
            # jit-compiled version of a model
            model_to_use.save(model_path)
        except AttributeError:
            # non-jit-compiled version of a model
            torch.save(model_to_use, model_path)


    predictions = ["input"] >> ptorch_op.PredictPyTorch(
        model_or_path, torchscript, model_input_schema, model_output_schema, sparse_max={"input": 3}
    )
    ensemble = Ensemble(predictions, model_input_schema)
    ens_config, node_configs = ensemble.export(tmpdir)

    input_data = {"input": np.array([[2.0, 3.0, 4.0], [4.0, 8.0, 1.0]]).astype(np.float32)}
    # input_data = {"input": np.array([2.0, 3.0, 4.0]).astype(np.float32)}

    inputs = [
        grpcclient.InferInput(
            "input", input_data["input"].shape, triton.np_to_triton_dtype(input_data["input"].dtype)
        )
    ]
    inputs[0].set_data_from_numpy(input_data["input"])

    outputs = [grpcclient.InferRequestedOutput("OUTPUT__0")]

    response = None
  with run_triton_server(tmpdir) as client:

tests/unit/systems/torch/test_torch.py:181:


/usr/lib/python3.8/contextlib.py:113: in enter
return next(self.gen)


modelpath = local('/tmp/pytest-of-jenkins/pytest-1/test_pytorch_op_serving_False_0')

@contextlib.contextmanager
def run_triton_server(modelpath):
    """This function starts up a Triton server instance and returns a client to it.

    Parameters
    ----------
    modelpath : string
        The path to the model to load.

    Yields
    ------
    client: tritonclient.InferenceServerClient
        The client connected to the Triton server.

    """
    cmdline = [
        TRITON_SERVER_PATH,
        "--model-repository",
        modelpath,
        "--backend-config=tensorflow,version=2",
    ]
    env = os.environ.copy()
    env["CUDA_VISIBLE_DEVICES"] = "0"
    with subprocess.Popen(cmdline, env=env) as process:
        try:
            with grpcclient.InferenceServerClient("localhost:8001") as client:
                # wait until server is ready
                for _ in range(60):
                    if process.poll() is not None:
                        retcode = process.returncode
                      raise RuntimeError(f"Tritonserver failed to start (ret={retcode})")

E RuntimeError: Tritonserver failed to start (ret=1)

merlin/systems/triton/utils.py:46: RuntimeError
----------------------------- Captured stderr call -----------------------------
I0728 15:39:34.512449 17393 pinned_memory_manager.cc:240] Pinned memory pool is created at '0x7f6cc6000000' with size 268435456
I0728 15:39:34.513214 17393 cuda_memory_manager.cc:105] CUDA memory pool is created on device 0 with size 67108864
I0728 15:39:34.515607 17393 model_repository_manager.cc:1191] loading: 0_predictpytorch:1
E0728 15:39:34.615980 17393 model_repository_manager.cc:1348] failed to load '0_predictpytorch' version 1: Invalid argument: unable to find 'libtriton_pytorch.so' for model '0_predictpytorch', searched: /tmp/pytest-of-jenkins/pytest-1/test_pytorch_op_serving_False_0/0_predictpytorch/1, /tmp/pytest-of-jenkins/pytest-1/test_pytorch_op_serving_False_0/0_predictpytorch, /opt/tritonserver/backends/pytorch
E0728 15:39:34.616095 17393 model_repository_manager.cc:1551] Invalid argument: ensemble 'ensemble_model' depends on '0_predictpytorch' which has no loaded version
I0728 15:39:34.616149 17393 server.cc:556]
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+

I0728 15:39:34.616172 17393 server.cc:583]
+---------+------+--------+
| Backend | Path | Config |
+---------+------+--------+
+---------+------+--------+

I0728 15:39:34.616229 17393 server.cc:626]
+------------------+---------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Model | Version | Status |
+------------------+---------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 0_predictpytorch | 1 | UNAVAILABLE: Invalid argument: unable to find 'libtriton_pytorch.so' for model '0_predictpytorch', searched: /tmp/pytest-of-jenkins/pytest-1/test_pytorch_op_serving_False_0/0_predictpytorch/1, /tmp/pytest-of-jenkins/pytest-1/test_pytorch_op_serving_False_0/0_predictpytorch, /opt/tritonserver/backends/pytorch |
+------------------+---------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0728 15:39:34.676107 17393 metrics.cc:650] Collecting metrics for GPU 0: Tesla P100-DGXS-16GB
I0728 15:39:34.676945 17393 tritonserver.cc:2138]
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option | Value |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| server_id | triton |
| server_version | 2.22.0 |
| server_extensions | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data statistics trace |
| model_repository_path[0] | /tmp/pytest-of-jenkins/pytest-1/test_pytorch_op_serving_False_0 |
| model_control_mode | MODE_NONE |
| strict_model_config | 1 |
| rate_limit | OFF |
| pinned_memory_pool_byte_size | 268435456 |
| cuda_memory_pool_byte_size{0} | 67108864 |
| response_cache_byte_size | 0 |
| min_supported_compute_capability | 6.0 |
| strict_readiness | 1 |
| exit_timeout | 30 |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0728 15:39:34.676980 17393 server.cc:257] Waiting for in-flight requests to complete.
I0728 15:39:34.676988 17393 server.cc:273] Timeout 30: Found 0 model versions that have in-flight inferences
I0728 15:39:34.676998 17393 server.cc:288] All models are stopped, unloading models
I0728 15:39:34.677005 17393 server.cc:295] Timeout 30: Found 0 live models and 0 in-flight non-inference requests
error: creating server: Internal - failed to load all models
W0728 15:39:35.695594 17393 metrics.cc:468] Unable to get energy consumption for GPU 0. Status:Success, value:0
W0728 15:39:35.695655 17393 metrics.cc:507] Unable to get memory usage for GPU 0. Memory usage status:Success, value:0. Memory total status:Success, value:0
_____________________ test_pytorch_op_serving[False-False] _____________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-1/test_pytorch_op_serving_False_1')
use_path = False, torchscript = False

@pytest.mark.skipif(not TRITON_SERVER_PATH, reason="triton server not found")
@pytest.mark.parametrize("torchscript", [True, False])
@pytest.mark.parametrize("use_path", [True, False])
def test_pytorch_op_serving(tmpdir, use_path, torchscript):
    # from merlin.core.dispatch import make_df

    model_name = "0_predictpytorch"
    model_path = str(tmpdir / "model.pt")

    model_to_use = model_scripted if torchscript else model
    model_or_path = model_path if use_path else model_to_use

    if use_path:
        try:
            # jit-compiled version of a model
            model_to_use.save(model_path)
        except AttributeError:
            # non-jit-compiled version of a model
            torch.save(model_to_use, model_path)


    predictions = ["input"] >> ptorch_op.PredictPyTorch(
        model_or_path, torchscript, model_input_schema, model_output_schema, sparse_max={"input": 3}
    )
    ensemble = Ensemble(predictions, model_input_schema)
    ens_config, node_configs = ensemble.export(tmpdir)

    input_data = {"input": np.array([[2.0, 3.0, 4.0], [4.0, 8.0, 1.0]]).astype(np.float32)}
    # input_data = {"input": np.array([2.0, 3.0, 4.0]).astype(np.float32)}

    inputs = [
        grpcclient.InferInput(
            "input", input_data["input"].shape, triton.np_to_triton_dtype(input_data["input"].dtype)
        )
    ]
    inputs[0].set_data_from_numpy(input_data["input"])

    outputs = [grpcclient.InferRequestedOutput("OUTPUT__0")]

    response = None
    with run_triton_server(tmpdir) as client:
      response = client.infer(model_name, inputs, outputs=outputs)

tests/unit/systems/torch/test_torch.py:182:


/usr/local/lib/python3.8/dist-packages/tritonclient/grpc/init.py:1322: in infer
raise_error_grpc(rpc_error)


rpc_error = <_InactiveRpcError of RPC that terminated with:
status = StatusCode.INTERNAL
details = "Failed to process the reques...t-of-jenkins/pytest-1/test_pytorch_op_serving_False_1/0_predictpytorch/1/model.py(177): execute\n","grpc_status":13}"

def raise_error_grpc(rpc_error):
  raise get_error_grpc(rpc_error) from None

E tritonclient.utils.InferenceServerException: [StatusCode.INTERNAL] Failed to process the request(s) for model instance '0_predictpytorch', message: RuntimeError: size mismatch, got 1, 1x3,2
E
E At:
E /usr/local/lib/python3.8/dist-packages/torch/nn/modules/linear.py(114): forward
E /usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py(1130): _call_impl
E /var/jenkins_home/workspace/merlin_systems/systems/tests/unit/systems/torch/test_torch.py(32): forward
E /usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py(1130): _call_impl
E /tmp/pytest-of-jenkins/pytest-1/test_pytorch_op_serving_False_1/0_predictpytorch/1/model.py(177): execute

/usr/local/lib/python3.8/dist-packages/tritonclient/grpc/init.py:62: InferenceServerException
----------------------------- Captured stdout call -----------------------------
Signal (2) received.
----------------------------- Captured stderr call -----------------------------
I0728 15:39:37.611455 17407 pinned_memory_manager.cc:240] Pinned memory pool is created at '0x7f6176000000' with size 268435456
I0728 15:39:37.612215 17407 cuda_memory_manager.cc:105] CUDA memory pool is created on device 0 with size 67108864
I0728 15:39:37.614608 17407 model_repository_manager.cc:1191] loading: 0_predictpytorch:1
I0728 15:39:37.721362 17407 python.cc:2388] TRITONBACKEND_ModelInstanceInitialize: 0_predictpytorch (GPU device 0)
I0728 15:39:40.706813 17407 model_repository_manager.cc:1345] successfully loaded '0_predictpytorch' version 1
I0728 15:39:40.707145 17407 model_repository_manager.cc:1191] loading: ensemble_model:1
I0728 15:39:40.807569 17407 model_repository_manager.cc:1345] successfully loaded 'ensemble_model' version 1
I0728 15:39:40.807708 17407 server.cc:556]
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+

I0728 15:39:40.807814 17407 server.cc:583]
+---------+-------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Backend | Path | Config |
+---------+-------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------+
| python | /opt/tritonserver/backends/python/libtriton_python.so | {"cmdline":{"auto-complete-config":"false","min-compute-capability":"6.000000","backend-directory":"/opt/tritonserver/backends","default-max-batch-size":"4"}} |
+---------+-------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0728 15:39:40.807881 17407 server.cc:626]
+------------------+---------+--------+
| Model | Version | Status |
+------------------+---------+--------+
| 0_predictpytorch | 1 | READY |
| ensemble_model | 1 | READY |
+------------------+---------+--------+

I0728 15:39:40.869806 17407 metrics.cc:650] Collecting metrics for GPU 0: Tesla P100-DGXS-16GB
I0728 15:39:40.870646 17407 tritonserver.cc:2138]
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option | Value |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| server_id | triton |
| server_version | 2.22.0 |
| server_extensions | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data statistics trace |
| model_repository_path[0] | /tmp/pytest-of-jenkins/pytest-1/test_pytorch_op_serving_False_1 |
| model_control_mode | MODE_NONE |
| strict_model_config | 1 |
| rate_limit | OFF |
| pinned_memory_pool_byte_size | 268435456 |
| cuda_memory_pool_byte_size{0} | 67108864 |
| response_cache_byte_size | 0 |
| min_supported_compute_capability | 6.0 |
| strict_readiness | 1 |
| exit_timeout | 30 |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0728 15:39:40.871562 17407 grpc_server.cc:4589] Started GRPCInferenceService at 0.0.0.0:8001
I0728 15:39:40.872108 17407 http_server.cc:3303] Started HTTPService at 0.0.0.0:8000
I0728 15:39:40.913293 17407 http_server.cc:178] Started Metrics Service at 0.0.0.0:8002
W0728 15:39:41.893443 17407 metrics.cc:468] Unable to get energy consumption for GPU 0. Status:Success, value:0
W0728 15:39:41.893508 17407 metrics.cc:507] Unable to get memory usage for GPU 0. Memory usage status:Success, value:0. Memory total status:Success, value:0
W0728 15:39:42.893672 17407 metrics.cc:468] Unable to get energy consumption for GPU 0. Status:Success, value:0
W0728 15:39:42.893729 17407 metrics.cc:507] Unable to get memory usage for GPU 0. Memory usage status:Success, value:0. Memory total status:Success, value:0
W0728 15:39:43.912791 17407 metrics.cc:468] Unable to get energy consumption for GPU 0. Status:Success, value:0
W0728 15:39:43.912846 17407 metrics.cc:507] Unable to get memory usage for GPU 0. Memory usage status:Success, value:0. Memory total status:Success, value:0
0728 15:39:45.099953 17447 pb_stub.cc:749] Failed to process the request(s) for model '0_predictpytorch', message: RuntimeError: size mismatch, got 1, 1x3,2

At:
/usr/local/lib/python3.8/dist-packages/torch/nn/modules/linear.py(114): forward
/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py(1130): _call_impl
/var/jenkins_home/workspace/merlin_systems/systems/tests/unit/systems/torch/test_torch.py(32): forward
/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py(1130): _call_impl
/tmp/pytest-of-jenkins/pytest-1/test_pytorch_op_serving_False_1/0_predictpytorch/1/model.py(177): execute

I0728 15:39:45.101305 17407 server.cc:257] Waiting for in-flight requests to complete.
I0728 15:39:45.101354 17407 server.cc:273] Timeout 30: Found 0 model versions that have in-flight inferences
I0728 15:39:45.101373 17407 model_repository_manager.cc:1223] unloading: ensemble_model:1
I0728 15:39:45.101476 17407 model_repository_manager.cc:1223] unloading: 0_predictpytorch:1
I0728 15:39:45.101555 17407 server.cc:288] All models are stopped, unloading models
I0728 15:39:45.101575 17407 server.cc:295] Timeout 30: Found 2 live models and 0 in-flight non-inference requests
I0728 15:39:45.101616 17407 model_repository_manager.cc:1328] successfully unloaded 'ensemble_model' version 1
I0728 15:39:46.101671 17407 server.cc:295] Timeout 29: Found 1 live models and 0 in-flight non-inference requests
free(): invalid pointer
I0728 15:39:46.771450 17407 model_repository_manager.cc:1328] successfully unloaded '0_predictpytorch' version 1
I0728 15:39:47.101797 17407 server.cc:295] Timeout 28: Found 0 live models and 0 in-flight non-inference requests
=============================== warnings summary ===============================
../../../../../usr/local/lib/python3.8/dist-packages/nvtabular/framework_utils/init.py:18
/usr/local/lib/python3.8/dist-packages/nvtabular/framework_utils/init.py:18: DeprecationWarning: The nvtabular.framework_utils module is being replaced by the Merlin Models library. Support for importing from nvtabular.framework_utils is deprecated, and will be removed in a future version. Please consider using the models and layers from Merlin Models instead.
warnings.warn(

tests/unit/examples/test_serving_ranking_models_with_merlin_systems.py: 1 warning
tests/unit/systems/test_ensemble.py: 2 warnings
tests/unit/systems/test_export.py: 1 warning
tests/unit/systems/test_inference_ops.py: 2 warnings
tests/unit/systems/test_op_runner.py: 4 warnings
/usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:384: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.
warnings.warn(

tests/unit/systems/test_export.py::test_export_run_ensemble_triton[tensorflow-parquet]
/var/jenkins_home/workspace/merlin_systems/systems/merlin/systems/triton/export.py:304: UserWarning: Column x is being generated by NVTabular workflow but is unused in test_name_tf model
warnings.warn(

tests/unit/systems/test_export.py::test_export_run_ensemble_triton[tensorflow-parquet]
/var/jenkins_home/workspace/merlin_systems/systems/merlin/systems/triton/export.py:304: UserWarning: Column y is being generated by NVTabular workflow but is unused in test_name_tf model
warnings.warn(

tests/unit/systems/test_export.py::test_export_run_ensemble_triton[tensorflow-parquet]
/var/jenkins_home/workspace/merlin_systems/systems/merlin/systems/triton/export.py:304: UserWarning: Column id is being generated by NVTabular workflow but is unused in test_name_tf model
warnings.warn(

tests/unit/systems/fil/test_fil.py::test_binary_classifier_default[sklearn_forest_classifier-get_model_params4]
tests/unit/systems/fil/test_fil.py::test_binary_classifier_with_proba[sklearn_forest_classifier-get_model_params4]
tests/unit/systems/fil/test_fil.py::test_multi_classifier[sklearn_forest_classifier-get_model_params4]
tests/unit/systems/fil/test_fil.py::test_regressor[sklearn_forest_regressor-get_model_params4]
tests/unit/systems/fil/test_fil.py::test_model_file[sklearn_forest_regressor-checkpoint.tl]
/usr/local/lib/python3.8/dist-packages/sklearn/utils/deprecation.py:103: FutureWarning: Attribute n_features_ was deprecated in version 1.0 and will be removed in 1.2. Use n_features_in_ instead.
warnings.warn(msg, category=FutureWarning)

tests/unit/systems/torch/test_torch.py::test_pytorch_op_serving[True-True]
/usr/local/lib/python3.8/dist-packages/torch/serialization.py:707: UserWarning: 'torch.load' received a zip file that looks like a TorchScript archive dispatching to 'torch.jit.load' (call 'torch.jit.load' directly to silence this warning)
warnings.warn("'torch.load' received a zip file that looks like a TorchScript archive"

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=========================== short test summary info ============================
FAILED tests/unit/systems/torch/test_torch.py::test_torch_backend - tritoncli...
FAILED tests/unit/systems/torch/test_torch.py::test_pytorch_op_serving[True-True]
FAILED tests/unit/systems/torch/test_torch.py::test_pytorch_op_serving[True-False]
FAILED tests/unit/systems/torch/test_torch.py::test_pytorch_op_serving[False-True]
FAILED tests/unit/systems/torch/test_torch.py::test_pytorch_op_serving[False-False]
============ 5 failed, 52 passed, 20 warnings in 265.83s (0:04:25) =============
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/systems/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[merlin_systems] $ /bin/bash /tmp/jenkins8161918272762406304.sh

@nvidia-merlin-bot
Copy link

Click to view CI Results
GitHub pull request #151 of commit 96b6d37b407d7ef50c50c7aef5cbeb83f59a7782, no merge conflicts.
Running as SYSTEM
Setting status of 96b6d37b407d7ef50c50c7aef5cbeb83f59a7782 to PENDING with url https://10.20.13.93:8080/job/merlin_systems/160/console and message: 'Pending'
Using context: Jenkins
Building on master in workspace /var/jenkins_home/workspace/merlin_systems
using credential fce1c729-5d7c-48e8-90cb-b0c314b1076e
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/systems # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/systems
 > git --version # timeout=10
using GIT_ASKPASS to set credentials login for merlin-systems user + githubtoken
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/systems +refs/pull/151/*:refs/remotes/origin/pr/151/* # timeout=10
 > git rev-parse 96b6d37b407d7ef50c50c7aef5cbeb83f59a7782^{commit} # timeout=10
Checking out Revision 96b6d37b407d7ef50c50c7aef5cbeb83f59a7782 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 96b6d37b407d7ef50c50c7aef5cbeb83f59a7782 # timeout=10
Commit message: "fix isort"
 > git rev-list --no-walk 5b345d6ce665f0e014446d175bea0a6935b8bf0a # timeout=10
[merlin_systems] $ /bin/bash /tmp/jenkins2674006302020910338.sh
PYTHONPATH=:/usr/local/lib/python3.8/dist-packages/:/usr/local/hugectr/lib:/var/jenkins_home/workspace/merlin_systems/systems
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/merlin_systems/systems, configfile: pyproject.toml
plugins: anyio-3.6.1, xdist-2.5.0, forked-1.4.0, cov-3.0.0
collected 57 items

tests/unit/test_version.py . [ 1%]
tests/unit/examples/test_serving_ranking_models_with_merlin_systems.py . [ 3%]
[ 3%]
tests/unit/systems/test_ensemble.py .... [ 10%]
tests/unit/systems/test_ensemble_ops.py .. [ 14%]
tests/unit/systems/test_export.py . [ 15%]
tests/unit/systems/test_graph.py . [ 17%]
tests/unit/systems/test_inference_ops.py ... [ 22%]
tests/unit/systems/test_model_registry.py . [ 24%]
tests/unit/systems/test_op_runner.py .... [ 31%]
tests/unit/systems/fil/test_fil.py .......................... [ 77%]
tests/unit/systems/fil/test_forest.py ... [ 82%]
tests/unit/systems/tf/test_tf_op.py ... [ 87%]
tests/unit/systems/torch/test_torch.py ..FFFFF [100%]

=================================== FAILURES ===================================
______________________________ test_torch_backend ______________________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-4/test_torch_backend0')

def test_torch_backend(tmpdir):
    model_repository = Path(tmpdir)

    model_dir = model_repository / model_name
    model_version_dir = model_dir / "1"
    model_version_dir.mkdir(parents=True, exist_ok=True)

    # Write config out
    config_path = model_dir / "config.pbtxt"
    with open(str(config_path), "w") as f:
        f.write(model_config)

    # Write model
    model_scripted = torch.jit.script(model)
    model_scripted.save(str(model_version_dir / "model.pt"))

    input_data = {"input": np.array([[2.0, 3.0, 4.0], [4.0, 8.0, 1.0]]).astype(np.float32)}

    inputs = [
        grpcclient.InferInput(
            "input", input_data["input"].shape, triton.np_to_triton_dtype(input_data["input"].dtype)
        )
    ]
  inputs[0].set_data_from_numpy(input_data)

tests/unit/systems/torch/test_torch.py:130:


/usr/local/lib/python3.8/dist-packages/tritonclient/grpc/init.py:1701: in set_data_from_numpy
raise_error("input_tensor must be a numpy array")


msg = 'input_tensor must be a numpy array'

def raise_error(msg):
    """
    Raise error with the provided message
    """
  raise InferenceServerException(msg=msg) from None

E tritonclient.utils.InferenceServerException: input_tensor must be a numpy array

/usr/local/lib/python3.8/dist-packages/tritonclient/utils/init.py:35: InferenceServerException
______________________ test_pytorch_op_serving[True-True] ______________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-4/test_pytorch_op_serving_True_T0')
use_path = True, torchscript = True

@pytest.mark.skipif(not TRITON_SERVER_PATH, reason="triton server not found")
@pytest.mark.parametrize("torchscript", [True, False])
@pytest.mark.parametrize("use_path", [True, False])
def test_pytorch_op_serving(tmpdir, use_path, torchscript):
    # from merlin.core.dispatch import make_df

    model_name = "0_predictpytorch"
    model_path = str(tmpdir / "model.pt")

    model_to_use = model_scripted if torchscript else model
    model_or_path = model_path if use_path else model_to_use

    if use_path:
        try:
            # jit-compiled version of a model
            model_to_use.save(model_path)
        except AttributeError:
            # non-jit-compiled version of a model
            torch.save(model_to_use, model_path)


    predictions = ["input"] >> ptorch_op.PredictPyTorch(
        model_or_path, torchscript, model_input_schema, model_output_schema, sparse_max={"input": 3}
    )
    ensemble = Ensemble(predictions, model_input_schema)
    ens_config, node_configs = ensemble.export(tmpdir)

    input_data = {"input": np.array([[2.0, 3.0, 4.0], [4.0, 8.0, 1.0]]).astype(np.float32)}
    # input_data = {"input": np.array([2.0, 3.0, 4.0]).astype(np.float32)}

    inputs = [
        grpcclient.InferInput(
            "input", input_data["input"].shape, triton.np_to_triton_dtype(input_data["input"].dtype)
        )
    ]
    inputs[0].set_data_from_numpy(input_data["input"])

    outputs = [grpcclient.InferRequestedOutput("OUTPUT__0")]

    response = None
  with run_triton_server(tmpdir) as client:

tests/unit/systems/torch/test_torch.py:181:


/usr/lib/python3.8/contextlib.py:113: in enter
return next(self.gen)


modelpath = local('/tmp/pytest-of-jenkins/pytest-4/test_pytorch_op_serving_True_T0')

@contextlib.contextmanager
def run_triton_server(modelpath):
    """This function starts up a Triton server instance and returns a client to it.

    Parameters
    ----------
    modelpath : string
        The path to the model to load.

    Yields
    ------
    client: tritonclient.InferenceServerClient
        The client connected to the Triton server.

    """
    cmdline = [
        TRITON_SERVER_PATH,
        "--model-repository",
        modelpath,
        "--backend-config=tensorflow,version=2",
    ]
    env = os.environ.copy()
    env["CUDA_VISIBLE_DEVICES"] = "0"
    with subprocess.Popen(cmdline, env=env) as process:
        try:
            with grpcclient.InferenceServerClient("localhost:8001") as client:
                # wait until server is ready
                for _ in range(60):
                    if process.poll() is not None:
                        retcode = process.returncode
                      raise RuntimeError(f"Tritonserver failed to start (ret={retcode})")

E RuntimeError: Tritonserver failed to start (ret=1)

merlin/systems/triton/utils.py:46: RuntimeError
----------------------------- Captured stderr call -----------------------------
I0728 21:12:38.240066 8641 pinned_memory_manager.cc:240] Pinned memory pool is created at '0x7f50ee000000' with size 268435456
I0728 21:12:38.240847 8641 cuda_memory_manager.cc:105] CUDA memory pool is created on device 0 with size 67108864
I0728 21:12:38.243302 8641 model_repository_manager.cc:1191] loading: 0_predictpytorch:1
E0728 21:12:38.343702 8641 model_repository_manager.cc:1348] failed to load '0_predictpytorch' version 1: Invalid argument: unable to find 'libtriton_pytorch.so' for model '0_predictpytorch', searched: /tmp/pytest-of-jenkins/pytest-4/test_pytorch_op_serving_True_T0/0_predictpytorch/1, /tmp/pytest-of-jenkins/pytest-4/test_pytorch_op_serving_True_T0/0_predictpytorch, /opt/tritonserver/backends/pytorch
E0728 21:12:38.343792 8641 model_repository_manager.cc:1551] Invalid argument: ensemble 'ensemble_model' depends on '0_predictpytorch' which has no loaded version
I0728 21:12:38.343839 8641 server.cc:556]
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+

I0728 21:12:38.343858 8641 server.cc:583]
+---------+------+--------+
| Backend | Path | Config |
+---------+------+--------+
+---------+------+--------+

I0728 21:12:38.343927 8641 server.cc:626]
+------------------+---------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Model | Version | Status |
+------------------+---------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 0_predictpytorch | 1 | UNAVAILABLE: Invalid argument: unable to find 'libtriton_pytorch.so' for model '0_predictpytorch', searched: /tmp/pytest-of-jenkins/pytest-4/test_pytorch_op_serving_True_T0/0_predictpytorch/1, /tmp/pytest-of-jenkins/pytest-4/test_pytorch_op_serving_True_T0/0_predictpytorch, /opt/tritonserver/backends/pytorch |
+------------------+---------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0728 21:12:38.403038 8641 metrics.cc:650] Collecting metrics for GPU 0: Tesla P100-DGXS-16GB
I0728 21:12:38.403915 8641 tritonserver.cc:2138]
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option | Value |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| server_id | triton |
| server_version | 2.22.0 |
| server_extensions | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data statistics trace |
| model_repository_path[0] | /tmp/pytest-of-jenkins/pytest-4/test_pytorch_op_serving_True_T0 |
| model_control_mode | MODE_NONE |
| strict_model_config | 1 |
| rate_limit | OFF |
| pinned_memory_pool_byte_size | 268435456 |
| cuda_memory_pool_byte_size{0} | 67108864 |
| response_cache_byte_size | 0 |
| min_supported_compute_capability | 6.0 |
| strict_readiness | 1 |
| exit_timeout | 30 |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0728 21:12:38.403951 8641 server.cc:257] Waiting for in-flight requests to complete.
I0728 21:12:38.403958 8641 server.cc:273] Timeout 30: Found 0 model versions that have in-flight inferences
I0728 21:12:38.403969 8641 server.cc:288] All models are stopped, unloading models
I0728 21:12:38.403975 8641 server.cc:295] Timeout 30: Found 0 live models and 0 in-flight non-inference requests
error: creating server: Internal - failed to load all models
W0728 21:12:39.422370 8641 metrics.cc:468] Unable to get energy consumption for GPU 0. Status:Success, value:0
W0728 21:12:39.422428 8641 metrics.cc:507] Unable to get memory usage for GPU 0. Memory usage status:Success, value:0. Memory total status:Success, value:0
_____________________ test_pytorch_op_serving[True-False] ______________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-4/test_pytorch_op_serving_True_F0')
use_path = True, torchscript = False

@pytest.mark.skipif(not TRITON_SERVER_PATH, reason="triton server not found")
@pytest.mark.parametrize("torchscript", [True, False])
@pytest.mark.parametrize("use_path", [True, False])
def test_pytorch_op_serving(tmpdir, use_path, torchscript):
    # from merlin.core.dispatch import make_df

    model_name = "0_predictpytorch"
    model_path = str(tmpdir / "model.pt")

    model_to_use = model_scripted if torchscript else model
    model_or_path = model_path if use_path else model_to_use

    if use_path:
        try:
            # jit-compiled version of a model
            model_to_use.save(model_path)
        except AttributeError:
            # non-jit-compiled version of a model
            torch.save(model_to_use, model_path)


    predictions = ["input"] >> ptorch_op.PredictPyTorch(
        model_or_path, torchscript, model_input_schema, model_output_schema, sparse_max={"input": 3}
    )
    ensemble = Ensemble(predictions, model_input_schema)
    ens_config, node_configs = ensemble.export(tmpdir)

    input_data = {"input": np.array([[2.0, 3.0, 4.0], [4.0, 8.0, 1.0]]).astype(np.float32)}
    # input_data = {"input": np.array([2.0, 3.0, 4.0]).astype(np.float32)}

    inputs = [
        grpcclient.InferInput(
            "input", input_data["input"].shape, triton.np_to_triton_dtype(input_data["input"].dtype)
        )
    ]
    inputs[0].set_data_from_numpy(input_data["input"])

    outputs = [grpcclient.InferRequestedOutput("OUTPUT__0")]

    response = None
  with run_triton_server(tmpdir) as client:

tests/unit/systems/torch/test_torch.py:181:


/usr/lib/python3.8/contextlib.py:113: in enter
return next(self.gen)


modelpath = local('/tmp/pytest-of-jenkins/pytest-4/test_pytorch_op_serving_True_F0')

@contextlib.contextmanager
def run_triton_server(modelpath):
    """This function starts up a Triton server instance and returns a client to it.

    Parameters
    ----------
    modelpath : string
        The path to the model to load.

    Yields
    ------
    client: tritonclient.InferenceServerClient
        The client connected to the Triton server.

    """
    cmdline = [
        TRITON_SERVER_PATH,
        "--model-repository",
        modelpath,
        "--backend-config=tensorflow,version=2",
    ]
    env = os.environ.copy()
    env["CUDA_VISIBLE_DEVICES"] = "0"
    with subprocess.Popen(cmdline, env=env) as process:
        try:
            with grpcclient.InferenceServerClient("localhost:8001") as client:
                # wait until server is ready
                for _ in range(60):
                    if process.poll() is not None:
                        retcode = process.returncode
                      raise RuntimeError(f"Tritonserver failed to start (ret={retcode})")

E RuntimeError: Tritonserver failed to start (ret=1)

merlin/systems/triton/utils.py:46: RuntimeError
----------------------------- Captured stderr call -----------------------------
I0728 21:12:41.336342 8655 pinned_memory_manager.cc:240] Pinned memory pool is created at '0x7f5aa6000000' with size 268435456
I0728 21:12:41.337109 8655 cuda_memory_manager.cc:105] CUDA memory pool is created on device 0 with size 67108864
I0728 21:12:41.339512 8655 model_repository_manager.cc:1191] loading: 0_predictpytorch:1
I0728 21:12:41.446718 8655 python.cc:2388] TRITONBACKEND_ModelInstanceInitialize: 0_predictpytorch (GPU device 0)
E0728 21:12:41.447685 8655 model_repository_manager.cc:1348] failed to load '0_predictpytorch' version 1: Internal: model.py does not exist in the model repository path: /tmp/pytest-of-jenkins/pytest-4/test_pytorch_op_serving_True_F0/0_predictpytorch/1/model.py
E0728 21:12:41.447753 8655 model_repository_manager.cc:1551] Invalid argument: ensemble 'ensemble_model' depends on '0_predictpytorch' which has no loaded version
I0728 21:12:41.447805 8655 server.cc:556]
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+

I0728 21:12:41.447853 8655 server.cc:583]
+---------+-------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Backend | Path | Config |
+---------+-------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------+
| python | /opt/tritonserver/backends/python/libtriton_python.so | {"cmdline":{"auto-complete-config":"false","min-compute-capability":"6.000000","backend-directory":"/opt/tritonserver/backends","default-max-batch-size":"4"}} |
+---------+-------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0728 21:12:41.447898 8655 server.cc:626]
+------------------+---------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Model | Version | Status |
+------------------+---------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 0_predictpytorch | 1 | UNAVAILABLE: Internal: model.py does not exist in the model repository path: /tmp/pytest-of-jenkins/pytest-4/test_pytorch_op_serving_True_F0/0_predictpytorch/1/model.py |
+------------------+---------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0728 21:12:41.509035 8655 metrics.cc:650] Collecting metrics for GPU 0: Tesla P100-DGXS-16GB
I0728 21:12:41.509886 8655 tritonserver.cc:2138]
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option | Value |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| server_id | triton |
| server_version | 2.22.0 |
| server_extensions | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data statistics trace |
| model_repository_path[0] | /tmp/pytest-of-jenkins/pytest-4/test_pytorch_op_serving_True_F0 |
| model_control_mode | MODE_NONE |
| strict_model_config | 1 |
| rate_limit | OFF |
| pinned_memory_pool_byte_size | 268435456 |
| cuda_memory_pool_byte_size{0} | 67108864 |
| response_cache_byte_size | 0 |
| min_supported_compute_capability | 6.0 |
| strict_readiness | 1 |
| exit_timeout | 30 |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0728 21:12:41.509920 8655 server.cc:257] Waiting for in-flight requests to complete.
I0728 21:12:41.509927 8655 server.cc:273] Timeout 30: Found 0 model versions that have in-flight inferences
I0728 21:12:41.509938 8655 server.cc:288] All models are stopped, unloading models
I0728 21:12:41.509944 8655 server.cc:295] Timeout 30: Found 0 live models and 0 in-flight non-inference requests
error: creating server: Internal - failed to load all models
W0728 21:12:42.528874 8655 metrics.cc:468] Unable to get energy consumption for GPU 0. Status:Success, value:0
W0728 21:12:42.528939 8655 metrics.cc:507] Unable to get memory usage for GPU 0. Memory usage status:Success, value:0. Memory total status:Success, value:0
_____________________ test_pytorch_op_serving[False-True] ______________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-4/test_pytorch_op_serving_False_0')
use_path = False, torchscript = True

@pytest.mark.skipif(not TRITON_SERVER_PATH, reason="triton server not found")
@pytest.mark.parametrize("torchscript", [True, False])
@pytest.mark.parametrize("use_path", [True, False])
def test_pytorch_op_serving(tmpdir, use_path, torchscript):
    # from merlin.core.dispatch import make_df

    model_name = "0_predictpytorch"
    model_path = str(tmpdir / "model.pt")

    model_to_use = model_scripted if torchscript else model
    model_or_path = model_path if use_path else model_to_use

    if use_path:
        try:
            # jit-compiled version of a model
            model_to_use.save(model_path)
        except AttributeError:
            # non-jit-compiled version of a model
            torch.save(model_to_use, model_path)


    predictions = ["input"] >> ptorch_op.PredictPyTorch(
        model_or_path, torchscript, model_input_schema, model_output_schema, sparse_max={"input": 3}
    )
    ensemble = Ensemble(predictions, model_input_schema)
    ens_config, node_configs = ensemble.export(tmpdir)

    input_data = {"input": np.array([[2.0, 3.0, 4.0], [4.0, 8.0, 1.0]]).astype(np.float32)}
    # input_data = {"input": np.array([2.0, 3.0, 4.0]).astype(np.float32)}

    inputs = [
        grpcclient.InferInput(
            "input", input_data["input"].shape, triton.np_to_triton_dtype(input_data["input"].dtype)
        )
    ]
    inputs[0].set_data_from_numpy(input_data["input"])

    outputs = [grpcclient.InferRequestedOutput("OUTPUT__0")]

    response = None
  with run_triton_server(tmpdir) as client:

tests/unit/systems/torch/test_torch.py:181:


/usr/lib/python3.8/contextlib.py:113: in enter
return next(self.gen)


modelpath = local('/tmp/pytest-of-jenkins/pytest-4/test_pytorch_op_serving_False_0')

@contextlib.contextmanager
def run_triton_server(modelpath):
    """This function starts up a Triton server instance and returns a client to it.

    Parameters
    ----------
    modelpath : string
        The path to the model to load.

    Yields
    ------
    client: tritonclient.InferenceServerClient
        The client connected to the Triton server.

    """
    cmdline = [
        TRITON_SERVER_PATH,
        "--model-repository",
        modelpath,
        "--backend-config=tensorflow,version=2",
    ]
    env = os.environ.copy()
    env["CUDA_VISIBLE_DEVICES"] = "0"
    with subprocess.Popen(cmdline, env=env) as process:
        try:
            with grpcclient.InferenceServerClient("localhost:8001") as client:
                # wait until server is ready
                for _ in range(60):
                    if process.poll() is not None:
                        retcode = process.returncode
                      raise RuntimeError(f"Tritonserver failed to start (ret={retcode})")

E RuntimeError: Tritonserver failed to start (ret=1)

merlin/systems/triton/utils.py:46: RuntimeError
----------------------------- Captured stderr call -----------------------------
I0728 21:12:44.445809 8669 pinned_memory_manager.cc:240] Pinned memory pool is created at '0x7fec84000000' with size 268435456
I0728 21:12:44.446568 8669 cuda_memory_manager.cc:105] CUDA memory pool is created on device 0 with size 67108864
I0728 21:12:44.448962 8669 model_repository_manager.cc:1191] loading: 0_predictpytorch:1
E0728 21:12:44.549417 8669 model_repository_manager.cc:1348] failed to load '0_predictpytorch' version 1: Invalid argument: unable to find 'libtriton_pytorch.so' for model '0_predictpytorch', searched: /tmp/pytest-of-jenkins/pytest-4/test_pytorch_op_serving_False_0/0_predictpytorch/1, /tmp/pytest-of-jenkins/pytest-4/test_pytorch_op_serving_False_0/0_predictpytorch, /opt/tritonserver/backends/pytorch
E0728 21:12:44.549510 8669 model_repository_manager.cc:1551] Invalid argument: ensemble 'ensemble_model' depends on '0_predictpytorch' which has no loaded version
I0728 21:12:44.549578 8669 server.cc:556]
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+

I0728 21:12:44.549605 8669 server.cc:583]
+---------+------+--------+
| Backend | Path | Config |
+---------+------+--------+
+---------+------+--------+

I0728 21:12:44.549678 8669 server.cc:626]
+------------------+---------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Model | Version | Status |
+------------------+---------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 0_predictpytorch | 1 | UNAVAILABLE: Invalid argument: unable to find 'libtriton_pytorch.so' for model '0_predictpytorch', searched: /tmp/pytest-of-jenkins/pytest-4/test_pytorch_op_serving_False_0/0_predictpytorch/1, /tmp/pytest-of-jenkins/pytest-4/test_pytorch_op_serving_False_0/0_predictpytorch, /opt/tritonserver/backends/pytorch |
+------------------+---------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0728 21:12:44.610318 8669 metrics.cc:650] Collecting metrics for GPU 0: Tesla P100-DGXS-16GB
I0728 21:12:44.611173 8669 tritonserver.cc:2138]
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option | Value |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| server_id | triton |
| server_version | 2.22.0 |
| server_extensions | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data statistics trace |
| model_repository_path[0] | /tmp/pytest-of-jenkins/pytest-4/test_pytorch_op_serving_False_0 |
| model_control_mode | MODE_NONE |
| strict_model_config | 1 |
| rate_limit | OFF |
| pinned_memory_pool_byte_size | 268435456 |
| cuda_memory_pool_byte_size{0} | 67108864 |
| response_cache_byte_size | 0 |
| min_supported_compute_capability | 6.0 |
| strict_readiness | 1 |
| exit_timeout | 30 |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0728 21:12:44.611208 8669 server.cc:257] Waiting for in-flight requests to complete.
I0728 21:12:44.611216 8669 server.cc:273] Timeout 30: Found 0 model versions that have in-flight inferences
I0728 21:12:44.611226 8669 server.cc:288] All models are stopped, unloading models
I0728 21:12:44.611231 8669 server.cc:295] Timeout 30: Found 0 live models and 0 in-flight non-inference requests
error: creating server: Internal - failed to load all models
W0728 21:12:45.636627 8669 metrics.cc:468] Unable to get energy consumption for GPU 0. Status:Success, value:0
W0728 21:12:45.636689 8669 metrics.cc:507] Unable to get memory usage for GPU 0. Memory usage status:Success, value:0. Memory total status:Success, value:0
_____________________ test_pytorch_op_serving[False-False] _____________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-4/test_pytorch_op_serving_False_1')
use_path = False, torchscript = False

@pytest.mark.skipif(not TRITON_SERVER_PATH, reason="triton server not found")
@pytest.mark.parametrize("torchscript", [True, False])
@pytest.mark.parametrize("use_path", [True, False])
def test_pytorch_op_serving(tmpdir, use_path, torchscript):
    # from merlin.core.dispatch import make_df

    model_name = "0_predictpytorch"
    model_path = str(tmpdir / "model.pt")

    model_to_use = model_scripted if torchscript else model
    model_or_path = model_path if use_path else model_to_use

    if use_path:
        try:
            # jit-compiled version of a model
            model_to_use.save(model_path)
        except AttributeError:
            # non-jit-compiled version of a model
            torch.save(model_to_use, model_path)


    predictions = ["input"] >> ptorch_op.PredictPyTorch(
        model_or_path, torchscript, model_input_schema, model_output_schema, sparse_max={"input": 3}
    )
    ensemble = Ensemble(predictions, model_input_schema)
    ens_config, node_configs = ensemble.export(tmpdir)

    input_data = {"input": np.array([[2.0, 3.0, 4.0], [4.0, 8.0, 1.0]]).astype(np.float32)}
    # input_data = {"input": np.array([2.0, 3.0, 4.0]).astype(np.float32)}

    inputs = [
        grpcclient.InferInput(
            "input", input_data["input"].shape, triton.np_to_triton_dtype(input_data["input"].dtype)
        )
    ]
    inputs[0].set_data_from_numpy(input_data["input"])

    outputs = [grpcclient.InferRequestedOutput("OUTPUT__0")]

    response = None
    with run_triton_server(tmpdir) as client:
      response = client.infer(model_name, inputs, outputs=outputs)

tests/unit/systems/torch/test_torch.py:182:


/usr/local/lib/python3.8/dist-packages/tritonclient/grpc/init.py:1322: in infer
raise_error_grpc(rpc_error)


rpc_error = <_InactiveRpcError of RPC that terminated with:
status = StatusCode.INTERNAL
details = "Failed to process the reques...t-of-jenkins/pytest-4/test_pytorch_op_serving_False_1/0_predictpytorch/1/model.py(177): execute\n","grpc_status":13}"

def raise_error_grpc(rpc_error):
  raise get_error_grpc(rpc_error) from None

E tritonclient.utils.InferenceServerException: [StatusCode.INTERNAL] Failed to process the request(s) for model instance '0_predictpytorch', message: RuntimeError: size mismatch, got 1, 1x3,2
E
E At:
E /usr/local/lib/python3.8/dist-packages/torch/nn/modules/linear.py(114): forward
E /usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py(1130): _call_impl
E /var/jenkins_home/workspace/merlin_systems/systems/tests/unit/systems/torch/test_torch.py(32): forward
E /usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py(1130): _call_impl
E /tmp/pytest-of-jenkins/pytest-4/test_pytorch_op_serving_False_1/0_predictpytorch/1/model.py(177): execute

/usr/local/lib/python3.8/dist-packages/tritonclient/grpc/init.py:62: InferenceServerException
----------------------------- Captured stdout call -----------------------------
Signal (2) received.
----------------------------- Captured stderr call -----------------------------
I0728 21:12:47.539949 8683 pinned_memory_manager.cc:240] Pinned memory pool is created at '0x7ff834000000' with size 268435456
I0728 21:12:47.540808 8683 cuda_memory_manager.cc:105] CUDA memory pool is created on device 0 with size 67108864
I0728 21:12:47.543383 8683 model_repository_manager.cc:1191] loading: 0_predictpytorch:1
I0728 21:12:47.650401 8683 python.cc:2388] TRITONBACKEND_ModelInstanceInitialize: 0_predictpytorch (GPU device 0)
I0728 21:12:50.604628 8683 model_repository_manager.cc:1345] successfully loaded '0_predictpytorch' version 1
I0728 21:12:50.604953 8683 model_repository_manager.cc:1191] loading: ensemble_model:1
I0728 21:12:50.705357 8683 model_repository_manager.cc:1345] successfully loaded 'ensemble_model' version 1
I0728 21:12:50.705491 8683 server.cc:556]
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+

I0728 21:12:50.705589 8683 server.cc:583]
+---------+-------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Backend | Path | Config |
+---------+-------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------+
| python | /opt/tritonserver/backends/python/libtriton_python.so | {"cmdline":{"auto-complete-config":"false","min-compute-capability":"6.000000","backend-directory":"/opt/tritonserver/backends","default-max-batch-size":"4"}} |
+---------+-------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0728 21:12:50.705656 8683 server.cc:626]
+------------------+---------+--------+
| Model | Version | Status |
+------------------+---------+--------+
| 0_predictpytorch | 1 | READY |
| ensemble_model | 1 | READY |
+------------------+---------+--------+

I0728 21:12:50.767807 8683 metrics.cc:650] Collecting metrics for GPU 0: Tesla P100-DGXS-16GB
I0728 21:12:50.768695 8683 tritonserver.cc:2138]
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option | Value |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| server_id | triton |
| server_version | 2.22.0 |
| server_extensions | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data statistics trace |
| model_repository_path[0] | /tmp/pytest-of-jenkins/pytest-4/test_pytorch_op_serving_False_1 |
| model_control_mode | MODE_NONE |
| strict_model_config | 1 |
| rate_limit | OFF |
| pinned_memory_pool_byte_size | 268435456 |
| cuda_memory_pool_byte_size{0} | 67108864 |
| response_cache_byte_size | 0 |
| min_supported_compute_capability | 6.0 |
| strict_readiness | 1 |
| exit_timeout | 30 |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0728 21:12:50.769744 8683 grpc_server.cc:4589] Started GRPCInferenceService at 0.0.0.0:8001
I0728 21:12:50.770136 8683 http_server.cc:3303] Started HTTPService at 0.0.0.0:8000
I0728 21:12:50.811121 8683 http_server.cc:178] Started Metrics Service at 0.0.0.0:8002
W0728 21:12:51.789502 8683 metrics.cc:468] Unable to get energy consumption for GPU 0. Status:Success, value:0
W0728 21:12:51.789563 8683 metrics.cc:507] Unable to get memory usage for GPU 0. Memory usage status:Success, value:0. Memory total status:Success, value:0
W0728 21:12:52.789727 8683 metrics.cc:468] Unable to get energy consumption for GPU 0. Status:Success, value:0
W0728 21:12:52.789778 8683 metrics.cc:507] Unable to get memory usage for GPU 0. Memory usage status:Success, value:0. Memory total status:Success, value:0
W0728 21:12:53.809189 8683 metrics.cc:468] Unable to get energy consumption for GPU 0. Status:Success, value:0
W0728 21:12:53.809243 8683 metrics.cc:507] Unable to get memory usage for GPU 0. Memory usage status:Success, value:0. Memory total status:Success, value:0
0728 21:12:55.012212 8723 pb_stub.cc:749] Failed to process the request(s) for model '0_predictpytorch', message: RuntimeError: size mismatch, got 1, 1x3,2

At:
/usr/local/lib/python3.8/dist-packages/torch/nn/modules/linear.py(114): forward
/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py(1130): _call_impl
/var/jenkins_home/workspace/merlin_systems/systems/tests/unit/systems/torch/test_torch.py(32): forward
/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py(1130): _call_impl
/tmp/pytest-of-jenkins/pytest-4/test_pytorch_op_serving_False_1/0_predictpytorch/1/model.py(177): execute

I0728 21:12:55.013492 8683 server.cc:257] Waiting for in-flight requests to complete.
I0728 21:12:55.013507 8683 server.cc:273] Timeout 30: Found 0 model versions that have in-flight inferences
I0728 21:12:55.013523 8683 model_repository_manager.cc:1223] unloading: ensemble_model:1
I0728 21:12:55.013575 8683 model_repository_manager.cc:1223] unloading: 0_predictpytorch:1
I0728 21:12:55.013642 8683 server.cc:288] All models are stopped, unloading models
I0728 21:12:55.013652 8683 server.cc:295] Timeout 30: Found 2 live models and 0 in-flight non-inference requests
I0728 21:12:55.013733 8683 model_repository_manager.cc:1328] successfully unloaded 'ensemble_model' version 1
I0728 21:12:56.013737 8683 server.cc:295] Timeout 29: Found 1 live models and 0 in-flight non-inference requests
free(): invalid pointer
I0728 21:12:56.713807 8683 model_repository_manager.cc:1328] successfully unloaded '0_predictpytorch' version 1
I0728 21:12:57.013858 8683 server.cc:295] Timeout 28: Found 0 live models and 0 in-flight non-inference requests
=============================== warnings summary ===============================
../../../../../usr/local/lib/python3.8/dist-packages/nvtabular/framework_utils/init.py:18
/usr/local/lib/python3.8/dist-packages/nvtabular/framework_utils/init.py:18: DeprecationWarning: The nvtabular.framework_utils module is being replaced by the Merlin Models library. Support for importing from nvtabular.framework_utils is deprecated, and will be removed in a future version. Please consider using the models and layers from Merlin Models instead.
warnings.warn(

tests/unit/examples/test_serving_ranking_models_with_merlin_systems.py: 1 warning
tests/unit/systems/test_ensemble.py: 2 warnings
tests/unit/systems/test_export.py: 1 warning
tests/unit/systems/test_inference_ops.py: 2 warnings
tests/unit/systems/test_op_runner.py: 4 warnings
/usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:384: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.
warnings.warn(

tests/unit/systems/test_export.py::test_export_run_ensemble_triton[tensorflow-parquet]
/var/jenkins_home/workspace/merlin_systems/systems/merlin/systems/triton/export.py:304: UserWarning: Column x is being generated by NVTabular workflow but is unused in test_name_tf model
warnings.warn(

tests/unit/systems/test_export.py::test_export_run_ensemble_triton[tensorflow-parquet]
/var/jenkins_home/workspace/merlin_systems/systems/merlin/systems/triton/export.py:304: UserWarning: Column y is being generated by NVTabular workflow but is unused in test_name_tf model
warnings.warn(

tests/unit/systems/test_export.py::test_export_run_ensemble_triton[tensorflow-parquet]
/var/jenkins_home/workspace/merlin_systems/systems/merlin/systems/triton/export.py:304: UserWarning: Column id is being generated by NVTabular workflow but is unused in test_name_tf model
warnings.warn(

tests/unit/systems/fil/test_fil.py::test_binary_classifier_default[sklearn_forest_classifier-get_model_params4]
tests/unit/systems/fil/test_fil.py::test_binary_classifier_with_proba[sklearn_forest_classifier-get_model_params4]
tests/unit/systems/fil/test_fil.py::test_multi_classifier[sklearn_forest_classifier-get_model_params4]
tests/unit/systems/fil/test_fil.py::test_regressor[sklearn_forest_regressor-get_model_params4]
tests/unit/systems/fil/test_fil.py::test_model_file[sklearn_forest_regressor-checkpoint.tl]
/usr/local/lib/python3.8/dist-packages/sklearn/utils/deprecation.py:103: FutureWarning: Attribute n_features_ was deprecated in version 1.0 and will be removed in 1.2. Use n_features_in_ instead.
warnings.warn(msg, category=FutureWarning)

tests/unit/systems/torch/test_torch.py::test_pytorch_op_serving[True-True]
/usr/local/lib/python3.8/dist-packages/torch/serialization.py:707: UserWarning: 'torch.load' received a zip file that looks like a TorchScript archive dispatching to 'torch.jit.load' (call 'torch.jit.load' directly to silence this warning)
warnings.warn("'torch.load' received a zip file that looks like a TorchScript archive"

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=========================== short test summary info ============================
FAILED tests/unit/systems/torch/test_torch.py::test_torch_backend - tritoncli...
FAILED tests/unit/systems/torch/test_torch.py::test_pytorch_op_serving[True-True]
FAILED tests/unit/systems/torch/test_torch.py::test_pytorch_op_serving[True-False]
FAILED tests/unit/systems/torch/test_torch.py::test_pytorch_op_serving[False-True]
FAILED tests/unit/systems/torch/test_torch.py::test_pytorch_op_serving[False-False]
============ 5 failed, 52 passed, 20 warnings in 265.21s (0:04:25) =============
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/systems/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[merlin_systems] $ /bin/bash /tmp/jenkins5216530116755053725.sh

@nvidia-merlin-bot
Copy link

Click to view CI Results
GitHub pull request #151 of commit ca721c01210ebf35bfc39cb058192a3c08fff8ac, no merge conflicts.
Running as SYSTEM
Setting status of ca721c01210ebf35bfc39cb058192a3c08fff8ac to PENDING with url https://10.20.13.93:8080/job/merlin_systems/161/console and message: 'Pending'
Using context: Jenkins
Building on master in workspace /var/jenkins_home/workspace/merlin_systems
using credential fce1c729-5d7c-48e8-90cb-b0c314b1076e
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/systems # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/systems
 > git --version # timeout=10
using GIT_ASKPASS to set credentials login for merlin-systems user + githubtoken
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/systems +refs/pull/151/*:refs/remotes/origin/pr/151/* # timeout=10
 > git rev-parse ca721c01210ebf35bfc39cb058192a3c08fff8ac^{commit} # timeout=10
Checking out Revision ca721c01210ebf35bfc39cb058192a3c08fff8ac (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f ca721c01210ebf35bfc39cb058192a3c08fff8ac # timeout=10
Commit message: "linting cleanup"
 > git rev-list --no-walk 96b6d37b407d7ef50c50c7aef5cbeb83f59a7782 # timeout=10
[merlin_systems] $ /bin/bash /tmp/jenkins8302880701330980524.sh
PYTHONPATH=:/usr/local/lib/python3.8/dist-packages/:/usr/local/hugectr/lib:/var/jenkins_home/workspace/merlin_systems/systems
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/merlin_systems/systems, configfile: pyproject.toml
plugins: anyio-3.6.1, xdist-2.5.0, forked-1.4.0, cov-3.0.0
collected 57 items

tests/unit/test_version.py . [ 1%]
tests/unit/examples/test_serving_ranking_models_with_merlin_systems.py . [ 3%]
[ 3%]
tests/unit/systems/test_ensemble.py ...F [ 10%]
tests/unit/systems/test_ensemble_ops.py FF [ 14%]
tests/unit/systems/test_export.py . [ 15%]
tests/unit/systems/test_graph.py . [ 17%]
tests/unit/systems/test_inference_ops.py ... [ 22%]
tests/unit/systems/test_model_registry.py . [ 24%]
tests/unit/systems/test_op_runner.py ...F [ 31%]
tests/unit/systems/fil/test_fil.py .......................... [ 77%]
tests/unit/systems/fil/test_forest.py FFF [ 82%]
tests/unit/systems/tf/test_tf_op.py ... [ 87%]
tests/unit/systems/torch/test_torch.py FFFFFFF [100%]

=================================== FAILURES ===================================
_____________________ test_workflow_with_forest_inference ______________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-5/test_workflow_with_forest_infe0')

@pytest.mark.skipif(not TRITON_SERVER_PATH, reason="triton server not found")
def test_workflow_with_forest_inference(tmpdir):
    rows = 200
    num_features = 16
    X, y = sklearn.datasets.make_regression(
        n_samples=rows,
        n_features=num_features,
        n_informative=num_features // 3,
        random_state=0,
    )
    feature_names = [str(i) for i in range(num_features)]
    df = pd.DataFrame(X, columns=feature_names, dtype=np.float32)
    dataset = Dataset(df)

    # Fit GBDT Model
    model = xgboost.XGBRegressor()
    model.fit(X, y)

    input_column_schemas = [ColumnSchema(col, dtype=np.float32) for col in feature_names]
    input_schema = Schema(input_column_schemas)
    selector = ColumnSelector(feature_names)

    workflow_ops = feature_names >> wf_ops.LogOp()
    workflow = Workflow(workflow_ops)
    workflow.fit(dataset)

    triton_chain = selector >> TransformWorkflow(workflow) >> PredictForest(model, input_schema)

    triton_ens = Ensemble(triton_chain, input_schema)

    request_df = df[:5]
  triton_ens.export(tmpdir)

tests/unit/systems/test_ensemble.py:218:


merlin/systems/dag/ensemble.py:105: in export
node_config = node.export(export_path, node_id=node_id, version=version)
merlin/systems/dag/node.py:44: in export
return self.op.export(
merlin/systems/dag/ops/fil.py:104: in export
return super().export(
merlin/systems/dag/ops/operator.py:277: in export
text_format.PrintMessage(config, o)
/usr/local/lib/python3.8/dist-packages/google/protobuf/text_format.py:239: in PrintMessage
printer.PrintMessage(message)
/usr/local/lib/python3.8/dist-packages/google/protobuf/text_format.py:454: in PrintMessage
self.PrintField(field, value)
/usr/local/lib/python3.8/dist-packages/google/protobuf/text_format.py:545: in PrintField
self._PrintFieldName(field)


self = <google.protobuf.text_format._Printer object at 0x7f832807ee50>
field = <google.protobuf.descriptor.FieldDescriptor object at 0x7f84806aa8e0>

def _PrintFieldName(self, field):
  """Print field name."""
  out = self.out
out.write(' ' * self.indent)

E TypeError: a bytes-like object is required, not 'str'

/usr/local/lib/python3.8/dist-packages/google/protobuf/text_format.py:517: TypeError
____________________________ test_softmax_sampling _____________________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-5/test_softmax_sampling0')

@pytest.mark.skipif(not TRITON_SERVER_PATH, reason="triton server not found")
def test_softmax_sampling(tmpdir):
    request_schema = Schema(
        [
            ColumnSchema("movie_ids", dtype=np.int32),
            ColumnSchema("output_1", dtype=np.float32),
        ]
    )

    combined_features = {
        "movie_ids": np.random.randint(0, 10000, 100).astype(np.int32),
        "output_1": np.random.random(100).astype(np.float32),
    }

    request = make_df(combined_features)

    ordering = ["movie_ids"] >> SoftmaxSampling(relevance_col="output_1", topk=10, temperature=20.0)

    ensemble = Ensemble(ordering, request_schema)
  ens_config, node_configs = ensemble.export(tmpdir)

tests/unit/systems/test_ensemble_ops.py:50:


merlin/systems/dag/ensemble.py:105: in export
node_config = node.export(export_path, node_id=node_id, version=version)
merlin/systems/dag/node.py:44: in export
return self.op.export(
merlin/systems/dag/ops/softmax_sampling.py:66: in export
return super().export(path, input_schema, output_schema, self_params, node_id, version)
merlin/systems/dag/ops/operator.py:277: in export
text_format.PrintMessage(config, o)
/usr/local/lib/python3.8/dist-packages/google/protobuf/text_format.py:239: in PrintMessage
printer.PrintMessage(message)
/usr/local/lib/python3.8/dist-packages/google/protobuf/text_format.py:454: in PrintMessage
self.PrintField(field, value)
/usr/local/lib/python3.8/dist-packages/google/protobuf/text_format.py:545: in PrintField
self._PrintFieldName(field)


self = <google.protobuf.text_format._Printer object at 0x7f833019b820>
field = <google.protobuf.descriptor.FieldDescriptor object at 0x7f84806aa8e0>

def _PrintFieldName(self, field):
  """Print field name."""
  out = self.out
out.write(' ' * self.indent)

E TypeError: a bytes-like object is required, not 'str'

/usr/local/lib/python3.8/dist-packages/google/protobuf/text_format.py:517: TypeError
____________________________ test_filter_candidates ____________________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-5/test_filter_candidates0')

@pytest.mark.skipif(not TRITON_SERVER_PATH, reason="triton server not found")
def test_filter_candidates(tmpdir):
    request_schema = Schema(
        [
            ColumnSchema("candidate_ids", dtype=np.int32),
            ColumnSchema("movie_ids", dtype=np.int32),
        ]
    )

    candidate_ids = np.random.randint(1, 100000, 100).astype(np.int32)
    movie_ids_1 = np.zeros(100, dtype=np.int32)
    movie_ids_1[:20] = np.unique(candidate_ids)[:20]

    combined_features = {
        "candidate_ids": candidate_ids,
        "movie_ids": movie_ids_1,
    }

    request = make_df(combined_features)

    filtering = ["candidate_ids"] >> FilterCandidates(filter_out=["movie_ids"])

    ensemble = Ensemble(filtering, request_schema)
  ens_config, node_configs = ensemble.export(tmpdir)

tests/unit/systems/test_ensemble_ops.py:82:


merlin/systems/dag/ensemble.py:105: in export
node_config = node.export(export_path, node_id=node_id, version=version)
merlin/systems/dag/node.py:44: in export
return self.op.export(
merlin/systems/dag/ops/session_filter.py:218: in export
return super().export(path, input_schema, output_schema, self_params, node_id, version)
merlin/systems/dag/ops/operator.py:277: in export
text_format.PrintMessage(config, o)
/usr/local/lib/python3.8/dist-packages/google/protobuf/text_format.py:239: in PrintMessage
printer.PrintMessage(message)
/usr/local/lib/python3.8/dist-packages/google/protobuf/text_format.py:454: in PrintMessage
self.PrintField(field, value)
/usr/local/lib/python3.8/dist-packages/google/protobuf/text_format.py:545: in PrintField
self._PrintFieldName(field)


self = <google.protobuf.text_format._Printer object at 0x7f83c01857f0>
field = <google.protobuf.descriptor.FieldDescriptor object at 0x7f84806aa8e0>

def _PrintFieldName(self, field):
  """Print field name."""
  out = self.out
out.write(' ' * self.indent)

E TypeError: a bytes-like object is required, not 'str'

/usr/local/lib/python3.8/dist-packages/google/protobuf/text_format.py:517: TypeError
__________________ test_op_runner_single_node_export[parquet] __________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-5/test_op_runner_single_node_exp0')
dataset = <merlin.io.dataset.Dataset object at 0x7f8378326e20>
engine = 'parquet'

@pytest.mark.parametrize("engine", ["parquet"])
def test_op_runner_single_node_export(tmpdir, dataset, engine):
    # assert against produced config
    schema = dataset.schema
    for name in schema.column_names:
        dataset.schema.column_schemas[name] = dataset.schema.column_schemas[name].with_tags(
            [Tags.USER]
        )

    inputs = ["x", "y"]

    node = inputs >> PlusTwoOp()

    graph = Graph(node)
    graph.construct_schema(dataset.schema)
  config = node.export(tmpdir)

tests/unit/systems/test_op_runner.py:169:


merlin/systems/dag/node.py:44: in export
return self.op.export(
merlin/systems/dag/ops/operator.py:277: in export
text_format.PrintMessage(config, o)
/usr/local/lib/python3.8/dist-packages/google/protobuf/text_format.py:239: in PrintMessage
printer.PrintMessage(message)
/usr/local/lib/python3.8/dist-packages/google/protobuf/text_format.py:454: in PrintMessage
self.PrintField(field, value)
/usr/local/lib/python3.8/dist-packages/google/protobuf/text_format.py:545: in PrintField
self._PrintFieldName(field)


self = <google.protobuf.text_format._Printer object at 0x7f83c80d5580>
field = <google.protobuf.descriptor.FieldDescriptor object at 0x7f84806aa8e0>

def _PrintFieldName(self, field):
  """Print field name."""
  out = self.out
out.write(' ' * self.indent)

E TypeError: a bytes-like object is required, not 'str'

/usr/local/lib/python3.8/dist-packages/google/protobuf/text_format.py:517: TypeError
____________________________ test_load_from_config _____________________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-5/test_load_from_config0')

def test_load_from_config(tmpdir):
    rows = 200
    num_features = 16
    X, y = sklearn.datasets.make_regression(
        n_samples=rows,
        n_features=num_features,
        n_informative=num_features // 3,
        random_state=0,
    )
    model = xgboost.XGBRegressor()
    model.fit(X, y)
    feature_names = [str(i) for i in range(num_features)]
    input_schema = Schema([ColumnSchema(col, dtype=np.float32) for col in feature_names])
    output_schema = Schema([ColumnSchema("output__0", dtype=np.float32)])
  config = PredictForest(model, input_schema).export(
        tmpdir, input_schema, output_schema, node_id=2
    )

tests/unit/systems/fil/test_forest.py:52:


merlin/systems/dag/ops/fil.py:104: in export
return super().export(
merlin/systems/dag/ops/operator.py:277: in export
text_format.PrintMessage(config, o)
/usr/local/lib/python3.8/dist-packages/google/protobuf/text_format.py:239: in PrintMessage
printer.PrintMessage(message)
/usr/local/lib/python3.8/dist-packages/google/protobuf/text_format.py:454: in PrintMessage
self.PrintField(field, value)
/usr/local/lib/python3.8/dist-packages/google/protobuf/text_format.py:545: in PrintField
self._PrintFieldName(field)


self = <google.protobuf.text_format._Printer object at 0x7f830bd2ac70>
field = <google.protobuf.descriptor.FieldDescriptor object at 0x7f84806aa8e0>

def _PrintFieldName(self, field):
  """Print field name."""
  out = self.out
out.write(' ' * self.indent)

E TypeError: a bytes-like object is required, not 'str'

/usr/local/lib/python3.8/dist-packages/google/protobuf/text_format.py:517: TypeError
_________________________________ test_export __________________________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-5/test_export0')

def test_export(tmpdir):
    rows = 200
    num_features = 16
    X, y = sklearn.datasets.make_regression(
        n_samples=rows,
        n_features=num_features,
        n_informative=num_features // 3,
        random_state=0,
    )
    model = xgboost.XGBRegressor()
    model.fit(X, y)
    feature_names = [str(i) for i in range(num_features)]
    input_schema = Schema([ColumnSchema(col, dtype=np.float32) for col in feature_names])
    output_schema = Schema([ColumnSchema("output__0", dtype=np.float32)])
  _ = PredictForest(model, input_schema).export(tmpdir, input_schema, output_schema, node_id=2)

tests/unit/systems/fil/test_forest.py:86:


merlin/systems/dag/ops/fil.py:104: in export
return super().export(
merlin/systems/dag/ops/operator.py:277: in export
text_format.PrintMessage(config, o)
/usr/local/lib/python3.8/dist-packages/google/protobuf/text_format.py:239: in PrintMessage
printer.PrintMessage(message)
/usr/local/lib/python3.8/dist-packages/google/protobuf/text_format.py:454: in PrintMessage
self.PrintField(field, value)
/usr/local/lib/python3.8/dist-packages/google/protobuf/text_format.py:545: in PrintField
self._PrintFieldName(field)


self = <google.protobuf.text_format._Printer object at 0x7f8360162fd0>
field = <google.protobuf.descriptor.FieldDescriptor object at 0x7f84806aa8e0>

def _PrintFieldName(self, field):
  """Print field name."""
  out = self.out
out.write(' ' * self.indent)

E TypeError: a bytes-like object is required, not 'str'

/usr/local/lib/python3.8/dist-packages/google/protobuf/text_format.py:517: TypeError
________________________________ test_ensemble _________________________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-5/test_ensemble0')

def test_ensemble(tmpdir):
    rows = 200
    num_features = 16
    X, y = sklearn.datasets.make_regression(
        n_samples=rows,
        n_features=num_features,
        n_informative=num_features // 3,
        random_state=0,
    )
    feature_names = [str(i) for i in range(num_features)]
    df = pd.DataFrame(X, columns=feature_names)
    dataset = Dataset(df)

    # Fit GBDT Model
    model = xgboost.XGBRegressor()
    model.fit(X, y)

    input_schema = Schema([ColumnSchema(col, dtype=np.float32) for col in feature_names])
    selector = ColumnSelector(feature_names)

    workflow_ops = ["0", "1", "2"] >> wf_ops.LogOp()
    workflow = Workflow(workflow_ops)
    workflow.fit(dataset)

    triton_chain = selector >> TransformWorkflow(workflow) >> PredictForest(model, input_schema)

    triton_ens = Ensemble(triton_chain, input_schema)
  triton_ens.export(tmpdir)

tests/unit/systems/fil/test_forest.py:127:


merlin/systems/dag/ensemble.py:105: in export
node_config = node.export(export_path, node_id=node_id, version=version)
merlin/systems/dag/node.py:44: in export
return self.op.export(
merlin/systems/dag/ops/fil.py:104: in export
return super().export(
merlin/systems/dag/ops/operator.py:277: in export
text_format.PrintMessage(config, o)
/usr/local/lib/python3.8/dist-packages/google/protobuf/text_format.py:239: in PrintMessage
printer.PrintMessage(message)
/usr/local/lib/python3.8/dist-packages/google/protobuf/text_format.py:454: in PrintMessage
self.PrintField(field, value)
/usr/local/lib/python3.8/dist-packages/google/protobuf/text_format.py:545: in PrintField
self._PrintFieldName(field)


self = <google.protobuf.text_format._Printer object at 0x7f83b8085160>
field = <google.protobuf.descriptor.FieldDescriptor object at 0x7f84806aa8e0>

def _PrintFieldName(self, field):
  """Print field name."""
  out = self.out
out.write(' ' * self.indent)

E TypeError: a bytes-like object is required, not 'str'

/usr/local/lib/python3.8/dist-packages/google/protobuf/text_format.py:517: TypeError
___________________ test_pytorch_op_exports_own_config[True] ___________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-5/test_pytorch_op_exports_own_co0')
torchscript = True

@pytest.mark.parametrize("torchscript", [True, False])
def test_pytorch_op_exports_own_config(tmpdir, torchscript):
    model_to_use = model_scripted if torchscript else model

    triton_op = ptorch_op.PredictPyTorch(
        model_to_use, torchscript, model_input_schema, model_output_schema
    )
  triton_op.export(tmpdir, None, None)

tests/unit/systems/torch/test_torch.py:91:


merlin/systems/dag/ops/pytorch.py:151: in export
return self._export_model_config(
merlin/systems/dag/ops/pytorch.py:168: in _export_model_config
self._export_torchscript_config(name, output_path)
merlin/systems/dag/ops/pytorch.py:259: in _export_torchscript_config
text_format.PrintMessage(config, o)
/usr/local/lib/python3.8/dist-packages/google/protobuf/text_format.py:239: in PrintMessage
printer.PrintMessage(message)
/usr/local/lib/python3.8/dist-packages/google/protobuf/text_format.py:454: in PrintMessage
self.PrintField(field, value)
/usr/local/lib/python3.8/dist-packages/google/protobuf/text_format.py:545: in PrintField
self._PrintFieldName(field)


self = <google.protobuf.text_format._Printer object at 0x7f830a751610>
field = <google.protobuf.descriptor.FieldDescriptor object at 0x7f84806aa8e0>

def _PrintFieldName(self, field):
  """Print field name."""
  out = self.out
out.write(' ' * self.indent)

E TypeError: a bytes-like object is required, not 'str'

/usr/local/lib/python3.8/dist-packages/google/protobuf/text_format.py:517: TypeError
__________________ test_pytorch_op_exports_own_config[False] ___________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-5/test_pytorch_op_exports_own_co1')
torchscript = False

@pytest.mark.parametrize("torchscript", [True, False])
def test_pytorch_op_exports_own_config(tmpdir, torchscript):
    model_to_use = model_scripted if torchscript else model

    triton_op = ptorch_op.PredictPyTorch(
        model_to_use, torchscript, model_input_schema, model_output_schema
    )
  triton_op.export(tmpdir, None, None)

tests/unit/systems/torch/test_torch.py:91:


merlin/systems/dag/ops/pytorch.py:151: in export
return self._export_model_config(
merlin/systems/dag/ops/pytorch.py:170: in _export_model_config
else self._export_python_config(name, output_path, sparse_max, use_fix_dtypes, version)
merlin/systems/dag/ops/pytorch.py:215: in _export_python_config
text_format.PrintMessage(config, o)
/usr/local/lib/python3.8/dist-packages/google/protobuf/text_format.py:239: in PrintMessage
printer.PrintMessage(message)
/usr/local/lib/python3.8/dist-packages/google/protobuf/text_format.py:454: in PrintMessage
self.PrintField(field, value)
/usr/local/lib/python3.8/dist-packages/google/protobuf/text_format.py:545: in PrintField
self._PrintFieldName(field)


self = <google.protobuf.text_format._Printer object at 0x7f83c80f80d0>
field = <google.protobuf.descriptor.FieldDescriptor object at 0x7f84806aa8e0>

def _PrintFieldName(self, field):
  """Print field name."""
  out = self.out
out.write(' ' * self.indent)

E TypeError: a bytes-like object is required, not 'str'

/usr/local/lib/python3.8/dist-packages/google/protobuf/text_format.py:517: TypeError
______________________________ test_torch_backend ______________________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-5/test_torch_backend0')

def test_torch_backend(tmpdir):
    model_repository = Path(tmpdir)

    model_dir = model_repository / model_name
    model_version_dir = model_dir / "1"
    model_version_dir.mkdir(parents=True, exist_ok=True)

    # Write config out
    config_path = model_dir / "config.pbtxt"
    with open(str(config_path), "wb") as f:
      f.write(model_config)

E TypeError: a bytes-like object is required, not 'str'

tests/unit/systems/torch/test_torch.py:121: TypeError
______________________ test_pytorch_op_serving[True-True] ______________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-5/test_pytorch_op_serving_True_T0')
use_path = True, torchscript = True

@pytest.mark.skipif(not TRITON_SERVER_PATH, reason="triton server not found")
@pytest.mark.parametrize("torchscript", [True, False])
@pytest.mark.parametrize("use_path", [True, False])
def test_pytorch_op_serving(tmpdir, use_path, torchscript):
    # from merlin.core.dispatch import make_df

    model_name = "0_predictpytorch"
    model_path = str(tmpdir / "model.pt")

    model_to_use = model_scripted if torchscript else model
    model_or_path = model_path if use_path else model_to_use

    if use_path:
        try:
            # jit-compiled version of a model
            model_to_use.save(model_path)
        except AttributeError:
            # non-jit-compiled version of a model
            torch.save(model_to_use, model_path)

    predictions = ["input"] >> ptorch_op.PredictPyTorch(
        model_or_path, torchscript, model_input_schema, model_output_schema, sparse_max={"input": 3}
    )
    ensemble = Ensemble(predictions, model_input_schema)
  ens_config, node_configs = ensemble.export(tmpdir)

tests/unit/systems/torch/test_torch.py:169:


merlin/systems/dag/ensemble.py:105: in export
node_config = node.export(export_path, node_id=node_id, version=version)
merlin/systems/dag/node.py:44: in export
return self.op.export(
merlin/systems/dag/ops/pytorch.py:151: in export
return self._export_model_config(
merlin/systems/dag/ops/pytorch.py:168: in _export_model_config
self._export_torchscript_config(name, output_path)
merlin/systems/dag/ops/pytorch.py:259: in _export_torchscript_config
text_format.PrintMessage(config, o)
/usr/local/lib/python3.8/dist-packages/google/protobuf/text_format.py:239: in PrintMessage
printer.PrintMessage(message)
/usr/local/lib/python3.8/dist-packages/google/protobuf/text_format.py:454: in PrintMessage
self.PrintField(field, value)
/usr/local/lib/python3.8/dist-packages/google/protobuf/text_format.py:545: in PrintField
self._PrintFieldName(field)


self = <google.protobuf.text_format._Printer object at 0x7f83c077fc10>
field = <google.protobuf.descriptor.FieldDescriptor object at 0x7f84806aa8e0>

def _PrintFieldName(self, field):
  """Print field name."""
  out = self.out
out.write(' ' * self.indent)

E TypeError: a bytes-like object is required, not 'str'

/usr/local/lib/python3.8/dist-packages/google/protobuf/text_format.py:517: TypeError
_____________________ test_pytorch_op_serving[True-False] ______________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-5/test_pytorch_op_serving_True_F0')
use_path = True, torchscript = False

@pytest.mark.skipif(not TRITON_SERVER_PATH, reason="triton server not found")
@pytest.mark.parametrize("torchscript", [True, False])
@pytest.mark.parametrize("use_path", [True, False])
def test_pytorch_op_serving(tmpdir, use_path, torchscript):
    # from merlin.core.dispatch import make_df

    model_name = "0_predictpytorch"
    model_path = str(tmpdir / "model.pt")

    model_to_use = model_scripted if torchscript else model
    model_or_path = model_path if use_path else model_to_use

    if use_path:
        try:
            # jit-compiled version of a model
            model_to_use.save(model_path)
        except AttributeError:
            # non-jit-compiled version of a model
            torch.save(model_to_use, model_path)

    predictions = ["input"] >> ptorch_op.PredictPyTorch(
        model_or_path, torchscript, model_input_schema, model_output_schema, sparse_max={"input": 3}
    )
    ensemble = Ensemble(predictions, model_input_schema)
  ens_config, node_configs = ensemble.export(tmpdir)

tests/unit/systems/torch/test_torch.py:169:


merlin/systems/dag/ensemble.py:105: in export
node_config = node.export(export_path, node_id=node_id, version=version)
merlin/systems/dag/node.py:44: in export
return self.op.export(
merlin/systems/dag/ops/pytorch.py:151: in export
return self._export_model_config(
merlin/systems/dag/ops/pytorch.py:170: in _export_model_config
else self._export_python_config(name, output_path, sparse_max, use_fix_dtypes, version)
merlin/systems/dag/ops/pytorch.py:212: in _export_python_config
json.dump(model_info, o)


obj = {'sparse_max': {'input': 3}, 'use_fix_dtypes': False}
fp = <_io.BufferedWriter name='/tmp/pytest-of-jenkins/pytest-5/test_pytorch_op_serving_True_F0/0_predictpytorch/1/model_info.json'>
skipkeys = False, ensure_ascii = True, check_circular = True, allow_nan = True
cls = None, indent = None, separators = None, default = None

def dump(obj, fp, *, skipkeys=False, ensure_ascii=True, check_circular=True,
        allow_nan=True, cls=None, indent=None, separators=None,
        default=None, sort_keys=False, **kw):
    """Serialize ``obj`` as a JSON formatted stream to ``fp`` (a
    ``.write()``-supporting file-like object).

    If ``skipkeys`` is true then ``dict`` keys that are not basic types
    (``str``, ``int``, ``float``, ``bool``, ``None``) will be skipped
    instead of raising a ``TypeError``.

    If ``ensure_ascii`` is false, then the strings written to ``fp`` can
    contain non-ASCII characters if they appear in strings contained in
    ``obj``. Otherwise, all such characters are escaped in JSON strings.

    If ``check_circular`` is false, then the circular reference check
    for container types will be skipped and a circular reference will
    result in an ``OverflowError`` (or worse).

    If ``allow_nan`` is false, then it will be a ``ValueError`` to
    serialize out of range ``float`` values (``nan``, ``inf``, ``-inf``)
    in strict compliance of the JSON specification, instead of using the
    JavaScript equivalents (``NaN``, ``Infinity``, ``-Infinity``).

    If ``indent`` is a non-negative integer, then JSON array elements and
    object members will be pretty-printed with that indent level. An indent
    level of 0 will only insert newlines. ``None`` is the most compact
    representation.

    If specified, ``separators`` should be an ``(item_separator, key_separator)``
    tuple.  The default is ``(', ', ': ')`` if *indent* is ``None`` and
    ``(',', ': ')`` otherwise.  To get the most compact JSON representation,
    you should specify ``(',', ':')`` to eliminate whitespace.

    ``default(obj)`` is a function that should return a serializable version
    of obj or raise TypeError. The default simply raises TypeError.

    If *sort_keys* is true (default: ``False``), then the output of
    dictionaries will be sorted by key.

    To use a custom ``JSONEncoder`` subclass (e.g. one that overrides the
    ``.default()`` method to serialize additional types), specify it with
    the ``cls`` kwarg; otherwise ``JSONEncoder`` is used.

    """
    # cached encoder
    if (not skipkeys and ensure_ascii and
        check_circular and allow_nan and
        cls is None and indent is None and separators is None and
        default is None and not sort_keys and not kw):
        iterable = _default_encoder.iterencode(obj)
    else:
        if cls is None:
            cls = JSONEncoder
        iterable = cls(skipkeys=skipkeys, ensure_ascii=ensure_ascii,
            check_circular=check_circular, allow_nan=allow_nan, indent=indent,
            separators=separators,
            default=default, sort_keys=sort_keys, **kw).iterencode(obj)
    # could accelerate with writelines in some versions of Python, at
    # a debuggability cost
    for chunk in iterable:
      fp.write(chunk)

E TypeError: a bytes-like object is required, not 'str'

/usr/lib/python3.8/json/init.py:180: TypeError
_____________________ test_pytorch_op_serving[False-True] ______________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-5/test_pytorch_op_serving_False_0')
use_path = False, torchscript = True

@pytest.mark.skipif(not TRITON_SERVER_PATH, reason="triton server not found")
@pytest.mark.parametrize("torchscript", [True, False])
@pytest.mark.parametrize("use_path", [True, False])
def test_pytorch_op_serving(tmpdir, use_path, torchscript):
    # from merlin.core.dispatch import make_df

    model_name = "0_predictpytorch"
    model_path = str(tmpdir / "model.pt")

    model_to_use = model_scripted if torchscript else model
    model_or_path = model_path if use_path else model_to_use

    if use_path:
        try:
            # jit-compiled version of a model
            model_to_use.save(model_path)
        except AttributeError:
            # non-jit-compiled version of a model
            torch.save(model_to_use, model_path)

    predictions = ["input"] >> ptorch_op.PredictPyTorch(
        model_or_path, torchscript, model_input_schema, model_output_schema, sparse_max={"input": 3}
    )
    ensemble = Ensemble(predictions, model_input_schema)
  ens_config, node_configs = ensemble.export(tmpdir)

tests/unit/systems/torch/test_torch.py:169:


merlin/systems/dag/ensemble.py:105: in export
node_config = node.export(export_path, node_id=node_id, version=version)
merlin/systems/dag/node.py:44: in export
return self.op.export(
merlin/systems/dag/ops/pytorch.py:151: in export
return self._export_model_config(
merlin/systems/dag/ops/pytorch.py:168: in _export_model_config
self._export_torchscript_config(name, output_path)
merlin/systems/dag/ops/pytorch.py:259: in _export_torchscript_config
text_format.PrintMessage(config, o)
/usr/local/lib/python3.8/dist-packages/google/protobuf/text_format.py:239: in PrintMessage
printer.PrintMessage(message)
/usr/local/lib/python3.8/dist-packages/google/protobuf/text_format.py:454: in PrintMessage
self.PrintField(field, value)
/usr/local/lib/python3.8/dist-packages/google/protobuf/text_format.py:545: in PrintField
self._PrintFieldName(field)


self = <google.protobuf.text_format._Printer object at 0x7f83c0009250>
field = <google.protobuf.descriptor.FieldDescriptor object at 0x7f84806aa8e0>

def _PrintFieldName(self, field):
  """Print field name."""
  out = self.out
out.write(' ' * self.indent)

E TypeError: a bytes-like object is required, not 'str'

/usr/local/lib/python3.8/dist-packages/google/protobuf/text_format.py:517: TypeError
_____________________ test_pytorch_op_serving[False-False] _____________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-5/test_pytorch_op_serving_False_1')
use_path = False, torchscript = False

@pytest.mark.skipif(not TRITON_SERVER_PATH, reason="triton server not found")
@pytest.mark.parametrize("torchscript", [True, False])
@pytest.mark.parametrize("use_path", [True, False])
def test_pytorch_op_serving(tmpdir, use_path, torchscript):
    # from merlin.core.dispatch import make_df

    model_name = "0_predictpytorch"
    model_path = str(tmpdir / "model.pt")

    model_to_use = model_scripted if torchscript else model
    model_or_path = model_path if use_path else model_to_use

    if use_path:
        try:
            # jit-compiled version of a model
            model_to_use.save(model_path)
        except AttributeError:
            # non-jit-compiled version of a model
            torch.save(model_to_use, model_path)

    predictions = ["input"] >> ptorch_op.PredictPyTorch(
        model_or_path, torchscript, model_input_schema, model_output_schema, sparse_max={"input": 3}
    )
    ensemble = Ensemble(predictions, model_input_schema)
  ens_config, node_configs = ensemble.export(tmpdir)

tests/unit/systems/torch/test_torch.py:169:


merlin/systems/dag/ensemble.py:105: in export
node_config = node.export(export_path, node_id=node_id, version=version)
merlin/systems/dag/node.py:44: in export
return self.op.export(
merlin/systems/dag/ops/pytorch.py:151: in export
return self._export_model_config(
merlin/systems/dag/ops/pytorch.py:170: in _export_model_config
else self._export_python_config(name, output_path, sparse_max, use_fix_dtypes, version)
merlin/systems/dag/ops/pytorch.py:212: in _export_python_config
json.dump(model_info, o)


obj = {'sparse_max': {'input': 3}, 'use_fix_dtypes': False}
fp = <_io.BufferedWriter name='/tmp/pytest-of-jenkins/pytest-5/test_pytorch_op_serving_False_1/0_predictpytorch/1/model_info.json'>
skipkeys = False, ensure_ascii = True, check_circular = True, allow_nan = True
cls = None, indent = None, separators = None, default = None

def dump(obj, fp, *, skipkeys=False, ensure_ascii=True, check_circular=True,
        allow_nan=True, cls=None, indent=None, separators=None,
        default=None, sort_keys=False, **kw):
    """Serialize ``obj`` as a JSON formatted stream to ``fp`` (a
    ``.write()``-supporting file-like object).

    If ``skipkeys`` is true then ``dict`` keys that are not basic types
    (``str``, ``int``, ``float``, ``bool``, ``None``) will be skipped
    instead of raising a ``TypeError``.

    If ``ensure_ascii`` is false, then the strings written to ``fp`` can
    contain non-ASCII characters if they appear in strings contained in
    ``obj``. Otherwise, all such characters are escaped in JSON strings.

    If ``check_circular`` is false, then the circular reference check
    for container types will be skipped and a circular reference will
    result in an ``OverflowError`` (or worse).

    If ``allow_nan`` is false, then it will be a ``ValueError`` to
    serialize out of range ``float`` values (``nan``, ``inf``, ``-inf``)
    in strict compliance of the JSON specification, instead of using the
    JavaScript equivalents (``NaN``, ``Infinity``, ``-Infinity``).

    If ``indent`` is a non-negative integer, then JSON array elements and
    object members will be pretty-printed with that indent level. An indent
    level of 0 will only insert newlines. ``None`` is the most compact
    representation.

    If specified, ``separators`` should be an ``(item_separator, key_separator)``
    tuple.  The default is ``(', ', ': ')`` if *indent* is ``None`` and
    ``(',', ': ')`` otherwise.  To get the most compact JSON representation,
    you should specify ``(',', ':')`` to eliminate whitespace.

    ``default(obj)`` is a function that should return a serializable version
    of obj or raise TypeError. The default simply raises TypeError.

    If *sort_keys* is true (default: ``False``), then the output of
    dictionaries will be sorted by key.

    To use a custom ``JSONEncoder`` subclass (e.g. one that overrides the
    ``.default()`` method to serialize additional types), specify it with
    the ``cls`` kwarg; otherwise ``JSONEncoder`` is used.

    """
    # cached encoder
    if (not skipkeys and ensure_ascii and
        check_circular and allow_nan and
        cls is None and indent is None and separators is None and
        default is None and not sort_keys and not kw):
        iterable = _default_encoder.iterencode(obj)
    else:
        if cls is None:
            cls = JSONEncoder
        iterable = cls(skipkeys=skipkeys, ensure_ascii=ensure_ascii,
            check_circular=check_circular, allow_nan=allow_nan, indent=indent,
            separators=separators,
            default=default, sort_keys=sort_keys, **kw).iterencode(obj)
    # could accelerate with writelines in some versions of Python, at
    # a debuggability cost
    for chunk in iterable:
      fp.write(chunk)

E TypeError: a bytes-like object is required, not 'str'

/usr/lib/python3.8/json/init.py:180: TypeError
=============================== warnings summary ===============================
../../../../../usr/local/lib/python3.8/dist-packages/nvtabular/framework_utils/init.py:18
/usr/local/lib/python3.8/dist-packages/nvtabular/framework_utils/init.py:18: DeprecationWarning: The nvtabular.framework_utils module is being replaced by the Merlin Models library. Support for importing from nvtabular.framework_utils is deprecated, and will be removed in a future version. Please consider using the models and layers from Merlin Models instead.
warnings.warn(

tests/unit/examples/test_serving_ranking_models_with_merlin_systems.py: 1 warning
tests/unit/systems/test_ensemble.py: 2 warnings
tests/unit/systems/test_export.py: 1 warning
tests/unit/systems/test_inference_ops.py: 2 warnings
tests/unit/systems/test_op_runner.py: 4 warnings
/usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:384: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.
warnings.warn(

tests/unit/systems/test_export.py::test_export_run_ensemble_triton[tensorflow-parquet]
/var/jenkins_home/workspace/merlin_systems/systems/merlin/systems/triton/export.py:304: UserWarning: Column x is being generated by NVTabular workflow but is unused in test_name_tf model
warnings.warn(

tests/unit/systems/test_export.py::test_export_run_ensemble_triton[tensorflow-parquet]
/var/jenkins_home/workspace/merlin_systems/systems/merlin/systems/triton/export.py:304: UserWarning: Column y is being generated by NVTabular workflow but is unused in test_name_tf model
warnings.warn(

tests/unit/systems/test_export.py::test_export_run_ensemble_triton[tensorflow-parquet]
/var/jenkins_home/workspace/merlin_systems/systems/merlin/systems/triton/export.py:304: UserWarning: Column id is being generated by NVTabular workflow but is unused in test_name_tf model
warnings.warn(

tests/unit/systems/fil/test_fil.py::test_binary_classifier_default[sklearn_forest_classifier-get_model_params4]
tests/unit/systems/fil/test_fil.py::test_binary_classifier_with_proba[sklearn_forest_classifier-get_model_params4]
tests/unit/systems/fil/test_fil.py::test_multi_classifier[sklearn_forest_classifier-get_model_params4]
tests/unit/systems/fil/test_fil.py::test_regressor[sklearn_forest_regressor-get_model_params4]
tests/unit/systems/fil/test_fil.py::test_model_file[sklearn_forest_regressor-checkpoint.tl]
/usr/local/lib/python3.8/dist-packages/sklearn/utils/deprecation.py:103: FutureWarning: Attribute n_features_ was deprecated in version 1.0 and will be removed in 1.2. Use n_features_in_ instead.
warnings.warn(msg, category=FutureWarning)

tests/unit/systems/torch/test_torch.py::test_pytorch_op_serving[True-True]
/usr/local/lib/python3.8/dist-packages/torch/serialization.py:707: UserWarning: 'torch.load' received a zip file that looks like a TorchScript archive dispatching to 'torch.jit.load' (call 'torch.jit.load' directly to silence this warning)
warnings.warn("'torch.load' received a zip file that looks like a TorchScript archive"

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=========================== short test summary info ============================
FAILED tests/unit/systems/test_ensemble.py::test_workflow_with_forest_inference
FAILED tests/unit/systems/test_ensemble_ops.py::test_softmax_sampling - TypeE...
FAILED tests/unit/systems/test_ensemble_ops.py::test_filter_candidates - Type...
FAILED tests/unit/systems/test_op_runner.py::test_op_runner_single_node_export[parquet]
FAILED tests/unit/systems/fil/test_forest.py::test_load_from_config - TypeErr...
FAILED tests/unit/systems/fil/test_forest.py::test_export - TypeError: a byte...
FAILED tests/unit/systems/fil/test_forest.py::test_ensemble - TypeError: a by...
FAILED tests/unit/systems/torch/test_torch.py::test_pytorch_op_exports_own_config[True]
FAILED tests/unit/systems/torch/test_torch.py::test_pytorch_op_exports_own_config[False]
FAILED tests/unit/systems/torch/test_torch.py::test_torch_backend - TypeError...
FAILED tests/unit/systems/torch/test_torch.py::test_pytorch_op_serving[True-True]
FAILED tests/unit/systems/torch/test_torch.py::test_pytorch_op_serving[True-False]
FAILED tests/unit/systems/torch/test_torch.py::test_pytorch_op_serving[False-True]
FAILED tests/unit/systems/torch/test_torch.py::test_pytorch_op_serving[False-False]
============ 14 failed, 43 passed, 20 warnings in 225.58s (0:03:45) ============
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/systems/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[merlin_systems] $ /bin/bash /tmp/jenkins18212149937410205207.sh

@karlhigley karlhigley linked an issue Jul 29, 2022 that may be closed by this pull request
@nvidia-merlin-bot
Copy link

Click to view CI Results
GitHub pull request #151 of commit c909ee1b10fc006d71213dd1977168849a8b11c0, no merge conflicts.
Running as SYSTEM
Setting status of c909ee1b10fc006d71213dd1977168849a8b11c0 to PENDING with url https://10.20.13.93:8080/job/merlin_systems/163/console and message: 'Pending'
Using context: Jenkins
Building on master in workspace /var/jenkins_home/workspace/merlin_systems
using credential fce1c729-5d7c-48e8-90cb-b0c314b1076e
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/systems # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/systems
 > git --version # timeout=10
using GIT_ASKPASS to set credentials login for merlin-systems user + githubtoken
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/systems +refs/pull/151/*:refs/remotes/origin/pr/151/* # timeout=10
 > git rev-parse c909ee1b10fc006d71213dd1977168849a8b11c0^{commit} # timeout=10
Checking out Revision c909ee1b10fc006d71213dd1977168849a8b11c0 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f c909ee1b10fc006d71213dd1977168849a8b11c0 # timeout=10
Commit message: "fix open paths"
 > git rev-list --no-walk 41bd6d7e424ed099d0922a5c35eb7bc98a2ef63d # timeout=10
[merlin_systems] $ /bin/bash /tmp/jenkins12832759688316659970.sh
PYTHONPATH=:/usr/local/lib/python3.8/dist-packages/:/usr/local/hugectr/lib:/var/jenkins_home/workspace/merlin_systems/systems
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/merlin_systems/systems, configfile: pyproject.toml
plugins: anyio-3.6.1, xdist-2.5.0, forked-1.4.0, cov-3.0.0
collected 57 items

tests/unit/test_version.py . [ 1%]
tests/unit/examples/test_serving_ranking_models_with_merlin_systems.py . [ 3%]
[ 3%]
tests/unit/systems/test_ensemble.py .... [ 10%]
tests/unit/systems/test_ensemble_ops.py .. [ 14%]
tests/unit/systems/test_export.py . [ 15%]
tests/unit/systems/test_graph.py . [ 17%]
tests/unit/systems/test_inference_ops.py ... [ 22%]
tests/unit/systems/test_model_registry.py . [ 24%]
tests/unit/systems/test_op_runner.py .... [ 31%]
tests/unit/systems/fil/test_fil.py .......................... [ 77%]
tests/unit/systems/fil/test_forest.py ... [ 82%]
tests/unit/systems/tf/test_tf_op.py ... [ 87%]
tests/unit/systems/torch/test_torch.py F.FFFFF [100%]

=================================== FAILURES ===================================
___________________ test_pytorch_op_exports_own_config[True] ___________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_pytorch_op_exports_own_co0')
torchscript = True

@pytest.mark.parametrize("torchscript", [True, False])
def test_pytorch_op_exports_own_config(tmpdir, torchscript):
    model_to_use = model_scripted if torchscript else model

    triton_op = ptorch_op.PredictPyTorch(
        model_to_use, torchscript, model_input_schema, model_output_schema
    )
  triton_op.export(tmpdir, None, None)

tests/unit/systems/torch/test_torch.py:91:


merlin/systems/dag/ops/pytorch.py:151: in export
return self._export_model_config(
merlin/systems/dag/ops/pytorch.py:168: in _export_model_config
self._export_torchscript_config(name, output_path)
merlin/systems/dag/ops/pytorch.py:261: in _export_torchscript_config
text_format.PrintMessage(config, o)
/usr/local/lib/python3.8/dist-packages/google/protobuf/text_format.py:239: in PrintMessage
printer.PrintMessage(message)
/usr/local/lib/python3.8/dist-packages/google/protobuf/text_format.py:454: in PrintMessage
self.PrintField(field, value)
/usr/local/lib/python3.8/dist-packages/google/protobuf/text_format.py:545: in PrintField
self._PrintFieldName(field)


self = <google.protobuf.text_format._Printer object at 0x7f30081471c0>
field = <google.protobuf.descriptor.FieldDescriptor object at 0x7f30fec56850>

def _PrintFieldName(self, field):
  """Print field name."""
  out = self.out
out.write(' ' * self.indent)

E TypeError: a bytes-like object is required, not 'str'

/usr/local/lib/python3.8/dist-packages/google/protobuf/text_format.py:517: TypeError
______________________________ test_torch_backend ______________________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_torch_backend0')

def test_torch_backend(tmpdir):
    model_repository = Path(tmpdir)

    model_dir = model_repository / model_name
    model_version_dir = model_dir / "1"
    model_version_dir.mkdir(parents=True, exist_ok=True)

    # Write config out
    config_path = model_dir / "config.pbtxt"
    with open(str(config_path), "w", encoding="utf-8") as f:
        f.write(model_config)

    # Write model
    model_scripted = torch.jit.script(model)
    model_scripted.save(str(model_version_dir / "model.pt"))

    input_data = {"input": np.array([[2.0, 3.0, 4.0], [4.0, 8.0, 1.0]]).astype(np.float32)}

    inputs = [
        grpcclient.InferInput(
            "input", input_data["input"].shape, triton.np_to_triton_dtype(input_data["input"].dtype)
        )
    ]
  inputs[0].set_data_from_numpy(input_data)

tests/unit/systems/torch/test_torch.py:134:


/usr/local/lib/python3.8/dist-packages/tritonclient/grpc/init.py:1701: in set_data_from_numpy
raise_error("input_tensor must be a numpy array")


msg = 'input_tensor must be a numpy array'

def raise_error(msg):
    """
    Raise error with the provided message
    """
  raise InferenceServerException(msg=msg) from None

E tritonclient.utils.InferenceServerException: input_tensor must be a numpy array

/usr/local/lib/python3.8/dist-packages/tritonclient/utils/init.py:35: InferenceServerException
______________________ test_pytorch_op_serving[True-True] ______________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_pytorch_op_serving_True_T0')
use_path = True, torchscript = True

@pytest.mark.skipif(not TRITON_SERVER_PATH, reason="triton server not found")
@pytest.mark.parametrize("torchscript", [True, False])
@pytest.mark.parametrize("use_path", [True, False])
def test_pytorch_op_serving(tmpdir, use_path, torchscript):
    # from merlin.core.dispatch import make_df

    model_name = "0_predictpytorch"
    model_path = str(tmpdir / "model.pt")

    model_to_use = model_scripted if torchscript else model
    model_or_path = model_path if use_path else model_to_use

    if use_path:
        try:
            # jit-compiled version of a model
            model_to_use.save(model_path)
        except AttributeError:
            # non-jit-compiled version of a model
            torch.save(model_to_use, model_path)

    predictions = ["input"] >> ptorch_op.PredictPyTorch(
        model_or_path, torchscript, model_input_schema, model_output_schema, sparse_max={"input": 3}
    )
    ensemble = Ensemble(predictions, model_input_schema)
  ens_config, node_configs = ensemble.export(tmpdir)

tests/unit/systems/torch/test_torch.py:169:


merlin/systems/dag/ensemble.py:105: in export
node_config = node.export(export_path, node_id=node_id, version=version)
merlin/systems/dag/node.py:44: in export
return self.op.export(
merlin/systems/dag/ops/pytorch.py:151: in export
return self._export_model_config(
merlin/systems/dag/ops/pytorch.py:168: in _export_model_config
self._export_torchscript_config(name, output_path)
merlin/systems/dag/ops/pytorch.py:261: in _export_torchscript_config
text_format.PrintMessage(config, o)
/usr/local/lib/python3.8/dist-packages/google/protobuf/text_format.py:239: in PrintMessage
printer.PrintMessage(message)
/usr/local/lib/python3.8/dist-packages/google/protobuf/text_format.py:454: in PrintMessage
self.PrintField(field, value)
/usr/local/lib/python3.8/dist-packages/google/protobuf/text_format.py:545: in PrintField
self._PrintFieldName(field)


self = <google.protobuf.text_format._Printer object at 0x7f2fa8157f70>
field = <google.protobuf.descriptor.FieldDescriptor object at 0x7f30fec56850>

def _PrintFieldName(self, field):
  """Print field name."""
  out = self.out
out.write(' ' * self.indent)

E TypeError: a bytes-like object is required, not 'str'

/usr/local/lib/python3.8/dist-packages/google/protobuf/text_format.py:517: TypeError
_____________________ test_pytorch_op_serving[True-False] ______________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_pytorch_op_serving_True_F0')
use_path = True, torchscript = False

@pytest.mark.skipif(not TRITON_SERVER_PATH, reason="triton server not found")
@pytest.mark.parametrize("torchscript", [True, False])
@pytest.mark.parametrize("use_path", [True, False])
def test_pytorch_op_serving(tmpdir, use_path, torchscript):
    # from merlin.core.dispatch import make_df

    model_name = "0_predictpytorch"
    model_path = str(tmpdir / "model.pt")

    model_to_use = model_scripted if torchscript else model
    model_or_path = model_path if use_path else model_to_use

    if use_path:
        try:
            # jit-compiled version of a model
            model_to_use.save(model_path)
        except AttributeError:
            # non-jit-compiled version of a model
            torch.save(model_to_use, model_path)

    predictions = ["input"] >> ptorch_op.PredictPyTorch(
        model_or_path, torchscript, model_input_schema, model_output_schema, sparse_max={"input": 3}
    )
    ensemble = Ensemble(predictions, model_input_schema)
    ens_config, node_configs = ensemble.export(tmpdir)

    input_data = {"input": np.array([[2.0, 3.0, 4.0], [4.0, 8.0, 1.0]]).astype(np.float32)}
    # input_data = {"input": np.array([2.0, 3.0, 4.0]).astype(np.float32)}

    inputs = [
        grpcclient.InferInput(
            "input", input_data["input"].shape, triton.np_to_triton_dtype(input_data["input"].dtype)
        )
    ]
    inputs[0].set_data_from_numpy(input_data["input"])

    outputs = [grpcclient.InferRequestedOutput("OUTPUT__0")]

    response = None
  with run_triton_server(tmpdir) as client:

tests/unit/systems/torch/test_torch.py:184:


/usr/lib/python3.8/contextlib.py:113: in enter
return next(self.gen)


modelpath = local('/tmp/pytest-of-jenkins/pytest-3/test_pytorch_op_serving_True_F0')

@contextlib.contextmanager
def run_triton_server(modelpath):
    """This function starts up a Triton server instance and returns a client to it.

    Parameters
    ----------
    modelpath : string
        The path to the model to load.

    Yields
    ------
    client: tritonclient.InferenceServerClient
        The client connected to the Triton server.

    """
    cmdline = [
        TRITON_SERVER_PATH,
        "--model-repository",
        modelpath,
        "--backend-config=tensorflow,version=2",
    ]
    env = os.environ.copy()
    env["CUDA_VISIBLE_DEVICES"] = "0"
    with subprocess.Popen(cmdline, env=env) as process:
        try:
            with grpcclient.InferenceServerClient("localhost:8001") as client:
                # wait until server is ready
                for _ in range(60):
                    if process.poll() is not None:
                        retcode = process.returncode
                      raise RuntimeError(f"Tritonserver failed to start (ret={retcode})")

E RuntimeError: Tritonserver failed to start (ret=1)

merlin/systems/triton/utils.py:46: RuntimeError
----------------------------- Captured stderr call -----------------------------
I0729 19:33:34.333899 7465 pinned_memory_manager.cc:240] Pinned memory pool is created at '0x7f15d6000000' with size 268435456
I0729 19:33:34.334686 7465 cuda_memory_manager.cc:105] CUDA memory pool is created on device 0 with size 67108864
I0729 19:33:34.337142 7465 model_repository_manager.cc:1191] loading: 0_predictpytorch:1
I0729 19:33:34.444207 7465 python.cc:2388] TRITONBACKEND_ModelInstanceInitialize: 0_predictpytorch (GPU device 0)
E0729 19:33:34.445186 7465 model_repository_manager.cc:1348] failed to load '0_predictpytorch' version 1: Internal: model.py does not exist in the model repository path: /tmp/pytest-of-jenkins/pytest-3/test_pytorch_op_serving_True_F0/0_predictpytorch/1/model.py
E0729 19:33:34.445258 7465 model_repository_manager.cc:1551] Invalid argument: ensemble 'ensemble_model' depends on '0_predictpytorch' which has no loaded version
I0729 19:33:34.445314 7465 server.cc:556]
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+

I0729 19:33:34.445363 7465 server.cc:583]
+---------+-------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Backend | Path | Config |
+---------+-------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------+
| python | /opt/tritonserver/backends/python/libtriton_python.so | {"cmdline":{"auto-complete-config":"false","min-compute-capability":"6.000000","backend-directory":"/opt/tritonserver/backends","default-max-batch-size":"4"}} |
+---------+-------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0729 19:33:34.445411 7465 server.cc:626]
+------------------+---------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Model | Version | Status |
+------------------+---------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 0_predictpytorch | 1 | UNAVAILABLE: Internal: model.py does not exist in the model repository path: /tmp/pytest-of-jenkins/pytest-3/test_pytorch_op_serving_True_F0/0_predictpytorch/1/model.py |
+------------------+---------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0729 19:33:34.505238 7465 metrics.cc:650] Collecting metrics for GPU 0: Tesla P100-DGXS-16GB
I0729 19:33:34.506072 7465 tritonserver.cc:2138]
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option | Value |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| server_id | triton |
| server_version | 2.22.0 |
| server_extensions | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data statistics trace |
| model_repository_path[0] | /tmp/pytest-of-jenkins/pytest-3/test_pytorch_op_serving_True_F0 |
| model_control_mode | MODE_NONE |
| strict_model_config | 1 |
| rate_limit | OFF |
| pinned_memory_pool_byte_size | 268435456 |
| cuda_memory_pool_byte_size{0} | 67108864 |
| response_cache_byte_size | 0 |
| min_supported_compute_capability | 6.0 |
| strict_readiness | 1 |
| exit_timeout | 30 |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0729 19:33:34.506107 7465 server.cc:257] Waiting for in-flight requests to complete.
I0729 19:33:34.506115 7465 server.cc:273] Timeout 30: Found 0 model versions that have in-flight inferences
I0729 19:33:34.506125 7465 server.cc:288] All models are stopped, unloading models
I0729 19:33:34.506131 7465 server.cc:295] Timeout 30: Found 0 live models and 0 in-flight non-inference requests
error: creating server: Internal - failed to load all models
W0729 19:33:35.524538 7465 metrics.cc:468] Unable to get energy consumption for GPU 0. Status:Success, value:0
W0729 19:33:35.524601 7465 metrics.cc:507] Unable to get memory usage for GPU 0. Memory usage status:Success, value:0. Memory total status:Success, value:0
_____________________ test_pytorch_op_serving[False-True] ______________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_pytorch_op_serving_False_0')
use_path = False, torchscript = True

@pytest.mark.skipif(not TRITON_SERVER_PATH, reason="triton server not found")
@pytest.mark.parametrize("torchscript", [True, False])
@pytest.mark.parametrize("use_path", [True, False])
def test_pytorch_op_serving(tmpdir, use_path, torchscript):
    # from merlin.core.dispatch import make_df

    model_name = "0_predictpytorch"
    model_path = str(tmpdir / "model.pt")

    model_to_use = model_scripted if torchscript else model
    model_or_path = model_path if use_path else model_to_use

    if use_path:
        try:
            # jit-compiled version of a model
            model_to_use.save(model_path)
        except AttributeError:
            # non-jit-compiled version of a model
            torch.save(model_to_use, model_path)

    predictions = ["input"] >> ptorch_op.PredictPyTorch(
        model_or_path, torchscript, model_input_schema, model_output_schema, sparse_max={"input": 3}
    )
    ensemble = Ensemble(predictions, model_input_schema)
  ens_config, node_configs = ensemble.export(tmpdir)

tests/unit/systems/torch/test_torch.py:169:


merlin/systems/dag/ensemble.py:105: in export
node_config = node.export(export_path, node_id=node_id, version=version)
merlin/systems/dag/node.py:44: in export
return self.op.export(
merlin/systems/dag/ops/pytorch.py:151: in export
return self._export_model_config(
merlin/systems/dag/ops/pytorch.py:168: in _export_model_config
self._export_torchscript_config(name, output_path)
merlin/systems/dag/ops/pytorch.py:261: in _export_torchscript_config
text_format.PrintMessage(config, o)
/usr/local/lib/python3.8/dist-packages/google/protobuf/text_format.py:239: in PrintMessage
printer.PrintMessage(message)
/usr/local/lib/python3.8/dist-packages/google/protobuf/text_format.py:454: in PrintMessage
self.PrintField(field, value)
/usr/local/lib/python3.8/dist-packages/google/protobuf/text_format.py:545: in PrintField
self._PrintFieldName(field)


self = <google.protobuf.text_format._Printer object at 0x7f3024246310>
field = <google.protobuf.descriptor.FieldDescriptor object at 0x7f30fec56850>

def _PrintFieldName(self, field):
  """Print field name."""
  out = self.out
out.write(' ' * self.indent)

E TypeError: a bytes-like object is required, not 'str'

/usr/local/lib/python3.8/dist-packages/google/protobuf/text_format.py:517: TypeError
_____________________ test_pytorch_op_serving[False-False] _____________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_pytorch_op_serving_False_1')
use_path = False, torchscript = False

@pytest.mark.skipif(not TRITON_SERVER_PATH, reason="triton server not found")
@pytest.mark.parametrize("torchscript", [True, False])
@pytest.mark.parametrize("use_path", [True, False])
def test_pytorch_op_serving(tmpdir, use_path, torchscript):
    # from merlin.core.dispatch import make_df

    model_name = "0_predictpytorch"
    model_path = str(tmpdir / "model.pt")

    model_to_use = model_scripted if torchscript else model
    model_or_path = model_path if use_path else model_to_use

    if use_path:
        try:
            # jit-compiled version of a model
            model_to_use.save(model_path)
        except AttributeError:
            # non-jit-compiled version of a model
            torch.save(model_to_use, model_path)

    predictions = ["input"] >> ptorch_op.PredictPyTorch(
        model_or_path, torchscript, model_input_schema, model_output_schema, sparse_max={"input": 3}
    )
    ensemble = Ensemble(predictions, model_input_schema)
    ens_config, node_configs = ensemble.export(tmpdir)

    input_data = {"input": np.array([[2.0, 3.0, 4.0], [4.0, 8.0, 1.0]]).astype(np.float32)}
    # input_data = {"input": np.array([2.0, 3.0, 4.0]).astype(np.float32)}

    inputs = [
        grpcclient.InferInput(
            "input", input_data["input"].shape, triton.np_to_triton_dtype(input_data["input"].dtype)
        )
    ]
    inputs[0].set_data_from_numpy(input_data["input"])

    outputs = [grpcclient.InferRequestedOutput("OUTPUT__0")]

    response = None
    with run_triton_server(tmpdir) as client:
      response = client.infer(model_name, inputs, outputs=outputs)

tests/unit/systems/torch/test_torch.py:185:


/usr/local/lib/python3.8/dist-packages/tritonclient/grpc/init.py:1322: in infer
raise_error_grpc(rpc_error)


rpc_error = <_InactiveRpcError of RPC that terminated with:
status = StatusCode.INTERNAL
details = "Failed to process the reques...t-of-jenkins/pytest-3/test_pytorch_op_serving_False_1/0_predictpytorch/1/model.py(177): execute\n","grpc_status":13}"

def raise_error_grpc(rpc_error):
  raise get_error_grpc(rpc_error) from None

E tritonclient.utils.InferenceServerException: [StatusCode.INTERNAL] Failed to process the request(s) for model instance '0_predictpytorch', message: RuntimeError: size mismatch, got 1, 1x3,2
E
E At:
E /usr/local/lib/python3.8/dist-packages/torch/nn/modules/linear.py(114): forward
E /usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py(1130): _call_impl
E /var/jenkins_home/workspace/merlin_systems/systems/tests/unit/systems/torch/test_torch.py(33): forward
E /usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py(1130): _call_impl
E /tmp/pytest-of-jenkins/pytest-3/test_pytorch_op_serving_False_1/0_predictpytorch/1/model.py(177): execute

/usr/local/lib/python3.8/dist-packages/tritonclient/grpc/init.py:62: InferenceServerException
----------------------------- Captured stdout call -----------------------------
Signal (2) received.
----------------------------- Captured stderr call -----------------------------
I0729 19:33:37.488098 7479 pinned_memory_manager.cc:240] Pinned memory pool is created at '0x7f0ef6000000' with size 268435456
I0729 19:33:37.488850 7479 cuda_memory_manager.cc:105] CUDA memory pool is created on device 0 with size 67108864
I0729 19:33:37.491296 7479 model_repository_manager.cc:1191] loading: 0_predictpytorch:1
I0729 19:33:37.598363 7479 python.cc:2388] TRITONBACKEND_ModelInstanceInitialize: 0_predictpytorch (GPU device 0)
I0729 19:33:40.556592 7479 model_repository_manager.cc:1345] successfully loaded '0_predictpytorch' version 1
I0729 19:33:40.556794 7479 model_repository_manager.cc:1191] loading: ensemble_model:1
I0729 19:33:40.657180 7479 model_repository_manager.cc:1345] successfully loaded 'ensemble_model' version 1
I0729 19:33:40.657301 7479 server.cc:556]
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+

I0729 19:33:40.657362 7479 server.cc:583]
+---------+-------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Backend | Path | Config |
+---------+-------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------+
| python | /opt/tritonserver/backends/python/libtriton_python.so | {"cmdline":{"auto-complete-config":"false","min-compute-capability":"6.000000","backend-directory":"/opt/tritonserver/backends","default-max-batch-size":"4"}} |
+---------+-------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0729 19:33:40.657404 7479 server.cc:626]
+------------------+---------+--------+
| Model | Version | Status |
+------------------+---------+--------+
| 0_predictpytorch | 1 | READY |
| ensemble_model | 1 | READY |
+------------------+---------+--------+

I0729 19:33:40.716414 7479 metrics.cc:650] Collecting metrics for GPU 0: Tesla P100-DGXS-16GB
I0729 19:33:40.717290 7479 tritonserver.cc:2138]
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option | Value |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| server_id | triton |
| server_version | 2.22.0 |
| server_extensions | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data statistics trace |
| model_repository_path[0] | /tmp/pytest-of-jenkins/pytest-3/test_pytorch_op_serving_False_1 |
| model_control_mode | MODE_NONE |
| strict_model_config | 1 |
| rate_limit | OFF |
| pinned_memory_pool_byte_size | 268435456 |
| cuda_memory_pool_byte_size{0} | 67108864 |
| response_cache_byte_size | 0 |
| min_supported_compute_capability | 6.0 |
| strict_readiness | 1 |
| exit_timeout | 30 |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0729 19:33:40.718238 7479 grpc_server.cc:4589] Started GRPCInferenceService at 0.0.0.0:8001
I0729 19:33:40.718462 7479 http_server.cc:3303] Started HTTPService at 0.0.0.0:8000
I0729 19:33:40.759290 7479 http_server.cc:178] Started Metrics Service at 0.0.0.0:8002
W0729 19:33:41.742973 7479 metrics.cc:468] Unable to get energy consumption for GPU 0. Status:Success, value:0
W0729 19:33:41.743038 7479 metrics.cc:507] Unable to get memory usage for GPU 0. Memory usage status:Success, value:0. Memory total status:Success, value:0
W0729 19:33:42.743204 7479 metrics.cc:468] Unable to get energy consumption for GPU 0. Status:Success, value:0
W0729 19:33:42.743261 7479 metrics.cc:507] Unable to get memory usage for GPU 0. Memory usage status:Success, value:0. Memory total status:Success, value:0
W0729 19:33:43.769611 7479 metrics.cc:468] Unable to get energy consumption for GPU 0. Status:Success, value:0
W0729 19:33:43.769667 7479 metrics.cc:507] Unable to get memory usage for GPU 0. Memory usage status:Success, value:0. Memory total status:Success, value:0
0729 19:33:45.010420 7519 pb_stub.cc:749] Failed to process the request(s) for model '0_predictpytorch', message: RuntimeError: size mismatch, got 1, 1x3,2

At:
/usr/local/lib/python3.8/dist-packages/torch/nn/modules/linear.py(114): forward
/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py(1130): _call_impl
/var/jenkins_home/workspace/merlin_systems/systems/tests/unit/systems/torch/test_torch.py(33): forward
/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py(1130): _call_impl
/tmp/pytest-of-jenkins/pytest-3/test_pytorch_op_serving_False_1/0_predictpytorch/1/model.py(177): execute

I0729 19:33:45.011783 7479 server.cc:257] Waiting for in-flight requests to complete.
I0729 19:33:45.011813 7479 server.cc:273] Timeout 30: Found 0 model versions that have in-flight inferences
I0729 19:33:45.011848 7479 model_repository_manager.cc:1223] unloading: ensemble_model:1
I0729 19:33:45.011952 7479 model_repository_manager.cc:1223] unloading: 0_predictpytorch:1
I0729 19:33:45.012062 7479 server.cc:288] All models are stopped, unloading models
I0729 19:33:45.012086 7479 server.cc:295] Timeout 30: Found 2 live models and 0 in-flight non-inference requests
I0729 19:33:45.012131 7479 model_repository_manager.cc:1328] successfully unloaded 'ensemble_model' version 1
I0729 19:33:46.012179 7479 server.cc:295] Timeout 29: Found 1 live models and 0 in-flight non-inference requests
free(): invalid pointer
I0729 19:33:46.655115 7479 model_repository_manager.cc:1328] successfully unloaded '0_predictpytorch' version 1
I0729 19:33:47.012306 7479 server.cc:295] Timeout 28: Found 0 live models and 0 in-flight non-inference requests
=============================== warnings summary ===============================
../../../../../usr/local/lib/python3.8/dist-packages/nvtabular/framework_utils/init.py:18
/usr/local/lib/python3.8/dist-packages/nvtabular/framework_utils/init.py:18: DeprecationWarning: The nvtabular.framework_utils module is being replaced by the Merlin Models library. Support for importing from nvtabular.framework_utils is deprecated, and will be removed in a future version. Please consider using the models and layers from Merlin Models instead.
warnings.warn(

tests/unit/examples/test_serving_ranking_models_with_merlin_systems.py: 1 warning
tests/unit/systems/test_ensemble.py: 2 warnings
tests/unit/systems/test_export.py: 1 warning
tests/unit/systems/test_inference_ops.py: 2 warnings
tests/unit/systems/test_op_runner.py: 4 warnings
/usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:384: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.
warnings.warn(

tests/unit/systems/test_export.py::test_export_run_ensemble_triton[tensorflow-parquet]
/var/jenkins_home/workspace/merlin_systems/systems/merlin/systems/triton/export.py:304: UserWarning: Column x is being generated by NVTabular workflow but is unused in test_name_tf model
warnings.warn(

tests/unit/systems/test_export.py::test_export_run_ensemble_triton[tensorflow-parquet]
/var/jenkins_home/workspace/merlin_systems/systems/merlin/systems/triton/export.py:304: UserWarning: Column y is being generated by NVTabular workflow but is unused in test_name_tf model
warnings.warn(

tests/unit/systems/test_export.py::test_export_run_ensemble_triton[tensorflow-parquet]
/var/jenkins_home/workspace/merlin_systems/systems/merlin/systems/triton/export.py:304: UserWarning: Column id is being generated by NVTabular workflow but is unused in test_name_tf model
warnings.warn(

tests/unit/systems/fil/test_fil.py::test_binary_classifier_default[sklearn_forest_classifier-get_model_params4]
tests/unit/systems/fil/test_fil.py::test_binary_classifier_with_proba[sklearn_forest_classifier-get_model_params4]
tests/unit/systems/fil/test_fil.py::test_multi_classifier[sklearn_forest_classifier-get_model_params4]
tests/unit/systems/fil/test_fil.py::test_regressor[sklearn_forest_regressor-get_model_params4]
tests/unit/systems/fil/test_fil.py::test_model_file[sklearn_forest_regressor-checkpoint.tl]
/usr/local/lib/python3.8/dist-packages/sklearn/utils/deprecation.py:103: FutureWarning: Attribute n_features_ was deprecated in version 1.0 and will be removed in 1.2. Use n_features_in_ instead.
warnings.warn(msg, category=FutureWarning)

tests/unit/systems/torch/test_torch.py::test_pytorch_op_serving[True-True]
/usr/local/lib/python3.8/dist-packages/torch/serialization.py:707: UserWarning: 'torch.load' received a zip file that looks like a TorchScript archive dispatching to 'torch.jit.load' (call 'torch.jit.load' directly to silence this warning)
warnings.warn("'torch.load' received a zip file that looks like a TorchScript archive"

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=========================== short test summary info ============================
FAILED tests/unit/systems/torch/test_torch.py::test_pytorch_op_exports_own_config[True]
FAILED tests/unit/systems/torch/test_torch.py::test_torch_backend - tritoncli...
FAILED tests/unit/systems/torch/test_torch.py::test_pytorch_op_serving[True-True]
FAILED tests/unit/systems/torch/test_torch.py::test_pytorch_op_serving[True-False]
FAILED tests/unit/systems/torch/test_torch.py::test_pytorch_op_serving[False-True]
FAILED tests/unit/systems/torch/test_torch.py::test_pytorch_op_serving[False-False]
============ 6 failed, 51 passed, 20 warnings in 249.46s (0:04:09) =============
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/systems/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[merlin_systems] $ /bin/bash /tmp/jenkins8063346701326083820.sh

@viswa-nvidia viswa-nvidia added this to the Merlin 22.08 milestone Jul 29, 2022
@viswa-nvidia
Copy link

arbitration: which initiative is this under ?

@karlhigley
Copy link
Contributor Author

@viswa-nvidia It's linked back up through #153 to NVIDIA-Merlin/Merlin#255

@nvidia-merlin-bot
Copy link

Click to view CI Results
GitHub pull request #151 of commit 4e51fc6d866e80d05fee4c733ffc407d2221c7b0, no merge conflicts.
Running as SYSTEM
Setting status of 4e51fc6d866e80d05fee4c733ffc407d2221c7b0 to PENDING with url https://10.20.13.93:8080/job/merlin_systems/256/console and message: 'Pending'
Using context: Jenkins
Building on master in workspace /var/jenkins_home/workspace/merlin_systems
using credential fce1c729-5d7c-48e8-90cb-b0c314b1076e
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/systems # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/systems
 > git --version # timeout=10
using GIT_ASKPASS to set credentials login for merlin-systems user + githubtoken
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/systems +refs/pull/151/*:refs/remotes/origin/pr/151/* # timeout=10
 > git rev-parse 4e51fc6d866e80d05fee4c733ffc407d2221c7b0^{commit} # timeout=10
Checking out Revision 4e51fc6d866e80d05fee4c733ffc407d2221c7b0 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 4e51fc6d866e80d05fee4c733ffc407d2221c7b0 # timeout=10
Commit message: "fix open paths"
 > git rev-list --no-walk 50855cccba4dd20a8759095bfee872d76e71a4b7 # timeout=10
[merlin_systems] $ /bin/bash /tmp/jenkins15130522378921920782.sh
PYTHONPATH=:/usr/local/lib/python3.8/dist-packages/:/usr/local/hugectr/lib:/var/jenkins_home/workspace/merlin_systems/systems
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/merlin_systems/systems, configfile: pyproject.toml
plugins: anyio-3.6.1, xdist-2.5.0, forked-1.4.0, cov-3.0.0
collected 82 items

tests/unit/test_version.py . [ 1%]
tests/unit/examples/test_serving_an_xgboost_model_with_merlin_systems.py F [ 2%]
[ 2%]
tests/unit/examples/test_serving_ranking_models_with_merlin_systems.py F [ 3%]
[ 3%]
tests/unit/systems/test_ensemble.py FF.Build was aborted
Aborted by �[8mha:////4I6AZwo/1Z8Fal8AhZTEatjIwqNwCcqT21311HdysuK+AAAAlx+LCAAAAAAAAP9b85aBtbiIQTGjNKU4P08vOT+vOD8nVc83PyU1x6OyILUoJzMv2y+/JJUBAhiZGBgqihhk0NSjKDWzXb3RdlLBUSYGJk8GtpzUvPSSDB8G5tKinBIGIZ+sxLJE/ZzEvHT94JKizLx0a6BxUmjGOUNodHsLgAzWEgZu/dLi1CL9xJTczDwAj6GcLcAAAAA=�[0madmin
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/systems/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[merlin_systems] $ /bin/bash /tmp/jenkins15111406350942464083.sh

@nvidia-merlin-bot
Copy link

Click to view CI Results
GitHub pull request #151 of commit a99e7a547d506cd777c6c53b20322acb4d58b5ad, no merge conflicts.
Running as SYSTEM
Setting status of a99e7a547d506cd777c6c53b20322acb4d58b5ad to PENDING with url https://10.20.13.93:8080/job/merlin_systems/257/console and message: 'Pending'
Using context: Jenkins
Building on master in workspace /var/jenkins_home/workspace/merlin_systems
using credential fce1c729-5d7c-48e8-90cb-b0c314b1076e
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/systems # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/systems
 > git --version # timeout=10
using GIT_ASKPASS to set credentials login for merlin-systems user + githubtoken
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/systems +refs/pull/151/*:refs/remotes/origin/pr/151/* # timeout=10
 > git rev-parse a99e7a547d506cd777c6c53b20322acb4d58b5ad^{commit} # timeout=10
Checking out Revision a99e7a547d506cd777c6c53b20322acb4d58b5ad (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f a99e7a547d506cd777c6c53b20322acb4d58b5ad # timeout=10
Commit message: "Merge branch 'main' into feature/pytorch"
 > git rev-list --no-walk 4e51fc6d866e80d05fee4c733ffc407d2221c7b0 # timeout=10
[merlin_systems] $ /bin/bash /tmp/jenkins11390376864038596964.sh
PYTHONPATH=:/usr/local/lib/python3.8/dist-packages/:/usr/local/hugectr/lib:/var/jenkins_home/workspace/merlin_systems/systems
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/merlin_systems/systems, configfile: pyproject.toml
plugins: anyio-3.6.1, xdist-2.5.0, forked-1.4.0, cov-3.0.0
collected 82 items

tests/unit/test_version.py . [ 1%]
tests/unit/examples/test_serving_an_xgboost_model_with_merlin_systems.py Build was aborted
Aborted by �[8mha:////4I6AZwo/1Z8Fal8AhZTEatjIwqNwCcqT21311HdysuK+AAAAlx+LCAAAAAAAAP9b85aBtbiIQTGjNKU4P08vOT+vOD8nVc83PyU1x6OyILUoJzMv2y+/JJUBAhiZGBgqihhk0NSjKDWzXb3RdlLBUSYGJk8GtpzUvPSSDB8G5tKinBIGIZ+sxLJE/ZzEvHT94JKizLx0a6BxUmjGOUNodHsLgAzWEgZu/dLi1CL9xJTczDwAj6GcLcAAAAA=�[0madmin
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/systems/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[merlin_systems] $ /bin/bash /tmp/jenkins4026410414144722676.sh

@karlhigley karlhigley closed this Aug 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create a PredictPytorch operator
4 participants