Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tensorrt inference for a batch size = 16 fails. Works when batch size = 1 #11

Open
BipinJG opened this issue Jun 26, 2020 · 0 comments
Open

Comments

@BipinJG
Copy link

BipinJG commented Jun 26, 2020

I am trying to extract feature vectors from my resnet50 based CNN optimized with TensorRT 7.0.

I am getting correct output when single input is given to the trt model. But when I am giving batch input to the model, then I get correct output only for the first sample of the batch. The remaining outputs are just zeros.

I have also built my trt engine with "builder.max_batch_size = 16" and "EXPLICIT_BATCH = 1 << (int)(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH))"

How do I get the correct outputs for all the samples in the batch?

imgs = np.ones([16,3,256,128])  # batch_size = 16
output shape = [(16,3072)]

trt_logger = trt.Logger(trt.Logger.INFO)
def load_engine(trt_logger):
    TRTbin = 'resnet50_onnx_trt/resnet50mid.model.tar-60.trt'
    with open(TRTbin, 'rb') as f, trt.Runtime(trt_logger) as runtime:
        return runtime.deserialize_cuda_engine(f.read())

engine = load_engine(trt_logger)
context = engine.create_execution_context()

class HostDeviceMem(object):
    #Simple helper data class that's a little nicer to use than a 2-tuple.

    def __init__(self, host_mem, device_mem):
        self.host = host_mem
        self.device = device_mem

    def __str__(self):
        return "Host:\n" + str(self.host) + "\nDevice:\n" + str(self.device)

    def __repr__(self):
        return self.__str__()

def alloc_buf_N(engine):
    """Allocates all host/device in/out buffers required for an engine."""
    inputs = []
    outputs = []
    bindings = []

    stream = cuda.Stream()

    for binding in engine:
        size = trt.volume(engine.get_binding_shape(binding)) * engine.max_batch_size
        # size = 1572864 = 16*3*256*128 for inputs
        # size = 49152 = 16*3072 for outputs

        dtype = trt.nptype(engine.get_binding_dtype(binding))
        # dtype = # <class 'numpy.float32'> for both input and output

        # Allocate host and device buffers
        host_mem = cuda.pagelocked_empty(size, dtype)
        # host_mem = [0. 0. 0. ... 0. 0. 0.], 
        # host_mem.shape) = (1572864,) and (49152,) for inputs and outputs respectively

        device_mem = cuda.mem_alloc(host_mem.nbytes)

        # Append the device buffer to device bindings.
        bindings.append(int(device_mem))
        
        # Append to the appropriate list.
        if engine.binding_is_input(binding):
            inputs.append(HostDeviceMem(host_mem, device_mem))
            # print("inputs alloc_buf_N", inputs)
        else:
            outputs.append(HostDeviceMem(host_mem, device_mem))
            # print("outputs alloc_buf_N", outputs)

    return inputs, outputs, bindings, stream
 
def do_inference_v2(engine, context, inputs, bindings, outputs, stream):
    """
    Inputs and outputs are expected to be lists of HostDeviceMem objects.
    """
    # Transfer input data to the GPU.
    cuda.memcpy_htod_async(inp.device, inp.host, stream) for inp in inputs]

    # Run inference.
    context.execute_async(batch_size=16, bindings=bindings, stream_handle=stream.handle)

    # Transfer predictions back from the GPU.
    [cuda.memcpy_dtoh_async(out.host, out.device, stream) for out in outputs]

    # Synchronize the stream
    stream.synchronize()

    # Return only the host outputs.
    return [out.host for out in outputs]

inputs = imgs.astype(np.float32)
engine = load_engine(trt_logger)
context = engine.create_execution_context()

inputs_alloc_buf, outputs_alloc_buf, bindings_alloc_buf, stream_alloc_buf = alloc_buf_N(engine)

inputs_alloc_buf[0].host = np.ascontiguousarray(inputs)

trt_feature = do_inference_v2(engine, context, inputs_alloc_buf, bindings_alloc_buf, outputs_alloc_buf, stream_alloc_buf)
print("len(trt_feature)",len(trt_feature))
trt_feature = np.asarray(trt_feature)
trt_feature = trt_feature.reshape(16,3072)

print("trt_feature[0][0:15]",trt_feature[0][0:15])
print("trt_feature[1][0:15]",trt_feature[1][0:15])
print("trt_feature.shape",trt_feature.shape)

This gives me output

len(trt_feature) 1

trt_feature[0][0:15] [ 0.  0.  0.  23.91  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0. ]
trt_feature[1][0:15] [ 0.  0.  0.  0.     0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0. ]

The output "trt_feature[0][0:15] [ 0. 0. 0. 23.91 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. ]" is correct. However, it does not work for remaining samples in the batch.
The output was supposed to be a list of length 16 (each of dimension 3072), i.e. "len(trt_feature)" = 16. But I got the length of the output equal to 1, i.e. "len(trt_feature)" = 1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant