You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am trying to extract feature vectors from my resnet50 based CNN optimized with TensorRT 7.0.
I am getting correct output when single input is given to the trt model. But when I am giving batch input to the model, then I get correct output only for the first sample of the batch. The remaining outputs are just zeros.
I have also built my trt engine with "builder.max_batch_size = 16" and "EXPLICIT_BATCH = 1 << (int)(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH))"
How do I get the correct outputs for all the samples in the batch?
imgs = np.ones([16,3,256,128]) # batch_size = 16
output shape = [(16,3072)]
trt_logger = trt.Logger(trt.Logger.INFO)
def load_engine(trt_logger):
TRTbin = 'resnet50_onnx_trt/resnet50mid.model.tar-60.trt'
with open(TRTbin, 'rb') as f, trt.Runtime(trt_logger) as runtime:
return runtime.deserialize_cuda_engine(f.read())
engine = load_engine(trt_logger)
context = engine.create_execution_context()
class HostDeviceMem(object):
#Simple helper data class that's a little nicer to use than a 2-tuple.
def __init__(self, host_mem, device_mem):
self.host = host_mem
self.device = device_mem
def __str__(self):
return "Host:\n" + str(self.host) + "\nDevice:\n" + str(self.device)
def __repr__(self):
return self.__str__()
def alloc_buf_N(engine):
"""Allocates all host/device in/out buffers required for an engine."""
inputs = []
outputs = []
bindings = []
stream = cuda.Stream()
for binding in engine:
size = trt.volume(engine.get_binding_shape(binding)) * engine.max_batch_size
# size = 1572864 = 16*3*256*128 for inputs
# size = 49152 = 16*3072 for outputs
dtype = trt.nptype(engine.get_binding_dtype(binding))
# dtype = # <class 'numpy.float32'> for both input and output
# Allocate host and device buffers
host_mem = cuda.pagelocked_empty(size, dtype)
# host_mem = [0. 0. 0. ... 0. 0. 0.],
# host_mem.shape) = (1572864,) and (49152,) for inputs and outputs respectively
device_mem = cuda.mem_alloc(host_mem.nbytes)
# Append the device buffer to device bindings.
bindings.append(int(device_mem))
# Append to the appropriate list.
if engine.binding_is_input(binding):
inputs.append(HostDeviceMem(host_mem, device_mem))
# print("inputs alloc_buf_N", inputs)
else:
outputs.append(HostDeviceMem(host_mem, device_mem))
# print("outputs alloc_buf_N", outputs)
return inputs, outputs, bindings, stream
def do_inference_v2(engine, context, inputs, bindings, outputs, stream):
"""
Inputs and outputs are expected to be lists of HostDeviceMem objects.
"""
# Transfer input data to the GPU.
cuda.memcpy_htod_async(inp.device, inp.host, stream) for inp in inputs]
# Run inference.
context.execute_async(batch_size=16, bindings=bindings, stream_handle=stream.handle)
# Transfer predictions back from the GPU.
[cuda.memcpy_dtoh_async(out.host, out.device, stream) for out in outputs]
# Synchronize the stream
stream.synchronize()
# Return only the host outputs.
return [out.host for out in outputs]
inputs = imgs.astype(np.float32)
engine = load_engine(trt_logger)
context = engine.create_execution_context()
inputs_alloc_buf, outputs_alloc_buf, bindings_alloc_buf, stream_alloc_buf = alloc_buf_N(engine)
inputs_alloc_buf[0].host = np.ascontiguousarray(inputs)
trt_feature = do_inference_v2(engine, context, inputs_alloc_buf, bindings_alloc_buf, outputs_alloc_buf, stream_alloc_buf)
print("len(trt_feature)",len(trt_feature))
trt_feature = np.asarray(trt_feature)
trt_feature = trt_feature.reshape(16,3072)
print("trt_feature[0][0:15]",trt_feature[0][0:15])
print("trt_feature[1][0:15]",trt_feature[1][0:15])
print("trt_feature.shape",trt_feature.shape)
The output "trt_feature[0][0:15] [ 0. 0. 0. 23.91 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. ]" is correct. However, it does not work for remaining samples in the batch.
The output was supposed to be a list of length 16 (each of dimension 3072), i.e. "len(trt_feature)" = 16. But I got the length of the output equal to 1, i.e. "len(trt_feature)" = 1.
The text was updated successfully, but these errors were encountered:
I am trying to extract feature vectors from my resnet50 based CNN optimized with TensorRT 7.0.
I am getting correct output when single input is given to the trt model. But when I am giving batch input to the model, then I get correct output only for the first sample of the batch. The remaining outputs are just zeros.
I have also built my trt engine with "builder.max_batch_size = 16" and "EXPLICIT_BATCH = 1 << (int)(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH))"
How do I get the correct outputs for all the samples in the batch?
This gives me output
The output "trt_feature[0][0:15] [ 0. 0. 0. 23.91 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. ]" is correct. However, it does not work for remaining samples in the batch.
The output was supposed to be a list of length 16 (each of dimension 3072), i.e. "len(trt_feature)" = 16. But I got the length of the output equal to 1, i.e. "len(trt_feature)" = 1.
The text was updated successfully, but these errors were encountered: