-
Notifications
You must be signed in to change notification settings - Fork 75
Significant FPS drop and issues with tracking #26
Comments
What model did you use? |
I don't know what is pred. If you want (1,19,19,x) shape, use |
And ref: #23 (comment) To speed up, I'll test it out ASAP. |
I use the yolov4-tiny with relu activation, that is converted to tflite. From what I remembered from netron it has 2 outputs. The script I am using uses 3 outputs, thus the 2nd issue I am facing probably. Here is the script:
The error occurs at the tflite area, although I posted the whole script since it might prove useful for others too. |
Update: I have looked at this code more these days, and Ive noticed that it is made specifically for tflite models with 3 outputs/branches, while my yolov4-tiny model has 2 outputs. I will see how I can modify the above script to run my model, but still the speed ( fps ) are extremely low ( 0.15 fps with inference ). Any idea on how to fix that part? |
It's only 0.15? on Coral? |
Sorry, the FPS on Coral is 0.45. Still relatively low, not that big of an improvement. |
HW: AMD Ryzen 7 2700X I think the computation time excluding inference is too long. How to install scipy on Coral? ResultFPS: 3.09, inference: 0.13 s, compute: 0.32
FPS: 3.06, inference: 0.14 s, compute: 0.33
FPS: 3.08, inference: 0.13 s, compute: 0.32
FPS: 2.93, inference: 0.13 s, compute: 0.34
FPS: 3.06, inference: 0.13 s, compute: 0.33
FPS: 3.18, inference: 0.13 s, compute: 0.31
FPS: 2.87, inference: 0.13 s, compute: 0.35 Scriptimport time
import cv2
from PIL import Image
import numpy as np
import matplotlib.pyplot as plt
from yolov4.tf import YOLOv4
from deep_sort import preprocessing, nn_matching
from deep_sort.detection import Detection
from deep_sort.tracker import Tracker
from tools import generate_detections as gdet
yolo = YOLOv4(tiny=True)
yolo.classes = "dataset/coco.names"
yolo.make_model(activation1="relu")
yolo.load_weights(
r"C:\Users\windows\google_drive\Hard_Soft\NN\yolov4\yolov4-tiny-relu.weights",
weights_type="yolo",
)
# Definition of the parameters
max_cosine_distance = 0.4
nn_budget = None
nms_max_overlap = 1.0
# initialize deep sort
model_filename = "model_data/mars-small128.pb"
encoder = gdet.create_box_encoder(model_filename, batch_size=1)
# calculate cosine distance metric
metric = nn_matching.NearestNeighborDistanceMetric(
"cosine", max_cosine_distance, nn_budget
)
# initialize tracker
tracker = Tracker(metric)
# load configuration for object detector
input_size = yolo.input_size
video_path = r"C:/Users/windows/Desktop/test.mp4"
# begin video capture
vid = cv2.VideoCapture(video_path)
out = None
# initialize color map
cmap = plt.get_cmap("tab20b")
colors = [cmap(i)[:3] for i in np.linspace(0, 1, 20)]
# while video is running
while True:
start_time = time.time()
return_value, frame = vid.read()
if return_value:
frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
image = Image.fromarray(frame)
else:
print("Video has ended or failed, try a different video format!")
break
original_h, original_w, _ = frame.shape
# (x, y, w, h, class_id, probability)
_bboxes = yolo.predict(frame)
mid_time = time.time()
# convert data to numpy arrays and slice out unused elements
# format bounding boxes from normalized ymin, xmin, ymax, xmax ---> xmin, ymin, width, height
num_objects = len(_bboxes)
bboxes = [
[
(box[0] - box[2] / 2) * original_w,
(box[1] - box[3] / 2) * original_h,
box[2] * original_w,
box[3] * original_h,
]
for box in _bboxes
]
bboxes = np.array(bboxes)
scores = np.array([box[5] for box in _bboxes])
classes = np.array([int(box[4]) for box in _bboxes])
# store all predictions in one parameter for simplicity when calling functions
pred_bbox = [bboxes, scores, classes, num_objects]
# read in all class names from config
class_names = yolo.classes
# by default allow all classes in .names file
# allowed_classes = list(class_names.values())
# custom allowed classes (uncomment line below to customize tracker for only people)
allowed_classes = ["person", "bicycle"]
# loop through objects and use class index to get class name, allow only classes in allowed_classes list
names = []
deleted_indx = []
for i in range(num_objects):
class_indx = classes[i]
class_name = class_names[class_indx]
if class_name not in allowed_classes:
deleted_indx.append(i)
else:
names.append(class_name)
names = np.array(names)
count = len(names)
# delete detections that are not in allowed_classes
bboxes = np.delete(bboxes, deleted_indx, axis=0)
# encode yolo detections and feed to tracker
features = encoder(frame, bboxes)
detections = [
Detection(bbox, score, class_name, feature)
for bbox, score, class_name, feature in zip(
bboxes, scores, names, features
)
]
# run non-maxima supression
boxs = np.array([d.tlwh for d in detections])
scores = np.array([d.confidence for d in detections])
classes = np.array([d.class_name for d in detections])
indices = preprocessing.non_max_suppression(
boxs, classes, nms_max_overlap, scores
)
detections = [detections[i] for i in indices]
# Call the tracker
tracker.predict()
tracker.update(detections)
# update tracks
for track in tracker.tracks:
if not track.is_confirmed() or track.time_since_update > 1:
continue
bbox = track.to_tlbr()
class_name = track.get_class()
# draw bbox on screen
color = colors[int(track.track_id) % len(colors)]
color = [i * 255 for i in color]
cv2.rectangle(
frame,
(int(bbox[0]), int(bbox[1])),
(int(bbox[2]), int(bbox[3])),
color,
2,
)
cv2.rectangle(
frame,
(int(bbox[0]), int(bbox[1] - 30)),
(
int(bbox[0])
+ (len(class_name) + len(str(track.track_id))) * 17,
int(bbox[1]),
),
color,
-1,
)
cv2.putText(
frame,
class_name + "-" + str(track.track_id),
(int(bbox[0]), int(bbox[1] - 10)),
0,
0.75,
(255, 255, 255),
2,
)
# calculate frames per second of running detections
fps = 1.0 / (time.time() - start_time)
print(
"FPS: {:.2f}, inference: {:.2f} s, compute: {:.2f}".format(
fps, mid_time - start_time, 1 / fps
)
)
result = np.asarray(frame)
result = cv2.cvtColor(frame, cv2.COLOR_RGB2BGR)
cv2.imshow("Output Video", result)
# if output flag is set, save video file
if cv2.waitKey(1) & 0xFF == ord("q"):
break
cv2.destroyAllWindows() |
Hey, I tested the model with the inference you provided in my pc, giving 0.15 fps and then at Coral, giving 0.45 fps. It is 3x better but still extremely low. At my PC test I used my GPU ( Gtx 1660 super ). Sorry for the misunderstanding. As for Coral, I do not think there is a way to install scipy on it at the moment, so I might just go with kalman trackers or basic centroid tracking. What bothers me a bit though is the aforementioned low FPS issue. |
I modified the script above, and I can say that it works with tflite models now, which is a positive result. The drawback is that it still has extremely low FPS, at the point that the window stops responding:
Thanks for the script ( Tested on GPU ) |
Could you explain how to solve this problem? I got same error: ValueError: Shapes (1, 19, 19) and (1, 38, 38) are incompatible |
Good evening,
I have been trying using the converted model today for object detection with deepsort, without result. Before that, I tried testing it as underlined by you, using the inference command. However, when used with videos it takes a huge amount of time to change the frame and track the changes. As for deepsort, I referred to https://github.com/theAIGuysCode/yolov4-deepsort and his tracker script, only to provide the following error:
ValueError: Shapes (1, 19, 19) and (1, 38, 38) are incompatible
After that, I tried running the above script ( basically the same as hunglc007's script ) with the following correction for int8 models, as specified here hunglc007#214 ( I recall you have referenced someone at one issue at this ). I tried running it, and it got me the following error:
Should I swap the number 2 with 1 or 0, it will eventually bring up an image, with an extremely inaccurate detection. I haven't tried this with video, for safety purposes :P ...
These issues and the fps are the crucial issues for me. Thank you for your great work though, the conversion is successful and the model is working.
The text was updated successfully, but these errors were encountered: