Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] FormatShape bug for optical flow #2630

Closed
3 tasks done
makecent opened this issue Aug 8, 2023 · 5 comments
Closed
3 tasks done

[Bug] FormatShape bug for optical flow #2630

makecent opened this issue Aug 8, 2023 · 5 comments
Assignees

Comments

@makecent
Copy link
Contributor

makecent commented Aug 8, 2023

Branch

main branch (1.x version, such as v1.0.0, or dev-1.x branch)

Prerequisite

Environment

N/A

Describe the bug

As the Normalization operation now is moved from the pipeline to the data preprocessor. The necessary processing step for optical flows, which is originally in the Normalize pipeline in the older version, is missing:

In older version, the Normalize pipeline stacks the flow_x and flow_y:

if modality == 'Flow':
num_imgs = len(results['imgs'])
assert num_imgs % 2 == 0
assert self.mean.shape[0] == 2
assert self.std.shape[0] == 2
n = num_imgs // 2
h, w = results['imgs'][0].shape
x_flow = np.empty((n, h, w), dtype=np.float32)
y_flow = np.empty((n, h, w), dtype=np.float32)
for i in range(n):
x_flow[i] = results['imgs'][2 * i]
y_flow[i] = results['imgs'][2 * i + 1]
x_flow = (x_flow - self.mean[0]) / self.std[0]
y_flow = (y_flow - self.mean[1]) / self.std[1]
if self.adjust_magnitude:
x_flow = x_flow * results['scale_factor'][0]
y_flow = y_flow * results['scale_factor'][1]
imgs = np.stack([x_flow, y_flow], axis=-1)
results['imgs'] = imgs
args = dict(
mean=self.mean,
std=self.std,
to_bgr=self.to_bgr,
adjust_magnitude=self.adjust_magnitude)
results['img_norm_cfg'] = args
return results

In 1.x version, the stacking operation is lost as the Normalize pipeline is no longer used. Causing the dimension error in FormatShape:

  File "/home/louis/miniconda3/envs/mmengine/lib/python3.8/site-packages/mmaction/datasets/transforms/formatting.py", line 260, in transform
    imgs = np.transpose(imgs, (0, 1, 5, 2, 3, 4))
  File "<__array_function__ internals>", line 180, in transpose
  File "/home/louis/.local/lib/python3.8/site-packages/numpy/core/fromnumeric.py", line 660, in transpose
    return _wrapfunc(a, 'transpose', axes)
  File "/home/louis/.local/lib/python3.8/site-packages/numpy/core/fromnumeric.py", line 57, in _wrapfunc
    return bound(*args, **kwds)
ValueError: axes don't match array

Reproduces the problem - code sample

No response

Reproduces the problem - command or script

No response

Reproduces the problem - error message

No response

Additional information

No response

@Dai-Wenxun
Copy link
Collaborator

For the formatting of the optical flow, you need to set the input_format of FormatShape as NCHW_Flow, as shown here. The code of this branch will stack the optical flow at the last dimension as in your #2631.

@Dai-Wenxun
Copy link
Collaborator

Dai-Wenxun commented Aug 11, 2023

For the normalization of optical flow, I think we can implement it as follows:

clip_len = 5
format_shape='NCHW_flow'

model = dict(
    type='Recognizer2D',
    backbone=dict(...),
    cls_head=dict(...),
    data_preprocessor=dict(
        type='ActionDataPreprocessor',
        mean=[128, 128] *clip_len,
        std=[128, 128] * clip_len,
        format_shape=format_shape))

train_pipeline = [
    dict(type='SampleFrames', clip_len=clip_len, frame_interval=1, num_clips=3),
   ....
    dict(type='FormatShape', input_format=format_shape),
   ...
]

@Dai-Wenxun
Copy link
Collaborator

Since the NCHW_flow is not defined in the ActionDataPreprocessor, could you please help us to implement it in action2? The functionality of NCHW_flow should be equivalent to that of NCHW.

@Dai-Wenxun
Copy link
Collaborator

of course, if you have any better ideas to process the optical flow, feel free to let me know. Thank u, bro!

@makecent
Copy link
Contributor Author

@Dai-Wenxun I am a little confused about the NCHW_Flow: why it does not contain a T dimension? In my understanding, the format of optical flows should be the same with RGB frames , i.e., using the NCTHW, albeit C=3 for RGB, and C=2 for Flow.

As for a better idea, I think my PR #2631 is simple and effective. To work with optical flows, we can just simply set the FormatShape as NCTHW like RGBs, and as for the normalization, we can set a 2D mean = [x, x] and 2D std=[x, x]. I have tested my PR on working optical flows and it worked.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants