-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy path2d_motion_from_3d.qmd
637 lines (553 loc) · 20.8 KB
/
2d_motion_from_3d.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
# 3D Motion and Its 2D Projection {#sec-3D_motion_and_its_2D_projection}
## Introduction
As objects move in the world, or as the camera moves, the projection of
the dynamic scene into the two-dimensional (2D) camera plane produces a
sequence of temporally varying pixel brightness. Before diving into how
to estimate motion from pixels, it is useful to understand the image
formation process. Studying how three-dimensional (3D) motion projects
into the camera will allow us to understand what the difference is
between a moving camera or a moving object and what types of constraints
one might be able to use to estimate motion.
## 3D Motion and Its 2D Projection {#d-motion-and-its-2d-projection}
A 3D point will follow a trajectory $\mathbf{P}(t) = (X(t),Y(t),Z(t))$,
in camera coordinates (@fig-optical_flow_basic_motion_point). As the
point moves, it has an instantaneous 3D velocity of
$\dot{\mathbf{P}} = (\dot{X}(t), \dot{Y}(t), \dot{Z}(t))$. The
projection of this point into the image plane location is
$\mathbf{p}(t) = (x(t),y(t))$ and its projection will move with the 2D
instantaneous velocity $\dot{\mathbf{p}} = (\dot{x}(t), \dot{y}(t))$,
where all the derivatives are done with respect to time, $t$.
![A 3D point, $\mathbf{P}$, moving in the world projects a 2D moving point, $\mathbf{p}$, into the camera plane.](figures/optical_flow/basic_motion_point.png){width="50%" #fig-optical_flow_basic_motion_point}
Using the equations of perspective projection $x=f X/Z$ and $y=f Y/Z$
(assuming that the camera is at the origin of the world-coordinate
system), we can derive the equations of how the instantaneous velocity
in the camera plane relates to the point motion in 3D world coordinates:
$$\begin{aligned}
\dot{x} &= f \frac{\dot{X} Z - \dot{Z} X}{Z^2} = \frac{f \dot{X} - \dot{Z} x}{Z}\\
\dot{y} &= f \frac{\dot{Y} Z - \dot{Z} Y}{Z^2} = \frac{f \dot{Y} - \dot{Z} y}{Z}
\end{aligned}$$ The second expression is obtained by using $x=f \, X/Z$
and $y=f \, Y/Z$, which removes the dependency on the world coordinates
$X$ and $Y$. Note that $f$ is the focal length. In many derivations, the
notation is simplified by setting the focal length $f=1$. Here we will
keep it to make explicit which factors depend on the camera parameters
and which ones do not. We can write the last two equations in matrix
form as:
$$\begin{aligned}
\begin{bmatrix}
\dot{x} \\
\dot{y}
\end{bmatrix}
= \frac{1}{Z}
\begin{bmatrix}
f & 0 & -x \\
0 & f & -y
\end{bmatrix}
\begin{bmatrix}
\dot{X} \\
\dot{Y} \\
\dot{Z}
\end{bmatrix}
\end{aligned}$${#eq-motionprojection}
This expression reveals a number of interesting
properties of the optical flow and how it relates to motion in the
world. For instance, points that move parallel to the camera plane
($\dot{Z} = 0$) will project to a motion parallel to the motion in 3D
but with a magnitude that will be inversely proportional to the distance
$Z$:
$$\begin{aligned}
\begin{bmatrix}
\dot{x} \\
\dot{y}
\end{bmatrix}
= \frac{f}{Z}
\begin{bmatrix}
\dot{X} \\
\dot{Y}
\end{bmatrix}
\end{aligned}$${#eq-parallelmotion}
For points moving parallel to the $Z$ axis ($\dot{X} = \dot{Y} = 0$) we
get:
$$\begin{aligned}
\begin{bmatrix}
\dot{x} \\
\dot{y}
\end{bmatrix}
= -\frac{\dot{Z}}{Z}
\begin{bmatrix}
x \\
y
\end{bmatrix}
\end{aligned}$${#eq-forward_objects} Points at the same distance $Z$ from the camera, moving
away or towards the camera ($\dot{Z} \neq 0$, and
$\dot{X} = \dot{Y} = 0$), with the same velocity will project into
points moving at different velocities on the image plane.
@fig-examples_3d_moving_points illustrates the geometry of the
projection of 3D motion into the camera for points moving parallel to
the camera plane (@fig-examples_3d_moving_points\[left\]) and parallel
to the optical axis of the camera
(@fig-examples_3d_moving_points\[right\]). The arrows at the image plane
show the imaged scene velocities.
![(left) Geometry of the projection of 3D motion into the camera for points moving parallel to the camera plane, and (right) parallel to the optical axis of the camera.](figures/optical_flow/examples_3d_moving_points.png){width="100%" #fig-examples_3d_moving_points}
Let's examine a few scenarios in a bit more detail to gain some
familiarity with the relationship between 3D motion and the projected 2D
motion field.
### Vanishing Point
Let's consider a point moving in a straight line in 3D with constant
velocity over time: $\dot{\mathbf{P}} = (V_X, V_Y, V_Z)^\mathsf{T}$. At
each time instant, $t$, the point location will be
$\mathbf{P}(t) = (X+V_Xt, Y+V_Yt, Z+V_Zt)^\mathsf{T}$, and its 2D
projection:
$$\begin{aligned}
x(t) &= f \frac{X + V_X t}{Z + V_Z t}\\
y(t) &= f \frac{Y + V_Y t}{Z + V_Z t}
\end{aligned}$$
If $\dot{Z}=0$, then the projected point will move with
constant velocity over time, as shown in equation (@eq-parallelmotion).
If $\dot{Z} \neq 0$, then, as time goes to infinity, the point will
converge to a **vanishing point**:
$$\begin{aligned}
\lim_{t \to \infty} x(t) &= f \frac{V_X}{V_Z} = x_{\infty}\\
\lim_{t \to \infty} y(t) &= f \frac{V_Y}{V_Z} = y_{\infty}
\end{aligned}$${#eq-motionvanishingpoint}
The following sketch (@fig-flying_bird) shows Gibson's bird (see
@fig-gibson_bird) flying away from the camera along a straight line. In
the camera the bird gets smaller as it flies away until it disappears at
the vanishing point. In this drawing the vanishing point is within the
view of the camera.
![Projection onto the camera plane of the sequence produced by a bird flying away. The bird will vanish at the **vanishing point**](figures/optical_flow/flying_bird.png){width="100%" #fig-flying_bird}
The vanishing point is the location
$\mathbf{p}_{\infty}=(x_{\infty}, y_{\infty})^\mathsf{T}$ where the
moving point slowly converges to. The location of the vanishing point is
independent of the point location at time $t=0$, and it only depends on
the 3D velocity vector, $\mathbf{V}$. Therefore, if the scene contains
multiple points at different locations moving with the same velocity,
they will converge to the same vanishing point.
### Camera Translation
Let's now assume that the scene is static and that only the camera is
moving. In this case, all the observed motion in the image will be due
to the motion of the camera.
Let's assume the camera is moving in a straight line with a velocity
$\dot{\mathbf{T}} = \mathbf{V} = (V_X,V_Y,V_Z)^\mathsf{T}$. The
translation of the camera after time $t$ will be
$\mathbf{T} = \mathbf{V} t$. A point in space
$\mathbf{P} = (X,Y,Z)^\mathsf{T}$, will move with velocity, relative to
the camera, equal to $\dot{\mathbf{P}} = -\mathbf{V}$. The moving camera
is equivalent to the case where all the scene points move relative to
the camera with the same velocity.
The 2D motion field by using equation (@eq-motionprojection) is:
$$\begin{aligned}
\begin{bmatrix}
\dot{x} \\
\dot{y}
\end{bmatrix}
= \frac{1}{Z}
\begin{bmatrix}
-f & 0 & x \\
0 & -f & y
\end{bmatrix}
\begin{bmatrix}
V_X\\
V_Y \\
V_Z
\end{bmatrix}
\end{aligned}$${#eq-2d_motion_field_equation}
We can also express the same relationship by making the contribution of
the camera coordinates more explicit. We can do this by rearranging the
terms in equation (@eq-2d_motion_field_equation), resulting in:
$$\begin{aligned}
\begin{bmatrix}
\dot{x} \\
\dot{y}
\end{bmatrix}
= -\frac{f}{Z}
\begin{bmatrix}
V_X\\
V_Y
\end{bmatrix}
+
\frac{V_Z}{Z}
\begin{bmatrix}
x\\
y
\end{bmatrix}
\end{aligned}$$
Equation (@eq-2d_motion_field_equation) gives a generic expression for
the observed motion in the camera plane produced by a moving camera
undergoing a translation (we will see later what happens if we also have
camera rotation). But let's first look at a few specific scenarios with
a camera following simple translation trajectories (and no rotations).
#### Lateral camera motion
Consider a camera translating laterally, as shown in
@fig-camera_lateral_translation. This will happen if you are looking
through a side window of a car at the scene passing by. In this case,
the forward velocity is zero, $V_Z=0$.
![Lateral camera motion parallel to the camera plane.](figures/optical_flow/camera_lateral_translation.png){width="100%" #fig-camera_lateral_translation}
Under lateral camera motion, using equation (@eq-parallelmotion), we
have the following relationship between the velocity of a 3D point and
the apparent velocity of its projection in the image plane:
$$\begin{aligned}
\begin{bmatrix}
\dot{x} \\
\dot{y}
\end{bmatrix}
=
-\frac{f}{Z}
\begin{bmatrix}
V_X \\
V_Y
\end{bmatrix}
\end{aligned}$$
The motion field depends on the depth at each location $Z$ as
illustrated in @fig-camera_lateral_translation_flow. Objects close to
the camera will appear moving faster than objects farther away. Objects
that are very far will appear as not moving. This parallax effect is the
same one used in stereo vision to recover depth. The 2D motion in the
image place is in the opposite direction to the camera motion.
![Sketch of the motion field under lateral camera motion. Objects close to the camera will appear to be moving faster than objects farther away. Objects that are very far (like the cloud) will appear to be nearly stationary.](figures/optical_flow/camera_lateral_translation_flow.png){width="100%" #fig-camera_lateral_translation_flow}
#### Camera forward motion and focus of expansion
For a camera moving forward, as illustrated in
@fig-camera_forward_translation, note that $V_X = V_Y = 0$. In this
case, the motion is only along the $Z$-axis, $V_Z \neq 0$.
![Camera moving forward, along the camera axis.](figures/optical_flow/camera_forward_translation.png){width="100%" #fig-camera_forward_translation}
Using equation (@eq-forward_objects), we get:
$$\begin{aligned}
\begin{bmatrix}
\dot{x} \\
\dot{y}
\end{bmatrix}
=
\frac{V_Z}{Z}
\begin{bmatrix}
x \\
y
\end{bmatrix}
\end{aligned}$${#eq-motion_projection_focus_expansion}
Equation (@eq-motion_projection_focus_expansion) provides a few
interesting insights. First, the rate of expansion does not depend on
the focal length $f$. Second, the observed motion only depends on the
ratio $V_Z/Z$, which is the inverse of the **time to contact**. The time
to contact, $V_Z/Z$, is the time it will take the camera to reach the
object located a distance $Z$ when moving at velocity $V_Z$.
For a camera moving in an arbitrary direction, that is, with
$V_X \neq 0$, $V_Y \neq 0$, and $V_Z \neq 0$, using equation
(@eq-motionprojection) and equation (@eq-motionvanishingpoint) we get:
$$\begin{aligned}
\begin{bmatrix}
\dot{x} \\
\dot{y}
\end{bmatrix}
=
\frac{V_Z}{Z}
\begin{bmatrix}
x - x_{\infty}\\
y - y_{\infty}
\end{bmatrix}
\end{aligned}$$
The observed motion is zero at the **focus of expansion**,
$(x_{\infty}, y_{\infty})$.
@fig-camera_forward_translation_flow illustrates the apparent motion
field for a camera moving toward the center of a wall. Points near the
center (which will be the point of impact) appear stationary, while
points in the periphery appear to move faster and away from the center.
The whole wall expands over time.
![Sketch of the motion field when the camera approaches a planar surface. The motion field indicates the rate of expansion of the image, and it is a function of the time to contact. In this example, the focus of expansion is on the center.](figures/optical_flow/camera_forward_translation_flow.png){width="100%" #fig-camera_forward_translation_flow}
### Camera Rotation
Let's consider a general camera motion undergoing both translation and
rotation. To compute the motion field with a compact expression, we will
do a number of simplifications assuming a small motion between
consecutive frames. After a small time interval, $\Delta t$, the camera
will move, generating a displacement in the points with respect to the
camera coordinate system equal to:
$$\mathbf{P}_{t+\Delta t} = - \mathbf{T}_{\Delta t} + \mathbf{R}_{\Delta t} \mathbf{P}_t
$$
where $\mathbf{T}_{\Delta t}$ is the camera translation and
$\mathbf{R}_{\Delta t}$ is the camera rotation that took place over that
time interval, $\Delta t$. The velocity of a 3D point with respect to
the camera will be:
$$
\dot{\mathbf{P}} = \frac{\mathbf{P}_{t+\Delta t} - \mathbf{P}_{t}}{\Delta t} = - \mathbf{V} + \frac{\mathbf{R}_{\Delta t} -\mathbf{I}}{\Delta t} \mathbf{P}_t
$${#eq-general_camera_motion_equation_full}
To derive the rotation, we consider the **Euler angles**
(@fig-yaw_pitch_roll) and decompose the rotation using rotations along
the three axes (yaw, pitch, roll):
![Rotation expressed by Euler angles (yaw, pitch, roll).](figures/optical_flow/yaw_pitch_roll.png){width="40%" #fig-yaw_pitch_roll}
Each angle measures the rotation along the camera-coordinate axes. Using
this representation of the rotation, the rotation matrix can be written
as:
$$\begin{aligned}
\mathbf{R}_{\Delta t} =
\begin{bmatrix}
\cos \theta_Z & \sin \theta_Z & 0\\
-\sin \theta_Z & \cos \theta_Z & 0\\
0 & 0 & 1
\end{bmatrix}
\begin{bmatrix}
\cos \theta_Y & 0 & -\sin \theta_Y\\
0 & 1 & 0\\
\sin \theta_Y & 0 & \cos \theta_Y
\end{bmatrix}
\begin{bmatrix}
1 & 0 & 0\\
0 & \cos \theta_X & \sin \theta_X \\
0 & -\sin \theta_X & \cos \theta_X
\end{bmatrix}
\end{aligned}
$$ In this equation, the sign of the angles are chosen to
reflect that a rotation of the camera is equivalent to the opposite
rotation of the 3D point.
For a small $\Delta t$, the angles will be small, and we can approximate
the trigonometric functions, $\cos$ and $\sin$, by
$\cos \alpha \approx 1$ and $\sin \alpha \approx \alpha$. We can also
approximate the product $\sin \alpha \sin \beta \approx 0$ as it will
result in a second-order term.
$$\begin{aligned}
\mathbf{R}_{\Delta t} \approx
\begin{bmatrix}
1 & \theta_Z & 0\\
-\theta_Z & 1& 0\\
0 & 0 & 1
\end{bmatrix}
\begin{bmatrix}
1 & 0 & -\theta_Y\\
0 & 1 & 0\\
\theta_Y & 0 & 1
\end{bmatrix}
\begin{bmatrix}
1 & 0 & 0\\
0 & 1 & \theta_X \\
0 & -\theta_X & 1
\end{bmatrix}
\approx
\begin{bmatrix}
1 & \theta_Z & -\theta_Y\\
-\theta_Z & 1 & \theta_X \\
\theta_Y & -\theta_X & 1
\end{bmatrix}
\end{aligned}$$
$$\mathbf{P}_{t+\Delta t} - \mathbf{P}_{t} = - \mathbf{T}_{\Delta t} + (\mathbf{R}_{\Delta t} - \mathbf{I}) \mathbf{P}_t =
- \mathbf{T}_{\Delta t} -
\begin{bmatrix}
0 & -\theta_Z & \theta_Y\\
\theta_Z & 0 & -\theta_X \\
-\theta_Y & \theta_X & 0
\end{bmatrix}
\mathbf{P}_t$$
The last term corresponds to the cross product in matrix
form (note that we changed the sign of the matrix to make the cross
product form more obvious). Therefore, we can rewrite the previous
expression as:
$$\mathbf{P}_{t+\Delta t} - \mathbf{P}_{t} =
- \mathbf{T}_{\Delta t} - \boldsymbol{\theta} \times \mathbf{P}_t
$$
where
$\boldsymbol{\theta}=(\theta_X,\theta_Y, \theta_Z)$. Substituting this
expression into equation (@eq-general_camera_motion_equation_full), we
get the expression of the motion of a 3D point:
$$\dot{\mathbf{P}} = - \mathbf{V} - \mathbf{W} \times \mathbf{P}_t =
-\begin{bmatrix}
V_X\\
V_Y \\
V_Z
\end{bmatrix}
-
\begin{bmatrix}
-W_Z Y + W_Y Z\\
W_Z X - W_X Z \\
-W_Y X + W_X Y
\end{bmatrix}$$
where $\mathbf{W}$ is the angular velocity
$\mathbf{W}=(W_X,W_Y,W_Z)$. Now we are ready to compute the 2D motion
field by using equation (@eq-motionprojection):
$$\begin{aligned}
\begin{bmatrix}
\dot{x} \\
\dot{y}
\end{bmatrix}
= -\frac{1}{Z}
\begin{bmatrix}
f & 0 & -x \\
0 & f & -y
\end{bmatrix}
\begin{bmatrix}
V_X\\
V_Y \\
V_Z
\end{bmatrix}
-\frac{1}{Z}
\begin{bmatrix}
f & 0 & -x \\
0 & f & -y
\end{bmatrix}
\begin{bmatrix}
-W_Z Y + W_Y Z\\
W_Z X - W_X Z \\
-W_Y X + W_X Y
\end{bmatrix}
\end{aligned}$$
This expression can be rewritten as, using $x=f \, X/Z$ and
$y=f \, Y/Z$:
$$\begin{aligned}
\begin{bmatrix}
\dot{x} \\
\dot{y}
\end{bmatrix}
= \frac{1}{Z}
\begin{bmatrix}
-f & 0 & x \\
0 & -f & y
\end{bmatrix}
\begin{bmatrix}
V_X\\
V_Y \\
V_Z
\end{bmatrix}
+
\frac{1}{f}
\begin{bmatrix}
xy & -f^2-x^2) & f y \\
f^2+y^2 & -xy & -f x
\end{bmatrix}
\begin{bmatrix}
W_X\\
W_Y \\
W_Z
\end{bmatrix}
\end{aligned}$${#eq-2d_motion_field_from_translation_and_rotation}
This is the expression we were looking for. It relates
the 2D motion field with the camera velocity and rotation. The matrices
are only a function of the intrinsic camera parameters (focal length,
$f$) and the camera coordinates. Note that this expression is only valid
for small displacements.
In the previous section we saw what happens when there is no rotation;
now we can focus on the case when there is only camera rotation, that is
$V_X=V_Y=V_Z=0$,
$$\begin{aligned}
\begin{bmatrix}
\dot{x} \\
\dot{y}
\end{bmatrix}
=
\frac{1}{f}
\begin{bmatrix}
xy & -f^2-x^2) & f y \\
f^2+y^2 & -xy & -f x
\end{bmatrix}
\begin{bmatrix}
W_X\\
W_Y \\
W_Z
\end{bmatrix}
\end{aligned}$$
The first thing to notice is that the 2D motion field does not depend on
the 3D scene structure, $Z$, and it is only a function of the rotational
velocity and the camera parameters. Therefore, under camera rotation we
can not learn anything about the scene by observing the motion field.
The only thing we can learn from the 2D motion field is about the motion
of the camera.
Let's consider first rotation along the camera optical axis, that is
$W_X=W_Y=0$. In this case the motion field is:
$$\begin{aligned}
\begin{bmatrix}
\dot{x} \\
\dot{y}
\end{bmatrix}
=
\begin{bmatrix}
y \\
-x
\end{bmatrix}
W_Z
\end{aligned}$$
The 2D motion field at each location $(x,y)$ will point
in the orthogonal direction to the vector that connects that point with
the origin (@fig-motion_wz). The motion field does not depend on the
focal length, $f$.
![Motion field for a rotating camera around the optical axis, $W_X=W_Y=0$.](figures/optical_flow/motion_wz.png){width="40%" #fig-motion_wz}
If $W_Z=0$, then the rotation along $W_X$ or $W_Y$ will produce similar
2D motion fields, so let's consider $W_X=0$. We have:
$$\begin{aligned}
\begin{bmatrix}
\dot{x} \\
\dot{y}
\end{bmatrix}
=
-\frac{1}{f}
\begin{bmatrix}
f^2+x^2 \\
xy
\end{bmatrix}
W_Y
\end{aligned}$$
In this case, the focal length $f$ will have a strong
effect on the appearance of the motion field. For very large $f$, we can
approximate the 2D motion field by $\dot{x} \approx f W_Y$ and
$\dot{y} \approx 0$. The resulting motion field is approximately
constant across the entire image and looks like lateral translation
motion. For very small $f$, the motion field will be similar to the one
produced by a homography and it will be very different to a lateral
camera motion. The flows shown in @fig-motion_wy_f03_f1_f3 correspond
to $f=1/3$, $f=1$, and $f=3$.
![Motion flows corresponding to a rotation around the
$Y$-axis. (left) $f = 1/3$. (middle) $f = 1$. (right) $f = 3$.](figures/optical_flow/camera_motion_field_rotation_three_f.png){width="100%" #fig-motion_wy_f03_f1_f3}
Camera rotation around the $Y$-axis does not inform about the scene
structure, but it informs about the camera parameters. The magnitude of
$W_Y$ only affects the scaling of the motion vectors, but it does not
change their orientation.
In the case of camera rotation, for a large angle (or a large
$\Delta t$), the relationship between the image at time $t$ and the
image at time $t+\Delta t$ is a homography.
### Motion under Varying Focal Length
Before moving into motion estimation, let's consider one final scenario:
a static camera observing a static scene, but the focal length changes
over time. What will the motion field be? In this setting, despite that
there is not motion in the scene, the focal length of the camera changes
over time, producing motion in the image plane. If the focal length
increases, it will seem as if we are zooming into the scene. Would it be
similar to a forward motion?
Starting from the perspective projection equation,
$$\begin{aligned}
\begin{bmatrix}
x \\
y
\end{bmatrix}
= \frac{f}{Z}
\begin{bmatrix}
X \\
Y
\end{bmatrix}
\end{aligned}$${#eq-pers_proj} If we compute the temporal derivative where only $f$
varies over time, we get:
$$\begin{aligned}
\begin{bmatrix}
\dot{x} \\
\dot{y}
\end{bmatrix}
= \frac{\dot{f}}{Z}
\begin{bmatrix}
X \\
Y
\end{bmatrix}
=
\frac{\dot{f}}{f}
\begin{bmatrix}
x \\
y
\end{bmatrix}
\end{aligned}$$
The last expression is obtained by using equation (@eq-pers_proj). As
the equation shows, changing the focal length only results in a scaling
of the projected image on the image plane. It does not create any
parallax. As the sensor has finite size, changing the focal length
results in a zoom and a crop. The motion field does not depend on the 3D
scene structure. Therefore, images taken by a pinhole camera from the
same viewpoint but with different focal lengths do not provide depth
information about the scene. This is an example where there is a 2D
motion field even when there is no motion in the scene.
## Concluding Remarks
As we did in @sec-imaging, in this chapter we have focused on
formulating the problem of image formation: How does the 3D motion in
the world appear once it is projected on the image plane?
But the goal of vision is to inverse this projection and recover the 3D
scene structure. In the upcoming chapters, we will proceed on the path
begun in @sec-motion_estimation, and we will study how to
estimate motion from pixels.