This code is based on the paper Optical Flow Estimation using a Spatial Pyramid Network.
[Unofficial Pytorch version] [Unofficial tensorflow version]
- First things first: Setting up this code
- Easy Usage: Compute Optical Flow in 5 lines
- Fast Performance Usage: Compute Optical Flow at a rocket speed
- Training: Train your own models using Spatial Pyramid approach on mulitiple GPUs
- End2End SPyNet: An easy trainable end-to-end version of SPyNet
- Optical Flow Utilities: A set of functions in lua for working around optical flow
- References: For further reading
You need to have Torch.
Install other required packages
cd extras/spybhwd
luarocks make
cd ../stnbhwd
luarocks make
spynet = require('spynet')
easyComputeFlow = spynet.easy_setup()
im1 = image.load('samples/00001_img1.ppm' )
im2 = image.load('samples/00001_img2.ppm' )
flow = easyComputeFlow(im1, im2)
To save your flow fields to a .flo file use flowExtensions.writeFLO.
Set up SPyNet according to the image size and model. For optimal performance, resize your image such that width and height are a multiple of 32. You can also specify your favorite model. The present supported modes are fine tuned models sintelFinal
(default), sintelClean
, kittiFinal
, and base models chairsFinal
and chairsClean
.
spynet = require('spynet')
computeFlow = spynet.setup(512, 384, 'sintelFinal') -- for 384x512 images
Now you can call computeFlow anytime to estimate optical flow between image pairs.
Load an image pair and stack and normalize it.
im1 = image.load('samples/00001_img1.ppm' )
im2 = image.load('samples/00001_img2.ppm' )
im = torch.cat(im1, im2, 1)
im = spynet.normalize(im)
SPyNet works with batches of data on CUDA. So, compute flow using
im = im:resize(1, im:size(1), im:size(2), im:size(3)):cuda()
flow = computeFlow(im)
You can also use batch-mode, if your images im
are a tensor of size Bx6xHxW
, of batch size B with 6 RGB pair channels. You can directly use:
flow = computeFlow(im)
Training sequentially is faster than training end-to-end since you need to learn small number of parameters at each level. To train a level N
, we need the trained models at levels 1
to N-1
. You also initialize the model with a pretrained model at N-1
.
E.g. To train level 3, we need trained models at L1
and L2
, and we initialize it modelL2_3.t7
.
th main.lua -fineWidth 128 -fineHeight 96 -level 3 -netType volcon \
-cache checkpoint -data FLYING_CHAIRS_DIR \
-L1 models/modelL1_3.t7 -L2 models/modelL2_3.t7 \
-retrain models/modelL2_3.t7
The end-to-end version of SPyNet is easily trainable and is available at anuragranj/end2end-spynet.
We provide flowExtensions.lua
containing various functions to make your life easier with optical flow while using Torch/Lua. You can just copy this file into your project directory and use if off the shelf.
flowX = require 'flowExtensions'
Given flow_x
and flow_y
of size MxN
each, evaluate flow_magnitude
of size MxN
.
Given flow_x
and flow_y
of size MxN
each, evaluate flow_angle
of size MxN
in degrees.
Given flow_magnitude
and flow_angle
of size MxN
each, return an image of size 3xMxN
for visualizing optical flow. max
(optional) specifies maximum flow magnitude and legend
(optional) is boolean that prints a legend on the image.
Given flow_x
and flow_y
of size MxN
each, return an image of size 3xMxN
for visualizing optical flow. max
(optional) specifies maximum flow magnitude.
Reads a .flo
file. Loads x
and y
components of optical flow in a 2 channel 2xMxN
optical flow field. First channel stores x
component and second channel stores y
component.
Write a 2xMxN
flow field F
containing x
and y
components of its flow fields in its first and second channel respectively to filename
, a .flo
file.
Reads a .pfm
file. Loads x
and y
components of optical flow in a 2 channel 2xMxN
optical flow field. First channel stores x
component and second channel stores y
component.
Rotates flow
of size 2xMxN
by angle
in radians. Uses nearest-neighbor interpolation to avoid blurring at boundaries.
Scales flow
of size 2xMxN
by sc
times. opt
(optional) specifies interpolation method, simple
(default), bilinear
, and bicubic
.
Scales flowBatch
of size Bx2xMxN
, a batch of B
flow fields by sc
times. Uses nearest-neighbor interpolation.
Our timing benchmark is set up on Flying chair dataset. To test it, you need to download
wget http://lmb.informatik.uni-freiburg.de/resources/datasets/FlyingChairs/FlyingChairs.zip
Run the timing benchmark
th timing_benchmark.lua -data YOUR_FLYING_CHAIRS_DATA_DIRECTORY
- Our warping code is based on qassemoquab/stnbhwd.
- The images in
samples
are from Flying Chairs dataset: Dosovitskiy, Alexey, et al. "Flownet: Learning optical flow with convolutional networks." 2015 IEEE International Conference on Computer Vision (ICCV). IEEE, 2015. - Some parts of
flowExtensions.lua
are adapted from marcoscoffier/optical-flow with help from fguney. - The unofficial PyTorch implementation is from sniklaus.
Free for non-commercial and scientific research purposes. For commercial use, please contact [email protected]. Check LICENSE file for details.
Ranjan, Anurag, and Michael J. Black. "Optical Flow Estimation using a Spatial Pyramid Network." arXiv preprint arXiv:1611.00850 (2016).