Skip to content

mihirk2309/Structure_From_Motion

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 

Repository files navigation

Structure from Motion

Introduction:

The goal of this project is to estimate a three-dimensional structure from two-dimensional image sequences that are linked by changes in camera motion (orientation and translation). This is commonly referred to as Structure from motion. There are several algorithms that accomplish this. We hope to learn how to recreate 3D structures from a given dataset of 2D images in this project.

With a given set of 5 images from a monocular camera and their feature point correspondences, we reconstructed a 3D scene while also obtaining camera poses with respect to the scene.

The steps to perform SFM are as follows:-

  • Estimating Fundamental Matrix
  • Outlier Rejection
  • Estimating Essential Matrix
  • Estimating Camera Pose
  • Linear Triangulation
  • Unique Camera Pose by Cheirality condition
  • Non-Linear Triangulation
  • Linear PnP
  • PnP RANSAC
  • Non-Linear PnP
  • Visibility Matrix
  • Bundle Adjustment

Estimating Fundamental Matrix:

The fundamental matrix is a 3x3 matrix that relates the positions of corresponding points in two images of a scene taken from two different viewpoints. Specifically, if a point X in one image is matched with a point Y in the other image, the fundamental matrix will relate the homogeneous coordinates of these points by

Y' * F * X = 0

where F is the fundamental matrix, X and Y are the homogeneous coordinates of the points in one image, and Y' is the transpose of Y. The matrix F encodes the epipolar geometry between the two images, which can be used to compute epipolar lines and perform other tasks such as triangulation and camera pose estimation.

The fundamental matrix can be estimated from a set of point correspondences between the two images using techniques such as the eight-point algorithm or the normalized eight-point algorithm. These methods work by minimizing the geometric error between the points in the two images, subject to the constraint that the fundamental matrix has rank 2. The rank 2 constraint ensures that the matrix represents true epipolar geometry, rather than an arbitrary 3D projective transformation.

Outlier Rejection:

The outliers in the feature correspondences between 2 images have to be rejected. This is an important step to find the final fundamental matrix. We estimated the inlier-feature correspondences to use in the subsequent steps by estimating the Fundamental matrix and performing RANSAC with an epipolar constraint. Using the normalized 8-point algorithm, we computed the Fundamental matrix only for images 1 and 2. The 8-point algorithm involves using singular value decomposition to solve a linear solver matrix generated by stacking the kroenker product between 8-point correspondences.

Estimating Essential Matrix:

The Essential matrix is also a 3x3 matrix that relates the positions of corresponding points in two images of a scene taken from two different viewpoints.

Y' * E* X = 0 Here, X and Y are normalized image coordinates which was not the case with F.

The difference in a way can be said as, the essential matrix is a metric object pertaining to calibrated cameras, while the fundamental matrix describes the correspondence in more general and fundamental terms of projective geometry. This is captured mathematically by the relationship between a fundamental matrix F and its corresponding essential matrix E , which is:

E = (K)^T * F * K

and K being the intrinsic calibration matrix of the two images involved.

Estimating Camera Pose:

Given the rotation and translation, the Essential matrix E connects corresponding image points from both cameras. Decomposing E yields four mathematically possible poses which are the 4 combinations of the possible camera locations with respect to the 2 image planes.

Refining Camera Pose and Linear Triangulation:

To disambiguate 2 poses out of the 4 and get the 2 unique poses, we use the chierality condition. The Cheirality
Condition requires that the reconstructed points be visible to the cameras. We test this condition by triangulating the 3D
points using linear least squares to determine the sign of the depth Z in the camera coordinate system relative to the camera
center. The camera pose that gives the maximum number of positive depth points is selected.
We can perform Linear triangulation to obtain the 3D location of a world point given a point correspondence and the projection matrix estimated using the unique disambiguate camera pose (R, C) and intrinsics (K) parameters.

Non-Linear Triangulation:

Now, having obtained the 3D points from Linear triangulation from the selected pose, we can refine these 3D points by performing optimization. We try to minimize the reprojection error i.e. the geometric error is re-calculated by projecting the
points on the image plane. We use least squares optimization for this.

Linear PnP, PnP RANSAC, and Non-Linear PnP:

Given the intrinsic camera matrix K, 3D points, and their corresponding 2D points, we can compute the camera poses
which is termed the perspective-n-point problem. We solve the equations having 6 2D and 3D points to get an initially
estimated camera pose. PnP is prone to error as there are outliers in the given set of point correspondences. To over-
come this error, we can use RANSAC to make our camera pose more robust to outliers. Similarly to non-linear triangulation, we refine the output from LinearPnP in Non-linear PnP by minimizing the reprojection error.

Visibility Matrix and Bundle Adjustment:

We are one step closer to having the 3D reconstructed output of the scene now that we have the set of refined 3D points from various perspectives and camera poses. Bundle adjustment refines both camera poses and 3D points at the same time
by minimizing reprojection error. It simultaneously refines the 3D coordinates describing the scene geometry, relative motion
parameters, and optical characteristics of the cameras used to acquire the images, using an optimality criterion involving the
corresponding image projections of all points. The visibility matrix is a Boolean matrix that denotes if a particular feature is visible in a particular image. In bundle adjustment, we create a sparse matrix that records whether a 2D point observation belongs to a particular parameter.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages