Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactoring/Adding Documentation #813

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

travisdriver
Copy link
Collaborator

@travisdriver travisdriver commented Oct 14, 2024

I'm trying to include some documentation on how we modularize the Structure-from-Motion problem to make it easier for user and contributors to get up-to-speed on GTSfM.

I tried to emulate Nerfstudio's documentation structure, but I'm open to suggestions.

Pages that need to be filled out:

@akshay-krishnan
Copy link
Collaborator

It would be best to do this with multiple PRs, so we can first try to merge this and I will work on a multi-view optimizer page in parallel.

Global descriptor modules are implemented following the [`GlobalDescriptorBase`](https://github.com/borglab/gtsfm/blob/master/gtsfm/frontend/global_descriptor/global_descriptor_base.py) class and must be wrapped using a corresponding [`RetrieverBase`](https://github.com/borglab/gtsfm/blob/master/gtsfm/retriever/retriever_base.py) implementation, where the global descriptor module takes in individual images and outputs their corresponding descriptor and the retriever module takes these descriptors descriptors and computes the image pair similarity scores and outputs the putative image pairs based on a specified threshold (see [`NetVLADGlobalDescriptor`](https://github.com/borglab/gtsfm/blob/master/gtsfm/frontend/global_descriptor/netvlad_global_descriptor.py) and [`NetVLADRetriever`](https://github.com/borglab/gtsfm/blob/master/gtsfm/retriever/netvlad_retriever.py)).

```python
class RetrieverBase(GTSFMProcess):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should explain what a Retreiver is if this is needed.


## What is a Correspondence Generator?

The Correspondence Generator is responsible for taking in putative image pairs from the [`ImagePairsGenerator`](https://github.com/borglab/gtsfm/blob/master/gtsfm/retriever/image_pairs_generator.py) and returning keypoints for each image and correspondences between each specified image pair. Correspondence generation is implemented by the [`CorrespondenceGeneratorBase`](https://github.com/borglab/gtsfm/blob/master/gtsfm/frontend/correspondence_generator/correspondence_generator_base.py) class defined below.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: returning keypoints for each image and (their / keypoint) correspondences between each specified image pair.


## What is an Image Pair Generator?

The Image Pair Generator takes in images from the Loader and outputs putative image pairs for correspondence generation. Image pair generation is implemented by the [`ImagePairsGenerator`](https://github.com/borglab/gtsfm/blob/master/gtsfm/retriever/image_pairs_generator.py) class defined below, which wraps a specific [`Retriever`](https://github.com/borglab/gtsfm/blob/master/gtsfm/retriever/retriever_base.py), and, optionally, a [`GlobalDescriptor`](https://github.com/borglab/gtsfm/blob/master/gtsfm/frontend/global_descriptor/global_descriptor_base.py).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to know what we assume from a reader of this documentation. It seems like they would already need some knowledge of SfM. If not, terms like "putative image pairs" are unclear. What makes a pair? (answer: view overlap, potential for keypoint correspondences).


## Global Image Descriptors

Global desriptors work similar to local feature desriptors except that these methods generate a single descriptor for each image. Distances between these global image descriptors can then be used as a metric for the expected "matchability" of the image pairs during the correspondence generation phase, where a threshold can be used to reject potentially dissimilar image pairs before conducting correspondence generation. This reduces the likelihood of matching image pairs with little to no overlap that could cause erroneous correspondences to be inserted into the optimization process in the back-end while also significantly reducing the runtimes as compared to exhaustive matching.
Copy link
Collaborator

@akshay-krishnan akshay-krishnan Jan 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I dont see much of a similarity here, the descriptors are actually very different. To the point that I would say "unlike feature descriptors that learn a local descriptor for each image patch/pixel, global descriptors generate a single descriptor for each image". The difference seems more important than any similarity (which would only be in the model architecture).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a small but siginificant change: rotation averaging should occur before translation averaging.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be changed for all files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants