Skip to content

Research Paper Analysis

PotatoPalooza edited this page Jun 10, 2022 · 12 revisions

Purpose

This page is for the analysis's of potential papers to integrate into the backend of our project. Since the focus of this project is to get NeRF working in a web app, the most relevant factors are open license, computational resources, and robustness to real world data. Breaking this down further here are the most important qualities ranked:

1. Source code is available and fully open source

2. Speed and memory use in training should be low enough to allow for ease of development

  • Sub hour NeRF training time
  • Less then 8GB memory consumption
  • Not reliant on custom kernels or specific GPU models

3. Inference speed should be fast enough to allow for video rendering

  • Inference of at least 1 fps

4. Robustness to real world data

  • Quality should be on par or better then the original NeRF paper
  • Should work with SFM techniques
  • Insensitive to depth inaccuracies or image artifacts

List of potential project papers

Paper Analysis

Note: some of the excerpts and descriptions are copied directly from the paper linked in each section.

  • PlenOctrees
    Desc: For real-time rendering of Neural Radiance Fields they train NeRFs to predict a spherical harmonic representation of radiance, removing the viewing direction as an input to the neural network. This representation can be directly optimized on and allows for slightly faster convergence.

    • Pros: This technique makes in browser rendering possible
    • Cons: Long training time
    • Memory: 2GB quantized to 34-125MB
    • Training: 1.2 hours finetuning + 6 hours traditional method
    • Inference: Real-time 200+ FPS
  • BARF : Bundle-Adjusting Neural Radiance Fields
    Desc: BARF can effectively optimize the neural scene representations and resolve large camera pose misalignment at the same time. A modified positional encoding method along with a structured learning process of course to fine alignment allows the camera poses to be estimated at training time.

    • Pros: No accurate camera position needed. Better view synthesis on real world data. Compatible with other papers.
    • Cons: No improvements to speed
    • Memory: same as original NeRF
    • Training: same as original NeRF
    • Inference: same as original NeRF
  • Instant Neural Graphics Primitives with a Multiresolution Hash Encoding
    Desc: A versatile new input encoding that permits the use of a smaller network without sacrificing quality, thus significantly reducing the number of floating point and memory access operations: a small neural network is augmented by a multi resolution hash table of trainable feature vectors whose values are optimized through stochastic gradient descent. The positional encoding stores trainable vectors stored by a hash table.

    • Pros: Extremely fast, moderate memory consumption, compatible with other papers, No loss in quality
    • Cons: Closed source, Results from custom kernel
    • Memory: 8GB (variable)
    • Training: Seconds!
    • Inference: miliseconds!
  • Plenoxels - Radiance Fields without Neural Networks

  • TensoRF Tensorial Radiance Fields
    Desc: Instead of using a MLP this paper repesents the radiance field of a scene with a 4D tensor. This tensor is then factorized into multiple compact low-rank tensors which optimization can be directly ran on with the image inputs.

    • Pros: Open source, Fast, better quality, low memory consumption, compatible with ngp-nerf, no custom kernal
    • Cons: No machine learning, Potentially difficult extension, method currently only supports bounded scenes with a single bounding box and cannot handle unbounded scenes with both foreground and background content
    • Memory: < 75 MB
    • Training: < 10-30 min
    • Inference: unknown (probably fast?)
  • MVSNeRF: Fast Generalizable Radiance Field Reconstruction from Multi-View Stereo

    • Pros: Trainable for multiple scenes, better then Pixel NeRF in quality
    • Cons: Worse then Nerf in quality
    • Memory: (all tests below done on 2080Ti)
    • Training: 15 min decent results (sometimes as slow as NeRF like with 360)
    • Inference: unknown (likely same as NeRF)
  • EfficientNeRF: Efficient Neural Radiance Fields

    • Training: few hours (sub hour if combined with MSVNerf)
    • Inference: Real-time 200+ FPS
  • Direct Voxel Grid Optimization: Super-fast Convergence for Radiance Fields Reconstruction Desc: Replace NeRF with voxel grid

    • Pros: At or above NeRF view synthesis, very flexible, supports forward-facing, unbounded 360,
    • Cons: Custom kernal, Code can only be used for research
    • Memory: (all tests below done on 2080Ti)
    • Training: < 15 min
    • Inference: near 1 fps
  • KiloNeRF: Speeding up Neural Radiance Fields with Thousands of Tiny MLPs

  • DONeRF: Towards Real-Time Rendering of Compact Neural Radiance Fields using Depth Oracle Networks

  • VaxNeRF: Revisiting the Classic for Voxel-Accelerated Neural Radiance Field

  • JaxNeRF

    • Pros:
    • Cons:
    • Memory:
    • Training: 2.5 hour
    • Inference: 20 seconds
  • NSVF

  • DIVeR (no faster training)