Skip to content

Research Paper Analysis

PotatoPalooza edited this page Jun 10, 2022 · 12 revisions


This page is for the analysis's of potential papers to integrate into the backend of our project. Since the focus of this project is to get NeRF working in a web app, the most relevant factors are open license, computational resources, and robustness to real world data. Breaking this down further here are the most important qualities ranked:

1. Source code is available and fully open source

2. Speed and memory use in training should be low enough to allow for ease of development

  • Sub hour NeRF training time
  • Less then 8GB memory consumption
  • Not reliant on custom kernels or specific GPU models

3. Inference speed should be fast enough to allow for video rendering

  • Inference of at least 1 fps

4. Robustness to real world data

  • Quality should be on par or better then the original NeRF paper
  • Should work with SFM techniques
  • Insensitive to depth inaccuracies or image artifacts

List of potential project papers

Paper Analysis

Note: some of the excerpts and descriptions are copied directly from the paper linked in each section.

  • PlenOctrees
    Desc: For real-time rendering of Neural Radiance Fields they train NeRFs to predict a spherical harmonic representation of radiance, removing the viewing direction as an input to the neural network. This representation can be directly optimized on and allows for slightly faster convergence.

    • Pros: This technique makes in browser rendering possible
    • Cons: Long training time
    • Memory: 2GB quantized to 34-125MB
    • Training: 1.2 hours finetuning + 6 hours traditional method
    • Inference: Real-time 200+ FPS
  • BARF : Bundle-Adjusting Neural Radiance Fields
    Desc: BARF can effectively optimize the neural scene representations and resolve large camera pose misalignment at the same time. A modified positional encoding method along with a structured learning process of course to fine alignment allows the camera poses to be estimated at training time.

    • Pros: No accurate camera position needed. Better view synthesis on real world data. Compatible with other papers.
    • Cons: No improvements to speed
    • Memory: same as original NeRF
    • Training: same as original NeRF
    • Inference: same as original NeRF
  • Instant Neural Graphics Primitives with a Multiresolution Hash Encoding
    Desc: A versatile new input encoding that permits the use of a smaller network without sacrificing quality, thus significantly reducing the number of floating point and memory access operations: a small neural network is augmented by a multi resolution hash table of trainable feature vectors whose values are optimized through stochastic gradient descent. The positional encoding stores trainable vectors stored by a hash table.

    • Pros: Extremely fast, moderate memory consumption, compatible with other papers, No loss in quality
    • Cons: Closed source, Results from custom kernel
    • Memory: 8GB (variable)
    • Training: Seconds!
    • Inference: miliseconds!
  • Plenoxels - Radiance Fields without Neural Networks

  • TensoRF Tensorial Radiance Fields
    Desc: Instead of using a MLP this paper repesents the radiance field of a scene with a 4D tensor. This tensor is then factorized into multiple compact low-rank tensors which optimization can be directly ran on with the image inputs.

    • Pros: Open source, Fast, better quality, low memory consumption, compatible with ngp-nerf, no custom kernal
    • Cons: No machine learning, Potentially difficult extension, method currently only supports bounded scenes with a single bounding box and cannot handle unbounded scenes with both foreground and background content
    • Memory: < 75 MB
    • Training: < 10-30 min
    • Inference: unknown (probably fast?)
  • MVSNeRF: Fast Generalizable Radiance Field Reconstruction from Multi-View Stereo

    • Pros: Trainable for multiple scenes, better then Pixel NeRF in quality
    • Cons: Worse then Nerf in quality
    • Memory: (all tests below done on 2080Ti)
    • Training: 15 min decent results (sometimes as slow as NeRF like with 360)
    • Inference: unknown (likely same as NeRF)
  • EfficientNeRF: Efficient Neural Radiance Fields

    • Training: few hours (sub hour if combined with MSVNerf)
    • Inference: Real-time 200+ FPS
  • Direct Voxel Grid Optimization: Super-fast Convergence for Radiance Fields Reconstruction Desc: Replace NeRF with voxel grid

    • Pros: At or above NeRF view synthesis, very flexible, supports forward-facing, unbounded 360,
    • Cons: Custom kernal, Code can only be used for research
    • Memory: (all tests below done on 2080Ti)
    • Training: < 15 min
    • Inference: near 1 fps
  • KiloNeRF: Speeding up Neural Radiance Fields with Thousands of Tiny MLPs

  • DONeRF: Towards Real-Time Rendering of Compact Neural Radiance Fields using Depth Oracle Networks

  • VaxNeRF: Revisiting the Classic for Voxel-Accelerated Neural Radiance Field

  • JaxNeRF

    • Pros:
    • Cons:
    • Memory:
    • Training: 2.5 hour
    • Inference: 20 seconds
  • NSVF

  • DIVeR (no faster training)