-
Notifications
You must be signed in to change notification settings - Fork 114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cameras with multiple overlapping regions: Will it work? #81
Comments
Options attached. |
After 2 hours it had not completed 10 iterations, what am I doing wrong? |
@samhodge @samhodge-aiml I'm not super confident whether BARF would work well on your data, as the viewpoint coverage is not as dense as what we had been experimenting before. My estimate of the runtime on a 3090 would be 8-10 hours, but I don't have one to benchmark with so I cannot say for sure (also it has been quite a while since I developed this project). The training shouldn't get stuck at 10 iterations though -- could you share the training log? |
that is the thing the GPU was loaded up (RTX 3090, 24 Gb, But nothing really being logged at all. I will try running it again and see if I can get something to share with you. There was no error, no Tensorboard logs to speak of, but a file in the output directory, so write permission was OK, I turned off Let me give you everything I have so far and we can get to the bottom of it. Thanks a million for the response. |
here is the
Sitting at this point
|
One hour later no progress, I will leave it running overnight and see if anything happens |
It has been running for over 10 hours now and now progress, I am going to save the electricity. |
This shouldn't happen. Could you help pinpoint which line it hangs at? |
I can certainly keyboard interrupt the job and give you the stack trace |
|
Could it be that the focal length for the camera is causing an unsolvable matrix? |
related? #76 (comment) |
Might try here tomorrow https://camp-nerf.github.io/ |
Yes, it is likely stuck in the loop as in #76. If you use batch size 1 the issue will likely go away -- I have not been able to figure out exactly where the bug was. CamP should be a quite decent improvement over BARF in joint camera optimization. I would definitely encourage you to try it out if they have the code released. |
no code yet, batch size of one it is |
Batch size of one didn't seem to work for me either. |
Hi, while I was working with this codebase I have faced similar issue (training stuck in endless loop). It has turned out that during sampling along the ray, there was exponential (kind of) grow in depth for last few samples with the last ones as big as few thousends (or even 10000 on one occasion). It caused gradients to explode during backward propagation and some of parameters became NaN's, hence calculeted rays got NaN values in them. I wasn't able to pinpoint specific error in implementation. Bare in mind that I was experimenting on heavily modified architecture so I encurage you to check for abnormal values, details in doc. There are number of strategies to deal with this problem (assuming that gradient explosion is what causing it), the simplest is to clip abnormal samples, which is very fast workaround. This can affect the results, but erroneous samples make up a very small proportion of the total training data, so it shouldn't be too bad. |
Thanks a million maybe tomorrow I can eek out a little time to see if I can make this into a PR The information is very generous but I am not sure if my skills are ready right now to debug and patch the issue, but why die wondering right? I will see what I can do |
@SwirtaB thanks for the feedback! I hadn't been able to deterministically reproduce this issue, and did not realize it had to do with the sampled coordinates. In this case, this line is likely the culprit, where the depth of the last sample is set to a very large number (1e10). @samhodge if you find that tweaking the code to lower it to e.g. 1e3 would help, please let me know and I'm happy to make a hotfix. |
Yeah I can certainly write a smoothstep function to roll it off to a limit. |
Trying this
|
I have another idea, that one didn't work: https://numpy.org/doc/stable/reference/generated/numpy.heaviside.html |
Other things that do not work
|
@chenhsuanlin no problem. I have gave your suggestion a try and it only delayed the problem for me, training have hang much later. Then I have cross checked your implementation of composit with NeRF article and their official implementation. By my understanding whole equation 3 from article reduces to alpha composition. In their implementation, they calculate it slightly different (original impl), so I gave it a try. I have commented T calculation and calculate prob as:
Unfortunately that didn't solve the problem, only delayed it again. That being said any workaround that ensures proper samples values (either by clipping or something else) works quite well. Maybe that is proper solution, since NeRF's are still neural networks and improper inputs could leads to all sorts of problems. EDIT: |
I have a series of photos:
https://drive.google.com/drive/folders/1ZZgZUrFrnP47rx8bN5K6yvYnSC50a-9G?usp=drive_link
Which were take with an iPhone 13 Pro Max
I have used this dataset with Instant NGP from NVIDIA and with Gaussian Splatting to produce a good radiance field.
Do you think this dataset will work with the code in this repository.
My changes are recorded here
and I removed "IMG_" from the file names.
I am training the model now.
Do you have an estimate of how long this might take on a RTX 3090.
What viewer can I use to make renders from the radiance field produced from this training run?
Example image below, EXIF information should be intact:
Sam
The text was updated successfully, but these errors were encountered: