-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Potential graph corruption at solve 2 resulting three types of error messages in matlab_export_*.log #42
Comments
Did you try opening these files with MATLAB on your local machine? Is the file size of the corrupt files seems reasonable (e.g., compared to "good" .mat files? Are you running the latest antrax version on all machines? |
Yes, I tried to open these files on my local machines, I can open the file graph of when the error was "Variable 'G' not found" and "Variable 'trjs' not found". But for the "File might be corrupt." it is indeed corrupt. I am running the latest version from the 'master' branch on all machines. |
Do the first two files have the variables that were not found ('G' and 'trjs' accordingly)? How does the solve 1 log file look for all three? Anything abnormal compared to the "good" videos? Any errors in the track logs? |
This problem on slurm systems has persisted since I developed anTraX, and I couldn’t find a good explanation for it. As Zimai noticed it usually disappears upon a rerun, and is probably related to highly fragmented graphs. My best guess is that something in matlab doesn’t work properly on these highly parallel systems, and that it relates to the .mat file format.
i had a plan to completely change the way anTraX stores it’s data, in hope it will solve the issue or at least will not require rerunning everything. Unfortunately I can’t say when I will have time to do this, or if it will actually help.
Sorry I can’t h lol further at this point :-(
… On Nov 3, 2022, at 3:28 PM, Jana Mach ***@***.***> wrote:
Do the first two files have the variables that were not found ('G' and 'trjs' accordingly)? How does the solve 1 log file look for all three? Anything abnormal compared to the "good" videos? Any errors in the track logs?
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.
|
Thank you, Asaf, for the insight! Following this, Do you think it will help if I can concatenate my videos into longer ones to reduce parallels, and see if that improves the outcome? Have you tried to do something like this before? |
Hi! I wonder if anyone could help me fix a problem when running antrax on hpc. This is a problem that I encountered when running a big continuous experiment (~288 videos, each contains 6000 frames) on a cluster managed by SLURM.
Although all the previous steps may not produce any errors, three types of error pop up for some videos at the last step of propagation and some seems to be related to graph corruption at solve step 2 since the problematic graphs before step 2 seems to be fine after opened from Matlab on my local machine.
Note that the problem is not present in all videos, only in a subset. After rerunning all the previous steps (track, classification, etc) on the problematic videos, sometimes the problem can be solved by itself. However, I should run all the steps twice (sometimes even more) which is quite time-consuming.
Here are the three types of error messages that I get:
Thanks a lot in advance!
The text was updated successfully, but these errors were encountered: