-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nemomodel gives me OverflowError: integer does not fit in 'int' #73
Comments
Hi - I'm assuming you're running the 'dev' branch? If so, this would probably be due to me trying to save memory, which didn't work out (caused more problems than it solved), and so I fixed this at the weekend. So I think if you just pull from 'dev', this should go away. Please let me know if not. |
I installed through pip. Let me see if using the 'dev' branch improves the situation. Thanks. |
Ok - it's unlikely to be what I said then, but I'm not sure what the issue would be without more info. Maybe you could post the whole traceback? |
Indeed I ran without the saving model hack (that converts to a np.float16). I am running now with it and waiting for the results. This is what I get from my previous pip installation: 19: Traceback (most recent call last):
19: File "/global/homes/o/omard/.conda/envs/act/bin/nemoModel", line 240, in <module>
19: comm.send(modelImage, dest = 0)
19: File "mpi4py/MPI/Comm.pyx", line 1406, in mpi4py.MPI.Comm.send
19: File "mpi4py/MPI/msgpickle.pxi", line 211, in mpi4py.MPI.PyMPI_send
19: File "mpi4py/MPI/msgpickle.pxi", line 147, in mpi4py.MPI.pickle_dump
19: File "mpi4py/MPI/msgbuffer.pxi", line 50, in mpi4py.MPI.downcast
19: OverflowError: integer 3566595060 does not fit in 'int'
(note that I clone the mpi4py environment of Perlmutter) |
Ok, I actually manage to run by doing
The total file size is 3.2 GB. Does this make sense to you? I am not sure if this is due to some limitation on Perlmutter (doubt it), mpi4py, or something else (perhaps I ran my initial PS search wrongly...). |
That's a mystery to me, because I've taken that out as I mentioned above. I don't think I've managed to get the OverflowError you've been getting, running on the sims I've been making or the real data. |
Hi all. I am a new user running on Perlmutter.
On running
srun -u -l -n 64 nemoModel "/pscratch/sd/o/omard/FGSIMS_OUT/agora/${nemo_run}/${nemo_run}_optimalCatalog.fits" $mask $beam "/pscratch/sd/o/omard/FGSIMS_OUT/agora/${nemo_run}/nemomodel_${freq}_snr4.fits" --min-snr 4.0 --freq $freq -M -n"
(note I added by hand the min-snr argument)
I get
even if
54: ... rank 54 image complete (took 1895.205 sec) 54: ... rank = 54 sending sky model image
Any ideas how to debug this? I thought it might be related to my survey mask, but I still keep getting this even after reducing the area.
Thanks in advance.
The text was updated successfully, but these errors were encountered: