You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I ran into an out of memory error using the Matrix.get method in a case where I have a lot of zero-weighted observations that I would like to carry through history-matching as metadata. It looks like the error occurs when np.diag is called:
---------------------------------------------------------------------------
MemoryError Traceback (most recent call last)
Cell In[59], line 6
3 print(len(obs_cov.x))
5 # reduce cov down to only include non-zero obsverations
----> 6 obs_cov = obs_cov.get(row_names=pst.nnz_obs_names, col_names=pst.nnz_obs_names, )
File ~\Documents\Development\repos\GMDSI_notebooks\dependencies\pyemu\pyemu\mat\mat_handler.py:1686, in Matrix.get(self, row_names, col_names, drop)
1684 return Cov(x=extract, names=names, isdiagonal=self.isdiagonal)
1685 if self.isdiagonal:
-> 1686 extract = np.diag(self.__x[:, 0])
1687 else:
1688 extract = self.__x.copy()
File ~\AppData\Local\anaconda3\envs\gmdsitut\lib\site-packages\numpy\lib\twodim_base.py:293, in diag(v, k)
291 if len(s) == 1:
292 n = s[0]+abs(k)
--> 293 res = zeros((n, n), v.dtype)
294 if k >= 0:
295 i = k
MemoryError: Unable to allocate 110. GiB for an array with shape (121252, 121252) and data type float64
It seems modifying the method to use scipy.sparse.diags can offer a workaround:
def spget(self, row_names=None, col_names=None, drop=False):
if row_names is None and col_names is None:
raise Exception(
"Matrix.get(): must pass at least" + " row_names or col_names"
)
if row_names is not None and not isinstance(row_names, list):
row_names = [row_names]
if col_names is not None and not isinstance(col_names, list):
col_names = [col_names]
if isinstance(self, Cov) and (row_names is None or col_names is None):
if row_names is not None:
idxs = self.indices(row_names, axis=0)
names = row_names
else:
idxs = self.indices(col_names, axis=1)
names = col_names
if self.isdiagonal:
extract = self.__x[idxs].copy()
else:
extract = self.__x[idxs, :].copy()
extract = extract[:, idxs]
if drop:
self.drop(names, 0)
return Cov(x=extract, names=names, isdiagonal=self.isdiagonal)
if self.isdiagonal:
extract = sp.diags(self.__x[:, 0], offsets=0, format='csr') # switch from np to sp
else:
extract = self.__x.copy()
if row_names is not None:
row_idxs = self.indices(row_names, axis=0)
extract = extract[row_idxs, :] # this row is modified
if drop:
self.drop(row_names, axis=0)
else:
row_names = self.row_names
if col_names is not None:
col_idxs = self.indices(col_names, axis=1)
extract = extract[:, col_idxs] # this row is also modified
if drop:
self.drop(col_names, axis=1)
else:
col_names = copy.deepcopy(self.col_names)
return type(self)(x=extract.toarray(), row_names=row_names, col_names=col_names) # modified to use .toarray()
Comparing the methods using a smaller amount of observations (allowing the use of both get and spget):
I wanted to bring this to attention because quite a lot of use-cases seem to rely on this method. Although I'm not sure whether this modification has any unforseen impact further down the line.
The text was updated successfully, but these errors were encountered:
Thanks @nikobenho . Ive considered swapping to sparse storage for a while now bc it does handle really large cases better (or at least using sparse for a few of the more critical pinch points like get())...pull requests welcome!
Maybe a similar issue(?), but I am hitting memory issues when I try to build a parcov using an .unc file with about 150,000 pars (small sample of the file below):
Curious if there is an easy way to build these as a bunch of smaller "blocks" and then combined into a block diagonal matrix to get around the memory issue? Or if that is already what is being done and I am just out of luck.
The pyemu helpers already do the group/block based side-stepping trick to avoid having to form that full matrix for drawing realizations. The problem is that the Cov object holds the matrix as dense in mem so its going to be huge and mostly zeros...
I ran into an out of memory error using the
Matrix.get
method in a case where I have a lot of zero-weighted observations that I would like to carry through history-matching as metadata. It looks like the error occurs when np.diag is called:It seems modifying the method to use scipy.sparse.diags can offer a workaround:
Comparing the methods using a smaller amount of observations (allowing the use of both
get
andspget
):I wanted to bring this to attention because quite a lot of use-cases seem to rely on this method. Although I'm not sure whether this modification has any unforseen impact further down the line.
The text was updated successfully, but these errors were encountered: