Reading in large amount of reader files: memory limit #1245

calquigs · 2024-03-06T23:57:23Z

I am working with SCHISM model files that contain a single time step each. At the moment I am reading in two months worth of files using:

data_path0 = '/<PATH>/schout_*.nc'
reader0 = reader_schism_native.Reader(data_path0,proj4='+proj=utm +zone=4 +ellps=WGS84 +datum=WGS84 +units=m +no_defs')

However that kills the run due to exceeding memory limit. Each timestep/model file is 270mb, so is creating the reader attempting to allocate 388gb of memory? Is there a better way to create the readers so it only accesses the timesteps one at a time?

The text was updated successfully, but these errors were encountered:

knutfrode · 2024-03-11T17:29:29Z

The dataset is in this case opened with Xarray open_mfdataset:
https://github.com/OpenDrift/opendrift/blob/master/opendrift/readers/reader_schism_native.py#L113
Maybe there is some memory leak there?

In the generic reader, there are some more options provided to open_mfdataset:
https://github.com/OpenDrift/opendrift/blob/master/opendrift/readers/reader_netCDF_CF_generic.py#L100
Can you try if any of these options could solve the problem?
I do not have any SCHISM files available for testing.

calquigs · 2024-03-12T11:18:04Z

I've tried adding those arguments and still getting the same issue. To confirm, is the intended behavior to read the files in as needed, or does the simulation need to be able to hold all the reader files in memory at once?

calquigs · 2024-03-19T17:24:07Z

Update: reading in 2000 hourly timesteps using 'schout_*.nc' kills due to memory limit, but if I read the files in multiple readers of smaller chunks of between 100 and 1000 files (e.g. 'schout_??.nc','schout_???.nc','schout_1???.nc', the memory limit is not reached and I'm able to successfully complete a simulation! It takes 20+ minutes to read in, does that seem reasonable for this amount of data?

knutfrode · 2024-03-21T18:04:21Z

See this parallel issue: #1241 (comment)

So you could also try to install h5netcdf with conda install h5netcdf
and add engine="h5netcdf" to open_mfdataset in the SCHISM reader.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reading in large amount of reader files: memory limit #1245

Reading in large amount of reader files: memory limit #1245

calquigs commented Mar 6, 2024

knutfrode commented Mar 11, 2024

calquigs commented Mar 12, 2024

calquigs commented Mar 19, 2024

knutfrode commented Mar 21, 2024

Reading in large amount of reader files: memory limit #1245

Reading in large amount of reader files: memory limit #1245

Comments

calquigs commented Mar 6, 2024

knutfrode commented Mar 11, 2024

calquigs commented Mar 12, 2024

calquigs commented Mar 19, 2024

knutfrode commented Mar 21, 2024