A docker image to pre-process a collection of SIMRAD's EK60/EK80 acoustic raw files into an xarray
dataset using pyEcholab package. The dataset is then stored as zarr
/netcdf
files on disk.
In addition, pre-processing the Marec's LSSS work files into a pandas
dataframe as a parquet
file is now supported (see the disk mounting option below).
- Automatic range re-gridding (by default it uses the main channel’s range from the first raw file, see
MAX_RANGE_SRC
option below). - Sv processing and re-gridding the channels are done in parallel (using
Dask
’s delayed). - Automatic resuming from the last
ping_time
if the output file exists. - Batch processing is done by appending directly to the output file, should be memory efficient.
- The image of this repository is available at Docker Hub (https://hub.docker.com/r/crimac/test-preprocessor).
- Processing annotations from
.work
files into apandas
dataframe object (using: https://github.com/CRIMAC-WP4-Machine-learning/CRIMAC-annotationtools).
-
Two directories need to be mounted:
/datain
should be mounted to the data directory where the.raw
files are located./dataout
should be mounted to the directory where the output is written./workin
should be mounted to the directory where the.work
files are located (optional).
-
Choose the frequency of the main channel:
--env MAIN_FREQ=38000
-
Choose the range determination type:
# Set the maximum range as 500, --env MAX_RANGE_SRC=500 # or use the the main channel's maximum range from all the files (for historical data), --env MAX_RANGE_SRC=auto # or use the the main channel's maximum range from the first processed file (for historical data) --env MAX_RANGE_SRC=None
-
Select output type,
zarr
andNetCDF4
are supported:--env OUTPUT_TYPE=zarr --env OUTPUT_TYPE=netcdf4
-
Select file name output (optional, default to
out.<zarr/nc>
)--env OUTPUT_NAME=S2020842
-
Set if we want a visual overview of the Sv data (in a PNG format image)
--env WRITE_PNG=1 # enable or 0 to disable
docker run -it --name pyechopreprocess \
-v /data/cruise_data/2020/S2020842_PHELMERHANSSEN_1173/ACOUSTIC/EK60/EK60_RAWDATA:/datain \
-v /data/cruise_data/2020/S2020842_PHELMERHANSSEN_1173/ACOUSTIC/LSSS/WORK:/workin \
-v /localscratch/ibrahim-echo/out:/dataout \
--security-opt label=disable \
--env OUTPUT_TYPE=zarr \
--env MAIN_FREQ=38000 \
--env MAX_RANGE_SRC=500 \
--env OUTPUT_NAME=S2020842 \
--env WRITE_PNG=0 \
crimac/preprocessor