Warning
This tutorial is no longer maintained. It is possible that some parts are no longer working.
In order to obtain training and validation datasets for the vlne retraining for the MiniProd5 campaign you have two options:
- You can regenerate datasets manually from the mprod5 caf files.
- Or you can reuse already generated datasets.
I will provide instructions for each option below.
The scripts to manually generate dataset from the mprod5 caf files are committed to the NOvA devsrepo. You can fetch them using the following command
export DEVSREPO=svn+ssh://[email protected]/cvs/projects/novaart-devs
svn checkout "${DEVSREPO}/trunk/users/torbunov/vlne/scripts/mprod5"
Inside the fetched directory mprod5 you will find four files named
exporter_vlne_{fd,nd}_{fhc,rhc}_nonswap.C
. They can be used to generate
training and validation datasets for the Far and Near Detectors, FHC and RHC
horn currents respectively. These scripts are known to be working at the
R19-10-30-final-prod4.b
release of NOvaSoft, and may work in the later
releases.
Let say you want to generate data for the Far Detector FHC training. To do that
you would need to submit a grid job that runs the script
exporter_vlne_fd_fhc_nonswap.C
in parallel. The job can be submitted to
the grid via the following command:
submit_cafana.py \
--njobs 250 --print_jobsub --rel R19-10-30-final-prod4.b \
--outdir OUTDIR \
exporter_vlne_fd_fhc_nonswap.C
where OUTDIR
is a directory under /pnfs where job output files will be
stored. Once the grid job has completed you will find multiple csv files under
OUTDIR
with names dataset_vlne_fd_fhc_nonswap_*_of_*.csv
. These
output files need to be merged together before they can be used for training.
The vlne package provides a bash script called merge_csv.sh
that can
be used to merge multiple csv files into one. You can find this script in the
scripts/data
directory of the vlne package. In addition to merging
the output files together it will compress the result with the xz compressor.
In order to use merge_csv.sh
to merge job output files you may run the
following command:
bash merge_csv.sh MERGED_FILE_NAME.csv.xz OUTDIR/dataset_*.csv
After merge_csv.sh
has finished running you can use the resulting file
MERGED_FILE_NAME.csv.xz
for training vlne networks.
The old mprod5 datasets are stored under the SAM system. The SAM definition
that contains these datasets is dataset_lstm_ee_mprod5
. There are four
different datasets available:
dataset_lstm_ee_mprod5_fd_fhc_nonswap.csv.xz
-- FD FHCdataset_lstm_ee_mprod5_fd_rhc_nonswap.csv.xz
-- FD RHCdataset_lstm_ee_mprod5_nd_fhc_nonswap.csv.xz
-- ND FHCdataset_lstm_ee_mprod5_nd_rhc_nonswap.csv.xz
-- ND RHC
To retrieve any of those datasets you can use ifdh_fetch
command, e.g.
ifdh_fetch dataset_lstm_ee_mprod5_fd_fhc_nonswap.csv.xz