Skip to content

Using config files instead of the command line

Grossfield Lab edited this page Jan 27, 2020 · 4 revisions

The command line options in LOOS are built on top of the program options package from BOOST, which supports both command line arguments and configuration files. While we've primarily designed LOOS to work using the command line, config files should work as well. This can be convenient on occasion -- for example, when selection strings contain characters that would be interpreted by the shell, as is the case with carbohydrates and nucleic acids. However, there are some lumps in the way it works that require documentation.

For example, consider the following command:

rdf full_system.psf trial_2_al.dcd 'name == "OH2"' 'name == "CA"' 0 10 40

This computes a fairly pointless radial distribution function (between water oxygens and alpha carbons), and bins it from 0 - 10 angstroms in 40 bins. The equivalent calculation can also be specified using a config file

rdf --config rdf.config

where the contents of rdf.config are

model = full_system.psf
traj = trial_2_al.dcd
sel1 = name == "OH2"
sel2 = name == "CA"
hist-min = 0
hist-max = 10
num-bins = 40

There are some advantages to doing it this way. As mentioned above, you no longer have to worry about the selection strings going through the shell, and long selection strings are easier to read. Moreover, you could put the config file under version control, to better have a record of what you did.

One the other hand, there are some downsides as well. First, nearly all LOOS tools echo the command line into their output. For the example given, the first line of the output would look like

# rdf 'full_system.psf' 'trial_2_al.dcd' 'name == "OH2"' 'name == "CA"' '0' '10' '40' - alan (Mon Jan 27 10:31:50 2020) {/home/alan/tmp} [3.1 200123]

in the case where the command line was used, which makes it easy to know exactly the command that was used to generate the subsequent data. If one uses the config file, that is what gets echoed into the output:

# rdf '--config' 'rdf.config' - alan (Mon Jan 27 10:38:11 2020) {/home/alan/tmp} [3.1 200123]

which is less useful -- it puts the onus on the user to have a record of the state of rdf.config on that date.

Second, the names of the options themselves are not always obvious. If you run rdf (or nearly any other LOOS command), you get a help message that describes the usage and the various options:

trent 17% rdf
Usage- rdf [options] model trajectory first-selection second-selection histogram-min histogram-max histogram-bins 
Allowed Options:
  --config arg                     Options config file
  --fullhelp                       More detailed help
  -h [ --help ]                    Produce this message
  -v [ --verbosity ] arg (=0)      Verbosity of output (if available)
  -k [ --skip ] arg (=0)           Number of frames to skip
  --modeltype arg                  Model types:
                                   prmtop = Amber
                                   crd = CHARMM CRD
                                   pdb = CHARMM/NAMD PDB
                                   psf = CHARMM/NAMD PSF
                                   gro = Gromacs
                                   xyz = Tinker
                                   
  --trajtype arg                   Trajectory types:
                                   crd = Amber Traj (NetCDF/Amber)
                                   mdcrd = Amber Traj (NetCDF/Amber)
                                   nc = Amber Traj (NetCDF)
                                   netcdf = Amber Traj (NetCDF)
                                   inpcrd = Amber Restart
                                   rst = Amber Restart
                                   rst7 = Amber Restart
                                   dcd = CHARMM/NAMD DCD
                                   pdb = Concatenated PDB
                                   trr = Gromacs TRR
                                   xtc = Gromacs XTC
                                   arc = Tinker ARC
                                   
  -i [ --stride ] arg (=1)         Take every ith frame
  -r [ --range ] arg               Which frames to use (matlab style range, 
                                   overrides stride and skip)
  -w [ --weights ] arg             List of weights to change averaging
  --weights-list arg               File containing a list of trajectories and 
                                   their weights files
  --split-mode arg (=by-molecule)  how to split the selections (by-residue, 
                                   molecule, segment, none)
  --split-mode2 arg (=by-molecule) how to split the second selection 
                                   (by-residue, molecule, segment, none)

For optional flags, the name of the option listed is the name you would put into the config file, e.g. stride or split-mode. However, for the positional options -- the mandatory fields that are required and not prefaced by a flag -- the name on the command line does not have to match the variable name, because for clarity we often use longer names. Thus, traj in the code becomes trajectory, or sel1 becomes first-selection, in the interests of making the help more readable for command line users.

Thus, if you choose to use the config file and are being told that a given option doesn't exist, the only real solution is to look in the code itself (LOOS/Tools/rdf.cpp) in the program options section at the top of the file, and see what the actual name for the positional flag is. We recognize this isn't optimal, and will consider altering the command line messages in the future if we hear that it's an issue for users.

More information on the config file format, and about BOOST's program options package in general, is available here.