-
Notifications
You must be signed in to change notification settings - Fork 25
Preparing to analyze a gromacs simulation
We will assume for the purposes of this tutorial that you are working with a protein in water.
- system.tpr : the tpr file for your system
- system.gro : the gro file you used for your system
- traj1.xtc, traj2.xtc, ... : the individual trajectory files
- merged.dcd : the merged trajectory file
- downsampled.dcd : the downsampled trajectory file
Strictly speaking, this step isn't necessary, because LOOS can read GRO files without difficulty. However, the PSF file format from CHARMM and NAMD has a bunch of additional information in it, most notably atomic charge, atomic mass, and connectivity information (where the bonds are). As a result, it's often convenient to use PSF files for analysis, because then this information can be used. *Moreover, if you're running a coarse-grained simulation, using this PSF will let VMD draw the bonds correctly.
LOOS provides a tool to generate a "fake" PSF file (fake in the sense that you couldn't use it to run a simulation) from your gromacs run:
gmx dump -s system.tpr | gmxdump2pdb.pl system_fake
This will create 2 new files: system_fake.pdb and system_fake.psf. Both contain connectivity information, and can be used to specify the system for further analysis.
PSF files also contain the notion of "segments", which are sets of residues/atoms that logically belong together. gmxdump2pdb will create segments for you for each molecule, compressing the names to fit in the 4 characters allowed by the PSF format. You can control this by adding the --segmap option to gmxdump2pdb. You can see further options using
gmxdump2pdb.pl --help
For the rest of this tutorial, we will assume the protein was assigned to segid "PROT".
As a rule, we don't run simulations in a single piece. Rather, we run the simulation in chunks, which can then be merged together into a single contiguous trajectory. However, if you have a large number of chunks (for instance if you've been running for months), doing a merge to add the prior night's run can be very slow, because the full merge is done each time.
The LOOS tool merge-traj resolves this problem by doing an append operation instead. It checks how many frames are in the existing merged trajectory, and walks through the listed source trajectories, and begins appending once it hits "new" data. For the sake of convenience, it can also perform some other simple operations, recentering the trajectory and fixing some imaging problems. At the moment, it can only write DCD files.
The simplest version of the command to do this would be
merge-traj system_fake.psf merged.dcd traj1.xtc traj2.xtc traj3.xtc
which would merge the 3 chunks (traj1, traj2, and traj3) to form merged.dcd. However, this is not quite correct, because xtc files usually contain a 0th frame (essentially the starting structure), which would make the merged trajectory contain duplicates. To fix this, we need to add the flag "--skip-first-frame 1".
However, this is not very convenient to put into a script, because you'd have to keep adding to the list of source dcd files. The obvious thing to do is to write something like
merge-traj --skip-first-frame 1 system_fake.psf merged.dcd traj*.xtc
This would work fine for the first 9 trajectories, but for the 10th one, the shell would order them incorrectly (traj10.xtc would come before traj2.xtc). There are a couple of solutions. You could use a smarter combination of globs and wildcards on the command line to make the files appear in the right order, or you could use a better naming convention so that they always sort correctly (traj_000001.xtc, traj_000002.xtc). Alternatively, merge-traj supports a couple of options to let you tell it how to sort the trajectories. For example, to sort using a regular expression, you would say
merge-traj --skip-first-frame --sort on --regex '(\d+).xtc$' system_fake.psf merged.dcd traj*.xtc
The regular expression tells merge-traj to treat all numbers right before the ".xtc" as indices for sorting.
When merge-traj starts running, it will tell you the order in which its operating on the trajectory files; check this to make sure your regular expression is doing what you intended.
If you also wanted to write a downsampled version of the trajectory (for example, writing every 10th frame) in order to make analysis go quicker, you would add additional arguments:
merge-traj --skip-first-frame 1 --sort on --regex '(\d+).xtc$' --downsampled-dcd downsampled.dcd system_fake.psf merged.dcd traj*.xtc
If you wanted a downsample value other than 10, you'd also need "--downsample-rate X", where X was the interval at which you should write to the downsampled trajectory. Note that it's crucial to keep the downsampled and full merged trajectories in sync by always writing both at the same time, because all bookkeeping is done relative to the fully merged trajectory.
It can be convenient for both analysis and viewing to remove the center of mass motion of the protein. This gives the command line
merge-traj --skip-first-frame 1 --sort on --regex '(\d+).xtc$' --downsampled-dcd downsampled.dcd --centering-selection 'segid == "PROT"' system_fake.psf merged.dcd traj*.xtc
If you were working with a membrane simulation, you might want to use one selection for centering in x & y (for instance, a bound protein) and another for the membrane normal (the lipids). In addition, if it is possible that the selection you're using for centering could be split across the periodic boundary, you would also need the "--selection-is-split" option. This is most likely if your selection contains multiple molecules.
Finally, GROMACS has a different convention for periodic boundary conditions than other simulation programs -- by default, it writes the trajectory with each atom re-imaged into the periodic box (NAMD does reimaging by molecule, as does AMBER). Since most LOOS tools assume that molecules are not broken across the periodic boundary, it's convenient to fix this at merge time by adding the fix-imaging flag
merge-traj --fix-imaging 1 --skip-first-frame 1 --sort on --regex '(\d+).xtc$' --downsampled-dcd downsampled.dcd --centering-selection 'segid == "PROT"' system_fake.psf merged.dcd traj*.xtc
As you can see, this is a bit of a mouthful -- we highly recommend putting it into a script and invoking it as part of your daily maintenance (rsync new data to your work space, merge and clean, run standard analyses).