Skip to content

Preparing to analyze a NAMD simulation

Grossfield Lab edited this page Dec 6, 2017 · 2 revisions

Preparation

We will assume for the purposes of this tutorial that you are working with a protein in water.

Key names we will use in this tutorial

  • system.psf : the PSF file for your system
  • traj1.dcd, traj2.dcd, ... : the individual trajectory files
  • merged.dcd : the merged trajectory file
  • downsampled.dcd : the downsampled trajectory file
  • PROT : the segment ID for the protein
  • BULK : the segment ID for the water

Merging trajectories and preparing for analysis

As a rule, we don't run simulations in a single piece. Rather, we run the simulation in chunks, which can then be merged together into a single contiguous trajectory. The canonical tool for doing this in NAMD is catdcd. It works fine, but if you have a large number of chunks (for instance if you've been running for months), doing a merge to add the prior night's run can be very slow, because the full merge is done each time.

The LOOS tool merge-traj resolves this problem by doing an append operation instead. It checks how many frames are in the existing merged trajectory, and walks through the listed source trajectories, and begins appending once it hits "new" data. For the sake of convenience, it can also perform some other simple operations, recentering the trajectory and fixing some imaging problems.

The simplest version of the command to do this would be

merge-traj system.psf merged.dcd traj1.dcd traj2.dcd traj3.dcd

which would merge the 3 chunks (traj1, traj2, and traj3) to form merged.dcd.

However, this is not very convenient to put into a script, because you'd have to keep adding to the list of source dcd files. The obvious thing to do is to write something like

merge-traj system.psf merged.dcd traj*.dcd

This would work fine for the first 9 trajectories, but for the 10th one, the shell would order them incorrectly (traj10.dcd would come before traj2.dcd). There are a couple of solutions. You could use a smarter combination of globs and wildcards on the command line to make the files appear in the right order, or you could use a better naming convention so that they always sort correctly (traj_000001.dcd, traj_000002.dcd). Alternatively, merge-traj supports a couple of options to let you tell it how to sort the trajectories. For example, to sort using a regular expression, you would say

merge-traj --sort on --regex '(\d+).dcd$' system.psf merged.dcd traj*.dcd

The regular expression tells merge-traj to treat all numbers right before the ".dcd" as indices for sorting.
When merge-traj starts running, it will tell you the order in which its operating on the trajectory files; check this to make sure your regular expression is doing what you intended.

If you also wanted to write a downsampled version of the trajectory (for example, writing every 10th frame) in order to make analysis go quicker, you would add additional arguments:

merge-traj --sort on --regex '(\d+).dcd$' --downsampled-dcd downsampled.dcd system.psf merged.dcd traj*.dcd

If you wanted a downsample value other than 10, you'd also need "--downsample-rate X", where X was the interval at which you should write to the downsampled trajectory. Note that it's crucial to keep the downsampled and full merged trajectories in sync by always writing both at the same time, because all bookkeeping is done relative to the fully merged trajectory.

Finally, it can be convenient for both analysis and viewing to remove the center of mass motion of the protein. This gives the final command line

merge-traj --sort on --regex '(\d+).dcd$' --downsampled-dcd downsampled.dcd --centering-selection 'segid == "PROT"' system.psf merged.dcd traj*.dcd

If you were working with a membrane simulation, you might want to use one selection for centering in x & y (for instance, a bound protein) and another for the membrane normal (the lipids). In addition, if it is possible that the selection you're using for centering could be split across the periodic boundary, you would also need the "--selection-is-split" option. When running with NAMD, this should only happen if your selection consists of multiple molecules.

As you can see, this is a bit of a mouthful -- we highly recommend putting into a script and invoking it as part of your daily maintenance (rsync new data to your work space, merge and clean, run standard analyses).