Skip to content

Working with trajectories in python

Grossfield Lab edited this page Jan 10, 2024 · 1 revision

When working with trajectories in python, you have 2 basic options. You can use the C++ Trajectory class, or the more pythonic pyloos.Trajectory and its derivatives (more on them later).

There's nothing wrong with using the C++ Trajectory in your python code. It's fast and convenient, but the code isn't especially pythonic. For example, a typical loop over frames in a trajectory would look like:

// assume you've already defined an AtomicGroup model and Trajectory traj
while (traj.readFrame()) {
    traj.updateGroupCoords(model);
    // do something with the coordinates in model
}

The readFrame call tries to read the next set of coordinates in from the file (returning false if it hits the end), then updateGroupCoords copies the coordinates into the associated AtomicGroup. If you supply an integer to readFrame, it will read that specific frame, so things like skipping, striding, and random access are easily implemented by the tool writer. It's worth noting that skip and stride are trivially supported in the C++ if you use loos' program options functionality.

The goal of the pyloos.Trajectory class is to remove some of the manual nature of this and make a trajectory act more like an iterator. The equivalent python code would look like:

for frame in traj:
    // do something with model

Note: in this case, frame is also an AtomicGroup. In both the C++ and python examples, if you've already made additional AtomicGroups before the loops (e.g. by selecting from model), you don't have to manually update them. Updating the root AtomicGroup is sufficient.

The real advantage to working with pyloos.Trajectory comes from the additional functionality available when you first instantiate it. Specifically, you can do something like:

// model is an AtomicGroup, traj_file is the name of the trajectory file, and skip_val and stride_val are input integers
traj = loos.pyloos.Trajectory(traj_file, model,
                              skip=skip_val, stride=stride_val)
for frame in traj:
    // do something useful

This is far more convenient than manually tracking which frame to read next. In addition, you can specify an arbitrary list of frames to read:

frame_list = [0, 5, 7, -2]
traj = loos.pyloos.Trajectory(traj_file, model, iterator=iterator(frame_list))
for frame in traj:
    // this will read frames 0, 5, 7, and the 2nd frame from the end

The are a few additional classes with more functionality available: -pyloos.VirtualTrajectory allows you to treat multiple pyloos.Trajectorys as if they were one, removing the need to loop over trajectories and then frames. -pyloos.AlignedVirtualTrajectory goes one step further and pre-aligns the structures for you so that your code doesn't have to do so explicitly. It can use either a reference structure or iteratively align the trajectories to their average. The latter is computationally more expensive but will often produce more meaningful RMSDs.

The two VirtualTrajectory classes support skip and stride, applied to the whole unified trajectory: specifying skip=5 will skip the first 5 frames of the combined trajectory, not of each individual trajectory. Similarly, if the number of frames in the first trajectory isn't a multiple of stride, you will not start reading the next trajectory at its 0th frame. If that's not the behavior you want -- if for example you want to skip the first 5 frames of each trajectory -- you would specify that skip when instantiating the individual pyloos.Trajectory objects.