Skip to content

Analyzing nucleic acid simulations

Grossfield Lab edited this page Jan 23, 2020 · 5 revisions

For the most part, working with nucleic acids in LOOS is the same as working with proteins. However, there is one big exception...

On selecting atoms with a ', or other characters requiring escapes, in their metadata

Nucleic acids (NAs) are special. For example, in the deep time when choices were made about the customary atom names our forebearers elected to use the ASCII character ' to indicate 'prime,' which is the customary way the literature has referred to backbone atoms in nucleic acids. If you've used a shell ever, you might think that this is sort of annoying because that means you'll have a lot of escapism going on in your command-line inputs (CLIs) to LOOS programs handling NAs. You'd be quite wrong unless you know WAY too much about how shells handle strings; it's very annoying. There are a lot of things that you'd figure may work to pass a single quote (or other escaped character) through to the parsers that handle selection strings inside LOOS. Here are several that, as far as we know, actually do work:

Use a config file

LOOS tools that take command line arguments are nearly all built from an input reading class (arguably one of the library's finest features) that is based in turn on boost's program options. This input class can take all CLIs inside a standard configuration file.

To figure out if the tool you're trying to use has this feature just obtain its short help on the command line by typing its name and giving no arguments. If one of the options is --config then you can use configuration files with your program of interest. Simply provide the file name of the configuration you've created as an argument to the config flag and you are away. You can include 's and other troublesome characters directly in your sel-string lines.

For what it's worth, this is what your loyal correspondent prefers best, since it's self-documenting, easy to look at (compared to the ugly formatting of expansive CLI) and easy to change (don't need to hold the arrow key down for about one whole day to find which position you'd like to edit in a mostly correct input line). Plus, you can store the config file in git for even better records of what you did.

Stuff for making bash behave itself

If like most of us you use bash, you will find that you can't just throw a \ in front of that single quote to make everything work. The following are CLI-based solutions to deal with this problem. Apparently tcsh handles strings differently and therefore doesn't cause this issue, but if you're using tcsh you've got other problems. Alan, care to comment? No, I gave up tcsh because conda doesn't play nicely with it. -A

Using a bash ANSI-C quote

Put the selection string you're working with, for example name == "O4\'", onto your command line with a $ in front of it like:

model-select -s $'name == "O4\'"' na-tester.pdb

Assuming na-tester.pdb is a nucleic acid this should grab all its O4′s). This works because of the ANSI-C variant of bash string handling.

Using bash variable expansion the correct way

It can be hard to get shell variables to expand correctly in selection strings. Chapin Cavender figured out how to get them to do what you want. I think this is because the variable expansion follows the aforementioned ANSI-C string rules:

o4p="O4'"
model-select -s 'name == "'$o4p'"' foo.pdb

Using a squadillion quotes

Going off the previous example, you could also write your selection string like:

model-select -s "name == "'"'"O4'"'"' foo.pdb

I nearly hurled when Tod sent me this, but he thinks, "It’s reasonably straight-forward if you think about how bash composes strings…and remembering that strings get concatenated…so you’re just changing the quote delimiter around…" K.

Editing your model file

This is not necessarily advisable; it could subtly goof up your model specifying file. But it could be worth a try if you're happy with whatever string-substitution strategy you're deploying to deal with this. For example, with sed:

sed "s/'/P/g" < foo.pdb > sane.pdb
model-select -s 'name == "O4P"' sane.pdb

This particular substitution looks like it could conflict with Phosphorus-bound oxygens. For those atoms, the 'P' referring to the phosphorus is before the number, so this isn't an apparent conflict in the one sample pdb I've been playing with.