-
Notifications
You must be signed in to change notification settings - Fork 25
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* added noWeights subclass to Weights to provide correct behavior for uniform weighting of frames. * fixed operator[] decs for noWeights. * fixed issue with transition from noWeights to uniform weights vector * I was working on these on the master. Whoops. This is to replace mods to the weights class with a derived class that does the things we need. * Added trajectory accepting weights c oding, for weights applications that need current_frame defined in the no weights use-case. * Brought Weights code up from the past since we are now pursuing the full subclass strategy. * Fixed some issues with delimiters and definitions. Introduced _frameWeight ILO the pervasive 1.0s. * UniformWeight builds. Now to add it to OptionsFramework. * Trying to achieve polymorphism the wrong way. Need to do the virtual functions thing, then use a pointer in weights_options. Perhaps the function-pointer approach would have been better? * vtable problem was coming from missing lines in sconscript. Code now builds. Next, see if we can add consts back in. * Changed several functions back to returing const, as they had before I made them virtual inside weights. * Weights vector swap added. Now we have a setter, a getter, and a replacer. * added error msg reporting differences in lengths for replacement weights vector. * drafting bootstrap code. * bootsy collins ref too obscure. Wrote header-only template class for making vector of observations. * Added ignore for pycharm configs. * edited rdf code to take advantage of weights new default behavior. * updated rdf to compute weights using weights default. * reordered things in uniform weight class to get rid of compiler warnings. * changed weights opts back to holding a reference to weights, rather than pointer, for consistency with master. * Use unique pointer so weights becomes an interface within weightsOptions. * added arrows to rdf and xyrdf. They now build. * May have erroneously switched an error msg line but nothing else. * needed to include correct template type in upcast in postConditions. * Adding zeroth pass changes. Switching to loos.hpp functions next. * Reorienting things as classes. * Tool remains untested, and classes need to be put in hierarchy. * regoofing classes for computing OCF in various cases. * Added uvdot to Coord.hpp, unit vector dot product. Working on class hierarchy for two OCF classes. * Abandoning class hierarchy idea. * Repetitive boilerplate version. Uncertain if it compiles. * Fixed some small typos, and booboos. updated AG to use 'uvdot' * Added testing config. Issue with weights default. Need to go finish that branch. * Adding files to record weights from file, so weights can be base. * copied and name-changed Weights over to WeightsFromFile. * Updated UniformWeight to no longer use add_traj, since that's no longer a base-class function. * Updated Weights to have only protected and public members. Updated UniformWeight and WeightsFromFile to inherit from protected content of Weights. * Updating to test building modified Weights and Weights-options * Trying to figure out why pointer to base is not working. * This now builds. Changed name of weights to pWeights in opts framework. Made add_traj virtual method of base. Amended all weights classes to respect this. * Trying fix object slicing via make unique as input to unique_ptr constructor. * Was experiencing object slicing due to make_unique's template parameter. * I appear to have been using the variadic argument of make_unique incorrectly. Modded code looks much better. * Added a comment reflecting my new lived experience. * Reordered contents of weights to get rid of -wreorder * Got rid of unneeded virtual calls * Removed extraneous files in prep for push * Apparently the public methods of Weights all need to be virtual. * fixed failure to accumulate OCFs. Fixed interaction with weights. * Moved ocf.cpp to Tools, changed sconscripts accordingly. * Fixed eroneous weights issue * trying to improve the code and also to accurately perform the correlation length sum. * added compute function that includes repetitive elements of if-elsehiemers. * Finished making compute function to reduce repetitive code. * Rewrote IO, and mostly ready to test now. Adding one flag option to do every possible offset... * Builds now. Output is modified. * Got rid of integer sign warnings. * Have something that appears to build, compute everything I want. Now onto the fullhelp. * Added fullhelp msg. * Undid small accidental changes and inadvertent includes. * Used function pointers to move t-loop outside of if-clauses. Tested on polypeptide system found no obvious issues. * Added the requested comment to lines 255-256 Co-authored-by: Louis Smith <[email protected]>
- Loading branch information
Showing
6 changed files
with
362 additions
and
1 deletion.
There are no files selected for viewing
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,324 @@ | ||
#include <iostream> | ||
#include <loos.hpp> | ||
|
||
using namespace std; | ||
using namespace loos; | ||
|
||
namespace opts = loos::OptionsFramework; | ||
namespace po = loos::OptionsFramework::po; | ||
|
||
const string fullHelpMessage = | ||
// clang-format off | ||
"SYNOPSIS \n" | ||
" \n" | ||
"This tool is designed to compute the Orientational Correlation Function, in the\n" | ||
" style of its use in polymer chemistry contexts such as in Ullner, M. & \n" | ||
"Woodward, C. E. Orientational Correlation Function and Persistence Lengths of \n" | ||
"Flexible Polyelectrolytes. Macromolecules 35, 1437–1445 (2002) and more \n" | ||
"specifically as it was used in Plumridge, A., Andresen, K. & Pollack, L. \n" | ||
"Visualizing Disordered Single-Stranded RNA: Connecting Sequence, Structure, and\n" | ||
" Electrostatics. J. Am. Chem. Soc. 142, 109–119 (2020). The user may specify \n" | ||
"abstracted 'bond vectors' between links in the polymer chain using single atom \n" | ||
"selectors or group selectors. \n" | ||
" \n" | ||
"DESCRIPTION \n" | ||
" \n" | ||
"This tool uses the definition of the orientational correlation function from \n" | ||
"Ullner & Woodward to estimate how correlated links in a polymer chain are. This\n" | ||
" is done by looking at the normalized projection of the i bond-vector in the \n" | ||
"chain onto the i+o bond-vector in the chain, for all offsets o between 1 and a \n" | ||
"max offset specified by the user (default is -1, or all possible). Each bond \n" | ||
"vector is defined as being a link between a point on a certain residue and a \n" | ||
"point on a neighboring residue. These could literally be a bond vector, if the \n" | ||
"points are atoms bonded to one another, or it could be a 'coarse grained' \n" | ||
"linkage between two monomers in the chain. For example, in the CA default for \n" | ||
"proteins, each residue is being treated as a link, with the link position at \n" | ||
"the alpha carbon; in such a coarsening of the polypeptide chain the vector \n" | ||
"between CAs becomes the chain bond vector. \n" | ||
" \n" | ||
"Thus, in the CA example, it would be the projection of the vector between CA_i \n" | ||
"and CA_i+1, v_i, onto the vector between CA_{i+o} and CA_{i+o+1}. These \n" | ||
"projections are averaged across all possible i for each o requested, then \n" | ||
"reported as a list of correlations as a function of offset. It is also possible\n" | ||
" for anticorrelations to be exhibited by this function--for example, a pretty \n" | ||
"solid antiparallel beta sheet would likely produce vectors that are pointed in \n" | ||
"opposite directions but are nearly coplanar. \n" | ||
" \n" | ||
"The notion here is that stiffer chains have a persistence of orientation, which\n" | ||
" is quantified by the projection of these vectors. Thus, a 'length' is also \n" | ||
"defined; it is the average length of the bond-vectors, multiplied by the \n" | ||
"average correlation between bond vectors summed over all pairs. \n" | ||
" \n" | ||
"The tool writes the results of the requested analysis to stdout as JSON. The \n" | ||
"JSON has the following tags: \"mean ocfs\", \"variance of means\", \"mean \n" | ||
"variances\", \"mean projections summed\", and \"mean bondlength\". The first three \n" | ||
"are all arrays with lengths equal to the number of offsets analyzed. The \"mean \n" | ||
"ocfs\" are the normalized projection vectors averaged over all pairs of bond \n" | ||
"vectors with a given offset, and then across each frame analyzed. The \"variance\n" | ||
" of means\" is the variance in each such mean across the whole trajectory. The \n" | ||
"\"mean variances\" are the variances at each offset, averaged across all analyzed\n" | ||
" frames. Finally, the \"mean projections summed\" and \"mean bondlength\" when \n" | ||
"multiplied together should correspond to l_OCF as given in Plumridge et al. \n" | ||
" \n" | ||
"The idea behind reporting the mean variances and the variances of the mean is \n" | ||
"that the one reports on the variability of each chain projection across the \n" | ||
"trajectory, whereas the other reports on how variable the projections that are \n" | ||
"associated with each offset are within a given frame, on average. If the \n" | ||
"variability within an offset is high on average, but the variability of that \n" | ||
"offset is low across the trajectory, it implies a strong conformational \n" | ||
"preference, or static disorder/glassy behavior. This may mean the chain is not \n" | ||
"really sampling different conformations, even if it is exhibiting correlation \n" | ||
"die-off as a function of offset. \n" | ||
" \n" | ||
"EXAMPLES \n" | ||
" \n" | ||
"ocf model traj > ocf_traj.json \n" | ||
" \n" | ||
"This will look for either alpha carbons or phosphorus atoms within the entire \n" | ||
"model, then use their ordering in the model to draw vectors between each such \n" | ||
"atom and the next one in the chain. It will compute the ocf on each frame in \n" | ||
"the trajectory. It will do this for all possible offsets. \n" | ||
" \n" | ||
"ocf --bond-atom-selection 'name ~= \"C*\'\"' --center-of-mass --residue-centroids\n" | ||
" \ \n" | ||
"model traj > ocf_sugar_carbons_com.json \n" | ||
" \n" | ||
"This will use the centers of mass of atoms matching the regex 'C*\'' (any \n" | ||
"carbon with a single quote at the end, which hopefully amounts to sugar \n" | ||
"carbons) as the points between which to draw bond-vectors for each link in the \n" | ||
"chain. It will then proceed to compute the OCF as normal for these. To use the \n" | ||
"centroids instead of the COM, elide the --center-of-mass flag. \n" | ||
" \n" | ||
"ocf --selection 'resid < 31' --bond-atom-selection 'name ~= \"C*\'\"' --center-\n" | ||
"of-mass --residue-centroids \ \n" | ||
"model traj > ocf_sugar_carbons_com.json \n" | ||
" \n" | ||
"Like the above, but only look for the bond atom selection within the first 30 \n" | ||
"residues in the model. \n" | ||
" \n" | ||
"POTENTIAL COMPLICATIONS \n" | ||
" \n" | ||
"Be careful with selection strings; results that are only subtly wrong could \n" | ||
"emerge from a string that grabs atoms or groups you're not expecting. While \n" | ||
"this is always a good thing to be careful about when analyzing trajectories, \n" | ||
"the peril here (because the selections are being split internally across either\n" | ||
" residues or contiguous sections within the bond atom selection) seems great \n" | ||
"indeed. \n" | ||
" \n" | ||
"The '--group-centroids' flag shouldn't be used unless you're after treating a \n" | ||
"collection of atoms that is trans-residue according to how your model defines \n" | ||
"residues. If you do need this functionality, make sure your model has \n" | ||
"connectivity, or find a way to add it. Using the '--infer-connectivity' flag to\n" | ||
" do this is applying a simple distance cutoff to decide where the chemical \n" | ||
"bonds are in your system from the first frame, which should be treated with \n" | ||
"caution.\n" | ||
; | ||
// clang-format on | ||
|
||
class ToolOptions : public opts::OptionsPackage { | ||
public: | ||
ToolOptions() {} | ||
// clang-format off | ||
void addGeneric(po::options_description& o) { | ||
o.add_options() | ||
("bond-atom-selection,B", po::value<string>(&bond_atom_selection)-> | ||
default_value("name == 'CA' || name == 'P'"), | ||
"Selection of atoms to compute the OCF across") | ||
("max-offset,M", po::value<int>(&max_offset)->default_value(-1), | ||
"Consider all |i - j| up to this value. -1 considers all possible offsets.") | ||
("group-centroids", po::bool_switch(&group_centroids)->default_value(false), | ||
"If thrown, split bond-atom-selection by molecule and compute BVs between centroids.") | ||
("residue-centroids", po::bool_switch(&residue_centroids)->default_value(false), | ||
"Split bond-atom-selection by residue, then track centroids for bond-vectors.") | ||
("center-of-mass,c", po::bool_switch(&com)->default_value(false), | ||
"Instead of using centroids, use centers of mass for groups/residues.") | ||
("infer-connectivity", po::value<float>(&bondlength)->default_value(-1), | ||
"Infer connectivity using provided distance for models lacking this. ALERT: " | ||
"uses hard distance cutoff on first frame of traj to infer connectivity. " | ||
"Only does this for values greater than zero.") | ||
; | ||
} | ||
// clang-format on | ||
string print() const { | ||
ostringstream oss; | ||
oss << boost::format("bond_atom_selection=%s,max_offset=%d,group_centroids=" | ||
"%b,bondlength=%d,residue_centroids%b,com=%b") % | ||
bond_atom_selection % max_offset % group_centroids % bondlength % | ||
residue_centroids % com; | ||
return (oss.str()); | ||
} | ||
|
||
bool postConditions(po::variables_map &map) { | ||
if (group_centroids && residue_centroids) { | ||
cerr << "ERROR: --group-centroids and --residue-centroids flags are " | ||
"mutually exclusive.\n"; | ||
return (false); | ||
} else if (com && !(group_centroids || residue_centroids)) { | ||
cerr << "ERROR: --center-of-mass must be used with --group-centroids or" | ||
"--residue-centroids.\n"; | ||
return (false); | ||
} else | ||
return (true); | ||
} | ||
string bond_atom_selection; | ||
int max_offset; | ||
bool group_centroids; | ||
bool residue_centroids; | ||
bool com; | ||
float bondlength; | ||
}; | ||
const string indent = " "; | ||
|
||
inline void ag_bond_vectors(AtomicGroup &chain, vector<GCoord> &bond_vectors) { | ||
for (uint i = 0; i < chain.size() - 1; i++) | ||
bond_vectors[i] = chain[i]->coords() - chain[i + 1]->coords(); | ||
} | ||
|
||
inline void centroid_bond_vectors(vector<AtomicGroup> &chain, | ||
vector<GCoord> &bond_vectors) { | ||
for (uint i = 0; i < chain.size() - 1; i++) | ||
bond_vectors[i] = chain[i].centroid() - chain[i + 1].centroid(); | ||
} | ||
|
||
inline void com_bond_vectors(vector<AtomicGroup> &chain, | ||
vector<GCoord> &bond_vectors) { | ||
for (uint i = 0; i < chain.size() - 1; i++) | ||
bond_vectors[i] = chain[i].centerOfMass() - chain[i + 1].centerOfMass(); | ||
} | ||
// this is the work to be done inside the traj loop, that is, per-frame. | ||
inline void compute_ocf_bondlength(uint max_offset, | ||
vector<GCoord> &bond_vectors, | ||
greal &accum_ocfs, vector<greal> &mean_ocfs, | ||
vector<greal> &var_ocfs, | ||
vector<greal> &accum_sq_mean, greal weight, | ||
greal &bl_accum) { | ||
for (uint offset_idx = 0; offset_idx < max_offset; offset_idx++) { | ||
uint offset = offset_idx + 1; | ||
greal accumulated_bvproj = 0; | ||
greal accumulated_square = 0; | ||
greal bvproj = 0; | ||
for (uint i = 0; i < bond_vectors.size() - offset; i++) { | ||
bvproj = bond_vectors[i].uvdot(bond_vectors[i + offset]); | ||
accumulated_bvproj += bvproj; | ||
accumulated_square += bvproj * bvproj; | ||
} | ||
accum_ocfs += accumulated_bvproj * weight; | ||
greal mean_ocf_atoffset = | ||
accumulated_bvproj / (bond_vectors.size() - offset) * weight; | ||
mean_ocfs[offset_idx] += mean_ocf_atoffset; | ||
var_ocfs[offset_idx] += | ||
accumulated_square * weight / (bond_vectors.size() - offset) - | ||
mean_ocf_atoffset * mean_ocf_atoffset; | ||
accum_sq_mean[offset_idx] += mean_ocf_atoffset * mean_ocf_atoffset; | ||
} | ||
for (auto bond : bond_vectors) | ||
bl_accum += bond.length() * weight; | ||
} | ||
|
||
int main(int argc, char *argv[]) { | ||
|
||
// parse the command line options | ||
string hdr = invocationHeader(argc, argv); | ||
opts::BasicOptions *bopts = new opts::BasicOptions(fullHelpMessage); | ||
opts::BasicSelection *sopts = new opts::BasicSelection("all"); | ||
opts::MultiTrajOptions *mtopts = new opts::MultiTrajOptions; | ||
opts::WeightsOptions *wopts = new opts::WeightsOptions; | ||
ToolOptions *topts = new ToolOptions; | ||
|
||
opts::AggregateOptions options; | ||
options.add(bopts).add(sopts).add(mtopts).add(wopts).add(topts); | ||
if (!options.parse(argc, argv)) | ||
exit(-1); | ||
|
||
cout << "# " << hdr << "\n"; | ||
// establish system, and subsystems | ||
AtomicGroup model = mtopts->model; | ||
if (model.hasBonds()) { | ||
} else if (topts->bondlength > 0) | ||
model.findBonds(topts->bondlength); | ||
else | ||
throw(LOOSError( | ||
"Model does not appear to have chemical connectivity, and " | ||
"infer-connectivity has not been set to a positive value.\n")); | ||
if (topts->max_offset == 0) | ||
throw( | ||
LOOSError("You asked for an offset of zero, which is not possible.\n")); | ||
AtomicGroup scope = selectAtoms(model, sopts->selection); | ||
pTraj traj = mtopts->trajectory; | ||
// move unique ptr to Weights into main function ownership for ease of use. | ||
auto weights = move(wopts->pWeights); | ||
// Attach trajectory to weights | ||
weights->addTraj(traj); | ||
// initialize max offset at top level, define either with user input | ||
// or as a function of chain, below. | ||
vector<AtomicGroup> chain; | ||
// Define pointer to the function that will update the bond sites -- will be | ||
// either com_bond_vectors or centroid_bond_vectors, depending on user input | ||
void (*bv_getter)(vector<AtomicGroup> &, vector<GCoord> &); | ||
// determine what points to use for the centers of each link in the chain | ||
// based on user input | ||
if (topts->com) { | ||
bv_getter = &com_bond_vectors; | ||
if (topts->group_centroids) { | ||
chain = scope.splitByMolecule(topts->bond_atom_selection); | ||
} else if (topts->residue_centroids) { | ||
chain = selectAtoms(scope, topts->bond_atom_selection).splitByResidue(); | ||
} | ||
} else { | ||
bv_getter = ¢roid_bond_vectors; | ||
if (topts->group_centroids) { | ||
chain = scope.splitByMolecule(topts->bond_atom_selection); | ||
} else if (topts->residue_centroids) { | ||
chain = selectAtoms(scope, topts->bond_atom_selection).splitByResidue(); | ||
} else { | ||
chain = selectAtoms(scope, topts->bond_atom_selection).splitByMolecule(); | ||
} | ||
} | ||
// Now figure out how many bond vectors and offsets we're tracking | ||
vector<GCoord> bond_vectors(chain.size() - 1, 0); | ||
vector<greal> mean_ocfs(chain.size() - 2, 0); | ||
vector<greal> var_ocfs(chain.size() - 2, 0); | ||
vector<greal> accum_sq_mean(chain.size() - 2, 0); | ||
greal accum_ocf = 0; | ||
greal bondlength = 0; | ||
uint max_offset; | ||
if (topts->max_offset > 0) | ||
max_offset = topts->max_offset; | ||
else if (topts->max_offset < 0) | ||
max_offset = bond_vectors.size() - 1; | ||
// loop over trajectory | ||
for (auto frame_index : mtopts->frameList()) { | ||
traj->readFrame(frame_index); | ||
traj->updateGroupCoords(scope); | ||
// get frame weights; defaults to zero | ||
const double weight = weights->get(); | ||
weights->accumulate(); | ||
bv_getter(chain, bond_vectors); | ||
compute_ocf_bondlength(max_offset, bond_vectors, accum_ocf, mean_ocfs, | ||
var_ocfs, accum_sq_mean, weight, bondlength); | ||
} | ||
bondlength /= bond_vectors.size(); | ||
// create the JSON report, written to stdout. | ||
cout << "{\n" + indent + "\"mean ocfs\": [\n"; | ||
for (auto i : mean_ocfs) | ||
cout << indent + indent << i / weights->totalWeight() << ",\n"; | ||
cout << indent + "],\n"; | ||
cout << indent + "\"variance of means\": [\n"; | ||
greal mean_ocf; | ||
for (uint i = 0; i < mean_ocfs.size(); i++) { | ||
mean_ocf = mean_ocfs[i] / weights->totalWeight(); | ||
cout << indent + indent | ||
<< accum_sq_mean[i] / weights->totalWeight() - mean_ocf * mean_ocf | ||
<< ",\n"; | ||
} | ||
cout << indent + "],\n"; | ||
cout << indent + "\"mean variances\": [\n"; | ||
for (auto i : var_ocfs) | ||
cout << indent + indent << i / weights->totalWeight() << ",\n"; | ||
cout << indent + "],\n"; | ||
cout << indent + "\"mean projections summed\": " | ||
<< accum_ocf / weights->totalWeight() << ",\n"; | ||
cout << indent + "\"mean bondlength\": " | ||
<< bondlength / weights->totalWeight() << "\n"; | ||
cout << "}\n"; | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters