-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] adding function that attempts to do some sort of clustering #21
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks really good so far! Seems like the main things to finish are 1) add a call to . EDIT: sorry, missed your checklist above!cluster_snapshots
from analyze_run
, and 2) get a list of existing clones and gens for random selection (see my comment)
clone = random.randint(0,99) | ||
gen = random.randint(0,2) | ||
if i == 0: | ||
# TODO safeguard against trying to load output that doesn't exist --- chose n_snapshots randomly from those that exists on disk |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is to be called from analyze_run
? (as we do for save_representative_snapshots
here?) If so, we'll have access to the Works extracted from the globals.csv
files.
Assuming that the existence of a globals.csv
implies that a trajectory was also written, you should be able to get the existing trajectories by passing a works: List[Work]
argument down to this function and using something like
clones = [work.path.clone for work in works]
gens = [work.path.gen for work in works]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes! It should be (eventually) in the place of save_snapshots
as it's just a slightly more comprehensive way to do the same thing. Do you think works is a better kwargs, or a list of clones and gens?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think maybe it would be better to take the clones/gens. If we want to do it at random, we can put random values into the function, but we may also want to cluster ALL of GEN0, so maybe it makes better sense to have the choosing clone/gen logic outside the function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Passing just a list of clones/gens sounds reasonable to me (especially if we're not actually going to use the works here). The list_results
function in lib.py
gets a listing of clones and gens given a project path and run.
Actually, it might make sense to change things around a little bit to call list_path
just once in analyze_run
, and pass the result to both extract_works
and to this function. (We can always clean this up later, too)
Addressing comments from Matt's review!
This PR adds the function
cluster_snapshots
which loads up N snapshots, does clustering based on the OLD LIGAND position, and then finds the index of the snapshot that is closest to the mean of the largest cluster.
This still needs work:
ligands.sdf
file that is printed outextract_snapshot
, saving to disk with a different filename before we replace it* this is the ligand RMSD to itself, between the random snapshots, and not the RMSD of the scaffold-core to the crystallographic positions (which would also be interesting)