[WIP] adding function that attempts to do some sort of clustering #21

hannahbrucemacdonald · 2020-08-21T14:35:48Z

This PR adds the function cluster_snapshots

which loads up N snapshots, does clustering based on the OLD LIGAND position, and then finds the index of the snapshot that is closest to the mean of the largest cluster.

This still needs work:

pick frames based at random based on what is on disk (currently hard-coded which isn't safe if it tries to open a clone/gen that hasn't run, or if the number of clones/gens changes in future iterations
is only running on the old-ligand, when in future may want to do for the new ligand too
optimising clustering parameters. I chose 0.5 as it looked ok for one example, something else might be better
add the ligand RMSD* and protein RMSD to the oemol/have it as an entry in the ligands.sdf file that is printed out
integrating into the main analysis pipeline --- for now, lets use this as well as extract_snapshot, saving to disk with a different filename before we replace it

* this is the ligand RMSD to itself, between the random snapshots, and not the RMSD of the scaffold-core to the crystallographic positions (which would also be interesting)

mcwitt

Looks really good so far! ~~Seems like the main things to finish are 1) add a call to cluster_snapshots from analyze_run, and 2) get a list of existing clones and gens for random selection (see my comment)~~. EDIT: sorry, missed your checklist above!

mcwitt · 2020-08-21T16:13:15Z

covid_moonshot/analysis/structures.py

+        clone = random.randint(0,99)
+        gen = random.randint(0,2)
+        if i == 0:
+            # TODO safeguard against trying to load output that doesn't exist --- chose n_snapshots randomly from those that exists on disk


Is to be called from analyze_run? (as we do for save_representative_snapshots here?) If so, we'll have access to the Works extracted from the globals.csv files.

Assuming that the existence of a globals.csv implies that a trajectory was also written, you should be able to get the existing trajectories by passing a works: List[Work] argument down to this function and using something like

clones = [work.path.clone for work in works] gens = [work.path.gen for work in works]

Yes! It should be (eventually) in the place of save_snapshots as it's just a slightly more comprehensive way to do the same thing. Do you think works is a better kwargs, or a list of clones and gens?

I think maybe it would be better to take the clones/gens. If we want to do it at random, we can put random values into the function, but we may also want to cluster ALL of GEN0, so maybe it makes better sense to have the choosing clone/gen logic outside the function?

Passing just a list of clones/gens sounds reasonable to me (especially if we're not actually going to use the works here). The list_results function in lib.py gets a listing of clones and gens given a project path and run.

Actually, it might make sense to change things around a little bit to call list_path just once in analyze_run, and pass the result to both extract_works and to this function. (We can always clean this up later, too)

covid_moonshot/analysis/structures.py

Addressing comments from Matt's review!

adding function to attempt to cluster based on ligand position - WIP

04f1d2b

hannahbrucemacdonald requested a review from mcwitt August 21, 2020 14:35

mcwitt reviewed Aug 21, 2020

View reviewed changes

covid_moonshot/analysis/structures.py Show resolved Hide resolved

Base automatically changed from compile-output to master August 22, 2020 03:48

hannahbrucemacdonald added 2 commits August 22, 2020 19:30

Small fixes to code format

90764a3

Addressing comments from Matt's review!

moving imports into correct function

07337e0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] adding function that attempts to do some sort of clustering #21

[WIP] adding function that attempts to do some sort of clustering #21

hannahbrucemacdonald commented Aug 21, 2020 •

edited

Loading

mcwitt left a comment •

edited

Loading

mcwitt Aug 21, 2020

hannahbrucemacdonald Aug 22, 2020

hannahbrucemacdonald Aug 22, 2020

mcwitt Aug 24, 2020

[WIP] adding function that attempts to do some sort of clustering #21

Are you sure you want to change the base?

[WIP] adding function that attempts to do some sort of clustering #21

Conversation

hannahbrucemacdonald commented Aug 21, 2020 • edited Loading

mcwitt left a comment • edited Loading

Choose a reason for hiding this comment

mcwitt Aug 21, 2020

Choose a reason for hiding this comment

hannahbrucemacdonald Aug 22, 2020

Choose a reason for hiding this comment

hannahbrucemacdonald Aug 22, 2020

Choose a reason for hiding this comment

mcwitt Aug 24, 2020

Choose a reason for hiding this comment

hannahbrucemacdonald commented Aug 21, 2020 •

edited

Loading

mcwitt left a comment •

edited

Loading