-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize conformer comparison algorithm #765
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #765 +/- ##
==========================================
+ Coverage 73.95% 73.98% +0.02%
==========================================
Files 101 101
Lines 27777 27804 +27
Branches 5817 5825 +8
==========================================
+ Hits 20542 20570 +28
+ Misses 5774 5773 -1
Partials 1461 1461
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I added my initial review.
arc/species/converter.py
Outdated
fl_distance1 = np.linalg.norm(np.array(xyz1['coords'][0]) - np.array(xyz1['coords'][-1])) | ||
if fl_distance2 is None: | ||
fl_distance2 = np.linalg.norm(np.array(xyz2['coords'][0]) - np.array(xyz2['coords'][-1])) | ||
if abs(fl_distance1-fl_distance2)/fl_distance1>rtol: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add a space before and after operators
arc/species/converter.py
Outdated
fl_distance2 = np.linalg.norm(np.array(xyz2['coords'][0]) - np.array(xyz2['coords'][-1])) | ||
if abs(fl_distance1-fl_distance2)/fl_distance1>rtol: | ||
return fl_distance1, fl_distance2, dmat1, dmat2 | ||
if dmat1 is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
at this point we know for certain that dmat1 is None
arc/species/conformers.py
Outdated
exists = True | ||
dmat1, fl_distance1 = None, None | ||
for conf in new_conformers + newest_conformer_list: | ||
conf['dmat'] = conf.get('dmat') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you explain why this is needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think otherwise we may get into a case where conf
dictionary doesn't have dmat
at the beginning.
arc/species/conformers.py
Outdated
fl_distance2=conf['fl_distance']) | ||
conf['fl_distance'] = fl_distance2 if conf['fl_distance'] is None else conf['fl_distance'] | ||
conf['dmat'] = dmat2 if conf['dmat'] is None else conf['dmat'] | ||
if dmat1 is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand that if dmat1 is None, this implies that the fl_distance was too large. But this code is not self-explanatory, therefore hard to maintain and understand.
I recommend to pass the two conformer dictionaries to compare_confs_fl
, then have compare_confs_fl
return three things: whether the conformers are similar or not and the two updated conformer dictionaries. The we update our confs with the returned dictionaries
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I agree. I modified compare_confs_fl
with your advice. However, it's a bit inconvenient to pass both conf dicts under conformers_combinations_by_lowest_conformer
, since we are looping xyz
, as shown here.
However, if you would like me to modify it further, please let me know.
8a7b5b0
to
89ce5f5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Please see some comments
arc/species/converter.py
Outdated
fl_distance1 = np.linalg.norm(np.array(xyz1['coords'][0]) - np.array(xyz1['coords'][-1])) | ||
if conf2['fl_distance'] is None: | ||
conf2['fl_distance'] = np.linalg.norm(np.array(xyz2['coords'][0]) - np.array(xyz2['coords'][-1])) | ||
if abs(fl_distance1-conf2['fl_distance']) / fl_distance1 > rtol: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use np.isclose
here
if abs(fl_distance1-conf2['fl_distance']) / fl_distance1 > rtol: | ||
return fl_distance1, dmat1, conf2, simliar | ||
simliar = True | ||
dmat1 = xyz_to_dmat(xyz1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we compute dmat1 in this function?
This function also doesn't use conf2['dmat']
, maybe we can simplify it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The purpose of this function compare_confs_fl
, is as following:
- To compare two Cartesian coordinates representing conformers by examining the distances between the first and last atoms.
- Distance matrices are computed and returned only if these distances are identical.
Considering potential modifications to the second purpose of this process, here are three options:
- Rename the function to
compare_confs_pre
to indicate that it performs a preliminary, quicker comparison before the more resource-intensive compare_confs calculation. - Create a new function to do this calculation.
- Merge this part to
compare_confs
, though feasible, I do not recommend this due to the complexity it would add, especially in returning the variableconf2
, which could complicate the primary function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
arc/species/conformers.py
Outdated
if xyz is not None and energy is not None: | ||
conformer = {'index': len_conformers + len(new_conformers) + len(newest_conformer_list), | ||
'xyz': xyz, | ||
'FF energy': round(energy, 3), | ||
'source': f'Changing dihedrals on most stable conformer, iteration {i}', | ||
'torsion': tor, | ||
'dihedral': round(dihedral, 2)} | ||
'dihedral': round(dihedral, 2), | ||
'dmat': dmat1 if dmat1 is not None else None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If dmat1
either has a value or can be None
, and you want its value or else None
, then considering just keeping 'dmat': dmat1
instead of the entire line. Same for the line below
arc/species/conformers.py
Outdated
if index > 0 and not any([converter.compare_confs(lowest_conf['xyz'], conformer_list[index]['xyz']) | ||
for lowest_conf in lowest_confs]): | ||
lowest_confs.append(conformer_list[index]) | ||
if index>0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a space before and after operators
4f2177a
to
847440f
Compare
Compare two Cartesian coordinates representing conformers using first and last atom distances. If the `fl_distances` are the similar, the distance matrices are computed and returned.
If we don't need the conversion, meaning `dmat1` and `dmat2` are both given, we can start the subsequent steps directly.
847440f
to
b5ad9e0
Compare
Save `fl_distance` and `dmat` to each of the `conformer` for efficient processing.
b5ad9e0
to
693b2f3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good. Thanks!
The previous conformer comparison algorithm required a distance matrix (d-matrix) for operations. In this update, we introduce a new variable,
fl_distance
, which represents the distance between the first and last elements. This method provides a less costly preliminary comparison before invoking the d-matrix comparison. Now, the d-matrix comparison will only be executed when necessary, optimizing the overall efficiency of the algorithm.