Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

disambiguate assembly chains by appending assembly id #700

Merged
merged 8 commits into from
Dec 20, 2024

Conversation

alex-hh
Copy link
Contributor

@alex-hh alex-hh commented Nov 10, 2024

Biotite doesn't currently distinguish between the chains produced by symmetry operations on a given asym unit in an assembly.

This can make certain selection operations slightly inconvenient (e.g. getting residue starts of a specific chain etc.) and is inconsistent with the chain naming convention employed by the PDB in assembly .cif files.

This PR follows the PDB convention of appending a '-<oper_id>' to the chain ids.

@alex-hh alex-hh marked this pull request as draft November 10, 2024 13:17
@alex-hh alex-hh marked this pull request as ready for review November 10, 2024 14:06
@alex-hh alex-hh force-pushed the disambiguate-assembly-chains branch from af068cd to 3918b0f Compare November 10, 2024 14:06
@padix-key
Copy link
Member

Hi. First I thought that this is the same as #389, but while separating chains is also possible without changing the AtomArray annotations, this is not true if one wants to have all chains for a specific symmetry operation. So I see definitely the use case for this feature.

However, I am leaning towards keeping the original chain_id annotation here, as it represents the asym_id from the atom_site category. I think in the assembly CIF files they go this route, because they have no other way of representing this while keeping the file compatible with the PDBx dictionary.

Instead I propose to add an additional optional annotation array (what do you think about sym_id?) that tracks this information instead. Similar to the appended number in your draft it would go from 0 to N. If a user still wants to have this information in the chain_id, they could concatenate chain_id with sym_id via numpy.strings.add().

As a pointer, have a look at set_annotation() in the AtomArray class, for how to add a custom annotation.

@Croydon-Brixton
Copy link
Contributor

I'd also be in favour of a separate sym_id annotation.

The one caveat I'd have is that some symmetry operations are actually 'slicing' type operations rather than copying operations, so if we wanted to add a sym_id to the asym unit we would need to decide whether we assign sym_id's based on the operation id or whether we want to assign them based on actual chemical similarity. (See 5ocm for an example). Personally I'd lean towards just using the oper_id as you suggested, @alex-hh

@alex-hh
Copy link
Contributor Author

alex-hh commented Nov 11, 2024

Hi - Thanks for the pointer to the previous issue, this also helps with some of the use cases I was considering.

Making sym_id an optional annotation makes sense to me.

I've updated the PR with one possible solution for the sym_id (concatenation of the oper ids applied to generate the sym unit), but agree it's not obvious what the value should be. I was trying to follow the convention from the assembly.cif files, but I'm not sure what they do in complex cases there.

@padix-key
Copy link
Member

Having sym_id concatenate the different operation IDs would also be OK for me. I suspect there are no strctures where a high number of these operations are combined

Copy link
Member

@padix-key padix-key left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, I would recommend a few small changes, but overall it looks good!

src/biotite/structure/io/pdbx/convert.py Outdated Show resolved Hide resolved
src/biotite/structure/io/pdbx/convert.py Outdated Show resolved Hide resolved
src/biotite/structure/io/pdbx/convert.py Outdated Show resolved Hide resolved
@alex-hh alex-hh requested a review from padix-key November 18, 2024 21:29
@alex-hh
Copy link
Contributor Author

alex-hh commented Nov 18, 2024

updated!

Copy link

codspeed-hq bot commented Nov 22, 2024

CodSpeed Performance Report

Merging #700 will not alter performance

Comparing alex-hh:disambiguate-assembly-chains (92f6834) with main (0acb143)

Summary

✅ 47 untouched benchmarks

@padix-key
Copy link
Member

I took the freedom to add some tests and documentation about the new annotation. Have a look of you like. From my side this would be ready to merge, when the discussion above is resolved.

@padix-key padix-key force-pushed the disambiguate-assembly-chains branch from 93bb63f to 92f6834 Compare December 20, 2024 09:55
@padix-key
Copy link
Member

I'll merge the PR now. Thanks again @alex-hh for implementing it and to @Croydon-Brixton and @nscorley for the feedback!

@padix-key padix-key merged commit 97ea5be into biotite-dev:main Dec 20, 2024
28 checks passed
@Croydon-Brixton
Copy link
Contributor

Always a pleasure! Thank you for all the great work on biotite!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants