Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

poly.type not recognized by the wwPDB servers #19

Open
daniel-s-d-larsson opened this issue Sep 27, 2024 · 10 comments
Open

poly.type not recognized by the wwPDB servers #19

daniel-s-d-larsson opened this issue Sep 27, 2024 · 10 comments

Comments

@daniel-s-d-larsson
Copy link

I have a problem that the wwPDB deposition server and also the wwPDB validation server misidentifies my polymer chains as polyribonucleotide instead of polypeptide(L) when I upload the refined.mmcif file from refine_spa_norefmac. The poly.type is set correctly in the header, but the wwPDB staff says it is wrong on their side.

Example:

loop_
_entity_poly.entity_id
_entity_poly.type
_entity_poly.pdbx_strand_id
_entity_poly.pdbx_seq_one_letter_code
A polyribonucleotide A ?
B polypeptide(L)     B ?
C polypeptide(L)     C ?
D polypeptide(L)     D ?
E polypeptide(L)     E ?
...

This is a ribosome structure with both ribonucleic acids and many proteins. Strangely, only protein chains up til a specific point are identified as polyribonucleotide, which indicate to me that there is some corruption in the file. Could the lack of TER records cause this problem? In an earlier deposition, which used an older version of refine_spa, the polymer type records were not there.

@wojdyr
Copy link
Contributor

wojdyr commented Sep 27, 2024

If you mean the lack of TER in mmCIF, it's fine, only PDB files have TER records.

The software used on the PDB servers discards and regenerates some information; it may easily happen that correct annotation is replaced with incorrect one. To investigate it, we'd need an example file that demonstrates the problem.

@daniel-s-d-larsson
Copy link
Author

How can I send you the file if I don't want to post it here? Maybe I could remove the coordinate columns...

@wojdyr
Copy link
Contributor

wojdyr commented Sep 29, 2024

You could use send it by email ([email protected]).
Editing the file requires some work on your side, but would also be fine – perhaps it'd suffice to include only a few residues in each chain.

@keitaroyam
Copy link
Owner

Please also send it to me, or let Marcin share it with me. Did the PDB staff explain what was wrong?

@daniel-s-d-larsson
Copy link
Author

Ok, I will send you the problematic file. I will also ask the wwPDB staff to describe the problem in detail.

@daniel-s-d-larsson
Copy link
Author

Now I have tried uploading different modified versions of the mmcif file to the deposition and validation servers, including running the files through the PDB extract server, and I cannot figure out exactly what is causing the problem. For the time being, I cannot waste more time on this issue, but everything points at it being the PDB servers reading the poly.type records incorrectly or mapping to the chains incorrectly. My workaround is to delete the section entirely before uploading to a fresh deposition session.

@wojdyr
Copy link
Contributor

wojdyr commented Sep 30, 2024

Wouldn't it be easier to just send us the file?
(Never mind, Keitaro reproduced it using 7k00)

@wojdyr
Copy link
Contributor

wojdyr commented Sep 30, 2024

It seems to be a bug in maxit. I compiled v11.200 and reduced 7k00 to ~500 lines input.mmcif.gz to reproduce it.

The input file has:

loop_
_entity.id
_entity.type
A     polymer
B     polymer

loop_
_entity_poly.entity_id
_entity_poly.type
A polyribonucleotide
B polypeptide(L)

Running:

maxit -input input.mmcif -output output.cif -o 8

produces output.cif with:

loop_
_entity_poly.entity_id 
_entity_poly.type 
_entity_poly.nstd_linkage 
_entity_poly.nstd_monomer 
_entity_poly.pdbx_seq_one_letter_code 
_entity_poly.pdbx_seq_one_letter_code_can 
_entity_poly.pdbx_strand_id 
_entity_poly.pdbx_target_identifier 
1 polyribonucleotide no no AAUUGAAGA   AAUUGAAGA   A ? 
2 polyribonucleotide no no VSMRDMLKAGV VSMRDMLKAGV B ? 
# 
loop_
_entity_poly_seq.entity_id 
_entity_poly_seq.num 
_entity_poly_seq.mon_id 
_entity_poly_seq.hetero 
1 1  A   n 
1 2  A   n 
1 3  U   n 
1 4  U   n 
1 5  G   n 
1 6  A   n 
1 7  A   n 
1 8  G   n 
1 9  A   n 
2 1  VAL n 
2 2  SER n 
2 3  MET n 
2 4  ARG n 
2 5  ASP n 
2 6  MET n 
2 7  LEU n 
2 8  LYS n 
2 9  ALA n 
2 10 GLY n 
2 11 VAL n 
# 
loop_
_entity.id 
_entity.type 
_entity.src_method 
_entity.pdbx_description 
_entity.formula_weight 
_entity.pdbx_number_of_molecules 
_entity.pdbx_ec 
_entity.pdbx_mutation 
_entity.pdbx_fragment 
_entity.details 
1 polymer man 
;RNA (5'-R(P*AP*AP*UP*UP*GP*AP*AP*GP*A)-3')
;
2903.815 1 ? ? ? ? 
2 polymer man VAL-SER-MET-ARG-ASP-MET-LEU-LYS-ALA-GLY-VAL  1208.495 1 ? ? ? ? 
# 

All looks fine apart from:

1 polyribonucleotide no no AAUUGAAGA   AAUUGAAGA   A ? 
2 polyribonucleotide no no VSMRDMLKAGV VSMRDMLKAGV B ?

If I change the order of lines in the input to:

loop_
_entity_poly.entity_id
_entity_poly.type
B polypeptide(L)
A polyribonucleotide

then in the output I get:

1 "polypeptide(L)" no no AAUUGAAGA   AAUUGAAGA   A ? 
2 "polypeptide(L)" no no VSMRDMLKAGV VSMRDMLKAGV B ?

If _entity_poly.type is absent in the input file, it's correct in the output.

@daniel-s-d-larsson
Copy link
Author

Good that you found the culprit. For the time being, I will just delete the poly.type section before I upload files to wwPDB.

@keitaroyam
Copy link
Owner

Just noticed maxit-v11.300 has been released https://sw-tools.rcsb.org/apps/MAXIT/source.html and it worked properly. Also tested https://validate-rcsb-1.wwpdb.org/, which seemed to still use an older maxit version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants