Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

uniq usage causes pLDDT row misalignment #226

Open
jscgh opened this issue Nov 29, 2024 · 1 comment
Open

uniq usage causes pLDDT row misalignment #226

jscgh opened this issue Nov 29, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@jscgh
Copy link

jscgh commented Nov 29, 2024

Description of the bug

I might be missing something, but in the following lines:

    awk '{print \$6"\\t"\$11}' ranked_0.pdb | uniq > ranked_0_plddt.tsv
    for i in 1 2 3 4
        do awk '{print \$6"\\t"\$11}' ranked_\$i.pdb | uniq | awk '{print \$2}' > ranked_"\$i"_plddt.tsv
    done

uniq is applied only to the pLDDT scores, which removes consecutive duplicate lines. This can result in imbalanced output when different ranked structures have varying numbers of consecutive pLDDT scores, leading to row misalignment and incorrect downstream visualisation (as shown below):

Image

Command used and terminal output

Relevant files

No response

System information

No response

@jscgh jscgh added the bug Something isn't working label Nov 29, 2024
@keiran-rowell-unsw
Copy link

I've started on a MultiQC implementation that uses Biopython to parse the b-factors insteads.

from Bio import PDB
parser = PDB.PDBParser(QUIET=True)

It also supports the .cif output of AlphaFold3

elif samplename.endswith(".cif"):
   parser = PDB.MMCIFParser(QUIET=True)

I'll upload when more feature complete, but can provide code snippets if a more robust way to parse pLDDT from structures is desired.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants