`uniq` usage causes pLDDT row misalignment #226

jscgh · 2024-11-29T04:38:13Z

Description of the bug

I might be missing something, but in the following lines:

    awk '{print \$6"\\t"\$11}' ranked_0.pdb | uniq > ranked_0_plddt.tsv
    for i in 1 2 3 4
        do awk '{print \$6"\\t"\$11}' ranked_\$i.pdb | uniq | awk '{print \$2}' > ranked_"\$i"_plddt.tsv
    done

uniq is applied only to the pLDDT scores, which removes consecutive duplicate lines. This can result in imbalanced output when different ranked structures have varying numbers of consecutive pLDDT scores, leading to row misalignment and incorrect downstream visualisation (as shown below):

Command used and terminal output

Relevant files

No response

System information

No response

The text was updated successfully, but these errors were encountered:

keiran-rowell-unsw · 2024-12-17T23:01:24Z

I've started on a MultiQC implementation that uses Biopython to parse the b-factors insteads.

from Bio import PDB
parser = PDB.PDBParser(QUIET=True)

It also supports the .cif output of AlphaFold3

elif samplename.endswith(".cif"):
   parser = PDB.MMCIFParser(QUIET=True)

I'll upload when more feature complete, but can provide code snippets if a more robust way to parse pLDDT from structures is desired.

jscgh added the bug Something isn't working label Nov 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`uniq` usage causes pLDDT row misalignment #226

`uniq` usage causes pLDDT row misalignment #226

jscgh commented Nov 29, 2024

keiran-rowell-unsw commented Dec 17, 2024

uniq usage causes pLDDT row misalignment #226

uniq usage causes pLDDT row misalignment #226

Comments

jscgh commented Nov 29, 2024

Description of the bug

Command used and terminal output

Relevant files

System information

keiran-rowell-unsw commented Dec 17, 2024

`uniq` usage causes pLDDT row misalignment #226

`uniq` usage causes pLDDT row misalignment #226