Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question on custom paired msa #99

Open
jvenderley-lilly opened this issue Dec 10, 2024 · 1 comment
Open

Question on custom paired msa #99

jvenderley-lilly opened this issue Dec 10, 2024 · 1 comment

Comments

@jvenderley-lilly
Copy link

jvenderley-lilly commented Dec 10, 2024

Thanks for all the great work on this! Is it possible to add an example for using a precomputed, custom paired msa? I know that v0.3.0 added a custom MSA pairing format using a CSV, but can you confirm that the model can accept a paired .a3m file like that produced by mmseqs2 with colabfold's colabfold_search.sh? I can't tell if some protein-protein folding cases are failing because the msa is not inputted correctly or if it's due to the model. I suspect that it's on the input side since colabfold can produce the correct outputs. Thanks!

@yktsnd
Copy link

yktsnd commented Dec 18, 2024

Thank you for raising this question! I encountered a similar issue and found a solution that might help.

The .a3m files generated by colabfold_search can be parsed using the following script:
https://github.com/cddlab/alphafold3_tools/blob/main/alphafold3tools/msatojson.py

Please note that this script was designed for a naive AlphaFold3 implementation by Yoshitaka Moriwaki, so you may need to make minor adjustments. Based on the implementation in boltz by boltz-1 developers (

def compute_msa(
), I was able to successfully process the MSA by following these steps:

  1. Store paired MSA: Save the paired MSA in a sequential integer-based format.
  2. Handle single-sequence MSAs: Use -1 to represent MSAs derived from single sequences.
  3. Create chain-specific CSV files: For each chain, create a CSV file and specify the MSA path in a .fasta file in the format (like, > A|protein|your_path)

After implementing these steps, the input seemed to work correctly for me.

I’d love to hear about your progress or any insights you might have while testing this! Let me know if this resolves the issue or if further clarification is needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants