Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(ena-submission): Create ena assembly #2332

Closed
wants to merge 14 commits into from
Closed

Conversation

anna-parker
Copy link
Contributor

@anna-parker anna-parker commented Jul 24, 2024

resolves #2397

preview URL: https://create-ena-assembly.loculus.org/

Summary

Uses same principal as create_ena_projects to keep submission state in DB - please review that PR first :-)

Summary

This adds the following rule to the ena-submission snakemake file :

  • create_assembly rule . This function will continuously (in a loop) scan for new sequences where a sample needs to be created and trigger their creation. It will also update both the submission_table and the assembly_table.
  • In contrast to create_project and create_sample we need to use the webin-cli.jar for assembly submission. This had to be added to the docker image (and its dependencies).
  • In contrast to create_project and create_sample we also need to wait for assemblies to be approved and only then we will receive the assembly accessions: https://ena-docs.readthedocs.io/en/latest/submit/assembly/genome.html#assigned-accession-numbers This means there is one additional state in the assembly_table: WAITING.
  • It also adds some basic unit tests for sub-functions used by create_assembly.

High level overview of assembly_creation:

In a loop:

  1. Get sequences in submission_table in state SUBMITTED_SAMPLE
  • if (there exists an entry in the sample_table for the corresponding (accession, version)):
    -- if (entry is in status SUBMITTED): update submission_table to SUBMITTED_ASSEMBLY.
    -- else: update submission_table to SUBMITTING_ASSEMBLY.
  • else: create assembly entry in assembly_table for (accession, version).
  1. Get sequences in submission_table in state SUBMITTING_SAMPLE
  • if (corresponding assembly_table entry is in state SUBMITTED): update entries to state SUBMITTED_ASSEMBLY.
  1. Get sequences in assembly_table in state READY, prepare files: we need chromosome_list, fasta files and a manifest file, set status to WAITING
  • if (submission succeeds): set status to WAITING and fill in results: ena-internal "erz_accession"
  • else: set status to HAS_ERRORS and fill in errors
  1. Get sequences in assembly_table in state WAITING, every 5minutes (to not overload ENA) check if ENA has processed the assemblies and assigned them "gca_accession". If so update the table to status SUBMITTED and fill in results
  2. Get sequences in assembly_table in state HAS_ERRORS for over 15min and sequences in status SUBMITTING for over 15min, or in state WAITING for over 48hours: #TODO (handle failure ena-submission: Recover from failed project/sample/assembly submission #2311), currently just throw an error

ENA Assembly

As we are submitting https://ena-docs.readthedocs.io/en/latest/submit/assembly/genome.html#chromosome-assembly
We need to submit 3 files for each (multi-segmented) sequence:

  1. Manifest files: https://ena-docs.readthedocs.io/en/latest/submit/assembly/genome.html#manifest-files, here there are some mandatory metadata fields which we must set, I map the fields as follows:
ASSEMBLY_TYPE: default=ISOLATE (mandatory)
PROGRAM: loculus_field=sequencing_instrument (mandatory, default=Unknown)
PLATFORM: loculus_field=sequencing_protocol (mandatory, default=Unknown)
COVERAGE: loculus_field=depth_of_coverage (mandatory AND non-negative float, default=1)
MOLECULETYPE: MoleculeType(loculus_field=molecule_type), default=None
  1. Chromosome List Files: https://ena-docs.readthedocs.io/en/latest/submit/fileprep/assembly.html#chromosome-list-file
  2. Fasta Files

PR Checklist

image
  • Test submission locally on ENA dev instance
  • Are able to query ENA for state of assembly submission: Automate querying ENA for state of assembly submission #2408
  • Test submission on preview on ENA dev instance - as seen in create_ena_samples the ENA dev instance is still down, preventing sample submission: I can however verify slack notifications work:
image I can trigger also trigger assembly creation by manually setting the sra_run_accession and the biosample_accession to an already existing sample: "sra_run_accession": "ERS20170050", "biosample_accession": "SAMEA115670243". This leads the sample submission to succeed and the assembly submission to also succeed and go into state waiting: image

here I can again set the table to the erz_accession of an already existing sample to verify check_ena works for getting the gca_accessions after processing:
image

this then puts the submission_table in status SUBMITTED_ALL

@anna-parker anna-parker force-pushed the create_ena_assembly branch from 7e0adae to 1c378aa Compare August 9, 2024 14:01
@anna-parker anna-parker marked this pull request as ready for review August 11, 2024 11:34
@anna-parker anna-parker changed the title Create ena assembly feat(ena-submission): Create ena assembly Aug 12, 2024
@anna-parker anna-parker force-pushed the create_ena_assembly branch 3 times, most recently from 750fc67 to 8968dfa Compare August 14, 2024 07:42
@anna-parker anna-parker added preview Triggers a deployment to argocd and removed preview Triggers a deployment to argocd labels Aug 14, 2024
@corneliusroemer corneliusroemer added the deposition related to ENA/INSDC deposition label Aug 29, 2024
@anna-parker anna-parker added the preview Triggers a deployment to argocd label Aug 29, 2024
@anna-parker anna-parker removed the preview Triggers a deployment to argocd label Aug 29, 2024
@corneliusroemer corneliusroemer added preview Triggers a deployment to argocd and removed preview Triggers a deployment to argocd labels Sep 16, 2024
@anna-parker
Copy link
Contributor Author

Closing as this was part of #2417

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
deposition related to ENA/INSDC deposition
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ENA Submission: Create project
2 participants