Skip to content

Commit

Permalink
Add comment when reingest causes grouping to change and old sequences…
Browse files Browse the repository at this point in the history
… to be revoked
  • Loading branch information
anna-parker committed Aug 19, 2024
1 parent a9fa920 commit a8cca4b
Show file tree
Hide file tree
Showing 3 changed files with 46 additions and 17 deletions.
20 changes: 17 additions & 3 deletions ingest/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,16 +54,17 @@ We group segments by adding a `joint_accession` field to the metadata which cons

Before uploading new sequences, the pipeline queries the Loculus backend for the status and hash of all previously submitted sequences. This is done to avoid uploading sequences that have already been submitted and have not changed. Furthermore, only accessions whose highest version is in status `APPROVED_FOR_RELEASE` can be updated through revision. Entries in other states cannot currently be updated (TODO: Potentially use `/submit-edited-data` endpoint to allow updating entries in more states).

Hashes and statuses are used to triage sequences into 4 categories which determine the action to be taken:
Hashes and statuses are used to triage sequences into 5 categories which determine the action to be taken:

- `submit`: Sequences that have not been submitted before
- `revise`: Sequences that have been submitted before and have changed
- `no_change`: Sequences that have been submitted before and have not changed
- `blocked`: Sequences that have been submitted before but are not in a state that allows updating
- `revoke`: Only for multi-segmented viruses, these are sequences that were submitted before but have now changed their segment grouping. This means the previously submitted segment-grouping needs to be revoked and the new grouping submitted.

### Uploading sequences to Loculus

Depending on the triage category, sequences are either submitted as new entries or revised.
Depending on the triage category, sequences are either submitted as new entries or revised. Furthermore, for multi-segmented organisms where reingest has found grouping changes, maintainers can [trigger](#approve-revocations) the `regroup_and_revoke` rule which revokes the sequences with incorrectly grouped segments and submits sequences with the new segment grouping. We currently do not automate sequence revocation.

### Approving sequences in status `WAITING_FOR_APPROVAL`

Expand Down Expand Up @@ -101,6 +102,19 @@ We use the Snakemake workflow management system which also uses different config

TLDR: The `Snakefile` contains workflows defined as rules with required input and expected output files. By default Snakemake takes the first rule as the target one and then constructs a graph of dependencies (a DAG) required to produce the expected output of the first rule. The target rule can be specified using `snakemake {rule}`

## Approve Revocations

You might be notified that the ingest pipeline would like to regroup segments of multi-segmented organisms, making the previous grouping obsolete. In this case the old segment-grouping needs to be revoked and the new one added. We do not automate this process yet in case of reingest issues leading to mass revocation of sequences. However, if you approve with the proposed revocation you can use the snakemake rule `regroup_and_revoke` to perform this operation.

In this case you need to find the pod that sent you the notification, it will always be for one organism. Then you need to stop the snakemake processes in that pod and run the `regroup_and_revoke` rule:

```
kubectl exec -it $POD -c ingest-{organism} -- sh -c "touch /path/to/your/working/directory/.snakemake/.stopme"
kubectl exec -it $POD -c ingest-{organism} -- /opt/conda/bin/snakemake results/revoked results/approved
```

You don't need to restart the previous rules as the ingest cronjob will rerun these jobs again.

## Local Development

Install micromamba, if you are on a mac:
Expand Down Expand Up @@ -132,7 +146,7 @@ cp ../temp/ingest-config.{organism}.yaml config/config.yaml

Then run snakemake using `snakemake` or `snakemake {rule}`.

Note that by default the pipeline will submit sequences to main. If you want to change this to another branch (that has a preview tag) you can modify the `backend_url` and `keycloak_token_url` arguments in the `config.yaml` file. They are of the form `https://backend-{branch_name}.loculus.org/` and `https://authentication-{branch_name}.loculus.org`. Alternatively, if you are running the backend locally you can also specify the local backend port: `http://localhost:8079` and the local keycloak port: `http://localhost:8083`.
Note that by default the pipeline will submit sequences to main. If you want to change this to another branch (that has a preview tag) you can modify the `backend_url` and `keycloak_token_url` arguments in the `config.yaml` file. They are of the form `https://backend-{branch_name}.loculus.org/` and `https://authentication-{branch_name}.loculus.org`. Alternatively, if you are running the backend locally you can also specify the local backend port: `http://localhost:8079` and the local keyclock port: `http://localhost:8083`.

The ingest pipeline requires config files, found in the directory `config`. The `default.yaml` contains default values which will be overridden by the `config.yaml`. To produce the `config.yaml` used in production you can run `../generate_local_test_config.sh` and then copy the configs from the pathogen to the `config.yaml`.

Expand Down
37 changes: 26 additions & 11 deletions ingest/scripts/call_loculus.py
Original file line number Diff line number Diff line change
Expand Up @@ -214,21 +214,36 @@ def regroup_and_revoke(metadata, sequences, map, config: Config, group_id):
Submit segments in new sequence groups and revoke segments in old (incorrect) groups in Loculus.
"""
response = submit_or_revise(metadata, sequences, config, group_id, mode="submit")
new_accessions = response[0]["accession"] # Will be later added as version comment

url = f"{organism_url(config)}/revoke"
new_accessions = {} # Map from submissionId to new loculus accession
for item in response:
new_accessions[item["submissionId"]] = item["accession"]

to_revoke = json.load(open(map, encoding="utf-8"))

loc_values = {loc for seq in to_revoke.values() for loc in seq.keys()}
loculus_accessions = set(loc_values)
old_loculus_keys: dict[
str, list[str]
] = {} # Map from old loculus accession to corresponding new accession(s)
for key, value in to_revoke.items():
for loc_accession in value:
all = old_loculus_keys.get(loc_accession, [])
all.append(new_accessions[key])
old_loculus_keys[loc_accession] = all

accessions = {"accessions": list(loculus_accessions)}

response = make_request(HTTPMethod.POST, url, config, json_body=accessions)
logger.debug(f"revocation response: {response.json()}")
url = f"{organism_url(config)}/revoke"
responses = []
for old_loc_accession, new_loc_accession in old_loculus_keys.items():
logger.debug(f"revoking: {old_loc_accession}")
comment = (
"INSDC re-ingest found metadata changes, these changes lead the segments in this "
"sequence to be grouped differently, the newly grouped sequences can be found "
f"here: ${new_loc_accession}."
)
body = {"accessions": [old_loc_accession], "comment": comment}
response = make_request(HTTPMethod.POST, url, config, json_body=body)
logger.debug(f"revocation response: {response.json()}")
responses.append(response.json())

return response.json()
return responses


def approve(config: Config):
Expand Down Expand Up @@ -335,6 +350,7 @@ def get_submitted(config: Config):
hash_value = original_metadata.get("hash", "")
if config.segmented:
insdc_accessions = [original_metadata[key] for key in insdc_key]
insdc_accessions = [accession for accession in insdc_accessions if accession]
joint_accession = "/".join(
[
f"{original_metadata[key]}.{segment}"
Expand Down Expand Up @@ -368,7 +384,6 @@ def get_submitted(config: Config):
"version": loculus_version,
"hash": hash_value,
"status": statuses[loculus_accession][loculus_version],
"joint_accession": joint_accession,
}
)

Expand Down
6 changes: 3 additions & 3 deletions ingest/scripts/prepare_files.py
Original file line number Diff line number Diff line change
Expand Up @@ -51,9 +51,9 @@ def revocation_notification(config: Config, to_revoke: dict[str, dict[str, str]]
"""Send slack notification with revocation details"""
text = (
f"{config.backend_url}: Ingest pipeline wants to add the following sequences"
f" which will lead to revocations: {to_revoke}. "
"If you agree with this run the regroup_and_revoke rule in the ingest pod:"
" `kubectl exec -it INGEST_POD_NAME -- snakemake regroup_and_revoke`."
f" which will lead to revocations: {to_revoke}. If you agree with this run the "
"regroup_and_revoke rule in the ingest pod: following the instructions in "
"https://github.com/loculus-project/loculus/blob/main/ingest/README.md#approve-revocations."
)
notify(config, text)

Expand Down

0 comments on commit a8cca4b

Please sign in to comment.