nextclade run with optional dataset tag input #1521

jrotieno · 2024-09-11T09:29:03Z

The nextclade run --dataset-name xx is great and uses by default the latest dataset tag. However, as mentioned in the documentation, "If a new version of the dataset is released between two runs, they will use different versions of the dataset and may produce different results."

It would be nice to have an optional --dataset-tag input where one can specify the specific tag for a dataset name when they don't want to use the default latest.

The text was updated successfully, but these errors were encountered:

ivan-aksamentov · 2024-09-11T09:45:37Z

Hi @jrotieno,

Makes sense! I cannot give any promises that this will be implemented though, due to lack of time. (Contributions are always welcome!)

For the time being, if you need to use the same version repeatedly - you could use the 2-stage flow with nextclade dataset get followed by one or more nextclade run with --input-dataset.

jrotieno · 2024-09-11T09:54:39Z

Many thanks @ivan-aksamentov for the swift response!

We have been using the 2-stage flow. However, I wanted to also have the option of running nextclade sort that would predict the dataset name, and do nextclade run with the predicted dataset name and an optional dataset tag.

But thinking about it again, it may not be the most sensible to predict a name and use a fixed tag as the tags are not universal across all datasets.

Nonetheless, will be nice to see the run and tag option in the future!

jrotieno · 2024-09-11T09:57:23Z

While here, is there a way to output the dataset tag used from a nextclade run command with an input dataset name?

ivan-aksamentov · 2024-09-11T10:07:47Z

Hmm... I don't think so. Not currently.

The name you already know when downloading or running - you'd have to write it down somewhere, e.g. in a txt or json file.

If you've downloaded the dataset, then the tag you can find in the dataset directory in pathogen.json file at the property .version.tag (example). Or if you downloaded a particuar tag, you can also write it down.

There were previously feature requests to add dataset name, tag and other things (like time of the run) to the outputs, however it's bit tricky, because you can also run nextclade without any dataset or with a third-party dataset, and in this case there might be no name or tag at all. Names and tags are really only ensured for our official datasets and our official dataset server where we enforce this particular versioning scheme.

I agree that it might be useful, and we need to contemplate about these things a little more sometimes.

jrotieno · 2024-09-11T11:00:58Z

For anyone following, my tentative solution is nextclade dataset list --name $dataset_name --json for the predicted dataset name, then parsing the output for the default version under the "version" block. Note however, the manual says

--json
Print output in JSON format.

      This is useful for automated processing. However, at this time, we cannot guarantee stability of the format. Use at own risk.

jrotieno added good first issue Good for newcomers help wanted Extra attention is needed needs triage Mark for review and label assignment t:feat Type: request of a new feature, functionality, enchancement labels Sep 11, 2024

ivan-aksamentov removed the needs triage Mark for review and label assignment label Sep 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nextclade run with optional dataset tag input #1521

nextclade run with optional dataset tag input #1521

jrotieno commented Sep 11, 2024

ivan-aksamentov commented Sep 11, 2024

jrotieno commented Sep 11, 2024

jrotieno commented Sep 11, 2024

ivan-aksamentov commented Sep 11, 2024 •

edited

Loading

jrotieno commented Sep 11, 2024 •

edited

Loading

nextclade run with optional dataset tag input #1521

nextclade run with optional dataset tag input #1521

Comments

jrotieno commented Sep 11, 2024

ivan-aksamentov commented Sep 11, 2024

jrotieno commented Sep 11, 2024

jrotieno commented Sep 11, 2024

ivan-aksamentov commented Sep 11, 2024 • edited Loading

jrotieno commented Sep 11, 2024 • edited Loading

ivan-aksamentov commented Sep 11, 2024 •

edited

Loading

jrotieno commented Sep 11, 2024 •

edited

Loading