Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Differences between genomics readme and automatically generated tsv #13

Open
somathias opened this issue Dec 18, 2024 · 5 comments
Open

Comments

@somathias
Copy link
Collaborator

somathias commented Dec 18, 2024

There are currently the following differences between the information in the genomics readme (which is based on interactive submissions to ENA using the Webin portal) and the automatically generated .tsv (and json) that are based on the information pulled from ENA's xsd files.

  • name vs alias: different labels for the same field?
  • VIB's scripts pull a dedicated title field in addition to name/alias. Needed?
  • file_format vs file_type: different labels for the same field?
  • the fields are listed in a different order
  • different description texts

How to harmonize?

@YvonneKallberg
Copy link
Collaborator

It is the automatically generated .tsv that can be regarded as sustainable, since it will be possible to keep the metadata up to date in the long run. However, thinking from end user / researcher point of view, it might be important that names and descriptions are the same as if the user would do an interactive submission without the help of our templates.

@YvonneKallberg
Copy link
Collaborator

name is not the same as alias:

  • The name will be shown when data is public, while the alias is a unique identifier that can be used internally.
  • The alias is e.g. used when doing programmatic submission (where experiment and run metadata are located in two .xml files), in order to link a specific run with a specific experiment.
  • An alias is thus not needed when doing interactive submission via browser (using tsv files) (but I do think that ENA creates an alias for you, when you submit).

However, I'm almost 100% that name and title is the same (without checking description text of the browser versus .xsd sources).

@somathias
Copy link
Collaborator Author

name is not the same as alias:

  • The name will be shown when data is public, while the alias is a unique identifier that can be used internally.
  • The alias is e.g. used when doing programmatic submission (where experiment and run metadata are located in two .xml files), in order to link a specific run with a specific experiment.
  • An alias is thus not needed when doing interactive submission via browser (using tsv files) (but I do think that ENA creates an alias for you, when you submit).

However, I'm almost 100% that name and title is the same (without checking description text of the browser versus .xsd sources).

thanks for the clarification @YvonneKallberg. Does this mean that the tsv we generate should include the name/title field instead of the alias field? If yes, do we use name or title?

title has the description "Short text that can be used to call out experiment records in searches or in displays. This element is technically optional but should be used for all new records." from here: https://github.com/enasequence/webin-xml/blob/8e764aab17b0557a786d72eb18cbe07a4121142f/src/main/resources/uk/ac/ebi/ena/sra/schema/SRA.experiment.xsd#L719

@YvonneKallberg
Copy link
Collaborator

I would say yes on inclusion of name/title, and would call it title for 2 reasons

  1. when I look at downloaded tsv templates from ENA they don't have this column (I think I got the name field (both the actual field as well as the name 'name') from documentation and it has stuck with me throughout submissions)
  2. 'title' is what we get automatically which make life easier.

@YvonneKallberg
Copy link
Collaborator

Another difference I've noticed is study and sample fields:

  • study_alias from automated but study from interactive (the latter can be populated with either an accession number or an alias)
  • sample_alias from automated but sample from interactive (the latter can be populated with either an accession number or an alias)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants