Export Phyx view as CSV #238

gaurav · 2022-02-16T05:57:00Z

This PR adds an "Export as CSV" button to the Phyloref summary in the Phyx view so that that table can be exported as a CSV file (see attached screenshot). The generated CSV file expands some of the fields in the visible table (see example:
Alligatoridae_Alligatorinae_and_4_others.csv from the screenshot below). To implement this, I moved the download filename generation code into the store so that it could be used in multiple places in this application.

If the phyloreference resolves to an unlabelled node, this will appear in the exported CSV as "(unlabelled)" (see example CSV).

This PR also includes an unrelated fix that I noticed (getDefaultNomenCodeURI() referred to NAME_IN_UNKNOWN_CODE instead of UNKNOWN_CODE as the constant is currently named).

Closes #225.

Screenshot of the new button:

hlapp

Nothing that looks obviously questionable here, but I don't see a screenshot of how this would appear, and I don't see an example csv that this would produce. So kind of hard to confidently give 👍🏻.

gaurav · 2022-04-27T01:31:14Z

Nothing that looks obviously questionable here, but I don't see a screenshot of how this would appear, and I don't see an example csv that this would produce. So kind of hard to confidently give 👍🏻.

Good catch! I've updated the PR description with a screenshot and a link to two example CSVs.

hlapp

UI appearance looks fine. As for the CSV, my one gripe with it is that the first column pretends that phyloreferences are identified by label, which of course they aren't. My suggestion would be to have two columns for Phyloreference ID and label, respectively, or if only one column at the very least be clear in the column label that the value is a label.

gaurav · 2022-06-15T04:11:01Z

Fixed in d3f7582. The file now includes a "Phyloreference ID" field -- if an @id is set, it will be used; otherwise, each phyloreference is referred to as #phyloref0, #phyloref1, and so on. See example CSV output in Alligatoridae_Alligatorinae_and_4_others.csv.

hlapp

Looks good with the ID present now.

Two remaining quibbles:

We're not including the type of specifier, so are leaving it up to the reader of the CSV to either assume it's a taxon name, or to apply some guessing logic. However, we do know the type, don't we? Same for expected and actual. Or maybe all we're saying is that these are node labels?
It seems that the number of internal and external specifier columns is not fixed, but dynamic depending on however many the phyloreference with the most such specifiers happens to use. I suppose that's unavoidable when flattening out to CSV. But wouldn't it be better to have the columns whose number is dynamic appear last in the column order, rather than in the middle?

I think both of these are minor quibbles, and rather than holding up this PR should probably rather be turned into issues, which is why I'm approving.

TaxonomicUnitWrapper has the ability to check the type of the specifier, to wrap it appropriately depending on the type and then to get the label from the correct wrapper.

gaurav · 2022-08-21T04:40:27Z

We're not including the type of specifier, so are leaving it up to the reader of the CSV to either assume it's a taxon name, or to apply some guessing logic. However, we do know the type, don't we? Same for expected and actual. Or maybe all we're saying is that these are node labels?

Ah, good catch! That's a bug on my part -- I wrote TaxonConceptWrapper instead of TaxonomicUnitWrapper, which has the logic for displaying specimens (with the phrase Specimen [occurrence ID]) or an external reference (in the format <url>). Fixed in 41dd4cc. I can't really test this since our only current test file only uses taxon names, but I've made a note on #232 to check the CSV export once we have more complicated phyloreferences in here.

It seems that the number of internal and external specifier columns is not fixed, but dynamic depending on however many the phyloreference with the most such specifiers happens to use. I suppose that's unavoidable when flattening out to CSV. But wouldn't it be better to have the columns whose number is dynamic appear last in the column order, rather than in the middle?

In theory, the phylogeny columns are also dynamic, as you are allowed to have multiple phylogenies in a single Phyx file. If there were multiple phylogenies in Alligatoridae_Alligatorinae_and_4_others.csv, each would be listed after all the other columns with the columns [phylogeny label] expected and [phylogeny label] actual. So even if I were to move them to before the specifier columns, they would still require some processing to be interpreted correctly.

I'll go ahead and merge this PR, but please do open an issues if you think we should output the phylogeny resolutions in CSV in another format.

hlapp · 2022-08-21T20:03:08Z

In theory, the phylogeny columns are also dynamic, as you are allowed to have multiple phylogenies in a single Phyx file.

Good point. So I think the main thing left is to document the CSV format, either in plain text, or for example in LinkML, so that downstream consumers know what to expect? Should that become an issue in the tracker so we don't forget?

gaurav · 2022-08-22T03:18:18Z

Good point. So I think the main thing left is to document the CSV format, either in plain text, or for example in LinkML, so that downstream consumers know what to expect? Should that become an issue in the tracker so we don't forget?

Sounds good! I'm not sure if LinkML supports dynamic column names in this way, but we should definitely document this somewhere. Filed as #257.

gaurav added 5 commits February 16, 2022 00:21

First stab at a Phyx view export (#225).

8473dd1

Added saveAs() functionality to CSV export.

bd5564d

Moved download filename into store for sharing.

ed34bfd

Tweaked code to meet code style guidelines.

66b642b

Fixed bug in which UTF-8 BOM is added to output.

87ad3ac

gaurav marked this pull request as ready for review March 28, 2022 02:33

gaurav requested a review from hlapp March 28, 2022 02:33

hlapp reviewed Apr 8, 2022

View reviewed changes

gaurav requested a review from hlapp April 27, 2022 01:31

hlapp requested changes Apr 27, 2022

View reviewed changes

gaurav added 2 commits June 15, 2022 00:07

Added Phyloreference ID to CSV output.

d3f7582

Added IDs to brochu_2003.json so we don't just export #phyloref0, etc.

6dc77a6

gaurav requested a review from hlapp June 15, 2022 04:27

hlapp approved these changes Jun 15, 2022

View reviewed changes

Replaced TaxonConceptWrapper with TaxonomicUnitWrapper.

41dd4cc

TaxonomicUnitWrapper has the ability to check the type of the specifier, to wrap it appropriately depending on the type and then to get the label from the correct wrapper.

gaurav merged commit 9a938ab into master Aug 21, 2022

gaurav deleted the export-phyx-view-as-csv branch August 21, 2022 04:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Export Phyx view as CSV #238

Export Phyx view as CSV #238

gaurav commented Feb 16, 2022 •

edited

Loading

hlapp left a comment

gaurav commented Apr 27, 2022

hlapp left a comment •

edited

Loading

gaurav commented Jun 15, 2022

hlapp left a comment

gaurav commented Aug 21, 2022

hlapp commented Aug 21, 2022

gaurav commented Aug 22, 2022

Export Phyx view as CSV #238

Export Phyx view as CSV #238

Conversation

gaurav commented Feb 16, 2022 • edited Loading

hlapp left a comment

Choose a reason for hiding this comment

gaurav commented Apr 27, 2022

hlapp left a comment • edited Loading

Choose a reason for hiding this comment

gaurav commented Jun 15, 2022

hlapp left a comment

Choose a reason for hiding this comment

gaurav commented Aug 21, 2022

hlapp commented Aug 21, 2022

gaurav commented Aug 22, 2022

gaurav commented Feb 16, 2022 •

edited

Loading

hlapp left a comment •

edited

Loading