Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Export Phyx view as CSV #238

Merged
merged 8 commits into from
Aug 21, 2022
Merged

Export Phyx view as CSV #238

merged 8 commits into from
Aug 21, 2022

Conversation

gaurav
Copy link
Member

@gaurav gaurav commented Feb 16, 2022

This PR adds an "Export as CSV" button to the Phyloref summary in the Phyx view so that that table can be exported as a CSV file (see attached screenshot). The generated CSV file expands some of the fields in the visible table (see example:
Alligatoridae_Alligatorinae_and_4_others.csv from the screenshot below). To implement this, I moved the download filename generation code into the store so that it could be used in multiple places in this application.

If the phyloreference resolves to an unlabelled node, this will appear in the exported CSV as "(unlabelled)" (see example CSV).

This PR also includes an unrelated fix that I noticed (getDefaultNomenCodeURI() referred to NAME_IN_UNKNOWN_CODE instead of UNKNOWN_CODE as the constant is currently named).

Closes #225.

Screenshot of the new button:
Screen Shot 2022-04-26 at 9 22 47 PM

@gaurav gaurav marked this pull request as ready for review March 28, 2022 02:33
@gaurav gaurav requested a review from hlapp March 28, 2022 02:33
Copy link
Member

@hlapp hlapp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing that looks obviously questionable here, but I don't see a screenshot of how this would appear, and I don't see an example csv that this would produce. So kind of hard to confidently give 👍🏻.

@gaurav
Copy link
Member Author

gaurav commented Apr 27, 2022

Nothing that looks obviously questionable here, but I don't see a screenshot of how this would appear, and I don't see an example csv that this would produce. So kind of hard to confidently give 👍🏻.

Good catch! I've updated the PR description with a screenshot and a link to two example CSVs.

@gaurav gaurav requested a review from hlapp April 27, 2022 01:31
Copy link
Member

@hlapp hlapp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UI appearance looks fine. As for the CSV, my one gripe with it is that the first column pretends that phyloreferences are identified by label, which of course they aren't. My suggestion would be to have two columns for Phyloreference ID and label, respectively, or if only one column at the very least be clear in the column label that the value is a label.

@gaurav
Copy link
Member Author

gaurav commented Jun 15, 2022

Fixed in d3f7582. The file now includes a "Phyloreference ID" field -- if an @id is set, it will be used; otherwise, each phyloreference is referred to as #phyloref0, #phyloref1, and so on. See example CSV output in Alligatoridae_Alligatorinae_and_4_others.csv.

@gaurav gaurav requested a review from hlapp June 15, 2022 04:27
Copy link
Member

@hlapp hlapp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good with the ID present now.

Two remaining quibbles:

  1. We're not including the type of specifier, so are leaving it up to the reader of the CSV to either assume it's a taxon name, or to apply some guessing logic. However, we do know the type, don't we? Same for expected and actual. Or maybe all we're saying is that these are node labels?
  2. It seems that the number of internal and external specifier columns is not fixed, but dynamic depending on however many the phyloreference with the most such specifiers happens to use. I suppose that's unavoidable when flattening out to CSV. But wouldn't it be better to have the columns whose number is dynamic appear last in the column order, rather than in the middle?

I think both of these are minor quibbles, and rather than holding up this PR should probably rather be turned into issues, which is why I'm approving.

TaxonomicUnitWrapper has the ability to check the type of the specifier,
to wrap it appropriately depending on the type and then to get the label
from the correct wrapper.
@gaurav
Copy link
Member Author

gaurav commented Aug 21, 2022

We're not including the type of specifier, so are leaving it up to the reader of the CSV to either assume it's a taxon name, or to apply some guessing logic. However, we do know the type, don't we? Same for expected and actual. Or maybe all we're saying is that these are node labels?

Ah, good catch! That's a bug on my part -- I wrote TaxonConceptWrapper instead of TaxonomicUnitWrapper, which has the logic for displaying specimens (with the phrase Specimen [occurrence ID]) or an external reference (in the format <url>). Fixed in 41dd4cc. I can't really test this since our only current test file only uses taxon names, but I've made a note on #232 to check the CSV export once we have more complicated phyloreferences in here.

It seems that the number of internal and external specifier columns is not fixed, but dynamic depending on however many the phyloreference with the most such specifiers happens to use. I suppose that's unavoidable when flattening out to CSV. But wouldn't it be better to have the columns whose number is dynamic appear last in the column order, rather than in the middle?

In theory, the phylogeny columns are also dynamic, as you are allowed to have multiple phylogenies in a single Phyx file. If there were multiple phylogenies in Alligatoridae_Alligatorinae_and_4_others.csv, each would be listed after all the other columns with the columns [phylogeny label] expected and [phylogeny label] actual. So even if I were to move them to before the specifier columns, they would still require some processing to be interpreted correctly.

I'll go ahead and merge this PR, but please do open an issues if you think we should output the phylogeny resolutions in CSV in another format.

@gaurav gaurav merged commit 9a938ab into master Aug 21, 2022
@gaurav gaurav deleted the export-phyx-view-as-csv branch August 21, 2022 04:40
@hlapp
Copy link
Member

hlapp commented Aug 21, 2022

In theory, the phylogeny columns are also dynamic, as you are allowed to have multiple phylogenies in a single Phyx file.

Good point. So I think the main thing left is to document the CSV format, either in plain text, or for example in LinkML, so that downstream consumers know what to expect? Should that become an issue in the tracker so we don't forget?

@gaurav
Copy link
Member Author

gaurav commented Aug 22, 2022

Good point. So I think the main thing left is to document the CSV format, either in plain text, or for example in LinkML, so that downstream consumers know what to expect? Should that become an issue in the tracker so we don't forget?

Sounds good! I'm not sure if LinkML supports dynamic column names in this way, but we should definitely document this somewhere. Filed as #257.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add ability to export the Phyx View as a CSV
2 participants