Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Variable metadata of some ingested tabular files is inaccessbile, can't be downloaded or used for explore tools #5887

Closed
jggautier opened this issue May 28, 2019 · 6 comments

Comments

@jggautier
Copy link
Contributor

jggautier commented May 28, 2019

A user reported that Data Explorer couldn't open this tabular data file on Harvard Dataverse (v4.14): https://doi.org/10.7910/DVN/GDF6Z0/JPMOZZ. I opened an issue about this in the Data Explorer GitHub repo (scholarsportal/Dataverse-Data-Explorer#15) and @stevenworthington wrote that the file's variable metadata is inaccessible.

The metadata for this tabular data file (https://doi.org/10.7910/DVN/N4TMYQ/OTIAGM) is also inaccessible and can't be explored.

This bug makes the download counts of at least these two files misleading, since Dataverse is counting the number of times Explore is pressed, even though the user never gets to explore the file. I don't know how many other files have this problem.

@jggautier
Copy link
Contributor Author

@sbarbosadataverse and I thought at first that the problem was related to the tabular file's large number of observations or variables or large file size, but there doesn't seem to be a problem with other tabular files with even more observations/variables and larger file sizes. The original content type also doesn't seem to be a factor.

@jggautier jggautier changed the title Variable metadata of ingested tabular files is inaccessbile, can't be downloaded or used for explore tools Variable metadata of some ingested tabular files is inaccessbile, can't be downloaded or used for explore tools May 28, 2019
@jggautier
Copy link
Contributor Author

jggautier commented Jan 27, 2022

Some more info about this issue after bumping into it today:
I'm still not able to get the variable level metadata of those two files by going to the file page, clicking the "Access File" menu and choosing the "Variable Metadata" option.

But both files' variable level metadata are in their datasets' DDI metadata exports. For the file at https://doi.org/10.7910/DVN/GDF6Z0/JPMOZZ, see its dataset's DDI XML export at https://dataverse.harvard.edu/api/datasets/export?exporter=ddi&persistentId=doi:10.7910/DVN/GDF6Z0 or the more human-friendly DDI HTML export at https://dataverse.harvard.edu/api/datasets/export?exporter=html&persistentId=doi:10.7910/DVN/GDF6Z0.

(I was reminded of this issue when talking with folks who are working on importing into another system certain dataset metadata from Harvard Dataverse Repository.)

@qqmyers
Copy link
Member

qqmyers commented Jan 27, 2022

FWIW: It's getting a 504 from the load balancer, so one issue is just that it is timing out. Whether it would complete and download OK with more time or not is another question.

@jggautier
Copy link
Contributor Author

This is just an update that the GitHub issue at scholarsportal/Dataverse-Data-Explorer#15 was closed and a new one opened at scholarsportal/dataverse-data-explorer-v2#8, in the GitHub repo for version 2 of Data Explorer.

That issue is about making sure that when the metadata is unavailable, Data Explorer displays an error message.

@jggautier
Copy link
Contributor Author

I've revisited this GitHub issue as part of an effort to review and prioritize work proposed in GitHub issues in the IQSS/Dataverse repo that have been opened for years (IQSS/dataverse-pm#114).

When I opened this GitHub issue and tried to help troubleshoot, I wrote that "there doesn't seem to be a problem with other tabular files with even more observations/variables and larger file sizes", but I didn't include links to those other tabular files. Wish I had.

I think it would be helpful to see if the variable metadata of ingested tabular files is inaccessible because of how big the file is. One way I'd do this is to try to find ingested tabular files that are bigger (byte size, number of rows/columns) than the ingested tabular file I mentioned earlier in this GitHub issue.

@cmbz
Copy link

cmbz commented Aug 20, 2024

To focus on the most important features and bugs, we are closing issues created before 2020 (version 5.0) that are not new feature requests with the label 'Type: Feature'.

If you created this issue and you feel the team should revisit this decision, please reopen the issue and leave a comment.

@cmbz cmbz closed this as completed Aug 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants