Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade Data Explorer to v2 #225

Closed
landreev opened this issue Jun 12, 2023 · 9 comments
Closed

Upgrade Data Explorer to v2 #225

landreev opened this issue Jun 12, 2023 · 9 comments
Assignees
Labels
Size: 10 A percentage of a sprint.

Comments

@landreev
Copy link
Collaborator

Will need to be tested carefully to make sure all the new features are working.

@cmbz cmbz moved this to Harvard Dataverse Instance (Sonia) in IQSS Dataverse Project Jun 26, 2023
@cmbz cmbz moved this from Harvard Dataverse Instance (Sonia) to SPRINT- NEEDS SIZING in IQSS Dataverse Project Jul 31, 2023
@cmbz cmbz added the Size: 3 A percentage of a sprint. label Aug 9, 2023
@cmbz cmbz moved this from SPRINT- NEEDS SIZING to SPRINT READY in IQSS Dataverse Project Aug 9, 2023
@cmbz cmbz moved this from SPRINT READY to Clear of the Backlog in IQSS Dataverse Project Oct 2, 2023
@landreev
Copy link
Collaborator Author

landreev commented Oct 18, 2023

It's in prod. now. Scholar Portal requests that the tool is hosted locally (with the old version, v1, we were just using it hosted at https://scholarsportal.github.io/Dataverse-Data-Explorer/), so it is installed statically under Apache on both server nodes.
I did save the time it would require to build the tool from sources by using a snapshot of the installation at UNC that @donsizemore kindly shared.
There are some questions already, about how categories are displayed (for discreet String vars, for ex., when categories are not explicitly defined in the metadata). So I'll keep this issue open for a while longer, to collect feedback/see if anything needs to be fixed.

@scolapasta scolapasta added the Status: Needs Input Applied to issues in need of input from someone currently unavailable label Oct 19, 2023
@landreev
Copy link
Collaborator Author

Hmm. So the problem with categories appears to be real and specific to our installation.
Here's the screenshot of the "View Categories" popup from the UNC installation for the file https://holodeck.irss.unc.edu/dataverse-data-explorer-v2/index.html?fileId=7504327&fileMetadataId=3782249&dvLocale=en&siteUrl=https://dataverse.unc.edu, variable zipcode:
Screen Shot 2023-10-27 at 1 54 39 PM
Here's the DDI metadata record for the variable:

<var ID="v39982342" name="zipcode" intrvl="discrete">
  <location fileid="f7504327"/>
  <labl level="variable">
     D.11 What is your five-digit ZIP code at your home address? [IF ZIP GIVEN IS INV
   </labl>
   <varFormat type="character"/>
   <notes subject="Universal Numeric Fingerprint" level="variable" type="VDC:UNF">UNF:6:a2XEqpOGWuHGFTuQMUEAGQ==</notes>
</var>

i.e., the categories are not pre-defined in the metadata - this is a "simple" character variable. The categories as shown in the screenshot above must be calculated on the fly.

Here's a similar variable in Phil's dataset:

   <var ID="v19500785" name="language" intrvl="discrete">
   <location fileid="f3371438"/>
   <labl level="variable">language</labl>
   <varFormat type="character"/>
   <notes subject="Universal Numeric Fingerprint" level="variable" type="VDC:UNF">UNF:6:2UG0lAfsl9idD6tBDK4E9A==</notes>
</var>

... but, attempting to get a view of the categories in our instance of Data Explorer results in an empty box:
Screen Shot 2023-10-27 at 2 09 44 PM

Whatever it is, it must be happening in the browser/javascript - according to the access log, the tool successfully downloaded the data column for the variable from the tab file, so it got everything it needs to generate the list of unique values etc. ...
It does appear that this is not unique to Explorer v2, that the same thing is observed in v1 (still installed in parallel in prod.)

So what do we do with this issue?

@pdurbin
Copy link
Member

pdurbin commented Oct 27, 2023

It's weird, if I manually hack the tool URL and add my API token like this...

https://dataverse.harvard.edu/dataverse-data-explorer-v2/?fileId=6867331&fileMetadataId=6747643&dvLocale=en&siteUrl=https://dataverse.harvard.edu&key=REDACTED

I get a nice plot when I click "view categories" on the "language" variable:

Screenshot 2023-10-27 at 3 10 25 PM

... but this is non-restricted public data so it shouldn't need any API token at all. 🤔

@landreev
Copy link
Collaborator Author

landreev commented Oct 27, 2023

OK, so the problem appears to be that sometime between 5.9 and 6.0 we've made the api auth start rejecting calls with invalid tokens (such as key=null), even if the file in question is public. i.e. this is no longer working:
https://dataverse.harvard.edu/api/access/datafile/6867331?key=null

So, sounds like closing this issue and opening a simple main repo issue should be the proper course.

@pdurbin
Copy link
Member

pdurbin commented Oct 27, 2023

@pdurbin
Copy link
Member

pdurbin commented Oct 27, 2023

Some URLs we've been playing with:

Note that the 6.0 URL works now because of a workaround we (ok @landreev ) put into Apache to strip out key=null.

@landreev
Copy link
Collaborator Author

I will create an issue for the "junk key when no auth is required" on Monday.
In the meantime, worked around the issue in IQSS prod. with an apache rewrite rule (see the comment above). Removed the v1 of the Explorer.

@landreev landreev added Size: 10 A percentage of a sprint. and removed Status: Needs Input Applied to issues in need of input from someone currently unavailable Size: 3 A percentage of a sprint. labels Oct 27, 2023
@landreev landreev self-assigned this Oct 28, 2023
@landreev
Copy link
Collaborator Author

Closing the issue., now that the explorer is working.
The rewrite rule that was added in prod. to strip "key=null" from incoming requests:

RewriteCond %{QUERY_STRING} ^(.+?&|)key=null(?:&(.*)|)$ [NC]
RewriteRule ^ %{REQUEST_URI}?%1%2 [PT]

@stevenferey
Copy link

For information, we have adapted the rule to handle a case generating an error:Downloading "Tab-delimited file" from DataExplorer. Generated URL is &key=null

RewriteCond %{QUERY_STRING} ^(.*?&|)key=null(?:&(.*)|)$ [NC]
RewriteRule ^ %{REQUEST_URI}?%1%2 [N,R]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Size: 10 A percentage of a sprint.
Projects
Status: No status
Development

No branches or pull requests

5 participants