Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extending Dataverse functionalities #5525

Closed
pkiraly opened this issue Feb 11, 2019 · 9 comments
Closed

Extending Dataverse functionalities #5525

pkiraly opened this issue Feb 11, 2019 · 9 comments

Comments

@pkiraly
Copy link
Member

pkiraly commented Feb 11, 2019

Is there a standard way to extend Dataverse functionalities?

In Göttingen we would like to implement the following workflow: for Humanities data the user would like to see a button on the dataset page (somewhere near the "Publish" button), which publish their data into another, domain specific repository (Dariah Repository -- https://repository.de.dariah.eu/publikator/). This would require a change in the UI, and some extra functionalities at the background, which use Dariah repository API, and Dataverse API.

This function would be specific to our use case, and could be implemented by manipulating the HTML with jQuery and setting up an individual service for the functionalities (running by Apache HTTP server or a Java servlet container). However that would not be integrated into Dataverse ecosystem, so the maintenance would be somewhat difficult.

In different content management systems (such as Drupal), there is a built in system for extensions, based on the observer design pattern. The idea is, that the core application (in this case Dataverse) makes some calls transparent, and let others to extend or modify the result of the call. It usually has two steps: 1) the extension subscribes to the core's service 2) when it is time to call the specific functionality the core notifies its subscribers and they could react to it. There are different models and subpatterns of this pattern.

At the end it works the following way: the extensions or modules would be deployed as individual jar or war files (I don't know what is the preferred way in Glassfish), and uponserver initialization phase or via an API they would subscribe to Dataverse.

I am not sure if there are no existing sources for this question, but I haven't found it.

@pdurbin
Copy link
Member

pdurbin commented Feb 11, 2019

@pkiraly this is a great question! I would suggest talking to @qqmyers about his pull request #5049 about the "DuraCloud/Chronopolis Integration" he added. A high level description of the feature sounds somewhat similar to your goal: "Dataverse can be configured to submit a copy of published Datasets, packaged as Research Data Alliance conformant zipped BagIt bags to the Chronopolis via DuraCloud."

Originally, he added a button to the Dataverse web interface but my understanding is that it was switched to an API call instead.

We do have a concept of "external tools" where any arbitrary button can be added to a "Explore" or "Configure" drop down. See "type" at http://guides.dataverse.org/en/4.10.1/installation/external-tools.html . However, these tools only work at the file level at the moment.

I hear you on wanting modularity in general. @michbarsinai and @matthew-a-dunlap gave a talk at JavaOne 2017 about modularity (CON3449) but I can't seem to find the slides. If memory serves, there was talk of dropping jars into the app, like you're saying, perhaps using Service Provider Interface (SPI): https://docs.oracle.com/javase/tutorial/ext/basics/spi.html

I hope this helps a little. This sounds like a great topic for a community call: https://dataverse.org/community-calls

@pameyer
Copy link
Contributor

pameyer commented Feb 11, 2019

If the requirement for a visible button that the user presses was removed, this would sound like something that the workflow APIs would be able to handle.

@qqmyers
Copy link
Member

qqmyers commented Feb 11, 2019

@pkiraly - The workflow mechanism is a way to catch the 'publish event' and either run before or after Dataverse's core publication process. As @pdurbin mentions, I've used that to let QDR send an archival copy of a dataset version being published off to an external repository (Duraspace). There are several generic parts of this: the workflow mechanism was extended to allow you to specify standard/custom Dataverse settings to be sent to your workflow, to support creation of an OAI-ORE json-ld map file with metadata and a BagIt bag to put all data/metadata into one zip file, and a core Archiver workflow step class that can be extended to call another repository's API. Not as generic/general as Drupal, but one could nominally create a new class extending the archiver, drop it in the classpath, and point to it with a setting and have it run in a publication workflow.

In addition to the workflow, we set this up as an admin API call as well (again generic to call whatever archiver class you use). For QDR only, we've also extended the GUI to add an admin-only column to the versions table to allow you to archive a given version/to see if it has already been archived. We decided to focus on versions and add to the GUI in the versions table (rather than by the publish button) since we expected to want to archive older versions of datasets as well. I can share the xhtml changes for that - the GUI just calls the API underneath so no additional java code for the GUI itself.

A more general event mechanism would be very interesting - I've worked on projects where events help automate creating previews, automating metadata extraction, trigger standard computations, etc. One (Clowder) sends the events to a message bus, making it easier to work with distributed processes than you can with something like Drupal's internal hooks.

@michbarsinai
Copy link
Member

michbarsinai commented Feb 11, 2019 via email

@djbrooke
Copy link
Contributor

Hi @pkiraly - it looks like there's some good information in this thread and there's nothing that needs immediate action. Once you start working in this area again on something specific, please create another issue and we can discuss further!

@pdurbin
Copy link
Member

pdurbin commented Jan 24, 2020

@pkiraly @qqmyers and @scolapasta are all in the same room today. Hopefully they'll have a chance to talk this out. 😄 @poikilotherm is in the room too and might have some thoughts.

@pkiraly
Copy link
Member Author

pkiraly commented Jan 24, 2020

@pdurbin Yes, I had a chance to talk to @qqmyers, @scolapasta and @poikilotherm. My impression so far that it is the service provider interface what could provide a general solution for this problem. I have checked the documents you and Gustavo sent me, I need to ingest them, because the concepts are quite different than what I expect. Another impression at the workshop, that some other developers has the general idea, that Dataverse should be more modular. My opinion is that for purpose more descriptions/guides on how Dataverse interacts with its environment needed. So helping in this gets to my TODO list, but first I should clearly understand the existing concepts/techniques within Dataverse.

@pdurbin
Copy link
Member

pdurbin commented Jan 24, 2020

@pkiraly sure, I can definitely imagine better docs about this. Maybe in the dev guide. A page on modularity would be nice. I don't think we've documented anything yet. Oh! And I forgot to mention @4tikhonov ! Can you please ask him how the custom PID provider works? I'm talking about #4106. I think it's an SPI but I'm not sure. I'd love to see some docs on this.

@pdurbin
Copy link
Member

pdurbin commented Jul 8, 2020

The modularity conversation has moved to this issue:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants