Extending Dataverse functionalities #5525

pkiraly · 2019-02-11T09:44:01Z

Is there a standard way to extend Dataverse functionalities?

In Göttingen we would like to implement the following workflow: for Humanities data the user would like to see a button on the dataset page (somewhere near the "Publish" button), which publish their data into another, domain specific repository (Dariah Repository -- https://repository.de.dariah.eu/publikator/). This would require a change in the UI, and some extra functionalities at the background, which use Dariah repository API, and Dataverse API.

This function would be specific to our use case, and could be implemented by manipulating the HTML with jQuery and setting up an individual service for the functionalities (running by Apache HTTP server or a Java servlet container). However that would not be integrated into Dataverse ecosystem, so the maintenance would be somewhat difficult.

In different content management systems (such as Drupal), there is a built in system for extensions, based on the observer design pattern. The idea is, that the core application (in this case Dataverse) makes some calls transparent, and let others to extend or modify the result of the call. It usually has two steps: 1) the extension subscribes to the core's service 2) when it is time to call the specific functionality the core notifies its subscribers and they could react to it. There are different models and subpatterns of this pattern.

At the end it works the following way: the extensions or modules would be deployed as individual jar or war files (I don't know what is the preferred way in Glassfish), and uponserver initialization phase or via an API they would subscribe to Dataverse.

I am not sure if there are no existing sources for this question, but I haven't found it.

pdurbin · 2019-02-11T15:17:56Z

@pkiraly this is a great question! I would suggest talking to @qqmyers about his pull request #5049 about the "DuraCloud/Chronopolis Integration" he added. A high level description of the feature sounds somewhat similar to your goal: "Dataverse can be configured to submit a copy of published Datasets, packaged as Research Data Alliance conformant zipped BagIt bags to the Chronopolis via DuraCloud."

Originally, he added a button to the Dataverse web interface but my understanding is that it was switched to an API call instead.

We do have a concept of "external tools" where any arbitrary button can be added to a "Explore" or "Configure" drop down. See "type" at http://guides.dataverse.org/en/4.10.1/installation/external-tools.html . However, these tools only work at the file level at the moment.

I hear you on wanting modularity in general. @michbarsinai and @matthew-a-dunlap gave a talk at JavaOne 2017 about modularity (CON3449) but I can't seem to find the slides. If memory serves, there was talk of dropping jars into the app, like you're saying, perhaps using Service Provider Interface (SPI): https://docs.oracle.com/javase/tutorial/ext/basics/spi.html

I hope this helps a little. This sounds like a great topic for a community call: https://dataverse.org/community-calls

pameyer · 2019-02-11T15:33:43Z

If the requirement for a visible button that the user presses was removed, this would sound like something that the workflow APIs would be able to handle.

qqmyers · 2019-02-11T15:34:04Z

@pkiraly - The workflow mechanism is a way to catch the 'publish event' and either run before or after Dataverse's core publication process. As @pdurbin mentions, I've used that to let QDR send an archival copy of a dataset version being published off to an external repository (Duraspace). There are several generic parts of this: the workflow mechanism was extended to allow you to specify standard/custom Dataverse settings to be sent to your workflow, to support creation of an OAI-ORE json-ld map file with metadata and a BagIt bag to put all data/metadata into one zip file, and a core Archiver workflow step class that can be extended to call another repository's API. Not as generic/general as Drupal, but one could nominally create a new class extending the archiver, drop it in the classpath, and point to it with a setting and have it run in a publication workflow.

In addition to the workflow, we set this up as an admin API call as well (again generic to call whatever archiver class you use). For QDR only, we've also extended the GUI to add an admin-only column to the versions table to allow you to archive a given version/to see if it has already been archived. We decided to focus on versions and add to the GUI in the versions table (rather than by the publish button) since we expected to want to archive older versions of datasets as well. I can share the xhtml changes for that - the GUI just calls the API underneath so no additional java code for the GUI itself.

A more general event mechanism would be very interesting - I've worked on projects where events help automate creating previews, automating metadata extraction, trigger standard computations, etc. One (Clowder) sends the events to a message bus, making it easier to work with distributed processes than you can with something like Drupal's internal hooks.

michbarsinai · 2019-02-11T15:37:02Z

I've uploaded the presentation here: http://mbarsinai.com/homepage-files/OpenMonolith.pdf <http://mbarsinai.com/homepage-files/OpenMonolith.pdf> Not sure why the catalog of the old JavaOnes does not work anymore. If only they would have used a trustworthy repository and assigned a DOI or something.... The TL;DR is that we use the service providers system to dynamically load plugins for workflows. We also have a PoC of such plug-in that announces publications of datasets in Slack. Publication workflows do have a built-in REST API clients, so you can send REST requests and inspect their result both before and after the publication. The "before" workflow must complete successfully prior to the dataset version being published (e.g. moving files from internal server to a publicly available one). The "after" workflow happens after the version has been published (e.g. push new link to a message board/Twitter). The docs are here: http://guides.dataverse.org/en/latest/api/native-api.html#workflows

…

On 11 Feb 2019, at 17:17, Philip Durbin ***@***.***> wrote: @pkiraly <https://github.com/pkiraly> this is a great question! I would suggest talking to @qqmyers <https://github.com/qqmyers> about his pull request #5049 <#5049> about the "DuraCloud/Chronopolis Integration" he added. A high level description of the feature sounds somewhat similar to your goal: "Dataverse can be configured to submit a copy of published Datasets, packaged as Research Data Alliance conformant zipped BagIt bags to the Chronopolis via DuraCloud." Documentation: https://github.com/QualitativeDataRepository/dataverse/blob/8d09fbf5e8a3139c1479ca07455c315165378a60/doc/sphinx-guides/source/installation/config.rst#duracloud-chronopolis-integration <https://github.com/QualitativeDataRepository/dataverse/blob/8d09fbf5e8a3139c1479ca07455c315165378a60/doc/sphinx-guides/source/installation/config.rst#duracloud-chronopolis-integration> The "archiver" workflow he added: https://github.com/QualitativeDataRepository/dataverse/blob/8d09fbf5e8a3139c1479ca07455c315165378a60/doc/sphinx-guides/source/developers/workflows.rst#archiver <https://github.com/QualitativeDataRepository/dataverse/blob/8d09fbf5e8a3139c1479ca07455c315165378a60/doc/sphinx-guides/source/developers/workflows.rst#archiver> Originally, he added a button to the Dataverse web interface but my understanding is that it was switched to an API call instead. We do have a concept of "external tools" where any arbitrary button can be added to a "Explore" or "Configure" drop down. See "type" at http://guides.dataverse.org/en/4.10.1/installation/external-tools.html <http://guides.dataverse.org/en/4.10.1/installation/external-tools.html> . However, these tools only work at the file level at the moment. I hear you on wanting modularity in general. @michbarsinai <https://github.com/michbarsinai> and @matthew-a-dunlap <https://github.com/matthew-a-dunlap> gave a talk at JavaOne 2017 about modularity (CON3449) but I can't seem to find the slides. If memory serves, there was talk of dropping jars into the app, like you're saying, perhaps using Service Provider Interface (SPI): https://docs.oracle.com/javase/tutorial/ext/basics/spi.html <https://docs.oracle.com/javase/tutorial/ext/basics/spi.html> I hope this helps a little. This sounds like a great topic for a community call: https://dataverse.org/community-calls <https://dataverse.org/community-calls> — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#5525 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AB2UJBn0qdAHBz6vY4TjHbNKlp6k2KA3ks5vMYmlgaJpZM4az0a2>.

djbrooke · 2020-01-24T04:28:37Z

Hi @pkiraly - it looks like there's some good information in this thread and there's nothing that needs immediate action. Once you start working in this area again on something specific, please create another issue and we can discuss further!

pdurbin · 2020-01-24T13:20:51Z

@pkiraly @qqmyers and @scolapasta are all in the same room today. Hopefully they'll have a chance to talk this out. 😄 @poikilotherm is in the room too and might have some thoughts.

pkiraly · 2020-01-24T13:48:48Z

@pdurbin Yes, I had a chance to talk to @qqmyers, @scolapasta and @poikilotherm. My impression so far that it is the service provider interface what could provide a general solution for this problem. I have checked the documents you and Gustavo sent me, I need to ingest them, because the concepts are quite different than what I expect. Another impression at the workshop, that some other developers has the general idea, that Dataverse should be more modular. My opinion is that for purpose more descriptions/guides on how Dataverse interacts with its environment needed. So helping in this gets to my TODO list, but first I should clearly understand the existing concepts/techniques within Dataverse.

pdurbin · 2020-01-24T13:58:01Z

@pkiraly sure, I can definitely imagine better docs about this. Maybe in the dev guide. A page on modularity would be nice. I don't think we've documented anything yet. Oh! And I forgot to mention @4tikhonov ! Can you please ask him how the custom PID provider works? I'm talking about #4106. I think it's an SPI but I'm not sure. I'd love to see some docs on this.

pdurbin · 2020-07-08T15:48:17Z

The modularity conversation has moved to this issue:

Beyond Dataverse 5.0: make it pluggable (on a code level) Beyond Dataverse 5.0: make it pluggable (on a code level) #7050

djbrooke closed this as completed Jan 24, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extending Dataverse functionalities #5525

Extending Dataverse functionalities #5525

pkiraly commented Feb 11, 2019

pdurbin commented Feb 11, 2019

pameyer commented Feb 11, 2019

qqmyers commented Feb 11, 2019

michbarsinai commented Feb 11, 2019 via email

djbrooke commented Jan 24, 2020

pdurbin commented Jan 24, 2020

pkiraly commented Jan 24, 2020

pdurbin commented Jan 24, 2020

pdurbin commented Jul 8, 2020

Extending Dataverse functionalities #5525

Extending Dataverse functionalities #5525

Comments

pkiraly commented Feb 11, 2019

pdurbin commented Feb 11, 2019

pameyer commented Feb 11, 2019

qqmyers commented Feb 11, 2019

michbarsinai commented Feb 11, 2019 via email

djbrooke commented Jan 24, 2020

pdurbin commented Jan 24, 2020

pkiraly commented Jan 24, 2020

pdurbin commented Jan 24, 2020

pdurbin commented Jul 8, 2020