-
Notifications
You must be signed in to change notification settings - Fork 495
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Publishing dataset in an external repository #8001
Comments
Hi @pkiraly, it's well known use case. We have already developed such (external) webservice in 2017 to archive datasets in our Trusted Digital Repository (DANS EASY). However, our workflow is a bit different, first we're publishing dataset in Dataverse and using its metadata and files to create BagIt package, and archiving it afterwords. Please take a look on slides here: https://www.slideshare.net/vty/cessda-persistent-identifiers Regarding your possible implementation, I'm pretty sure the development of webservices is the way to go. At the moment Dataverse looks too much monolithic and we have to make it prepared for the future using modern technologies and concepts. |
(I typed this response this morning and I got sidetracked, apologies :)) I think we'd want to utilize the workflows system (https://guides.dataverse.org/en/latest/developers/workflows.html) to trigger an event to publish into the other system, and I don't think we'd want to add a flow in the Dataverse UI for this. I'd be concerned about communicating failure cases and scalability. |
This might be a good chance to revive discussing #7050. You already could extend Dataverse with a workflow, but this is not tied to the UI IIRC. A way to inject UI components for workflows from plugins would be great IMHO. Less forks, more extensibility. |
Dear @djbrooke, @4tikhonov and @poikilotherm, thanks a lot for your feedback and suggestions! I totally agree with the suggestion that Dataverse should not be extend but should work with plugins wherever it is possible. I checked the suggested workflow documentation and the example scripts in the To use the workflow for this requirement the following improvement should be taken:
Example for such a conditional step configuration: example 1: direct entry of conditions, i.e. archive the dataset only if subject is "Arts and Humanities", the user if affiliated a Humanities organisation, and it is a new major version) {
"provider":":internal",
"stepType":"http/sr",
"parameters": {
...
"conditions": [
"${dataset.subject}=[Arts and Humanities]",
"${user.affiliation}=[DARRIAH, Department of Humanities]",
"${minorVersion}=0"
]
}
} example 2: the workflow should retrieve and evaluate the user's conditions, which have been set on the user's page or via API {
"provider":":internal",
"stepType":"http/sr",
"parameters": {
...
"conditions": ["${user.externalArchivingConditions}"]
}
} A question: are you aware of any existing open source plugin for Dataverse I can check? |
@pkiraly maybe there's a better video or screenshots @qqmyers can point us to but there's now some UI for curators to see the status of publishing/archiving to another repository. The screenshot below is from "Final Demo - Full Final demo of automatic ingests of Dataverse exports into DRS, including successful, failed, and message error scenarios" at https://github.com/harvard-lts/awesome-lts#2022-06-29-final-demo via this pull request that was merged into 5.12 (just released): It seems highly related at least! I think it might use a command instead of a workflow though. (No, I can't think of any plugins you can check.) |
FWIW: Automation is via workflow (i.e. configured to post-publish), but the workflow step calls an archiving command. Those are dynamically loaded so dropping a new one in the exploded war should work. (We haven't dealt with a separate class loader yet.) |
We have a speific feature request, which I think would worth it to solve it with a general solution.
The original request: if a user create an Arts and Humanities dataset, s/he should be able to publish it as well on an external reporitory called DARIAH Repository.
As we know the slogan "lots of copies keep your stuff safe" I believe it would be a valid and supportable use case to create copies of the dataset into external reporitories.
Here is a suggestion for the user interface:
The backend and the workflow would like something like this:
getName()
: returns the name of the repositorygetUrl()
: returns the URL of the repository's starting pagepublish(DatasetVersion datasetVersion)
: the main method which publish the dataset in the repositoryisActive()
: returns if the repository is turned on in the current Dataverse instance (by default all are turned off, the site admin can activate them via configurationHere are some code snippets, to get more details:
Mapping of subjects and repositories:
get the list of active repositories:
@pdurbin @qqmyers @poikilotherm @djbrooke @4tikhonov I am interested in your opinion. I have some initial code to prove the concept for myself, but for a PR it needs lots of work. I would invent this time only if this idea meets community's opinion. Otherwise I will create an independent webservice specific for the DARIAH repository.
The text was updated successfully, but these errors were encountered: