diff --git a/CODE_OF_CONDUCT.md b/CODE_OF_CONDUCT.md new file mode 100644 index 00000000000..4204a1fc85e --- /dev/null +++ b/CODE_OF_CONDUCT.md @@ -0,0 +1,76 @@ +# Contributor Covenant Code of Conduct + +## Our Pledge + +In the interest of fostering an open and welcoming environment, we as +contributors and maintainers pledge to making participation in our project and +our community a harassment-free experience for everyone, regardless of age, body +size, disability, ethnicity, sex characteristics, gender identity and expression, +level of experience, education, socio-economic status, nationality, personal +appearance, race, religion, or sexual identity and orientation. + +## Our Standards + +Examples of behavior that contributes to creating a positive environment +include: + +* Using welcoming and inclusive language +* Being respectful of differing viewpoints and experiences +* Gracefully accepting constructive criticism +* Focusing on what is best for the community +* Showing empathy towards other community members + +Examples of unacceptable behavior by participants include: + +* The use of sexualized language or imagery and unwelcome sexual attention or + advances +* Trolling, insulting/derogatory comments, and personal or political attacks +* Public or private harassment +* Publishing others' private information, such as a physical or electronic + address, without explicit permission +* Other conduct which could reasonably be considered inappropriate in a + professional setting + +## Our Responsibilities + +Project maintainers are responsible for clarifying the standards of acceptable +behavior and are expected to take appropriate and fair corrective action in +response to any instances of unacceptable behavior. + +Project maintainers have the right and responsibility to remove, edit, or +reject comments, commits, code, wiki edits, issues, and other contributions +that are not aligned to this Code of Conduct, or to ban temporarily or +permanently any contributor for other behaviors that they deem inappropriate, +threatening, offensive, or harmful. + +## Scope + +This Code of Conduct applies both within project spaces and in public spaces +when an individual is representing the project or its community. Examples of +representing a project or community include using an official project e-mail +address, posting via an official social media account, or acting as an appointed +representative at an online or offline event. Representation of a project may be +further defined and clarified by project maintainers. + +## Enforcement + +Instances of abusive, harassing, or otherwise unacceptable behavior may be +reported by contacting the project team at support at dataverse dot org. All +complaints will be reviewed and investigated and will result in a response that +is deemed necessary and appropriate to the circumstances. The project team is +obligated to maintain confidentiality with regard to the reporter of an incident. +Further details of specific enforcement policies may be posted separately. + +Project maintainers who do not follow or enforce the Code of Conduct in good +faith may face temporary or permanent repercussions as determined by other +members of the project's leadership. + +## Attribution + +This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4, +available at https://www.contributor-covenant.org/version/1/4/code-of-conduct.html + +[homepage]: https://www.contributor-covenant.org + +For answers to common questions about this code of conduct, see +https://www.contributor-covenant.org/faq diff --git a/README.md b/README.md index 2bdc0e8edde..f52a6e20f83 100644 --- a/README.md +++ b/README.md @@ -18,6 +18,7 @@ Dataverse is a trademark of President and Fellows of Harvard College and is regi [![Dataverse Project logo](src/main/webapp/resources/images/dataverseproject_logo.jpg?raw=true "Dataverse Project")](http://dataverse.org) [![API Test Status](https://jenkins.dataverse.org/buildStatus/icon?job=IQSS-dataverse-develop&subject=API%20Test%20Status)](https://jenkins.dataverse.org/job/IQSS-dataverse-develop/) +[![API Test Coverage](https://img.shields.io/jenkins/coverage/jacoco?jobUrl=https%3A%2F%2Fjenkins.dataverse.org%2Fjob%2FIQSS-dataverse-develop&label=API%20Test%20Coverage)](https://jenkins.dataverse.org/job/IQSS-dataverse-develop/) [![Unit Test Status](https://img.shields.io/travis/IQSS/dataverse?label=Unit%20Test%20Status)](https://travis-ci.org/IQSS/dataverse) [![Unit Test Coverage](https://img.shields.io/coveralls/github/IQSS/dataverse?label=Unit%20Test%20Coverage)](https://coveralls.io/github/IQSS/dataverse?branch=develop) diff --git a/conf/docker-aio/0prep_deps.sh b/conf/docker-aio/0prep_deps.sh index 649a190af24..9059439948c 100755 --- a/conf/docker-aio/0prep_deps.sh +++ b/conf/docker-aio/0prep_deps.sh @@ -17,12 +17,12 @@ if [ ! -e dv/deps/glassfish4dv.tgz ]; then # assuming that folks usually have /tmp auto-clean as needed fi -if [ ! -e dv/deps/solr-7.3.1dv.tgz ]; then +if [ ! -e dv/deps/solr-7.7.2dv.tgz ]; then echo "solr dependency prep" # schema changes *should* be the only ones... cd dv/deps/ #wget https://archive.apache.org/dist/lucene/solr/7.3.0/solr-7.3.0.tgz -O solr-7.3.0dv.tgz - wget https://archive.apache.org/dist/lucene/solr/7.3.1/solr-7.3.1.tgz -O solr-7.3.1dv.tgz + wget https://archive.apache.org/dist/lucene/solr/7.7.2/solr-7.7.2.tgz -O solr-7.7.2dv.tgz cd ../../ fi diff --git a/conf/docker-aio/1prep.sh b/conf/docker-aio/1prep.sh index 1dc95f8d45c..a2f2956532a 100755 --- a/conf/docker-aio/1prep.sh +++ b/conf/docker-aio/1prep.sh @@ -4,9 +4,9 @@ # this was based off the phoenix deployment; and is likely uglier and bulkier than necessary in a perfect world mkdir -p testdata/doc/sphinx-guides/source/_static/util/ -cp ../solr/7.3.1/schema*.xml testdata/ -cp ../solr/7.3.1/solrconfig.xml testdata/ -cp ../solr/7.3.1/updateSchemaMDB.sh testdata/ +cp ../solr/7.7.2/schema*.xml testdata/ +cp ../solr/7.7.2/solrconfig.xml testdata/ +cp ../solr/7.7.2/updateSchemaMDB.sh testdata/ cp ../jhove/jhove.conf testdata/ cp ../jhove/jhoveConfig.xsd testdata/ cd ../../ diff --git a/conf/docker-aio/c7.dockerfile b/conf/docker-aio/c7.dockerfile index 7436b73664c..c5663daa3ec 100644 --- a/conf/docker-aio/c7.dockerfile +++ b/conf/docker-aio/c7.dockerfile @@ -17,7 +17,7 @@ COPY testdata/sushi_sample_logs.json /tmp/ COPY disableipv6.conf /etc/sysctl.d/ RUN rm /etc/httpd/conf/* COPY httpd.conf /etc/httpd/conf -RUN cd /opt ; tar zxf /tmp/dv/deps/solr-7.3.1dv.tgz +RUN cd /opt ; tar zxf /tmp/dv/deps/solr-7.7.2dv.tgz RUN cd /opt ; tar zxf /tmp/dv/deps/glassfish4dv.tgz # this copy of domain.xml is the result of running `asadmin set server.monitoring-service.module-monitoring-levels.jvm=LOW` on a default glassfish installation (aka - enable the glassfish REST monitir endpoint for the jvm` @@ -28,9 +28,9 @@ RUN sudo -u postgres /usr/pgsql-9.6/bin/initdb -D /var/lib/pgsql/data # copy configuration related files RUN cp /tmp/dv/pg_hba.conf /var/lib/pgsql/data/ -RUN cp -r /opt/solr-7.3.1/server/solr/configsets/_default /opt/solr-7.3.1/server/solr/collection1 -RUN cp /tmp/dv/schema*.xml /opt/solr-7.3.1/server/solr/collection1/conf/ -RUN cp /tmp/dv/solrconfig.xml /opt/solr-7.3.1/server/solr/collection1/conf/solrconfig.xml +RUN cp -r /opt/solr-7.7.2/server/solr/configsets/_default /opt/solr-7.7.2/server/solr/collection1 +RUN cp /tmp/dv/schema*.xml /opt/solr-7.7.2/server/solr/collection1/conf/ +RUN cp /tmp/dv/solrconfig.xml /opt/solr-7.7.2/server/solr/collection1/conf/solrconfig.xml # skipping glassfish user and solr user (run both as root) diff --git a/conf/docker-aio/entrypoint.bash b/conf/docker-aio/entrypoint.bash index da01ee56153..60f99cf2259 100755 --- a/conf/docker-aio/entrypoint.bash +++ b/conf/docker-aio/entrypoint.bash @@ -2,7 +2,7 @@ export LANG=en_US.UTF-8 #sudo -u postgres /usr/bin/postgres -D /var/lib/pgsql/data & sudo -u postgres /usr/pgsql-9.6/bin/postgres -D /var/lib/pgsql/data & -cd /opt/solr-7.3.1/ +cd /opt/solr-7.7.2/ # TODO: Run Solr as non-root and remove "-force". bin/solr start -force bin/solr create_core -c collection1 -d server/solr/collection1/conf -force diff --git a/conf/docker-aio/testscripts/install b/conf/docker-aio/testscripts/install index a994fe2920d..b886ea8e4ad 100755 --- a/conf/docker-aio/testscripts/install +++ b/conf/docker-aio/testscripts/install @@ -15,7 +15,7 @@ export SMTP_SERVER=localhost export MEM_HEAP_SIZE=2048 export GLASSFISH_DOMAIN=domain1 cd scripts/installer -cp pgdriver/postgresql-42.2.2.jar $GLASSFISH_ROOT/glassfish/lib +cp pgdriver/postgresql-42.2.9.jar $GLASSFISH_ROOT/glassfish/lib #cp ../../conf/jhove/jhove.conf $GLASSFISH_ROOT/glassfish/domains/$GLASSFISH_DOMAIN/config/jhove.conf cp /opt/dv/testdata/jhove.conf $GLASSFISH_ROOT/glassfish/domains/$GLASSFISH_DOMAIN/config/jhove.conf cp /opt/dv/testdata/jhoveConfig.xsd $GLASSFISH_ROOT/glassfish/domains/$GLASSFISH_DOMAIN/config/jhoveConfig.xsd diff --git a/conf/docker/dataverse-glassfish/Dockerfile b/conf/docker/dataverse-glassfish/Dockerfile index 367a9ca127c..57284d3f58b 100644 --- a/conf/docker/dataverse-glassfish/Dockerfile +++ b/conf/docker/dataverse-glassfish/Dockerfile @@ -70,7 +70,7 @@ RUN /tmp/dvinstall/glassfish-setup.sh ###glassfish-setup will handle everything in Dockerbuild ##install jdbc driver -RUN cp /tmp/dvinstall/pgdriver/postgresql-42.2.2.jar /usr/local/glassfish4/glassfish/domains/domain1/lib +RUN cp /tmp/dvinstall/pgdriver/postgresql-42.2.9.jar /usr/local/glassfish4/glassfish/domains/domain1/lib # Customized persistence xml to avoid database recreation #RUN mkdir -p /tmp/WEB-INF/classes/META-INF/ diff --git a/conf/solr/7.3.1/readme.md b/conf/solr/7.7.2/readme.md similarity index 100% rename from conf/solr/7.3.1/readme.md rename to conf/solr/7.7.2/readme.md diff --git a/conf/solr/7.3.1/schema.xml b/conf/solr/7.7.2/schema.xml similarity index 98% rename from conf/solr/7.3.1/schema.xml rename to conf/solr/7.7.2/schema.xml index fd307a32f07..da40a8e99fa 100644 --- a/conf/solr/7.3.1/schema.xml +++ b/conf/solr/7.7.2/schema.xml @@ -171,6 +171,12 @@ + + + + + + @@ -229,6 +235,12 @@ + + + + + + @@ -281,7 +293,7 @@ - + diff --git a/conf/solr/7.3.1/schema_dv_mdb_copies.xml b/conf/solr/7.7.2/schema_dv_mdb_copies.xml similarity index 100% rename from conf/solr/7.3.1/schema_dv_mdb_copies.xml rename to conf/solr/7.7.2/schema_dv_mdb_copies.xml diff --git a/conf/solr/7.3.1/schema_dv_mdb_fields.xml b/conf/solr/7.7.2/schema_dv_mdb_fields.xml similarity index 100% rename from conf/solr/7.3.1/schema_dv_mdb_fields.xml rename to conf/solr/7.7.2/schema_dv_mdb_fields.xml diff --git a/conf/solr/7.3.1/solrconfig.xml b/conf/solr/7.7.2/solrconfig.xml similarity index 100% rename from conf/solr/7.3.1/solrconfig.xml rename to conf/solr/7.7.2/solrconfig.xml diff --git a/conf/solr/7.3.1/updateSchemaMDB.sh b/conf/solr/7.7.2/updateSchemaMDB.sh similarity index 100% rename from conf/solr/7.3.1/updateSchemaMDB.sh rename to conf/solr/7.7.2/updateSchemaMDB.sh diff --git a/doc/release-notes/4.16-release-notes.md b/doc/release-notes/4.16-release-notes.md index 66241a42777..8feb263d2ab 100644 --- a/doc/release-notes/4.16-release-notes.md +++ b/doc/release-notes/4.16-release-notes.md @@ -91,6 +91,7 @@ If this is a new installation, please see our 4.19 milestone in Github. +For the complete list of code changes in this release, see the 4.19 milestone in Github. For help with upgrading, installing, or general questions please post to the Dataverse Google Group or email support@dataverse.org. diff --git a/doc/release-notes/4.20-release-notes b/doc/release-notes/4.20-release-notes new file mode 100644 index 00000000000..e29953db101 --- /dev/null +++ b/doc/release-notes/4.20-release-notes @@ -0,0 +1,224 @@ +# Dataverse 4.20 + +This release brings new features, enhancements, and bug fixes to Dataverse. Thank you to all of the community members who contributed code, suggestions, bug reports, and other assistance across the project. + +## Release Highlights + +### Multiple Store Support + +Dataverse can now be configured to store files in more than one place at the same time (multiple file, s3, and/or swift stores). + +General information about this capability can be found below and in the Configuration Guide - File Storage section. + +### S3 Direct Upload support + +S3 stores can now optionally be configured to support direct upload of files, as one option for supporting upload of larger files. In the current implementation, each file is uploaded in a single HTTP call. For AWS, this limits file size to 5 GB. With Minio the theoretical limit should be 5 TB and 50+ GB file uploads have been tested successfully. (In practice other factors such as network timeouts may prevent a successful upload a multi-TB file and minio instances may be configured with a < 5 TB single HTTP call limit.) No other S3 service providers have been tested yet. Their limits should be the lower of the maximum object size allowed and any single HTTP call upload limit. + +General information about this capability can be found in the Big Data Support Guide with specific information about how to enable it in the Configuration Guide - File Storage section. + +To support large data uploads, installations can now configure direct upload to S3, bypassing the application server. This will allow for larger uploads over a more resilient transfer method. + +General information about this capability can be found below and in the Configuration Guide. + +### Integration Test Coverage Reporting + +The percentage of code covered by the API-based integration tests is now shown on a badge at the bottom of the README.md file that serves as the homepage of Dataverse Github Repository. + +### New APIs + +New APIs for Role Management and Dataset Size have been added. Previously, managing roles at the dataset and file level was only possible through the UI. API users can now also retrieve the size of a dataset through an API call, with specific parameters depending on the type of information needed. + +More information can be found in the API Guide. + +## Major Use Cases + +Newly-supported use cases in this release include: + +- Users will now be able to see the number of linked datasets and dataverses accurately reflected in the facet counts on the Dataverse search page. (Issue #6564, PR #6262) +- Users will be able to upload large files directly to S3. (Issue #6489, PR #6490) +- Users will be able to see the PIDs of datasets and files in the Guestbook export. (Issue #6534, PR #6628) +- Administrators will be able to configure multiple stores per Dataverse installation, which allow dataverse-level setting of storage location, upload size limits, and supported data transfer methods (Issue #6485, PR #6488) +- Administrators and integrators will be able to manage roles using a new API. (Issue #6290, PR #6622) +- Administrators and integrators will be able to determine a dataset's size. (Issue #6524, PR #6609) +- Integrators will now be able to retrieve the number of files in a dataset as part of a single API call instead of needing to count the number of files in the response. (Issue #6601, PR #6623) + +## Notes for Dataverse Installation Administrators + +### Potential Data Integrity Issue + +We recently discovered a *potential* data integrity issue in Dataverse databases. One manifests itself as duplicate DataFile objects created for the same uploaded file (https://github.com/IQSS/dataverse/issues/6522); the other as duplicate DataTable (tabular metadata) objects linked to the same DataFile (https://github.com/IQSS/dataverse/issues/6510). This issue impacted approximately .03% of datasets in Harvard's Dataverse. + +To see if any datasets in your installation have been impacted by this data integrity issue, we've provided a diagnostic script here: + +https://github.com/IQSS/dataverse/raw/develop/scripts/issues/6510/check_datafiles_6522_6510.sh + +The script relies on the PostgreSQL utility psql to access the database. You will need to edit the credentials at the top of the script to match your database configuration. + +If neither of the two issues is present in your database, you will see a message "... no duplicate DataFile objects in your database" and "no tabular files affected by this issue in your database". + +If either, or both kinds of duplicates are detected, the script will provide further instructions. We will need you to send us the produced output. We will then assist you in resolving the issues in your database. + +### Multiple Store Support Changes + +**Existing installations will need to make configuration changes to adopt this version, regardless of whether additional stores are to be added or not.** + +Multistore support requires that each store be assigned a label, id, and type - see the Configuration Guide for a more complete explanation. For an existing store, the recommended upgrade path is to assign the store id based on it's type, i.e. a 'file' store would get id 'file', an 's3' store would have the id 's3'. + +With this choice, no manual changes to datafile 'storageidentifier' entries are needed in the database. If you do not name your existing store using this convention, you will need to edit the database to maintain access to existing files. + +The following set of commands to change the Glassfish JVM options will adapt an existing file or s3 store for this upgrade: +For a file store: + + ./asadmin create-jvm-options "\-Ddataverse.files.file.type=file" + ./asadmin create-jvm-options "\-Ddataverse.files.file.label=file" + ./asadmin create-jvm-options "\-Ddataverse.files.file.directory=" + +For a s3 store: + + ./asadmin create-jvm-options "\-Ddataverse.files.s3.type=s3" + ./asadmin create-jvm-options "\-Ddataverse.files.s3.label=s3" + ./asadmin delete-jvm-options "-Ddataverse.files.s3-bucket-name=" + ./asadmin create-jvm-options "-Ddataverse.files.s3.bucket-name=" + +Any additional S3 options you have set will need to be replaced as well, following the pattern in the last two lines above - delete the option including a '-' after 's3' and creating the same option with the '-' replaced by a '.', using the same value you currently have configured. + +Once these options are set, restarting the Glassfish service is all that is needed to complete the change. + +Note that the "\-Ddataverse.files.directory", if defined, continues to control where temporary files are stored (in the /temp subdir of that directory), independent of the location of any 'file' store defined above. + +Also note that the :MaxFileUploadSizeInBytes property has a new option to provide independent limits for each store instead of a single value for the whole installation. The default is to apply any existing limit defined by this property to all stores. + +### Direct S3 Upload Changes + +Direct upload to S3 is enabled per store by one new jvm option: + + ./asadmin create-jvm-options "\-Ddataverse.files..upload-redirect=true" + +The existing :MaxFileUploadSizeInBytes property and ```dataverse.files..url-expiration-minutes``` jvm option for the same store also apply to direct upload. + +Direct upload via the Dataverse web interface is transparent to the user and handled automatically by the browser. Some minor differences in file upload exist: directly uploaded files are not unzipped and Dataverse does not scan their content to help in assigning a MIME type. Ingest of tabular files and metadata extraction from FITS files will occur, but can be turned off for files above a specified size limit through the new dataverse.files..ingestsizelimit jvm option. + +API calls to support direct upload also exist, and, if direct upload is enabled for a store in Dataverse, the latest DVUploader (v1.0.8) provides a'-directupload' flag that enables its use. + +### Solr Update + +With this release we upgrade to the latest available stable release in the Solr 7.x branch. We recommend a fresh installation of Solr 7.7.2 (the index will be empty) +followed by an "index all". + +Before you start the "index all", Dataverse will appear to be empty because +the search results come from Solr. As indexing progresses, results will appear +until indexing is complete. + +### Dataverse Linking Fix + +The fix implemented for #6262 will display the datasets contained in linked dataverses in the linking dataverse. The full reindex described above will correct these counts. Going forward, this will happen automatically whenever a dataverse is linked. + +### Google Analytics Download Tracking Bug + +The button tracking capability discussed in the installation guide (http://guides.dataverse.org/en/4.20/installation/config.html#id88) relies on an analytics-code.html file that must be configured using the :WebAnalyticsCode setting. The example file provided in the installation guide is no longer compatible with recent Dataverse releases (>v4.16). Installations using this feature should update their analytics-code.html file by following the installation instructions using the updated example file. Alternately, sites can modify their existing files to include the one-line change made in the example file at line 120. + +### Run ReExportall + +We made changes to the JSON Export in this release (Issue 6650, PR #6669). If you'd like these changes to reflected in your JSON exports, you should run ReExportall as part of the upgrade process. We've included this in the step-by-step instructions below. + +### New JVM Options and Database Settings + +## New JVM Options for file storage drivers + +- The JVM option dataverse.files.file.directory= controls where temporary files are stored (in the /temp subdir of the defined directory), independent of the location of any 'file' store defined above. +- The JVM option dataverse.files..upload-redirect enables direct upload of files added to a dataset to the S3 bucket. (S3 stores only!) +- The JVM option dataverse.files..MaxFileUploadSizeInBytes controls the maximum size of file uploads allowed for the given file store. +- The JVM option dataverse.files..ingestsizelimit controls the maximum size of files for which ingest will be attempted, for the given file store. + +## New Database Settings for Shibboleth + +- The database setting :ShibAffiliationAttribute can now be set to prevent affiliations for Shibboleth users from being reset upon each log in. + +## Notes for Tool Developers and Integrators + +### Integration Test Coverage Reporting + +API-based integration tests are run every time a branch is merged to develop and the percentage of code covered by these integration tests is now shown on a badge at the bottom of the README.md file that serves as the homepage of Dataverse Github Repository. + +### Guestbook Column Changes + +Users of downloaded guestbooks should note that two new columns have been added: + +- Dataset PID +- File PID + +If you are expecting column in the CSV file to be in a particular order, you will need to make adjustments. + +Old columns: Guestbook, Dataset, Date, Type, File Name, File Id, User Name, Email, Institution, Position, Custom Questions + +New columns: Guestbook, Dataset, Dataset PID, Date, Type, File Name, File Id, File PID, User Name, Email, Institution, Position, Custom Questions + +### API Changes + +As reported in #6570, the affiliation for dataset contacts has been wrapped in parentheses in the JSON output from the Search API. These parentheses have now been removed. This is a backward incompatible change but it's expected that this will not cause issues for integrators. + +### Role Name Change + +The role alias provided in API responses has changed, so if anything was hard-coded to "editor" instead of "contributor" it will need to be updated. + +## Complete List of Changes + +For the complete list of code changes in this release, see the 4.20 milestone in Github. + +For help with upgrading, installing, or general questions please post to the Dataverse Google Group or email support@dataverse.org. + +## Installation + +If this is a new installation, please see our Installation Guide. + +## Upgrade + +1. Undeploy the previous version. + +- <glassfish install path>/glassfish4/bin/asadmin list-applications +- <glassfish install path>/glassfish4/bin/asadmin undeploy dataverse + +2. Stop glassfish and remove the generated directory, start. + +- service glassfish stop +- remove the generated directory: rm -rf <glassfish install path>glassfish4/glassfish/domains/domain1/generated +- service glassfish start + +3. Install and configure Solr v7.7.2 + +See http://guides.dataverse.org/en/4.20/installation/prerequisites.html#installing-solr + +4. Deploy this version. + +- <glassfish install path>/glassfish4/bin/asadmin deploy <path>dataverse-4.20.war + +5. The following set of commands to change the Glassfish JVM options will adapt an existing file or s3 store for this upgrade: +For a file store: + + ./asadmin create-jvm-options "\-Ddataverse.files.file.type=file" + ./asadmin create-jvm-options "\-Ddataverse.files.file.label=file" + ./asadmin create-jvm-options "\-Ddataverse.files.file.directory=" + +For a s3 store: + + ./asadmin create-jvm-options "\-Ddataverse.files.s3.type=s3" + ./asadmin create-jvm-options "\-Ddataverse.files.s3.label=s3" + ./asadmin delete-jvm-options "-Ddataverse.files.s3-bucket-name=" + ./asadmin create-jvm-options "-Ddataverse.files.s3.bucket-name=" + +Any additional S3 options you have set will need to be replaced as well, following the pattern in the last two lines above - delete the option including a '-' after 's3' and creating the same option with the '-' replaced by a '.', using the same value you currently have configured. + +6. Restart glassfish. + +7. Update Citation Metadata Block + +- `wget https://github.com/IQSS/dataverse/releases/download/4.20/citation.tsv` +- `curl http://localhost:8080/api/admin/datasetfield/load -X POST --data-binary @citation.tsv -H "Content-type: text/tab-separated-values"` + +8. Kick off full reindex + +http://guides.dataverse.org/en/4.20/admin/solr-search-index.html + +9. (Recommended) Run ReExportall to update JSON Exports + + diff --git a/doc/release-notes/6545-solr-var-meta.md b/doc/release-notes/6545-solr-var-meta.md new file mode 100644 index 00000000000..5e4a0c417c1 --- /dev/null +++ b/doc/release-notes/6545-solr-var-meta.md @@ -0,0 +1,2 @@ +File schema.xml for solr search was changed. New fields such as literalQuestion, interviewInstruction, postQuestion, variableUniverse, variableNotes were added. +Full reindexing is needed if one wants to search and see updates to variable level metadata before this change. Otherwise were is no need to reindex, new updates with DCT will be authomaticaly indexed. diff --git a/doc/release-notes/6650-export-import-mismatch b/doc/release-notes/6650-export-import-mismatch new file mode 100644 index 00000000000..0ab2999a603 --- /dev/null +++ b/doc/release-notes/6650-export-import-mismatch @@ -0,0 +1,3 @@ +Run ReExportall to update JSON Exports + +http://guides.dataverse.org/en/4.19/admin/metadataexport.html?highlight=export#batch-exports-through-the-api \ No newline at end of file diff --git a/doc/sphinx-guides/source/_static/admin/dataverse-external-tools.tsv b/doc/sphinx-guides/source/_static/admin/dataverse-external-tools.tsv index 5de06df2bd3..556a17ef0eb 100644 --- a/doc/sphinx-guides/source/_static/admin/dataverse-external-tools.tsv +++ b/doc/sphinx-guides/source/_static/admin/dataverse-external-tools.tsv @@ -1,5 +1,5 @@ TwoRavens explore file A system of interlocking statistical tools for data exploration, analysis, and meta-analysis: http://2ra.vn. See the :doc:`/user/data-exploration/tworavens` section of the User Guide for more information on TwoRavens from the user perspective and the :doc:`/installation/r-rapache-tworavens` section of the Installation Guide. Data Explorer explore file A GUI which lists the variables in a tabular data file allowing searching, charting and cross tabulation analysis. See the README.md file at https://github.com/scholarsportal/Dataverse-Data-Explorer for the instructions on adding Data Explorer to your Dataverse; and the :doc:`/installation/prerequisites` section of the Installation Guide for the instructions on how to set up **basic R configuration required** (specifically, Dataverse uses R to generate .prep metadata files that are needed to run Data Explorer). Whole Tale explore dataset A platform for the creation of reproducible research packages that allows users to launch containerized interactive analysis environments based on popular tools such as Jupyter and RStudio. Using this integration, Dataverse users can launch Jupyter and RStudio environments to analyze published datasets. For more information, see the `Whole Tale User Guide `_. -File Previewers explore file A set of tools that display the content of files - including audio, html, `Hypothes.is `_ annotations, images, PDF, text, video - allowing them to be viewed without downloading. The previewers can be run directly from github.io, so the only required step is using the Dataverse API to register the ones you want to use. Documentation, including how to optionally brand the previewers, and an invitation to contribute through github are in the README.md file. https://github.com/QualitativeDataRepository/dataverse-previewers +File Previewers explore file A set of tools that display the content of files - including audio, html, `Hypothes.is `_ annotations, images, PDF, text, video, tabular data, and spreadsheets - allowing them to be viewed without downloading. The previewers can be run directly from github.io, so the only required step is using the Dataverse API to register the ones you want to use. Documentation, including how to optionally brand the previewers, and an invitation to contribute through github are in the README.md file. Initial development was led by the Qualitative Data Repository and the spreasdheet previewer was added by the Social Sciences and Humanities Open Cloud (SSHOC) project. https://github.com/GlobalDataverseCommunityConsortium/dataverse-previewers Data Curation Tool configure file A GUI for curating data by adding labels, groups, weights and other details to assist with informed reuse. See the README.md file at https://github.com/scholarsportal/Dataverse-Data-Curation-Tool for the installation instructions. diff --git a/doc/sphinx-guides/source/_static/api/file-provenance.json b/doc/sphinx-guides/source/_static/api/file-provenance.json new file mode 100644 index 00000000000..6c823cdb5f3 --- /dev/null +++ b/doc/sphinx-guides/source/_static/api/file-provenance.json @@ -0,0 +1 @@ +{"prefix": {"pre_0": "http://www.w3.org/2001/XMLSchema", "s-prov": "http://s-prov/ns/#", "provone": "http://purl.dataone.org/provone/2015/01/15/ontology#", "vargen": "http://openprovenance.org/vargen#", "foaf": "http://xmlns.com/foaf/0.1/", "dcterms": "http://purl.org/dc/terms/", "tmpl": "http://openprovenance.org/tmpl#", "var": "http://openprovenance.org/var#", "vcard": "http://www.w3.org/2006/vcard/ns#", "swirrl": "http://project-dare.eu/ns#"}, "bundle": {"vargen:SessionSnapshot": {"prefix": {"s-prov": "http://s-prov/ns/#", "provone": "http://purl.dataone.org/provone/2015/01/15/ontology#", "vargen": "http://openprovenance.org/vargen#", "tmpl": "http://openprovenance.org/tmpl#", "var": "http://openprovenance.org/var#", "vcard": "http://www.w3.org/2006/vcard/ns#", "swirrl": "http://project-dare.eu/ns#"}, "entity": {"vargen:inData": {"swirrl:volumeId": {"$": "var:rawVolumeId", "type": "prov:QUALIFIED_NAME"}, "prov:type": {"$": "provone:Data", "type": "prov:QUALIFIED_NAME"}}, "vargen:inFile": {"prov:atLocation": {"$": "var:atLocation", "type": "prov:QUALIFIED_NAME"}, "s-prov:format": {"$": "var:format", "type": "prov:QUALIFIED_NAME"}, "s-prov:checksum": {"$": "var:checksum", "type": "prov:QUALIFIED_NAME"}}, "vargen:WorkData": {"swirrl:volumeId": {"$": "var:workVolumeId", "type": "prov:QUALIFIED_NAME"}, "prov:type": {"$": "provone:Data", "type": "prov:QUALIFIED_NAME"}}, "var:JupSnapshot": {"prov:generatedAt": {"$": "var:generatedAt", "type": "prov:QUALIFIED_NAME"}, "prov:atLocation": {"$": "var:repoUrl", "type": "prov:QUALIFIED_NAME"}, "s-prov:description": {"$": "var:description", "type": "prov:QUALIFIED_NAME"}, "prov:type": {"$": "swirrl:NotebookSnapshot", "type": "prov:QUALIFIED_NAME"}, "swirrl:sessionId": {"$": "var:sessionId", "type": "prov:QUALIFIED_NAME"}}}, "used": {"_:id1": {"prov:activity": "vargen:snapshot", "prov:entity": "var:Jupyter"}, "_:id2": {"prov:activity": "vargen:snapshot", "prov:entity": "vargen:WorkData"}, "_:id3": {"prov:activity": "vargen:snapshot", "prov:entity": "vargen:inData"}}, "wasDerivedFrom": {"_:id4": {"prov:usedEntity": "var:Jupyter", "prov:generatedEntity": "var:JupSnapshot"}}, "wasAssociatedWith": {"_:id5": {"prov:activity": "vargen:snapshot", "prov:agent": "var:snapAgent"}}, "actedOnBehalfOf": {"_:id6": {"prov:delegate": "var:snapAgent", "prov:responsible": "var:user"}}, "activity": {"vargen:snapshot": {"prov:atLocation": {"$": "var:method_path", "type": "prov:QUALIFIED_NAME"}, "tmpl:startTime": {"$": "var:startTime", "type": "prov:QUALIFIED_NAME"}, "tmpl:endTime": {"$": "var:endTime", "type": "prov:QUALIFIED_NAME"}}}, "wasGeneratedBy": {"_:id7": {"prov:activity": "vargen:snapshot", "prov:entity": "var:JupSnapshot"}}, "agent": {"var:user": {"vcard:uid": {"$": "var:name", "type": "prov:QUALIFIED_NAME"}, "swirrl:authMode": {"$": "var:authmode", "type": "prov:QUALIFIED_NAME"}, "swirrl:group": {"$": "var:group", "type": "prov:QUALIFIED_NAME"}, "prov:type": {"$": "prov:Person", "type": "prov:QUALIFIED_NAME"}}, "var:snapAgent": {"vcard:uid": {"$": "var:name_api", "type": "prov:QUALIFIED_NAME"}, "prov:type": {"$": "prov:SoftwareAgent", "type": "prov:QUALIFIED_NAME"}}}, "hadMember": {"_:id8": {"prov:collection": "vargen:inData", "prov:entity": "vargen:inFile"}}}}} \ No newline at end of file diff --git a/doc/sphinx-guides/source/_static/installation/files/etc/init.d/solr b/doc/sphinx-guides/source/_static/installation/files/etc/init.d/solr index 6c89c27f5f7..d351c544a65 100755 --- a/doc/sphinx-guides/source/_static/installation/files/etc/init.d/solr +++ b/doc/sphinx-guides/source/_static/installation/files/etc/init.d/solr @@ -5,7 +5,7 @@ # chkconfig: 35 92 08 # description: Starts and stops Apache Solr -SOLR_DIR="/usr/local/solr/solr-7.3.1" +SOLR_DIR="/usr/local/solr/solr-7.7.2" SOLR_COMMAND="bin/solr" SOLR_ARGS="-m 1g -j jetty.host=127.0.0.1" SOLR_USER=solr diff --git a/doc/sphinx-guides/source/_static/installation/files/etc/systemd/solr.service b/doc/sphinx-guides/source/_static/installation/files/etc/systemd/solr.service index 84f1d22c517..06eacc68ca2 100644 --- a/doc/sphinx-guides/source/_static/installation/files/etc/systemd/solr.service +++ b/doc/sphinx-guides/source/_static/installation/files/etc/systemd/solr.service @@ -5,9 +5,9 @@ After = syslog.target network.target remote-fs.target nss-lookup.target [Service] User = solr Type = forking -WorkingDirectory = /usr/local/solr/solr-7.3.1 -ExecStart = /usr/local/solr/solr-7.3.1/bin/solr start -m 1g -j "jetty.host=127.0.0.1" -ExecStop = /usr/local/solr/solr-7.3.1/bin/solr stop +WorkingDirectory = /usr/local/solr/solr-7.7.2 +ExecStart = /usr/local/solr/solr-7.7.2/bin/solr start -m 1g -j "jetty.host=127.0.0.1" +ExecStop = /usr/local/solr/solr-7.7.2/bin/solr stop LimitNOFILE=65000 LimitNPROC=65000 Restart=on-failure diff --git a/doc/sphinx-guides/source/_static/installation/files/var/www/dataverse/branding/analytics-code.html b/doc/sphinx-guides/source/_static/installation/files/var/www/dataverse/branding/analytics-code.html index 4e6a01f2d5d..ca703dddf11 100644 --- a/doc/sphinx-guides/source/_static/installation/files/var/www/dataverse/branding/analytics-code.html +++ b/doc/sphinx-guides/source/_static/installation/files/var/www/dataverse/branding/analytics-code.html @@ -117,7 +117,7 @@ var row = target.parents('tr')[0]; if(row != null) { //finds the file id/DOI in the Dataset page - label = $(row).find('td.col-file-metadata > a').attr('href'); + label = $(row).find('div.file-metadata-block > a').attr('href'); } else { //finds the file id/DOI in the file page label = $('#fileForm').attr('action'); diff --git a/doc/sphinx-guides/source/admin/dataverses-datasets.rst b/doc/sphinx-guides/source/admin/dataverses-datasets.rst index 7b5c5fbd4a0..a4bea9f53e7 100644 --- a/doc/sphinx-guides/source/admin/dataverses-datasets.rst +++ b/doc/sphinx-guides/source/admin/dataverses-datasets.rst @@ -22,7 +22,7 @@ Moves a dataverse whose id is passed to a new dataverse whose id is passed. The Link a Dataverse ^^^^^^^^^^^^^^^^ -Creates a link between a dataverse and another dataverse (see the Linked Dataverses + Linked Datasets section of the :doc:`/user/dataverse-management` guide for more information). Only accessible to superusers. :: +Creates a link between a dataverse and another dataverse (see the :ref:`dataverse-linking` section of the User Guide for more information). Only accessible to superusers. :: curl -H "X-Dataverse-key: $API_TOKEN" -X PUT http://$SERVER/api/dataverses/$linked-dataverse-alias/link/$linking-dataverse-alias @@ -38,7 +38,27 @@ Add Dataverse RoleAssignments to Child Dataverses Recursively assigns the users and groups having a role(s),that are in the set configured to be inheritable via the :InheritParentRoleAssignments setting, on a specified dataverse to have the same role assignments on all of the dataverses that have been created within it. The response indicates success or failure and lists the individuals/groups and dataverses involved in the update. Only accessible to superusers. :: - curl -H "X-Dataverse-key: $API_TOKEN" http://$SERVER/api/admin/dataverse/$dataverse-alias//addRoleAssignmentsToChildren + curl -H "X-Dataverse-key: $API_TOKEN" http://$SERVER/api/admin/dataverse/$dataverse-alias/addRoleAssignmentsToChildren + +Configure a Dataverse to store all new files in a specific file store +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +To direct new files (uploaded when datasets are created or edited) for all datasets in a given dataverse, the store can be specified via the API as shown below, or by editing the 'General Information' for a Dataverse on the Dataverse page. Only accessible to superusers. :: + + curl -H "X-Dataverse-key: $API_TOKEN" -X PUT -d $storageDriverLabel http://$SERVER/api/admin/dataverse/$dataverse-alias/storageDriver + +The current driver can be seen using: + + curl -H "X-Dataverse-key: $API_TOKEN" http://$SERVER/api/admin/dataverse/$dataverse-alias/storageDriver + +and can be reset to the default store with: + + curl -H "X-Dataverse-key: $API_TOKEN" -X DELETE http://$SERVER/api/admin/dataverse/$dataverse-alias/storageDriver + +The available drivers can be listed with: + + curl -H "X-Dataverse-key: $API_TOKEN" http://$SERVER/api/admin/storageDrivers + Datasets -------- @@ -55,7 +75,7 @@ Moves a dataset whose id is passed to a dataverse whose alias is passed. If the Link a Dataset ^^^^^^^^^^^^^^ -Creates a link between a dataset and a dataverse (see the Linked Dataverses + Linked Datasets section of the :doc:`/user/dataverse-management` guide for more information). :: +Creates a link between a dataset and a dataverse (see the :ref:`dataset-linking` section of the User Guide for more information). :: curl -H "X-Dataverse-key: $API_TOKEN" -X PUT http://$SERVER/api/datasets/$linked-dataset-id/link/$linking-dataverse-alias diff --git a/doc/sphinx-guides/source/admin/harvestserver.rst b/doc/sphinx-guides/source/admin/harvestserver.rst index 333e139972f..f1436926ea2 100644 --- a/doc/sphinx-guides/source/admin/harvestserver.rst +++ b/doc/sphinx-guides/source/admin/harvestserver.rst @@ -54,7 +54,7 @@ be used to create an OAI set. Sets can overlap local dataverses, and can include as few or as many of your local datasets as you wish. A good way to master the Dataverse search query language is to experiment with the Advanced Search page. We also recommend that you -consult the Search API section of the Dataverse User Guide. +consult the :doc:`/api/search` section of the API Guide. Once you have entered the search query and clicked *Next*, the number of search results found will be shown on the next screen. This way, if @@ -138,7 +138,7 @@ runs every night (at 2AM, by default). This export timer is created and activated automatically every time the application is deployed or restarted. Once again, this is new in Dataverse 4, and unlike DVN v3, where export jobs had to be scheduled and activated by the admin -user. See the "Export" section of the Admin guide, for more information on the automated metadata exports. +user. See the :doc:`/admin/metadataexport` section of the Admin Guide, for more information on the automated metadata exports. It is still possible however to make changes like this be immediately reflected in the OAI server, by going to the *Harvesting Server* page diff --git a/doc/sphinx-guides/source/admin/index.rst b/doc/sphinx-guides/source/admin/index.rst index 39b4f5748d3..6ff611cb55f 100755 --- a/doc/sphinx-guides/source/admin/index.rst +++ b/doc/sphinx-guides/source/admin/index.rst @@ -27,7 +27,7 @@ This guide documents the functionality only available to superusers (such as "da solr-search-index ip-groups monitoring - reporting-tools + reporting-tools-and-queries maintenance backups troubleshooting diff --git a/doc/sphinx-guides/source/admin/integrations.rst b/doc/sphinx-guides/source/admin/integrations.rst index abadaea8891..527ec6fe563 100644 --- a/doc/sphinx-guides/source/admin/integrations.rst +++ b/doc/sphinx-guides/source/admin/integrations.rst @@ -19,9 +19,9 @@ If your researchers have data on Dropbox, you can make it easier for them to get Open Science Framework (OSF) ++++++++++++++++++++++++++++ -The Center for Open Science's Open Science Framework (OSF) is an open source software project that facilitates open collaboration in science research across the lifespan of a scientific project. +The Center for Open Science's Open Science Framework (OSF) is an open source software project that facilitates open collaboration in science research across the lifespan of a scientific project. -For instructions on depositing data from OSF to your installation of Dataverse, your researchers can visit http://help.osf.io/m/addons/l/863978-connect-dataverse-to-a-project +For instructions on depositing data from OSF to your installation of Dataverse, your researchers can visit http://help.osf.io/m/addons/l/863978-connect-dataverse-to-a-project RSpace ++++++ @@ -41,6 +41,22 @@ As of this writing only OJS 2.x is supported and instructions for getting starte If you are interested in OJS 3.x supporting deposit from Dataverse, please leave a comment on https://github.com/pkp/pkp-lib/issues/1822 +Renku ++++++ + +Renku is a platform that enables collaborative, reproducible and reusable +(data)science. It allows researchers to automatically record the provenance of +their research results and retain links to imported and exported data. Users +can organize their data in "Datasets", which can be exported to Dataverse via +the command-line interface (CLI). + +Renku dataset documentation: https://renku-python.readthedocs.io/en/latest/commands.html#module-renku.cli.dataset + +Flagship deployment of the Renku platform: https://renkulab.io + +Renku discourse: https://renku.discourse.group/ + + Embedding Data on Websites -------------------------- @@ -58,7 +74,7 @@ Analysis and Computation Data Explorer +++++++++++++ -Data Explorer is a GUI which lists the variables in a tabular data file allowing searching, charting and cross tabulation analysis. +Data Explorer is a GUI which lists the variables in a tabular data file allowing searching, charting and cross tabulation analysis. For installation instructions, see the :doc:`external-tools` section. @@ -95,6 +111,20 @@ Researchers can launch Jupyter Notebooks, RStudio, and other computational envir Institutions can self host BinderHub. Dataverse is one of the supported `repository providers `_. +Renku ++++++ + +Researchers can import Dataverse datasets into their Renku projects via the +command-line interface (CLI) by using the Dataverse DOI. See the `renku Dataset +documentation +`_ +for details. Currently Dataverse ``>=4.8.x`` is required for the import to work. If you need +support for an earlier version of Dataverse, please get in touch with the Renku team at +`Discourse `_ or `GitHub `_. +The UI implementation of the import is in progress and will be +completed in Q12020. + + Discoverability --------------- @@ -116,7 +146,7 @@ Research Data Preservation Archivematica +++++++++++++ -`Archivematica `_ is an integrated suite of open-source tools for processing digital objects for long-term preservation, developed and maintained by Artefactual Systems Inc. Its configurable workflow is designed to produce system-independent, standards-based Archival Information Packages (AIPs) suitable for long-term storage and management. +`Archivematica `_ is an integrated suite of open-source tools for processing digital objects for long-term preservation, developed and maintained by Artefactual Systems Inc. Its configurable workflow is designed to produce system-independent, standards-based Archival Information Packages (AIPs) suitable for long-term storage and management. Sponsored by the `Ontario Council of University Libraries (OCUL) `_, this technical integration enables users of Archivematica to select datasets from connected Dataverse instances and process them for long-term access and digital preservation. For more information and list of known issues, please refer to Artefactual's `release notes `_, `integration documentation `_, and the `project wiki `_. diff --git a/doc/sphinx-guides/source/admin/make-data-count.rst b/doc/sphinx-guides/source/admin/make-data-count.rst index 157b71d3e20..d6e9828a872 100644 --- a/doc/sphinx-guides/source/admin/make-data-count.rst +++ b/doc/sphinx-guides/source/admin/make-data-count.rst @@ -61,9 +61,9 @@ If you haven't already, follow the steps for installing Counter Processor in the Enable Logging for Make Data Count ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -To make Dataverse log dataset usage (views and downloads) for Make Data Count, you must set the ``:MDCLogPath`` database setting. See :ref:`MDCLogPath` for details. +To make Dataverse log dataset usage (views and downloads) for Make Data Count, you must set the ``:MDCLogPath`` database setting. See :ref:`:MDCLogPath` for details. -If you wish to start logging in advance of setting up other components, or wish to log without display MDC metrics for any other reason, you can set the optional ``:DisplayMDCMetrics`` database setting to false. See :ref:`DisplayMDCMetrics` for details. +If you wish to start logging in advance of setting up other components, or wish to log without display MDC metrics for any other reason, you can set the optional ``:DisplayMDCMetrics`` database setting to false. See :ref:`:DisplayMDCMetrics` for details. After you have your first day of logs, you can process them the next day. diff --git a/doc/sphinx-guides/source/admin/metadatacustomization.rst b/doc/sphinx-guides/source/admin/metadatacustomization.rst index 1a41d329b3b..bf89805007b 100644 --- a/doc/sphinx-guides/source/admin/metadatacustomization.rst +++ b/doc/sphinx-guides/source/admin/metadatacustomization.rst @@ -9,18 +9,17 @@ Dataverse has a flexible data-driven metadata system powered by "metadata blocks Introduction ------------ -Before you embark on customizing metadata in Dataverse you should make sure you are aware of the modest amount of customization that is available with the Dataverse web interface. It's possible to hide fields and make field required by clicking "Edit" at the dataverse level, clicking "General Information" and making adjustments under "Metadata Fields" as described in the context of dataset templates in the :doc:`/user/dataverse-management` section of the User Guide. +Before you embark on customizing metadata in Dataverse you should make sure you are aware of the modest amount of customization that is available with the Dataverse web interface. It's possible to hide fields and make field required by clicking "Edit" at the dataverse level, clicking "General Information" and making adjustments under "Metadata Fields" as described in the :ref:`create-dataverse` section of the Dataverse Management page in the User Guide. Much more customization of metadata is possible, but this is an advanced topic so feedback on what is written below is very welcome. The possibilities for customization include: - Editing and adding metadata fields -- Editing and adding instructional text (field label tooltips and text - box watermarks) +- Editing and adding instructional text (field label tooltips and text box watermarks) - Editing and adding controlled vocabularies -- Changing which fields depositors must use in order to save datasets (see also "dataset templates" in the :doc:`/user/dataverse-management` section of the User Guide.) +- Changing which fields depositors must use in order to save datasets (see also :ref:`dataset-templates` section of the User Guide.) - Changing how saved metadata values are displayed in the UI @@ -38,10 +37,8 @@ tab-separated value (TSV). [1]_\ :sup:`,`\ [2]_ While it is technically possible to define more than one metadata block in a TSV file, it is good organizational practice to define only one in each file. -The metadata block TSVs shipped with Dataverse are in `this folder in -the Dataverse github -repo `__ and the corresponding ResourceBundle property files are `here `__. -Human-readable copies are available in `this Google Sheets +The metadata block TSVs shipped with Dataverse are in `/tree/develop/scripts/api/data/metadatablocks +`__ and the corresponding ResourceBundle property files `/tree/develop/src/main/java `__ of the Dataverse GitHub repo. Human-readable copies are available in `this Google Sheets document `__ but they tend to get out of sync with the TSV files, which should be considered authoritative. The Dataverse installation process operates on the TSVs, not the Google spreadsheet. About the metadata block TSV @@ -120,6 +117,17 @@ Each of the three main sections own sets of properties: | | | cause display | | | | problems. | +-----------------------+-----------------------+-----------------------+ +| blockURI | Associates the | The citation | +| | properties in a block | #metadataBlock has | +| | with an external URI. | the blockURI | +| | Properties will be | https://dataverse.org | +| | assigned the global | /schema/citation/ | +| | identifier | which assigns a | +| | blockURI in the | global URI to terms | +| | OAI_ORE metadata | such as 'https:// | +| | and archival Bags | dataverse.org/schema/ | +| | | citation/subtitle' | ++-----------------------+-----------------------+-----------------------+ #datasetField (field) properties ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -316,6 +324,19 @@ Each of the three main sections own sets of properties: | | | existing metadata | | | | block.) | +-----------------------+-----------------------+------------------------+ +| termURI | Specify a global URI | For example, the | +| | identifying this term | existing citation | +| | in an external | #metadataBlock | +| | community vocabulary. | defines the property | +| | | names 'title' | +| | This value overrides | as http://purl.org/dc/ | +| | the default created | terms/title - i.e. | +| | by appending the | indicating that it can | +| | property name to the | be interpreted as the | +| | blockURI defined | Dublin Core term | +| | for the | 'title' | +| | #metadataBlock | | ++-----------------------+-----------------------+------------------------+ #controlledVocabulary (enumerated) properties ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -391,12 +412,10 @@ FieldType definitions | | newlines. While any HTML is | | | permitted, only a subset of HTML | | | tags will be rendered in the UI. | -| | A `list of supported tags is | -| | included in the Dataverse User | -| | Guide `__ | -| | . | +| | See the | +| | :ref:`supported-html-fields` | +| | section of the Dataset + File | +| | Management page in the User Guide.| +-----------------------------------+-----------------------------------+ | url | If not empty, field must contain | | | a valid URL. | @@ -504,10 +523,10 @@ Setting Up a Dev Environment for Testing You have several options for setting up a dev environment for testing metadata block changes: -- Vagrant: See the :doc:`/developers/tools` section of the Dev Guide. +- Vagrant: See the :doc:`/developers/tools` section of the Developer Guide. - docker-aio: See https://github.com/IQSS/dataverse/tree/develop/conf/docker-aio -- AWS deployment: See the :doc:`/developers/deployment` section of the Dev Guide. -- Full dev environment: See the :doc:`/developers/dev-environment` section of the Dev Guide. +- AWS deployment: See the :doc:`/developers/deployment` section of the Developer Guide. +- Full dev environment: See the :doc:`/developers/dev-environment` section of the Developer Guide. To get a clean environment in Vagrant, you'll be running ``vagrant destroy``. In Docker, you'll use ``docker rm``. For a full dev environment or AWS installation, you might find ``rebuild`` and related scripts at ``scripts/deploy/phoenix.dataverse.org`` useful. @@ -586,7 +605,7 @@ controlledvocabulary.language.marathi_(marathi)=Marathi (Mar\u0101\u1E6Dh\u012B) Enabling a Metadata Block ~~~~~~~~~~~~~~~~~~~~~~~~~ -Running a curl command like "load" example above should make the new custom metadata block available within the system but in order to start using the fields you must either enable it from the GUI (see "General Information" in the :doc:`/user/dataverse-management` section of the User Guide) or by running a curl command like the one below using a superuser API token. In the example below we are enabling the "journal" and "geospatial" metadata blocks for the root dataverse: +Running a curl command like "load" example above should make the new custom metadata block available within the system but in order to start using the fields you must either enable it from the UI (see :ref:`general-information` section of Dataverse Management in the User Guide) or by running a curl command like the one below using a superuser API token. In the example below we are enabling the "journal" and "geospatial" metadata blocks for the root dataverse: ``curl -H "X-Dataverse-key:$API_TOKEN" -X POST -H "Content-type:application/json" -d "[\"journal\",\"geospatial\"]" http://localhost:8080/api/dataverses/:root/metadatablocks`` @@ -601,7 +620,7 @@ configuration, including any enabled metadata schemas: ``curl http://localhost:8080/api/admin/index/solr/schema`` -For convenience and automation you can download and consider running :download:`updateSchemaMDB.sh <../../../../conf/solr/7.3.1/updateSchemaMDB.sh>`. It uses the API endpoint above and writes schema files to the filesystem (so be sure to run it on the Solr server itself as the Unix user who owns the Solr files) and then triggers a Solr reload. +For convenience and automation you can download and consider running :download:`updateSchemaMDB.sh <../../../../conf/solr/7.7.2/updateSchemaMDB.sh>`. It uses the API endpoint above and writes schema files to the filesystem (so be sure to run it on the Solr server itself as the Unix user who owns the Solr files) and then triggers a Solr reload. By default, it will download from Dataverse at `http://localhost:8080` and reload Solr at `http://localhost:8983`. You may use the following environment variables with this script or mix'n'match with options: @@ -614,13 +633,13 @@ Environment variable Option Description E `UNBLOCK_KEY` `-u` If your installation has a blocked admin API *xyz* or */secrets/unblock.key* endpoint, you can provide either the key itself or a path to a keyfile -`TARGET` `-t` Provide the config directory of your Solr core */usr/local/solr/solr-7.3.1/server/solr/collection1/conf* +`TARGET` `-t` Provide the config directory of your Solr core */usr/local/solr/solr-7.7.2/server/solr/collection1/conf* "collection1" ==================== ====== =============================================== ========================================================= See the :doc:`/installation/prerequisites/` section of the Installation Guide for a suggested location on disk for the Solr schema file. -Please note that if you are going to make a pull request updating ``conf/solr/7.3.1/schema.xml`` with fields you have added, you should first load all the custom metadata blocks in ``scripts/api/data/metadatablocks`` (including ones you don't care about) to create a complete list of fields. +Please note that if you are going to make a pull request updating ``conf/solr/7.7.2/schema.xml`` with fields you have added, you should first load all the custom metadata blocks in ``scripts/api/data/metadatablocks`` (including ones you don't care about) to create a complete list of fields. Reloading a Metadata Block -------------------------- @@ -631,7 +650,7 @@ As mentioned above, changes to metadata blocks that ship with Dataverse will be Great care must be taken when reloading a metadata block. Matching is done on field names (or identifiers and then names in the case of controlled vocabulary values) so it's easy to accidentally create duplicate fields. -The ability to reload metadata blocks means that SQL update scripts don't need to be written for these changes. See also the :doc:`/developers/sql-upgrade-scripts` section of the Dev Guide. +The ability to reload metadata blocks means that SQL update scripts don't need to be written for these changes. See also the :doc:`/developers/sql-upgrade-scripts` section of the Developer Guide. Tips from the Dataverse Community --------------------------------- diff --git a/doc/sphinx-guides/source/admin/metadataexport.rst b/doc/sphinx-guides/source/admin/metadataexport.rst index 1d1deb37a2f..b9036363cac 100644 --- a/doc/sphinx-guides/source/admin/metadataexport.rst +++ b/doc/sphinx-guides/source/admin/metadataexport.rst @@ -9,7 +9,7 @@ Automatic Exports Publishing a dataset automatically starts a metadata export job, that will run in the background, asynchronously. Once completed, it will make the dataset metadata exported and cached in all the supported formats listed under :ref:`Supported Metadata Export Formats ` in the :doc:`/user/dataset-management` section of the User Guide. -A scheduled timer job that runs nightly will attempt to export any published datasets that for whatever reason haven't been exported yet. This timer is activated automatically on the deployment, or restart, of the application. So, again, no need to start or configure it manually. (See the "Application Timers" section of this guide for more information) +A scheduled timer job that runs nightly will attempt to export any published datasets that for whatever reason haven't been exported yet. This timer is activated automatically on the deployment, or restart, of the application. So, again, no need to start or configure it manually. (See the :doc:`timers` section of this Admin Guide for more information.) Batch exports through the API ----------------------------- diff --git a/doc/sphinx-guides/source/admin/monitoring.rst b/doc/sphinx-guides/source/admin/monitoring.rst index 84d6f31e6d7..a901a357907 100644 --- a/doc/sphinx-guides/source/admin/monitoring.rst +++ b/doc/sphinx-guides/source/admin/monitoring.rst @@ -103,6 +103,8 @@ actionlogrecord There is a database table called ``actionlogrecord`` that captures events that may be of interest. See https://github.com/IQSS/dataverse/issues/2729 for more discussion around this table. +.. _edit-draft-versions-logging: + Edit Draft Versions Logging --------------------------- diff --git a/doc/sphinx-guides/source/admin/reporting-tools.rst b/doc/sphinx-guides/source/admin/reporting-tools-and-queries.rst similarity index 57% rename from doc/sphinx-guides/source/admin/reporting-tools.rst rename to doc/sphinx-guides/source/admin/reporting-tools-and-queries.rst index c309744be63..197339d767d 100644 --- a/doc/sphinx-guides/source/admin/reporting-tools.rst +++ b/doc/sphinx-guides/source/admin/reporting-tools-and-queries.rst @@ -1,18 +1,19 @@ .. role:: fixedwidthplain -Reporting Tools -=============== +Reporting Tools and Common Queries +================================== -Reporting tools created by members of the Dataverse community. +Reporting tools and queries created by members of the Dataverse community. .. contents:: Contents: :local: * Matrix (): Collaboration Matrix is a visualization showing the connectedness and collaboration between authors and their affiliations. Visit https://rin.lipi.go.id/matrix/ to play with a production installation. - * Dataverse Web Report (): Creates interactive charts showing data extracted from the Dataverse Excel Report * Dataverse Reports for Texas Digital Library (): A python3-based tool to generate and email statistical reports from Dataverse (https://dataverse.org/) using the native API and database queries. * dataverse-metrics (): Aggregates and visualizes metrics for installations of Dataverse around the world or a single Dataverse installation. + +* Useful queries from the Dataverse Community (): A community-generated and maintained document of postgresql queries for getting information about users and dataverses/datasets/files in your Dataverse installation. If you are trying to find out some information from Dataverse, chances are that someone else has had the same questions and it's now listed in this document. If it's not listed, please feel free to add it to the document. \ No newline at end of file diff --git a/doc/sphinx-guides/source/admin/timers.rst b/doc/sphinx-guides/source/admin/timers.rst index 3c1ff40f935..733dd7fbc1c 100644 --- a/doc/sphinx-guides/source/admin/timers.rst +++ b/doc/sphinx-guides/source/admin/timers.rst @@ -24,7 +24,7 @@ The following JVM option instructs the application to act as the dedicated timer **IMPORTANT:** Note that this option is automatically set by the Dataverse installer script. That means that when **configuring a multi-server cluster**, it will be the responsibility of the installer to remove the option from the :fixedwidthplain:`domain.xml` of every node except the one intended to be the timer server. We also recommend that the following entry in the :fixedwidthplain:`domain.xml`: ```` is changed back to ```` on all the non-timer server nodes. Similarly, this option is automatically set by the installer script. Changing it back to the default setting on a server that doesn't need to run the timer will prevent a potential race condition, where multiple servers try to get a lock on the timer database. -**Note** that for the timer to work, the version of the PostgreSQL JDBC driver your instance is using must match the version of your PostgreSQL database. See the 'Timer not working' section of the :doc:`/admin/troubleshooting` guide. +**Note** that for the timer to work, the version of the PostgreSQL JDBC driver your instance is using must match the version of your PostgreSQL database. See the :ref:`timer-not-working` section of Troubleshooting in the Admin Guide. Harvesting Timers ----------------- diff --git a/doc/sphinx-guides/source/admin/troubleshooting.rst b/doc/sphinx-guides/source/admin/troubleshooting.rst index eb7872bac20..bf0ffb508a6 100644 --- a/doc/sphinx-guides/source/admin/troubleshooting.rst +++ b/doc/sphinx-guides/source/admin/troubleshooting.rst @@ -78,7 +78,9 @@ Note that it may or may not work on your system, so it is provided as an example .. literalinclude:: ../_static/util/clear_timer.sh -Timer not working +.. _timer-not-working: + +Timer Not Working ----------------- Dataverse relies on EJB timers to perform scheduled tasks: harvesting from remote servers, updating the local OAI sets and running metadata exports. (See :doc:`timers` for details.) If these scheduled jobs are not running on your server, this may be the result of the incompatibility between the version of PostgreSQL database you are using, and PostgreSQL JDBC driver in use by your instance of Glassfish. The symptoms: diff --git a/doc/sphinx-guides/source/admin/user-administration.rst b/doc/sphinx-guides/source/admin/user-administration.rst index 764de6977ab..d9907a94f43 100644 --- a/doc/sphinx-guides/source/admin/user-administration.rst +++ b/doc/sphinx-guides/source/admin/user-administration.rst @@ -53,7 +53,7 @@ The app will send a standard welcome email with a URL the user can click, which, Should users' URL token expire, they will see a "Verify Email" button on the account information page to send another URL. -Sysadmins can determine which users have verified their email addresses by looking for the presence of the value ``emailLastConfirmed`` in the JSON output from listing users (see the "Admin" section of the :doc:`/api/native-api`). As mentioned in the :doc:`/user/account` section of the User Guide, the email addresses for Shibboleth users are re-confirmed on every login. +Sysadmins can determine which users have verified their email addresses by looking for the presence of the value ``emailLastConfirmed`` in the JSON output from listing users (see :ref:`admin` section of Native API in the API Guide). As mentioned in the :doc:`/user/account` section of the User Guide, the email addresses for Shibboleth users are re-confirmed on every login. Deleting an API Token --------------------- diff --git a/doc/sphinx-guides/source/api/apps.rst b/doc/sphinx-guides/source/api/apps.rst index 44ffb1ead71..6fca5891202 100755 --- a/doc/sphinx-guides/source/api/apps.rst +++ b/doc/sphinx-guides/source/api/apps.rst @@ -30,7 +30,7 @@ File Previewers File Previewers are tools that display the content of files - including audio, html, Hypothes.is annotations, images, PDF, text, video - allowing them to be viewed without downloading. -https://github.com/QualitativeDataRepository/dataverse-previewers +https://github.com/GlobalDataverseCommunityConsortium/dataverse-previewers TwoRavens ~~~~~~~~~ diff --git a/doc/sphinx-guides/source/api/dataaccess.rst b/doc/sphinx-guides/source/api/dataaccess.rst index eca43ba1c5e..0e2e338404d 100755 --- a/doc/sphinx-guides/source/api/dataaccess.rst +++ b/doc/sphinx-guides/source/api/dataaccess.rst @@ -112,6 +112,7 @@ Value Description ID Exports file with specific file metadata ``ID``. ============== =========== +.. _data-variable-metadata-access: Data Variable Metadata Access ----------------------------- diff --git a/doc/sphinx-guides/source/api/native-api.rst b/doc/sphinx-guides/source/api/native-api.rst index d226781da30..67b4317e1de 100644 --- a/doc/sphinx-guides/source/api/native-api.rst +++ b/doc/sphinx-guides/source/api/native-api.rst @@ -53,9 +53,9 @@ Next you need to figure out the alias or database id of the "parent" dataverse i .. code-block:: bash export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx - export PARENT=root export SERVER_URL=https://demo.dataverse.org - + export PARENT=root + curl -H X-Dataverse-key:$API_TOKEN -X POST $SERVER_URL/api/dataverses/$PARENT --upload-file dataverse-complete.json The fully expanded example above (without environment variables) looks like this: @@ -64,42 +64,83 @@ The fully expanded example above (without environment variables) looks like this curl -H X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx -X POST https://demo.dataverse.org/api/dataverses/root --upload-file dataverse-complete.json -You should expect a 201 ("CREATED") response and JSON indicating the database id that has been assigned to your newly created dataverse. +You should expect an HTTP 200 response and JSON beginning with "status":"OK" followed by a representation of the newly-created dataverse. .. _view-dataverse: View a Dataverse ~~~~~~~~~~~~~~~~ -|CORS| View data about the dataverse identified by ``$id``. ``$id`` can be the id number of the dataverse, its identifier (a.k.a. alias), or the special value ``:root`` for the root dataverse. +|CORS| View a JSON representation of the dataverse identified by ``$id``. ``$id`` can be the database ID of the dataverse, its alias, or the special value ``:root`` for the root dataverse. + +To view a published dataverse: + +.. code-block:: bash + + export SERVER_URL=https://demo.dataverse.org + export ID=root + + curl $SERVER_URL/api/dataverses/$ID + +The fully expanded example above (without environment variables) looks like this: + +.. code-block:: bash + + curl https://demo.dataverse.org/api/dataverses/root + +To view an unpublished dataverse: -``curl $SERVER_URL/api/dataverses/$id`` +.. code-block:: bash + + export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx + export SERVER_URL=https://demo.dataverse.org + export ID=root + + curl -H X-Dataverse-key:$API_TOKEN $SERVER_URL/api/dataverses/$ID + +The fully expanded example above (without environment variables) looks like this: + +.. code-block:: bash + + curl -H X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx https://demo.dataverse.org/api/dataverses/root Delete a Dataverse ~~~~~~~~~~~~~~~~~~ -In order to delete a dataverse you must first delete or move all of its contents elsewhere. +Before you may delete a dataverse you must first delete or move all of its contents elsewhere. + +Deletes the dataverse whose database ID or alias is given: + +.. code-block:: bash + + export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx + export SERVER_URL=https://demo.dataverse.org + export ID=root + + curl -H X-Dataverse-key:$API_TOKEN -X DELETE $SERVER_URL/api/dataverses/$ID + +The fully expanded example above (without environment variables) looks like this: -Deletes the dataverse whose ID is given: +.. code-block:: bash -``curl -H "X-Dataverse-key:$API_TOKEN" -X DELETE $SERVER_URL/api/dataverses/$id`` + curl -H X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx -X DELETE https://demo.dataverse.org/api/dataverses/root .. _show-contents-of-a-dataverse-api: Show Contents of a Dataverse ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -|CORS| Lists all the dataverses and datasets directly under a dataverse (direct children only). You must specify the "alias" of a dataverse or its database id. If you specify your API token and have access, unpublished dataverses and datasets will be included in the listing. +|CORS| Lists all the dataverses and datasets directly under a dataverse (direct children only, not recursive) specified by database id or alias. If you pass your API token and have access, unpublished dataverses and datasets will be included in the response. .. note:: See :ref:`curl-examples-and-environment-variables` if you are unfamiliar with the use of ``export`` below. .. code-block:: bash export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx - export ALIAS=root export SERVER_URL=https://demo.dataverse.org - - curl -H X-Dataverse-key:$API_TOKEN $SERVER_URL/api/dataverses/$ALIAS/contents + export ID=root + + curl -H X-Dataverse-key:$API_TOKEN $SERVER_URL/api/dataverses/$ID/contents The fully expanded example above (without environment variables) looks like this: @@ -110,45 +151,104 @@ The fully expanded example above (without environment variables) looks like this Report the data (file) size of a Dataverse ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -Shows the combined size in bytes of all the files uploaded into the dataverse ``id``. :: +Shows the combined size in bytes of all the files uploaded into the dataverse ``id``: -``curl -H "X-Dataverse-key:$API_TOKEN" http://$SERVER_URL/api/dataverses/$id/storagesize`` +.. code-block:: bash -Both published and unpublished files will be counted, in the dataverse specified, and in all its sub-dataverses, recursively. -By default, only the archival files are counted - i.e., the files uploaded by users (plus the tab-delimited versions generated for tabular data files on ingest). If the optional argument ``includeCached=true`` is specified, the API will also add the sizes of all the extra files generated and cached by Dataverse - the resized thumbnail versions for image files, the metadata exports for published datasets, etc. + export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx + export SERVER_URL=https://demo.dataverse.org + export ID=root + + curl -H X-Dataverse-key:$API_TOKEN $SERVER_URL/api/dataverses/$ID/storagesize + +The fully expanded example above (without environment variables) looks like this: + +.. code-block:: bash + + curl -H X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx https://demo.dataverse.org/api/dataverses/root/storagesize +The size of published and unpublished files will be summed both in the dataverse specified and beneath all its sub-dataverses, recursively. +By default, only the archival files are counted - i.e., the files uploaded by users (plus the tab-delimited versions generated for tabular data files on ingest). If the optional argument ``includeCached=true`` is specified, the API will also add the sizes of all the extra files generated and cached by Dataverse - the resized thumbnail versions for image files, the metadata exports for published datasets, etc. List Roles Defined in a Dataverse ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -All the roles defined directly in the dataverse identified by ``id``:: +All the roles defined directly in the dataverse identified by ``id``: + +.. code-block:: bash + + export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx + export SERVER_URL=https://demo.dataverse.org + export ID=root + + curl -H X-Dataverse-key:$API_TOKEN $SERVER_URL/api/dataverses/$ID/roles + +The fully expanded example above (without environment variables) looks like this: + +.. code-block:: bash - GET http://$SERVER/api/dataverses/$id/roles?key=$apiKey + curl -H X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx https://demo.dataverse.org/api/dataverses/root/roles List Facets Configured for a Dataverse ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -|CORS| List all the facets for a given dataverse ``id``. :: +|CORS| List all the facets for a given dataverse ``id``: + +.. code-block:: bash + + export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx + export SERVER_URL=https://demo.dataverse.org + export ID=root + + curl -H X-Dataverse-key:$API_TOKEN $SERVER_URL/api/dataverses/$ID/facets + +The fully expanded example above (without environment variables) looks like this: + +.. code-block:: bash - GET http://$SERVER/api/dataverses/$id/facets?key=$apiKey + curl -H X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx https://demo.dataverse.org/api/dataverses/root/facets Set Facets for a Dataverse ~~~~~~~~~~~~~~~~~~~~~~~~~~ -Assign search facets for a given dataverse with alias ``$alias`` +Assign search facets for a given dataverse identified by ``id``: + +.. code-block:: bash + + export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx + export SERVER_URL=https://demo.dataverse.org + export ID=root + + curl -H X-Dataverse-key:$API_TOKEN" -X POST $SERVER_URL/api/dataverses/$ID/facets --upload-file facets.json + +The fully expanded example above (without environment variables) looks like this: + +.. code-block:: bash -``curl -H "X-Dataverse-key: $apiKey" -X POST http://$server/api/dataverses/$alias/facets --upload-file facets.json`` + curl -H X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx -X POST https://demo.dataverse.org/api/dataverses/root/facets --upload-file facets.json Where ``facets.json`` contains a JSON encoded list of metadata keys (e.g. ``["authorName","authorAffiliation"]``). Create a New Role in a Dataverse ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -Creates a new role under dataverse ``id``. Needs a json file with the role description:: +Creates a new role under dataverse ``id``. Needs a json file with the role description: - POST http://$SERVER/api/dataverses/$id/roles?key=$apiKey - -POSTed JSON example:: +.. code-block:: bash + + export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx + export SERVER_URL=https://demo.dataverse.org + export ID=root + + curl -H X-Dataverse-key:$API_TOKEN -X POST $SERVER_URL/api/dataverses/$ID/roles --upload-file roles.json + +The fully expanded example above (without environment variables) looks like this: + +.. code-block:: bash + + curl -H X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx -X POST https://demo.dataverse.org/api/dataverses/root/roles --upload-file roles.json + +Where ``roles.json`` looks like this:: { "alias": "sys1", @@ -164,29 +264,66 @@ POSTed JSON example:: List Role Assignments in a Dataverse ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -List all the role assignments at the given dataverse:: +List all the role assignments at the given dataverse: + +.. code-block:: bash + + export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx + export SERVER_URL=https://demo.dataverse.org + export ID=root + + curl -H X-Dataverse-key:$API_TOKEN $SERVER_URL/api/dataverses/$ID/assignments + +The fully expanded example above (without environment variables) looks like this: + +.. code-block:: bash + + curl -H X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx https://demo.dataverse.org/api/dataverses/root/assignments - GET http://$SERVER/api/dataverses/$id/assignments?key=$apiKey - Assign Default Role to User Creating a Dataset in a Dataverse ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -Assign a default role to a user creating a dataset in a dataverse ``id`` where ``roleAlias`` is the database alias of the role to be assigned:: +Assign a default role to a user creating a dataset in a dataverse ``id`` where ``roleAlias`` is the database alias of the role to be assigned: - PUT http://$SERVER/api/dataverses/$id/defaultContributorRole/$roleAlias?key=$apiKey - -Note: You may use "none" as the ``roleAlias``. This will prevent a user who creates a dataset from having any role on that dataset. It is not recommended for dataverses with human contributors. +.. code-block:: bash + + export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx + export SERVER_URL=https://demo.dataverse.org + export ID=root + export ROLE_ALIAS=curator + + curl -H X-Dataverse-key:$API_TOKEN -X PUT $SERVER_URL/api/dataverses/$ID/defaultContributorRole/$ROLE_ALIAS + +The fully expanded example above (without environment variables) looks like this: + +.. code-block:: bash + + curl -H X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx -X PUT https://demo.dataverse.org/api/dataverses/root/defaultContributorRole/curator + +Note: You may use "none" as the ``ROLE_ALIAS``. This will prevent a user who creates a dataset from having any role on that dataset. It is not recommended for dataverses with human contributors. .. _assign-role-on-a-dataverse-api: Assign a New Role on a Dataverse ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -Assigns a new role, based on the POSTed JSON. :: +Assigns a new role, based on the POSTed JSON: - POST http://$SERVER/api/dataverses/$id/assignments?key=$apiKey +.. code-block:: bash -POSTed JSON example:: + export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx + export SERVER_URL=https://demo.dataverse.org + export ID=root + + curl -H X-Dataverse-key:$API_TOKEN -X POST -H "Content-Type: application/json" $SERVER_URL/api/dataverses/$ID/assignments --upload-file role.json + +The fully expanded example above (without environment variables) looks like this: + +.. code-block:: bash + + curl -H X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx -X POST -H "Content-Type: application/json" https://demo.dataverse.org/api/dataverses/root/assignments --upload-file role.json + +POSTed JSON example (the content of ``role.json`` file):: { "assignee": "@uma", @@ -198,14 +335,27 @@ POSTed JSON example:: Delete Role Assignment from a Dataverse ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -Delete the assignment whose id is ``$id``:: +Delete the assignment whose id is ``$id``: + +.. code-block:: bash + + export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx + export SERVER_URL=https://demo.dataverse.org + export ID=root + export ASSIGNMENT_ID=6 + + curl -H X-Dataverse-key:$API_TOKEN -X DELETE $SERVER_URL/api/dataverses/$ID/assignments/$ASSIGNMENT_ID + +The fully expanded example above (without environment variables) looks like this: + +.. code-block:: bash - DELETE http://$SERVER/api/dataverses/$id/assignments/$id?key=$apiKey + curl -H X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx -X DELETE https://demo.dataverse.org/api/dataverses/root/assignments/6 List Metadata Blocks Defined on a Dataverse ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -|CORS| Get the metadata blocks defined on a dataverse which determine which field are available to authors when they create and edit datasets within that dataverse. This feature is described under "General Information" in the :doc:`/user/dataverse-management` section of the User Guide. +|CORS| Get the metadata blocks defined on a dataverse which determine which field are available to authors when they create and edit datasets within that dataverse. This feature is described in :ref:`general-information` section of Dataverse Management of the User Guide. Please note that an API token is only required if the dataverse has not been published. @@ -214,10 +364,10 @@ Please note that an API token is only required if the dataverse has not been pub .. code-block:: bash export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx - export ALIAS=root export SERVER_URL=https://demo.dataverse.org + export ID=root - curl -H X-Dataverse-key:$API_TOKEN $SERVER_URL/api/dataverses/$ALIAS/metadatablocks + curl -H X-Dataverse-key:$API_TOKEN $SERVER_URL/api/dataverses/$ID/metadatablocks The fully expanded example above (without environment variables) looks like this: @@ -239,10 +389,10 @@ The metadata blocks that are available with a default installation of Dataverse .. code-block:: bash export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx - export ALIAS=root export SERVER_URL=https://demo.dataverse.org - - curl -H X-Dataverse-key:$API_TOKEN -X POST -H \"Content-type:application/json\" --upload-file define-metadatablocks.json $SERVER_URL/api/dataverses/$ALIAS/metadatablocks + export ID=root + + curl -H X-Dataverse-key:$API_TOKEN -X POST $SERVER_URL/api/dataverses/$ID/metadatablocks -H \"Content-type:application/json\" --upload-file define-metadatablocks.json The fully expanded example above (without environment variables) looks like this: @@ -253,19 +403,43 @@ The fully expanded example above (without environment variables) looks like this Determine if a Dataverse Inherits Its Metadata Blocks from Its Parent ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -Get whether the dataverse is a metadata block root, or does it uses its parent blocks:: +Get whether the dataverse is a metadata block root, or does it uses its parent blocks: + +.. code-block:: bash + + export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx + export SERVER_URL=https://demo.dataverse.org + export ID=root + + curl -H X-Dataverse-key:$API_TOKEN $SERVER_URL/api/dataverses/$ID/metadatablocks/isRoot + +The fully expanded example above (without environment variables) looks like this: + +.. code-block:: bash - GET http://$SERVER/api/dataverses/$id/metadatablocks/isRoot?key=$apiKey + curl -H X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx https://demo.dataverse.org/api/dataverses/root/metadatablocks/isRoot Configure a Dataverse to Inherit Its Metadata Blocks from Its Parent ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Set whether the dataverse is a metadata block root, or does it uses its parent blocks. Possible -values are ``true`` and ``false`` (both are valid JSON expressions). :: +values are ``true`` and ``false`` (both are valid JSON expressions): + +.. code-block:: bash + + export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx + export SERVER_URL=https://demo.dataverse.org + export ID=root + + curl -H X-Dataverse-key:$API_TOKEN -X PUT $SERVER_URL/api/dataverses/$ID/metadatablocks/isRoot + +The fully expanded example above (without environment variables) looks like this: + +.. code-block:: bash - PUT http://$SERVER/api/dataverses/$id/metadatablocks/isRoot?key=$apiKey + curl -H X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx -X PUT https://demo.dataverse.org/api/dataverses/root/metadatablocks/isRoot -.. note:: Previous endpoints ``GET http://$SERVER/api/dataverses/$id/metadatablocks/:isRoot?key=$apiKey`` and ``POST http://$SERVER/api/dataverses/$id/metadatablocks/:isRoot?key=$apiKey`` are deprecated, but supported. +.. note:: Previous endpoints ``$SERVER/api/dataverses/$id/metadatablocks/:isRoot`` and ``POST http://$SERVER/api/dataverses/$id/metadatablocks/:isRoot?key=$apiKey`` are deprecated, but supported. .. _create-dataset-command: @@ -297,24 +471,37 @@ Next you need to figure out the alias or database id of the "parent" dataverse i export PARENT=root export SERVER_URL=https://demo.dataverse.org - curl -H X-Dataverse-key:$API_TOKEN -X POST $SERVER_URL/api/dataverses/$PARENT/datasets --upload-file dataset-finch1.json + curl -H X-Dataverse-key:$API_TOKEN -X POST "$SERVER_URL/api/dataverses/$PARENT/datasets" --upload-file dataset-finch1.json The fully expanded example above (without the environment variables) looks like this: .. code-block:: bash - curl -H X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx -X POST https://demo.dataverse.org/api/dataverses/root/datasets --upload-file dataset-finch1.json + curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X POST "https://demo.dataverse.org/api/dataverses/root/datasets" --upload-file "dataset-finch1.json" -You should expect a 201 ("CREATED") response and JSON indicating the database ID and Persistent ID (PID such as DOI or Handle) that has been assigned to your newly created dataset. +You should expect an HTTP 200 ("OK") response and JSON indicating the database ID and Persistent ID (PID such as DOI or Handle) that has been assigned to your newly created dataset. Import a Dataset into a Dataverse ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. note:: This action requires a Dataverse account with super-user permissions. -To import a dataset with an existing persistent identifier (PID), the dataset's metadata should be prepared in Dataverse's native JSON format. The PID is provided as a parameter at the URL. The following line imports a dataset with the PID ``PERSISTENT_IDENTIFIER`` to Dataverse, and then releases it:: +To import a dataset with an existing persistent identifier (PID), the dataset's metadata should be prepared in Dataverse's native JSON format. The PID is provided as a parameter at the URL. The following line imports a dataset with the PID ``PERSISTENT_IDENTIFIER`` to Dataverse, and then releases it: + +.. code-block:: bash + + export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx + export SERVER_URL=https://demo.dataverse.org + export DATAVERSE_ID=root + export PERSISTENT_IDENTIFIER=doi:ZZ7/MOSEISLEYDB94 + + curl -H X-Dataverse-key:$API_TOKEN -X POST $SERVER_URL/api/dataverses/$DATAVERSE_ID/datasets/:import?pid=$PERSISTENT_IDENTIFIER&release=yes --upload-file dataset.json + +The fully expanded example above (without environment variables) looks like this: + +.. code-block:: bash - curl -H "X-Dataverse-key: $API_TOKEN" -X POST $SERVER_URL/api/dataverses/$DV_ALIAS/datasets/:import?pid=$PERSISTENT_IDENTIFIER&release=yes --upload-file dataset.json + curl -H X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx -X POST https://demo.dataverse.org/api/dataverses/root/datasets/:import?pid=doi:ZZ7/MOSEISLEYDB94&release=yes --upload-file dataset.json The ``pid`` parameter holds a persistent identifier (such as a DOI or Handle). The import will fail if no PID is provided, or if the provided PID fails validation. @@ -340,9 +527,22 @@ Import a Dataset into a Dataverse with a DDI file .. note:: This action requires a Dataverse account with super-user permissions. -To import a dataset with an existing persistent identifier (PID), you have to provide the PID as a parameter at the URL. The following line imports a dataset with the PID ``PERSISTENT_IDENTIFIER`` to Dataverse, and then releases it:: +To import a dataset with an existing persistent identifier (PID), you have to provide the PID as a parameter at the URL. The following line imports a dataset with the PID ``PERSISTENT_IDENTIFIER`` to Dataverse, and then releases it: + +.. code-block:: bash + + export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx + export SERVER_URL=https://demo.dataverse.org + export DATAVERSE_ID=root + export PERSISTENT_IDENTIFIER=doi:ZZ7/MOSEISLEYDB94 + + curl -H X-Dataverse-key:$API_TOKEN -X POST $SERVER_URL/api/dataverses/$DATAVERSE_ID/datasets/:importddi?pid=$PERSISTENT_IDENTIFIER&release=yes --upload-file ddi_dataset.xml + +The fully expanded example above (without environment variables) looks like this: + +.. code-block:: bash - curl -H "X-Dataverse-key: $API_TOKEN" -X POST $SERVER_URL/api/dataverses/$DV_ALIAS/datasets/:importddi?pid=$PERSISTENT_IDENTIFIER&release=yes --upload-file ddi_dataset.xml + curl -H X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx -X POST https://demo.dataverse.org/api/dataverses/root/datasets/:importddi?pid=doi:ZZ7/MOSEISLEYDB94&release=yes --upload-file ddi_dataset.xml The optional ``pid`` parameter holds a persistent identifier (such as a DOI or Handle). The import will fail if the provided PID fails validation. @@ -367,10 +567,10 @@ In order to publish a dataverse, you must know either its "alias" (which the GUI .. code-block:: bash export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx - export ALIAS=root export SERVER_URL=https://demo.dataverse.org + export ID=root - curl -H X-Dataverse-key:$API_TOKEN -X POST $SERVER_URL/api/dataverses/$ALIAS/actions/:publish + curl -H X-Dataverse-key:$API_TOKEN -X POST $SERVER_URL/api/dataverses/$ID/actions/:publish The fully expanded example above (without environment variables) looks like this: @@ -398,49 +598,159 @@ Get JSON Representation of a Dataset .. note:: Datasets can be accessed using persistent identifiers. This is done by passing the constant ``:persistentId`` where the numeric id of the dataset is expected, and then passing the actual persistent id as a query parameter with the name ``persistentId``. - Example: Getting the dataset whose DOI is *10.5072/FK2/J8SJZB* :: +Example: Getting the dataset whose DOI is *10.5072/FK2/J8SJZB*: + +.. code-block:: bash + + export SERVER_URL=https://demo.dataverse.org + export PERSISTENT_IDENTIFIER=doi:10.5072/FK2/J8SJZB + + curl $SERVER_URL/api/datasets/:persistentId/?persistentId=$PERSISTENT_IDENTIFIER + +The fully expanded example above (without environment variables) looks like this: + +.. code-block:: bash + + curl https://demo.dataverse.org/api/datasets/:persistentId/?persistentId=doi:10.5072/FK2/J8SJZB + +Getting its draft version: + +.. code-block:: bash + + export SERVER_URL=https://demo.dataverse.org + export PERSISTENT_IDENTIFIER=doi:10.5072/FK2/J8SJZB + + curl http://$SERVER/api/datasets/:persistentId/versions/:draft?persistentId=$PERSISTENT_IDENTIFIER + +The fully expanded example above (without environment variables) looks like this: + +.. code-block:: bash - curl http://$SERVER/api/datasets/:persistentId/?persistentId=doi:10.5072/FK2/J8SJZB + curl https://demo.dataverse.org/api/datasets/:persistentId/versions/:draft?persistentId=doi:10.5072/FK2/J8SJZB - fully expanded:: - curl http://localhost:8080/api/datasets/:persistentId/?persistentId=doi:10.5072/FK2/J8SJZB +|CORS| Show the dataset whose id is passed: - Getting its draft version:: +.. code-block:: bash - curl http://$SERVER/api/datasets/:persistentId/versions/:draft?persistentId=doi:10.5072/FK2/J8SJZB + export SERVER_URL=https://demo.dataverse.org + export ID=408730 - fully expanded:: + curl $SERVER_URL/api/datasets/$ID - curl http://localhost:8080/api/datasets/:persistentId/versions/:draft?persistentId=doi:10.5072/FK2/J8SJZB +The fully expanded example above (without environment variables) looks like this: +.. code-block:: bash -|CORS| Show the dataset whose id is passed:: + curl https://demo.dataverse.org/api/datasets/408730 - GET http://$SERVER/api/datasets/$id?key=$apiKey +The dataset id can be extracted from the response retrieved from the API which uses the persistent identifier (``/api/datasets/:persistentId/?persistentId=$PERSISTENT_IDENTIFIER``). List Versions of a Dataset ~~~~~~~~~~~~~~~~~~~~~~~~~~ -|CORS| List versions of the dataset:: +|CORS| List versions of the dataset: + +.. code-block:: bash + + export SERVER_URL=https://demo.dataverse.org + export ID=24 + + curl $SERVER_URL/api/dataverses/$ID/versions + +The fully expanded example above (without environment variables) looks like this: + +.. code-block:: bash + + curl https://demo.dataverse.org/api/datasets/24/versions + +It returns a list of versions with their metadata, and file list: + +.. code-block:: bash + + { + "status": "OK", + "data": [ + { + "id": 7, + "datasetId": 24, + "datasetPersistentId": "doi:10.5072/FK2/U6AEZM", + "storageIdentifier": "file://10.5072/FK2/U6AEZM", + "versionNumber": 2, + "versionMinorNumber": 0, + "versionState": "RELEASED", + "lastUpdateTime": "2015-04-20T09:58:35Z", + "releaseTime": "2015-04-20T09:58:35Z", + "createTime": "2015-04-20T09:57:32Z", + "license": "CC0", + "termsOfUse": "CC0 Waiver", + "termsOfAccess": "You need to request for access.", + "fileAccessRequest": true, + "metadataBlocks": {...}, + "files": [...] + }, + { + "id": 6, + "datasetId": 24, + "datasetPersistentId": "doi:10.5072/FK2/U6AEZM", + "storageIdentifier": "file://10.5072/FK2/U6AEZM", + "versionNumber": 1, + "versionMinorNumber": 0, + "versionState": "RELEASED", + "UNF": "UNF:6:y4dtFxWhBaPM9K/jlPPuqg==", + "lastUpdateTime": "2015-04-20T09:56:34Z", + "releaseTime": "2015-04-20T09:56:34Z", + "createTime": "2015-04-20T09:43:45Z", + "license": "CC0", + "termsOfUse": "CC0 Waiver", + "termsOfAccess": "You need to request for access.", + "fileAccessRequest": true, + "metadataBlocks": {...}, + "files": [...] + } + ] + } - GET http://$SERVER/api/datasets/$id/versions?key=$apiKey Get Version of a Dataset ~~~~~~~~~~~~~~~~~~~~~~~~ -|CORS| Show a version of the dataset. The output includes any metadata blocks the dataset might have:: +|CORS| Show a version of the dataset. The output includes any metadata blocks the dataset might have: + +.. code-block:: bash + + export SERVER_URL=https://demo.dataverse.org + export ID=24 + export VERSION=1.0 + + curl $SERVER_URL/api/datasets/$ID/versions/$VERSION + +The fully expanded example above (without environment variables) looks like this: - GET http://$SERVER/api/datasets/$id/versions/$versionNumber?key=$apiKey +.. code-block:: bash + + curl https://demo.dataverse.org/api/datasets/24/versions/1.0 .. _export-dataset-metadata-api: Export Metadata of a Dataset in Various Formats ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -|CORS| Export the metadata of the current published version of a dataset in various formats see Note below:: +|CORS| Export the metadata of the current published version of a dataset in various formats see Note below: + +.. code-block:: bash + + export SERVER_URL=https://demo.dataverse.org + export PERSISTENT_IDENTIFIER=doi:10.5072/FK2/J8SJZB + export METADATA_FORMAT=ddi + + curl $SERVER_URL/api/datasets/export?exporter=$METADATA_FORMAT&persistentId=PERSISTENT_IDENTIFIER + +The fully expanded example above (without environment variables) looks like this: + +.. code-block:: bash - GET http://$SERVER/api/datasets/export?exporter=ddi&persistentId=$persistentId + curl https://demo.dataverse.org/api/datasets/export?exporter=ddi&persistentId=doi:10.5072/FK2/J8SJZB .. note:: Supported exporters (export formats) are ``ddi``, ``oai_ddi``, ``dcterms``, ``oai_dc``, ``schema.org`` , ``OAI_ORE`` , ``Datacite``, ``oai_datacite`` and ``dataverse_json``. Descriptive names can be found under :ref:`metadata-export-formats` in the User Guide. @@ -458,38 +768,99 @@ Both forms are valid according to Google's Structured Data Testing Tool at https List Files in a Dataset ~~~~~~~~~~~~~~~~~~~~~~~ -|CORS| Lists all the file metadata, for the given dataset and version:: +|CORS| Lists all the file metadata, for the given dataset and version: - GET http://$SERVER/api/datasets/$id/versions/$versionId/files?key=$apiKey +.. code-block:: bash -List All Metadata Blocks for a Dataset -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + export SERVER_URL=https://demo.dataverse.org + export ID=24 + export VERSION=1.0 -|CORS| Lists all the metadata blocks and their content, for the given dataset and version:: + curl $SERVER_URL/api/datasets/$ID/versions/$VERSION/files - GET http://$SERVER/api/datasets/$id/versions/$versionId/metadata?key=$apiKey +The fully expanded example above (without environment variables) looks like this: -List Single Metadata Block for a Dataset -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +.. code-block:: bash -|CORS| Lists the metadata block block named `blockname`, for the given dataset and version:: + curl https://demo.dataverse.org/api/datasets/24/versions/1.0/files - GET http://$SERVER/api/datasets/$id/versions/$versionId/metadata/$blockname?key=$apiKey +List All Metadata Blocks for a Dataset +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -Update Metadata For a Dataset -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +|CORS| Lists all the metadata blocks and their content, for the given dataset and version: -Updates the metadata for a dataset. If a draft of the dataset already exists, the metadata of that draft is overwritten; otherwise, a new draft is created with this metadata. +.. code-block:: bash -You must download a JSON representation of the dataset, edit the JSON you download, and then send the updated JSON to the Dataverse server. + export SERVER_URL=https://demo.dataverse.org + export ID=24 + export VERSION=1.0 -For example, after making your edits, your JSON file might look like :download:`dataset-update-metadata.json <../_static/api/dataset-update-metadata.json>` which you would send to Dataverse like this:: + curl $SERVER_URL/api/datasets/$ID/versions/$VERSION/metadata - curl -H "X-Dataverse-key: $API_TOKEN" -X PUT $SERVER_URL/api/datasets/:persistentId/versions/:draft?persistentId=$PID --upload-file dataset-update-metadata.json +The fully expanded example above (without environment variables) looks like this: -Note that in the example JSON file above, there is a single JSON object with ``metadataBlocks`` as a key. When you download a representation of your dataset in JSON format, the ``metadataBlocks`` object you need is nested inside another object called ``json``. To extract just the ``metadataBlocks`` key when downloading a JSON representation, you can use a tool such as ``jq`` like this:: +.. code-block:: bash + + curl https://demo.dataverse.org/api/datasets/24/versions/1.0/metadata + +List Single Metadata Block for a Dataset +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +|CORS| Lists the metadata block named `METADATA_BLOCK`, for the given dataset and version: + +.. code-block:: bash + + export SERVER_URL=https://demo.dataverse.org + export ID=24 + export VERSION=1.0 + export METADATA_BLOCK=citation + + curl $SERVER_URL/api/datasets/$ID/versions/$VERSION/metadata/$METADATA_BLOCK + +The fully expanded example above (without environment variables) looks like this: + +.. code-block:: bash + + curl https://demo.dataverse.org/api/datasets/24/versions/1.0/metadata/citation + +Update Metadata For a Dataset +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Updates the metadata for a dataset. If a draft of the dataset already exists, the metadata of that draft is overwritten; otherwise, a new draft is created with this metadata. + +You must download a JSON representation of the dataset, edit the JSON you download, and then send the updated JSON to the Dataverse server. + +For example, after making your edits, your JSON file might look like :download:`dataset-update-metadata.json <../_static/api/dataset-update-metadata.json>` which you would send to Dataverse like this: + +.. code-block:: bash + + export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx + export SERVER_URL=https://demo.dataverse.org + export PERSISTENT_IDENTIFIER=doi:10.5072/FK2/BCCP9Z + + curl -H "X-Dataverse-key: $API_TOKEN" -X PUT $SERVER_URL/api/datasets/:persistentId/versions/:draft?persistentId=$PERSISTENT_IDENTIFIER --upload-file dataset-update-metadata.json + +The fully expanded example above (without environment variables) looks like this: - curl -H "X-Dataverse-key: $API_TOKEN" $SERVER_URL/api/datasets/:persistentId/versions/:latest?persistentId=$PID | jq '.data | {metadataBlocks: .metadataBlocks}' > dataset-update-metadata.json +.. code-block:: bash + + curl -H "X-Dataverse-key: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X PUT https://demo.dataverse.org/api/datasets/:persistentId/versions/:draft?persistentId=doi:10.5072/FK2/BCCP9Z --upload-file dataset-update-metadata.json + +Note that in the example JSON file above, there is a single JSON object with ``metadataBlocks`` as a key. When you download a representation of your dataset in JSON format, the ``metadataBlocks`` object you need is nested inside another object called ``json``. To extract just the ``metadataBlocks`` key when downloading a JSON representation, you can use a tool such as ``jq`` like this: + +.. code-block:: bash + + export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx + export SERVER_URL=https://demo.dataverse.org + export PERSISTENT_IDENTIFIER=doi:10.5072/FK2/BCCP9Z + + curl -H "X-Dataverse-key: $API_TOKEN" $SERVER_URL/api/datasets/:persistentId/versions/:latest?persistentId=$PERSISTENT_IDENTIFIER | jq '.data | {metadataBlocks: .metadataBlocks}' > dataset-update-metadata.json + +The fully expanded example above (without environment variables) looks like this: + +.. code-block:: bash + + curl -H "X-Dataverse-key: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" https://demo.dataverse.org/api/datasets/:persistentId/versions/:latest?persistentId=doi:10.5072/FK2/BCCP9Z | jq '.data | {metadataBlocks: .metadataBlocks}' > dataset-update-metadata.json Now that the resulting JSON file only contains the ``metadataBlocks`` key, you can edit the JSON such as with ``vi`` in the example below:: @@ -500,25 +871,60 @@ Now that you've made edits to the metadata in your JSON file, you can send it to Edit Dataset Metadata ~~~~~~~~~~~~~~~~~~~~~ -Alternatively to replacing an entire dataset version with its JSON representation you may add data to dataset fields that are blank or accept multiple values with the following :: +Alternatively to replacing an entire dataset version with its JSON representation you may add data to dataset fields that are blank or accept multiple values with the following: + +.. code-block:: bash + + export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx + export SERVER_URL=https://demo.dataverse.org + export PERSISTENT_IDENTIFIER=doi:10.5072/FK2/BCCP9Z + + curl -H "X-Dataverse-key: $API_TOKEN" -X PUT $SERVER_URL/api/datasets/:persistentId/editMetadata/?persistentId=$PERSISTENT_IDENTIFIER --upload-file dataset-add-metadata.json - curl -H "X-Dataverse-key: $API_TOKEN" -X PUT $SERVER_URL/api/datasets/:persistentId/editMetadata/?persistentId=$PID --upload-file dataset-add-metadata.json +The fully expanded example above (without environment variables) looks like this: + +.. code-block:: bash -You may also replace existing metadata in dataset fields with the following (adding the parameter replace=true) :: + curl -H "X-Dataverse-key: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X PUT https://demo.dataverse.org/api/datasets/:persistentId/editMetadata/?persistentId=doi:10.5072/FK2/BCCP9Z --upload-file dataset-add-metadata.json + +You may also replace existing metadata in dataset fields with the following (adding the parameter replace=true): + +.. code-block:: bash + + export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx + export SERVER_URL=https://demo.dataverse.org + export PERSISTENT_IDENTIFIER=doi:10.5072/FK2/BCCP9Z + + curl -H "X-Dataverse-key: $API_TOKEN" -X PUT $SERVER_URL/api/datasets/:persistentId/editMetadata?persistentId=$PERSISTENT_IDENTIFIER&replace=true --upload-file dataset-update-metadata.json + +The fully expanded example above (without environment variables) looks like this: + +.. code-block:: bash + + curl -H "X-Dataverse-key: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X PUT https://demo.dataverse.org/api/datasets/:persistentId/editMetadata/?persistentId=doi:10.5072/FK2/BCCP9Z&replace=true --upload-file dataset-update-metadata.json - curl -H "X-Dataverse-key: $API_TOKEN" -X PUT $SERVER_URL/api/datasets/:persistentId/editMetadata?persistentId=$PID&replace=true --upload-file dataset-update-metadata.json - For these edits your JSON file need only include those dataset fields which you would like to edit. A sample JSON file may be downloaded here: :download:`dataset-edit-metadata-sample.json <../_static/api/dataset-edit-metadata-sample.json>` Delete Dataset Metadata ~~~~~~~~~~~~~~~~~~~~~~~ -You may delete some of the metadata of a dataset version by supplying a file with a JSON representation of dataset fields that you would like to delete with the following :: +You may delete some of the metadata of a dataset version by supplying a file with a JSON representation of dataset fields that you would like to delete with the following: - curl -H "X-Dataverse-key: $API_TOKEN" -X PUT $SERVER_URL/api/datasets/:persistentId/deleteMetadata/?persistentId=$PID --upload-file dataset-delete-author-metadata.json - -For these deletes your JSON file must include an exact match of those dataset fields which you would like to delete. A sample JSON file may be downloaded here: :download:`dataset-delete-author-metadata.json <../_static/api/dataset-delete-author-metadata.json>` +.. code-block:: bash + + export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx + export SERVER_URL=https://demo.dataverse.org + export PERSISTENT_IDENTIFIER=doi:10.5072/FK2/BCCP9Z + + curl -H "X-Dataverse-key: $API_TOKEN" -X PUT $SERVER_URL/api/datasets/:persistentId/deleteMetadata/?persistentId=$PERSISTENT_IDENTIFIER --upload-file dataset-delete-author-metadata.json + +The fully expanded example above (without environment variables) looks like this: + +.. code-block:: bash + + curl -H "X-Dataverse-key: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X PUT https://demo.dataverse.org/api/datasets/:persistentId/deleteMetadata/?persistentId=doi:10.5072/FK2/BCCP9Z --upload-file dataset-delete-author-metadata.json +For these deletes your JSON file must include an exact match of those dataset fields which you would like to delete. A sample JSON file may be downloaded here: :download:`dataset-delete-author-metadata.json <../_static/api/dataset-delete-author-metadata.json>` .. _publish-dataset-api: @@ -538,13 +944,13 @@ If this is the first version of the dataset, its version number will be set to ` export PERSISTENT_ID=doi:10.5072/FK2/J8SJZB export MAJOR_OR_MINOR=major - curl -H X-Dataverse-key:$API_TOKEN -X POST \""$SERVER_URL/api/datasets/:persistentId/actions/:publish?persistentId=$PERSISTENT_ID&type=$MAJOR_OR_MINOR"\" + curl -H "X-Dataverse-key: $API_TOKEN" -X POST "$SERVER_URL/api/datasets/:persistentId/actions/:publish?persistentId=$PERSISTENT_ID&type=$MAJOR_OR_MINOR" The fully expanded example above (without environment variables) looks like this: .. code-block:: bash - curl -H X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx -X POST "https://demo.dataverse.org/api/datasets/:persistentId/actions/:publish?persistentId=doi:10.5072/FK2/J8SJZB&type=major" + curl -H "X-Dataverse-key: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X POST "https://demo.dataverse.org/api/datasets/:persistentId/actions/:publish?persistentId=doi:10.5072/FK2/J8SJZB&type=major" The quotes around the URL are required because there is more than one query parameter separated by an ampersand (``&``), which has special meaning to Unix shells such as Bash. Putting the ``&`` in quotes ensures that "type" is interpreted as one of the query parameters. @@ -555,53 +961,190 @@ You should expect JSON output and a 200 ("OK") response in most cases. If you re Delete Dataset Draft ~~~~~~~~~~~~~~~~~~~~ -Deletes the draft version of dataset ``$id``. Only the draft version can be deleted:: +Deletes the draft version of dataset ``$ID``. Only the draft version can be deleted: - DELETE http://$SERVER/api/datasets/$id/versions/:draft?key=$apiKey +.. code-block:: bash -Set Citation Date Field for a Dataset -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx + export SERVER_URL=https://demo.dataverse.org + export ID=24 -Sets the dataset field type to be used as the citation date for the given dataset (if the dataset does not include the dataset field type, the default logic is used). The name of the dataset field type should be sent in the body of the request. -To revert to the default logic, use ``:publicationDate`` as the ``$datasetFieldTypeName``. -Note that the dataset field used has to be a date field:: + curl -H "X-Dataverse-key: $API_TOKEN" -X DELETE $SERVER_URL/api/datasets/$ID/versions/:draft - PUT http://$SERVER/api/datasets/$id/citationdate?key=$apiKey --data "$datasetFieldTypeName" +The fully expanded example above (without environment variables) looks like this: -Revert Citation Date Field to Default for Dataset -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +.. code-block:: bash -Restores the default logic of the field type to be used as the citation date. Same as ``PUT`` with ``:publicationDate`` body:: + curl -H "X-Dataverse-key: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X DELETE https://demo.dataverse.org/api/datasets/24/versions/:draft - DELETE http://$SERVER/api/datasets/$id/citationdate?key=$apiKey +Set Citation Date Field Type for a Dataset +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -List Role Assignments for a Dataset -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Sets the dataset citation date field type for a given dataset. ``:publicationDate`` is the default. +Note that the dataset citation date field type must be a date field. + +.. code-block:: bash + + export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx + export SERVER_URL=https://demo.dataverse.org + export ID=24 + export DATASET_FIELD_TYPE_NAME=:dateOfDeposit + + curl -H "X-Dataverse-key: $API_TOKEN" -X PUT $SERVER_URL/api/datasets/$ID/citationdate --data "$DATASET_FIELD_TYPE_NAME" + +The fully expanded example above (without environment variables) looks like this: + +.. code-block:: bash + + curl -H "X-Dataverse-key: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X PUT https://demo.dataverse.org/api/datasets/24/citationdate --data ":dateOfDeposit" + +Revert Citation Date Field Type to Default for Dataset +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Restores the default citation date field type, ``:publicationDate``, for a given dataset. + +.. code-block:: bash + + export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx + export SERVER_URL=https://demo.dataverse.org + export ID=24 + + curl -H "X-Dataverse-key: $API_TOKEN" -X DELETE $SERVER_URL/api/datasets/$ID/citationdate + +The fully expanded example above (without environment variables) looks like this: + +.. code-block:: bash + + curl -H "X-Dataverse-key: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X DELETE https://demo.dataverse.org/api/datasets/24/citationdate + +.. _list-roles-on-a-dataset-api: + +List Role Assignments in a Dataset +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Lists all role assignments on a given dataset: + +.. code-block:: bash + + export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx + export SERVER_URL=https://demo.dataverse.org + export ID=2347 + + curl -H X-Dataverse-key:$API_TOKEN $SERVER_URL/api/datasets/$ID/assignments + +The fully expanded example above (without environment variables) looks like this: + +.. code-block:: bash -List all the role assignments at the given dataset:: + curl -H X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx https://demo.dataverse.org/api/datasets/2347/assignments + +.. _assign-role-on-a-dataset-api: + +Assign a New Role on a Dataset +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Assigns a new role, based on the POSTed JSON: + +.. code-block:: bash + + export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx + export SERVER_URL=https://demo.dataverse.org + export ID=2347 + + curl -H X-Dataverse-key:$API_TOKEN -X POST -H "Content-Type: application/json" $SERVER_URL/api/datasets/$ID/assignments --upload-file role.json + +The fully expanded example above (without environment variables) looks like this: + +.. code-block:: bash + + curl -H X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx -X POST -H "Content-Type: application/json" https://demo.dataverse.org/api/datasets/2347/assignments --upload-file role.json + +POSTed JSON example (the content of ``role.json`` file):: + + { + "assignee": "@uma", + "role": "curator" + } + +.. _revoke-role-on-a-dataset-api: + +Delete Role Assignment from a Dataset +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Delete the assignment whose id is ``$id``: + +.. code-block:: bash + + export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx + export SERVER_URL=https://demo.dataverse.org + export ID=2347 + export ASSIGNMENT_ID=6 + + curl -H X-Dataverse-key:$API_TOKEN -X DELETE $SERVER_URL/api/datasets/$ID/assignments/$ASSIGNMENT_ID + +The fully expanded example above (without environment variables) looks like this: + +.. code-block:: bash + + curl -H X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx -X DELETE https://demo.dataverse.org/api/datasets/2347/assignments/6 - GET http://$SERVER/api/datasets/$id/assignments?key=$apiKey Create a Private URL for a Dataset ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -Create a Private URL (must be able to manage dataset permissions):: +Create a Private URL (must be able to manage dataset permissions): + +.. code-block:: bash + + export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx + export SERVER_URL=https://demo.dataverse.org + export ID=24 + + curl -H "X-Dataverse-key: $API_TOKEN" -X POST $SERVER_URL/api/datasets/$ID/privateUrl + +The fully expanded example above (without environment variables) looks like this: + +.. code-block:: bash - POST http://$SERVER/api/datasets/$id/privateUrl?key=$apiKey + curl -H "X-Dataverse-key: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X POST https://demo.dataverse.org/api/datasets/24/privateUrl Get the Private URL for a Dataset ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -Get a Private URL from a dataset (if available):: +Get a Private URL from a dataset (if available): + +.. code-block:: bash + + export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx + export SERVER_URL=https://demo.dataverse.org + export ID=24 + + curl -H "X-Dataverse-key: $API_TOKEN" $SERVER_URL/api/datasets/$ID/privateUrl + +The fully expanded example above (without environment variables) looks like this: + +.. code-block:: bash - GET http://$SERVER/api/datasets/$id/privateUrl?key=$apiKey + curl -H "X-Dataverse-key: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" https://demo.dataverse.org/api/datasets/24/privateUrl Delete the Private URL from a Dataset ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -Delete a Private URL from a dataset (if it exists):: +Delete a Private URL from a dataset (if it exists): + +.. code-block:: bash + + export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx + export SERVER_URL=https://demo.dataverse.org + export ID=24 + + curl -H "X-Dataverse-key: $API_TOKEN" -X DELETE $SERVER_URL/api/datasets/$ID/privateUrl + +The fully expanded example above (without environment variables) looks like this: + +.. code-block:: bash - DELETE http://$SERVER/api/datasets/$id/privateUrl?key=$apiKey + curl -H "X-Dataverse-key: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X DELETE https://demo.dataverse.org/api/datasets/24/privateUrl .. _add-file-api: @@ -666,373 +1209,1060 @@ In practice, you only need one the ``dataset_id`` or the ``persistentId``. The e file_content = 'content: %s' % datetime.now() files = {'file': ('sample_file.txt', file_content)} - # -------------------------------------------------- - # Using a "jsonData" parameter, add optional description + file tags - # -------------------------------------------------- - params = dict(description='Blue skies!', - categories=['Lily', 'Rosemary', 'Jack of Hearts']) + # -------------------------------------------------- + # Using a "jsonData" parameter, add optional description + file tags + # -------------------------------------------------- + params = dict(description='Blue skies!', + categories=['Lily', 'Rosemary', 'Jack of Hearts']) + + params_as_json_string = json.dumps(params) + + payload = dict(jsonData=params_as_json_string) + + # -------------------------------------------------- + # Add file using the Dataset's id + # -------------------------------------------------- + url_dataset_id = '%s/api/datasets/%s/add?key=%s' % (dataverse_server, dataset_id, api_key) + + # ------------------- + # Make the request + # ------------------- + print '-' * 40 + print 'making request: %s' % url_dataset_id + r = requests.post(url_dataset_id, data=payload, files=files) + + # ------------------- + # Print the response + # ------------------- + print '-' * 40 + print r.json() + print r.status_code + + # -------------------------------------------------- + # Add file using the Dataset's persistentId (e.g. doi, hdl, etc) + # -------------------------------------------------- + url_persistent_id = '%s/api/datasets/:persistentId/add?persistentId=%s&key=%s' % (dataverse_server, persistentId, api_key) + + # ------------------- + # Update the file content to avoid a duplicate file error + # ------------------- + file_content = 'content2: %s' % datetime.now() + files = {'file': ('sample_file2.txt', file_content)} + + + # ------------------- + # Make the request + # ------------------- + print '-' * 40 + print 'making request: %s' % url_persistent_id + r = requests.post(url_persistent_id, data=payload, files=files) + + # ------------------- + # Print the response + # ------------------- + print '-' * 40 + print r.json() + print r.status_code + +Report the data (file) size of a Dataset +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Shows the combined size in bytes of all the files uploaded into the dataset ``id``. + +.. code-block:: bash + + export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx + export SERVER_URL=https://demo.dataverse.org + export ID=24 + + curl -H X-Dataverse-key:$API_TOKEN $SERVER_URL/api/datasets/$ID/storagesize + +The fully expanded example above (without environment variables) looks like this: + +.. code-block:: bash + + curl -H X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx https://demo.dataverse.org/api/datasets/24/storagesize + +The size of published and unpublished files will be summed in the dataset specified. +By default, only the archival files are counted - i.e., the files uploaded by users (plus the tab-delimited versions generated for tabular data files on ingest). If the optional argument ``includeCached=true`` is specified, the API will also add the sizes of all the extra files generated and cached by Dataverse - the resized thumbnail versions for image files, the metadata exports for published datasets, etc. Because this deals with unpublished files the token supplied must have permission to view unpublished drafts. + + +Get the size of Downloading all the files of a Dataset Version +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Shows the combined size in bytes of all the files available for download from version ``versionId`` of dataset ``id``. + +.. code-block:: bash + + export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx + export SERVER_URL=https://demo.dataverse.org + export ID=24 + export VERSIONID=1.0 + + curl -H X-Dataverse-key:$API_TOKEN $SERVER_URL/api/datasets/$ID/versions/$VERSIONID/downloadsize + +The fully expanded example above (without environment variables) looks like this: + +.. code-block:: bash + + curl -H X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx https://demo.dataverse.org/api/datasets/24/versions/1.0/downloadsize + +The size of all files available for download will be returned. +If :draft is passed as versionId the token supplied must have permission to view unpublished drafts. A token is not required for published datasets. Also restricted files will be included in this total regardless of whether the user has access to download the restricted file(s). + +Submit a Dataset for Review +~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +When dataset authors do not have permission to publish directly, they can click the "Submit for Review" button in the web interface (see :doc:`/user/dataset-management`), or perform the equivalent operation via API: + +.. code-block:: bash + + export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx + export SERVER_URL=https://demo.dataverse.org + export PERSISTENT_ID=doi:10.5072/FK2/J8SJZB + + curl -H "X-Dataverse-key: $API_TOKEN" -X POST "$SERVER_URL/api/datasets/:persistentId/submitForReview?persistentId=$PERSISTENT_ID" + +The fully expanded example above (without environment variables) looks like this: + +.. code-block:: bash + + curl -H "X-Dataverse-key: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X POST "https://demo.dataverse.org/api/datasets/:persistentId/submitForReview?persistentId=doi:10.5072/FK2/J8SJZB" + +The people who need to review the dataset (often curators or journal editors) can check their notifications periodically via API to see if any new datasets have been submitted for review and need their attention. See the :ref:`Notifications` section for details. Alternatively, these curators can simply check their email or notifications to know when datasets have been submitted (or resubmitted) for review. + +Return a Dataset to Author +~~~~~~~~~~~~~~~~~~~~~~~~~~ + +After the curators or journal editors have reviewed a dataset that has been submitted for review (see "Submit for Review", above) they can either choose to publish the dataset (see the ``:publish`` "action" above) or return the dataset to its authors. In the web interface there is a "Return to Author" button (see :doc:`/user/dataset-management`), but the interface does not provide a way to explain **why** the dataset is being returned. There is a way to do this outside of this interface, however. Instead of clicking the "Return to Author" button in the UI, a curator can write a "reason for return" into the database via API. + +Here's how curators can send a "reason for return" to the dataset authors. First, the curator creates a JSON file that contains the reason for return: + +.. literalinclude:: ../_static/api/reason-for-return.json + +In the example below, the curator has saved the JSON file as :download:`reason-for-return.json <../_static/api/reason-for-return.json>` in their current working directory. Then, the curator sends this JSON file to the ``returnToAuthor`` API endpoint like this: + +.. code-block:: bash + + export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx + export SERVER_URL=https://demo.dataverse.org + export PERSISTENT_ID=doi:10.5072/FK2/J8SJZB + + curl -H "X-Dataverse-key: $API_TOKEN" -X POST "$SERVER_URL/api/datasets/:persistentId/returnToAuthor?persistentId=$PERSISTENT_ID" -H "Content-type: application/json" -d @reason-for-return.json + +The fully expanded example above (without environment variables) looks like this: + +.. code-block:: bash + + curl -H "X-Dataverse-key: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X POST "https://demo.dataverse.org/api/datasets/:persistentId/returnToAuthor?persistentId=doi:10.5072/FK2/J8SJZB" -H "Content-type: application/json" -d @reason-for-return.json + +The review process can sometimes resemble a tennis match, with the authors submitting and resubmitting the dataset over and over until the curators are satisfied. Each time the curators send a "reason for return" via API, that reason is persisted into the database, stored at the dataset version level. + +Link a Dataset +~~~~~~~~~~~~~~ + +Creates a link between a dataset and a dataverse (see :ref:`dataset-linking` section of Dataverse Management in the User Guide for more information): + +.. code-block:: bash + + export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx + export SERVER_URL=https://demo.dataverse.org + export DATASET_ID=24 + export DATAVERSE_ID=test + + curl -H "X-Dataverse-key: $API_TOKEN" -X PUT $SERVER_URL/api/datasets/$DATASET_ID/link/$DATAVERSE_ID + +The fully expanded example above (without environment variables) looks like this: + +.. code-block:: bash + + curl -H "X-Dataverse-key: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X PUT https://demo.dataverse.org/api/datasets/24/link/test + +Dataset Locks +~~~~~~~~~~~~~ + +To check if a dataset is locked: + +.. code-block:: bash + + export SERVER_URL=https://demo.dataverse.org + export ID=24 + + curl $SERVER_URL/api/datasets/$ID/locks + +The fully expanded example above (without environment variables) looks like this: + +.. code-block:: bash + + curl https://demo.dataverse.org/api/datasets/24/locks + +Optionally, you can check if there's a lock of a specific type on the dataset: + +.. code-block:: bash + + export SERVER_URL=https://demo.dataverse.org + export ID=24 + export LOCK_TYPE=Ingest + + curl "$SERVER_URL/api/datasets/$ID/locks?type=$LOCK_TYPE" + +The fully expanded example above (without environment variables) looks like this: + +.. code-block:: bash + + curl "https://demo.dataverse.org/api/datasets/24/locks?type=Ingest" + +Currently implemented lock types are ``Ingest``, ``Workflow``, ``InReview``, ``DcmUpload``, ``pidRegister``, and ``EditInProgress``. + +The API will output the list of locks, for example:: + + {"status":"OK","data": + [ + { + "lockType":"Ingest", + "date":"Fri Aug 17 15:05:51 EDT 2018", + "user":"dataverseAdmin" + }, + { + "lockType":"Workflow", + "date":"Fri Aug 17 15:02:00 EDT 2018", + "user":"dataverseAdmin" + } + ] + } + +If the dataset is not locked (or if there is no lock of the requested type), the API will return an empty list. + +The following API end point will lock a Dataset with a lock of specified type: + +.. code-block:: bash + + export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx + export SERVER_URL=https://demo.dataverse.org + export ID=24 + export LOCK_TYPE=Ingest + + curl -H "X-Dataverse-key: $API_TOKEN" -X POST $SERVER_URL/api/datasets/$ID/lock/$LOCK_TYPE + +The fully expanded example above (without environment variables) looks like this: + +.. code-block:: bash + + curl -H "X-Dataverse-key: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X POST https://demo.dataverse.org/api/datasets/24/lock/Ingest + +Use the following API to unlock the dataset, by deleting all the locks currently on the dataset: + +.. code-block:: bash + + export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx + export SERVER_URL=https://demo.dataverse.org + export ID=24 + + curl -H "X-Dataverse-key: $API_TOKEN" -X DELETE $SERVER_URL/api/datasets/$ID/locks + +The fully expanded example above (without environment variables) looks like this: + +.. code-block:: bash + + curl -H "X-Dataverse-key: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X DELETE https://demo.dataverse.org/api/datasets/24/locks + +Or, to delete a lock of the type specified only: + +.. code-block:: bash + + export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx + export SERVER_URL=https://demo.dataverse.org + export ID=24 + export LOCK_TYPE=pidRegister + + curl -H "X-Dataverse-key: $API_TOKEN" -X DELETE $SERVER_URL/api/datasets/$ID/locks?type=$LOCK_TYPE + +The fully expanded example above (without environment variables) looks like this: + +.. code-block:: bash + + curl -H "X-Dataverse-key: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X DELETE https://demo.dataverse.org/api/datasets/24/locks?type=pidRegister + +If the dataset is not locked (or if there is no lock of the specified type), the API will exit with a warning message. + +(Note that the API calls above all support both the database id and persistent identifier notation for referencing the dataset) + +.. _dataset-metrics-api: + +Dataset Metrics +~~~~~~~~~~~~~~~ + +Please note that these dataset level metrics are only available if support for Make Data Count has been enabled in your installation of Dataverse. See the :ref:`Dataset Metrics ` in the :doc:`/user/dataset-management` section of the User Guide and the :doc:`/admin/make-data-count` section of the Admin Guide for details. + +.. note:: See :ref:`curl-examples-and-environment-variables` if you are unfamiliar with the use of ``export`` below. + +.. code-block:: bash + + export SERVER_URL=https://demo.dataverse.org + +To confirm that the environment variable was set properly, you can use ``echo`` like this: + +.. code-block:: bash + + echo $SERVER_URL + +Please note that for each of these endpoints except the "citations" endpoint, you can optionally pass the query parameter "country" with a two letter code (e.g. "country=us") and you can specify a particular month by adding it in yyyy-mm format after the requested metric (e.g. "viewsTotal/2019-02"). + +Retrieving Total Views for a Dataset +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Please note that "viewsTotal" is a combination of "viewsTotalRegular" and "viewsTotalMachine" which can be requested separately. + +.. code-block:: bash + + export SERVER_URL=https://demo.dataverse.org + export PERSISTENT_ID=doi:10.5072/FK2/J8SJZB + + curl "$SERVER_URL/api/datasets/:persistentId/makeDataCount/viewsTotal?persistentId=$PERSISTENT_ID" + +The fully expanded example above (without environment variables) looks like this: + +.. code-block:: bash + + curl "https://demo.dataverse.org/api/datasets/:persistentId/makeDataCount/viewsTotal?persistentId=10.5072/FK2/J8SJZB" + +Retrieving Unique Views for a Dataset +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Please note that "viewsUnique" is a combination of "viewsUniqueRegular" and "viewsUniqueMachine" which can be requested separately. + +.. code-block:: bash + + export SERVER_URL=https://demo.dataverse.org + export PERSISTENT_ID=doi:10.5072/FK2/J8SJZB + + curl "$SERVER_URL/api/datasets/:persistentId/makeDataCount/viewsUnique?persistentId=$PERSISTENT_ID" + +The fully expanded example above (without environment variables) looks like this: + +.. code-block:: bash + + curl "https://demo.dataverse.org/api/datasets/:persistentId/makeDataCount/viewsUnique?persistentId=10.5072/FK2/J8SJZB" + +Retrieving Total Downloads for a Dataset +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Please note that "downloadsTotal" is a combination of "downloadsTotalRegular" and "downloadsTotalMachine" which can be requested separately. + +.. code-block:: bash + + export SERVER_URL=https://demo.dataverse.org + export PERSISTENT_ID=doi:10.5072/FK2/J8SJZB + + curl "$SERVER_URL/api/datasets/:persistentId/makeDataCount/downloadsTotal?persistentId=$PERSISTENT_ID" + +The fully expanded example above (without environment variables) looks like this: + +.. code-block:: bash + + curl "https://demo.dataverse.org/api/datasets/:persistentId/makeDataCount/downloadsTotal?persistentId=10.5072/FK2/J8SJZB" + +Retrieving Unique Downloads for a Dataset +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Please note that "downloadsUnique" is a combination of "downloadsUniqueRegular" and "downloadsUniqueMachine" which can be requested separately. + +.. code-block:: bash + + export SERVER_URL=https://demo.dataverse.org + export PERSISTENT_ID=doi:10.5072/FK2/J8SJZB + + curl "$SERVER_URL/api/datasets/:persistentId/makeDataCount/downloadsUnique?persistentId=$PERSISTENT_ID" + +The fully expanded example above (without environment variables) looks like this: + +.. code-block:: bash + + curl "https://demo.dataverse.org/api/datasets/:persistentId/makeDataCount/downloadsUnique?persistentId=10.5072/FK2/J8SJZB" + +Retrieving Citations for a Dataset +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +.. code-block:: bash + + export SERVER_URL=https://demo.dataverse.org + export PERSISTENT_ID=doi:10.5072/FK2/J8SJZB + + curl "$SERVER_URL/api/datasets/:persistentId/makeDataCount/citations?persistentId=$PERSISTENT_ID" + +The fully expanded example above (without environment variables) looks like this: + +.. code-block:: bash + + curl "https://demo.dataverse.org/api/datasets/:persistentId/makeDataCount/citations?persistentId=10.5072/FK2/J8SJZB" + +Delete Unpublished Dataset +~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Delete the dataset whose id is passed: + +.. code-block:: bash + + export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx + export SERVER_URL=https://demo.dataverse.org + export ID=24 + + curl -H "X-Dataverse-key: $API_TOKEN" -X DELETE $SERVER_URL/api/datasets/$ID + +The fully expanded example above (without environment variables) looks like this: + +.. code-block:: bash + + curl -H "X-Dataverse-key: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X DELETE https://demo.dataverse.org/api/datasets/24 + +Delete Published Dataset +~~~~~~~~~~~~~~~~~~~~~~~~ + +Normally published datasets should not be deleted, but there exists a "destroy" API endpoint for superusers which will act on a dataset given a persistent ID or dataset database ID: + +.. code-block:: bash + + export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx + export SERVER_URL=https://demo.dataverse.org + export PERSISTENT_ID=doi:10.5072/FK2/AAA000 + + curl -H "X-Dataverse-key: $API_TOKEN" -X DELETE $SERVER_URL/api/datasets/:persistentId/destroy/?persistentId=$PERSISTENT_ID + +The fully expanded example above (without environment variables) looks like this: + +.. code-block:: bash + + curl -H "X-Dataverse-key: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X DELETE https://demo.dataverse.org/api/datasets/:persistentId/destroy/?persistentId=doi:10.5072/FK2/AAA000 + +Delete with dataset identifier: + +.. code-block:: bash + + export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx + export SERVER_URL=https://demo.dataverse.org + export ID=24 + + curl -H "X-Dataverse-key: $API_TOKEN" -X DELETE $SERVER_URL/api/datasets/$ID/destroy + +The fully expanded example above (without environment variables) looks like this: + +.. code-block:: bash + + curl -H "X-Dataverse-key: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X DELETE https://demo.dataverse.org/api/datasets/24/destroy + +Calling the destroy endpoint is permanent and irreversible. It will remove the dataset and its datafiles, then re-index the parent dataverse in Solr. This endpoint requires the API token of a superuser. + +Files +----- + +Adding Files +~~~~~~~~~~~~ + +.. Note:: Files can be added via the native API but the operation is performed on the parent object, which is a dataset. Please see the Datasets_ endpoint above for more information. + +Accessing (downloading) files +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +.. Note:: Access API has its own section in the Guide: :doc:`/api/dataaccess` + +**Note** Data Access API calls can now be made using persistent identifiers (in addition to database ids). This is done by passing the constant ``:persistentId`` where the numeric id of the file is expected, and then passing the actual persistent id as a query parameter with the name ``persistentId``. + +Example: Getting the file whose DOI is *10.5072/FK2/J8SJZB* + +.. code-block:: bash + + export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx + export SERVER_URL=https://demo.dataverse.org + export PERSISTENT_ID=doi:10.5072/FK2/J8SJZB + + curl "$SERVER_URL/api/access/datafile/:persistentId/?persistentId=$PERSISTENT_ID" + +The fully expanded example above (without environment variables) looks like this: + +.. code-block:: bash + + curl "https://demo.dataverse.org/api/access/datafile/:persistentId/?persistentId=doi:10.5072/FK2/J8SJZB" + +Note: you can use the combination of cURL's ``-J`` (``--remote-header-name``) and ``-O`` (``--remote-name``) options to save the file in its original file name, such as + +.. code-block:: bash + + curl -J -O "https://demo.dataverse.org/api/access/datafile/:persistentId/?persistentId=doi:10.5072/FK2/J8SJZB" + +Restrict Files +~~~~~~~~~~~~~~ + +Restrict or unrestrict an existing file where ``id`` is the database id of the file or ``pid`` is the persistent id (DOI or Handle) of the file to restrict. Note that some Dataverse installations do not allow the ability to restrict files. + +A curl example using an ``id`` + +.. code-block:: bash + + export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx + export SERVER_URL=https://demo.dataverse.org + export ID=24 + + curl -H "X-Dataverse-key:$API_TOKEN" -X PUT -d true $SERVER_URL/api/files/$ID/restrict + +The fully expanded example above (without environment variables) looks like this: + +.. code-block:: bash + + curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X PUT -d true https://demo.dataverse.org/api/files/24/restrict + +A curl example using a ``pid`` + +.. code-block:: bash + + export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx + export SERVER_URL=https://demo.dataverse.org + export PERSISTENT_ID=doi:10.5072/FK2/AAA000 + + curl -H "X-Dataverse-key:$API_TOKEN" -X PUT -d true $SERVER_URL/api/files/:persistentId/restrict?persistentId=$PERSISTENT_ID + +The fully expanded example above (without environment variables) looks like this: + +.. code-block:: bash + + curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X PUT -d true "https://demo.dataverse.org/api/files/:persistentId/restrict?persistentId=doi:10.5072/FK2/AAA000" + +Uningest a File +~~~~~~~~~~~~~~~ + +Reverse the tabular data ingest process performed on a file where ``ID`` is the database id or ``PERSISTENT_ID`` is the persistent id (DOI or Handle) of the file to process. Note that this requires "superuser" credentials. + +A curl example using an ``ID``: + +.. code-block:: bash + + export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx + export SERVER_URL=https://demo.dataverse.org + export ID=24 + + curl -H "X-Dataverse-key:$API_TOKEN" -X POST $SERVER_URL/api/files/$ID/uningest + +The fully expanded example above (without environment variables) looks like this: + +.. code-block:: bash + + curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X POST https://demo.dataverse.org/api/files/24/uningest + +A curl example using a ``PERSISTENT_ID``: + +.. code-block:: bash + + export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx + export SERVER_URL=https://demo.dataverse.org + export PERSISTENT_ID=doi:10.5072/FK2/AAA000 + + curl -H "X-Dataverse-key:$API_TOKEN" -X POST "$SERVER_URL/api/files/:persistentId/uningest?persistentId=$PERSISTENT_ID" + +The fully expanded example above (without environment variables) looks like this: + +.. code-block:: bash + + curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X POST "https://demo.dataverse.org/api/files/:persistentId/uningest?persistentId=doi:10.5072/FK2/AAA000" + +Reingest a File +~~~~~~~~~~~~~~~ + +Attempt to ingest an existing datafile as tabular data. This API can be used on a file that was not ingested as tabular back when it was uploaded. For example, a Stata v.14 file that was uploaded before ingest support for Stata 14 was added (in Dataverse v.4.9). It can also be used on a file that failed to ingest due to a bug in the ingest plugin that has since been fixed (hence the name "reingest"). + +Note that this requires "superuser" credentials. + +A curl example using an ``ID`` + +.. code-block:: bash + + export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx + export SERVER_URL=https://demo.dataverse.org + export ID=24 + + curl -H "X-Dataverse-key:$API_TOKEN" -X POST $SERVER_URL/api/files/$ID/reingest + +The fully expanded example above (without environment variables) looks like this: + +.. code-block:: bash + + curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X POST https://demo.dataverse.org/api/files/24/reingest + +A curl example using a ``PERSISTENT_ID`` + +.. code-block:: bash + + export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx + export SERVER_URL=https://demo.dataverse.org + export PERSISTENT_ID=doi:10.5072/FK2/AAA000 + + curl -H "X-Dataverse-key:$API_TOKEN" -X POST $SERVER_URL/api/files/:persistentId/reingest?persistentId=$PERSISTENT_ID + +The fully expanded example above (without environment variables) looks like this: + +.. code-block:: bash + + curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X POST "https://demo.dataverse.org/api/files/:persistentId/reingest?persistentId=doi:10.5072/FK2/AAA000" + +Note: at present, the API cannot be used on a file that's already successfully ingested as tabular. + +.. _redetect-file-type: + +Redetect File Type +~~~~~~~~~~~~~~~~~~ + +Dataverse uses a variety of methods for determining file types (MIME types or content types) and these methods (listed below) are updated periodically. If you have files that have an unknown file type, you can have Dataverse attempt to redetect the file type. + +When using the curl command below, you can pass ``dryRun=true`` if you don't want any changes to be saved to the database. Change this to ``dryRun=false`` (or omit it) to save the change. + +A curl example using an ``id`` + +.. code-block:: bash + + export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx + export SERVER_URL=https://demo.dataverse.org + export ID=24 + + curl -H "X-Dataverse-key:$API_TOKEN" -X POST "$SERVER_URL/api/files/$ID/redetect?dryRun=true" + +The fully expanded example above (without environment variables) looks like this: + +.. code-block:: bash + + curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X POST "https://demo.dataverse.org/api/files/24/redetect?dryRun=true" + +A curl example using a ``pid`` + +.. code-block:: bash + + export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx + export SERVER_URL=https://demo.dataverse.org + export PERSISTENT_ID=doi:10.5072/FK2/AAA000 + + curl -H "X-Dataverse-key:$API_TOKEN" -X POST "$SERVER_URL/api/files/:persistentId/redetect?persistentId=$PERSISTENT_ID&dryRun=true" + +The fully expanded example above (without environment variables) looks like this: + +.. code-block:: bash + + curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X POST "https://demo.dataverse.org/api/files/:persistentId/redetect?persistentId=doi:10.5072/FK2/AAA000&dryRun=true" + +Currently the following methods are used to detect file types: + +- The file type detected by the browser (or sent via API). +- JHOVE: http://jhove.openpreservation.org +- As a last resort the file extension (e.g. ".ipybn") is used, defined in a file called ``MimeTypeDetectionByFileExtension.properties``. + +Replacing Files +~~~~~~~~~~~~~~~ + +Replace an existing file where ``ID`` is the database id of the file to replace or ``PERSISTENT_ID`` is the persistent id (DOI or Handle) of the file. Requires the ``file`` to be passed as well as a ``jsonString`` expressing the new metadata. Note that metadata such as description, directoryLabel (File Path) and tags are not carried over from the file being replaced. + +A curl example using an ``ID`` + +.. code-block:: bash + + export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx + export SERVER_URL=https://demo.dataverse.org + export ID=24 + + curl -H "X-Dataverse-key:$API_TOKEN" -X POST -F 'file=@file.extension' -F 'jsonData={json}' $SERVER_URL/api/files/$ID/metadata + +The fully expanded example above (without environment variables) looks like this: + +.. code-block:: bash + + curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X POST -F 'file=@data.tsv' \ + -F 'jsonData={"description":"My description.","categories":["Data"],"forceReplace":false}' \ + https://demo.dataverse.org/api/files/24/replace + +A curl example using a ``PERSISTENT_ID`` + +.. code-block:: bash + + export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx + export SERVER_URL=https://demo.dataverse.org + export PERSISTENT_ID=doi:10.5072/FK2/AAA000 + + curl -H "X-Dataverse-key:$API_TOKEN" -X POST -F 'file=@file.extension' -F 'jsonData={json}' \ + "$SERVER_URL/api/files/:persistentId/metadata?persistentId=$PERSISTENT_ID" + +The fully expanded example above (without environment variables) looks like this: + +.. code-block:: bash + + curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X POST -F 'file=@data.tsv' \ + -F 'jsonData={"description":"My description.","categories":["Data"],"forceReplace":false}' \ + "https://demo.dataverse.org/api/files/:persistentId/metadata?persistentId=doi:10.5072/FK2/AAA000" + +Getting File Metadata +~~~~~~~~~~~~~~~~~~~~~ + +Provides a json representation of the file metadata for an existing file where ``ID`` is the database id of the file to get metadata from or ``PERSISTENT_ID`` is the persistent id (DOI or Handle) of the file. - params_as_json_string = json.dumps(params) +A curl example using an ``ID`` - payload = dict(jsonData=params_as_json_string) +.. code-block:: bash - # -------------------------------------------------- - # Add file using the Dataset's id - # -------------------------------------------------- - url_dataset_id = '%s/api/datasets/%s/add?key=%s' % (dataverse_server, dataset_id, api_key) + export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx + export SERVER_URL=https://demo.dataverse.org + export ID=24 - # ------------------- - # Make the request - # ------------------- - print '-' * 40 - print 'making request: %s' % url_dataset_id - r = requests.post(url_dataset_id, data=payload, files=files) + curl $SERVER_URL/api/files/$ID/metadata - # ------------------- - # Print the response - # ------------------- - print '-' * 40 - print r.json() - print r.status_code +The fully expanded example above (without environment variables) looks like this: - # -------------------------------------------------- - # Add file using the Dataset's persistentId (e.g. doi, hdl, etc) - # -------------------------------------------------- - url_persistent_id = '%s/api/datasets/:persistentId/add?persistentId=%s&key=%s' % (dataverse_server, persistentId, api_key) +.. code-block:: bash - # ------------------- - # Update the file content to avoid a duplicate file error - # ------------------- - file_content = 'content2: %s' % datetime.now() - files = {'file': ('sample_file2.txt', file_content)} + curl https://demo.dataverse.org/api/files/24/metadata +A curl example using a ``PERSISTENT_ID`` - # ------------------- - # Make the request - # ------------------- - print '-' * 40 - print 'making request: %s' % url_persistent_id - r = requests.post(url_persistent_id, data=payload, files=files) +.. code-block:: bash - # ------------------- - # Print the response - # ------------------- - print '-' * 40 - print r.json() - print r.status_code + export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx + export SERVER_URL=https://demo.dataverse.org + export PERSISTENT_ID=doi:10.5072/FK2/AAA000 -Submit a Dataset for Review -~~~~~~~~~~~~~~~~~~~~~~~~~~~ + curl "$SERVER_URL/api/files/:persistentId/metadata?persistentId=$PERSISTENT_ID" -When dataset authors do not have permission to publish directly, they can click the "Submit for Review" button in the web interface (see :doc:`/user/dataset-management`), or perform the equivalent operation via API:: +The fully expanded example above (without environment variables) looks like this: - curl -H "X-Dataverse-key: $API_TOKEN" -X POST "$SERVER_URL/api/datasets/:persistentId/submitForReview?persistentId=$DOI_OR_HANDLE_OF_DATASET" +.. code-block:: bash -The people who need to review the dataset (often curators or journal editors) can check their notifications periodically via API to see if any new datasets have been submitted for review and need their attention. See the :ref:`Notifications` section for details. Alternatively, these curators can simply check their email or notifications to know when datasets have been submitted (or resubmitted) for review. + curl "https://demo.dataverse.org/api/files/:persistentId/metadata?persistentId=doi:10.5072/FK2/AAA000" -Return a Dataset to Author -~~~~~~~~~~~~~~~~~~~~~~~~~~ +The current draft can also be viewed if you have permissions and pass your API token -After the curators or journal editors have reviewed a dataset that has been submitted for review (see "Submit for Review", above) they can either choose to publish the dataset (see the ``:publish`` "action" above) or return the dataset to its authors. In the web interface there is a "Return to Author" button (see :doc:`/user/dataset-management`), but the interface does not provide a way to explain **why** the dataset is being returned. There is a way to do this outside of this interface, however. Instead of clicking the "Return to Author" button in the UI, a curator can write a "reason for return" into the database via API. +A curl example using an ``ID`` -Here's how curators can send a "reason for return" to the dataset authors. First, the curator creates a JSON file that contains the reason for return: +.. code-block:: bash -.. literalinclude:: ../_static/api/reason-for-return.json + export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx + export SERVER_URL=https://demo.dataverse.org + export ID=24 -In the example below, the curator has saved the JSON file as :download:`reason-for-return.json <../_static/api/reason-for-return.json>` in their current working directory. Then, the curator sends this JSON file to the ``returnToAuthor`` API endpoint like this:: + curl -H "X-Dataverse-key:$API_TOKEN" $SERVER_URL/api/files/$ID/metadata/draft - curl -H "Content-type:application/json" -d @reason-for-return.json -H "X-Dataverse-key: $API_TOKEN" -X POST "$SERVER_URL/api/datasets/:persistentId/returnToAuthor?persistentId=$DOI_OR_HANDLE_OF_DATASET" +The fully expanded example above (without environment variables) looks like this: -The review process can sometimes resemble a tennis match, with the authors submitting and resubmitting the dataset over and over until the curators are satisfied. Each time the curators send a "reason for return" via API, that reason is persisted into the database, stored at the dataset version level. +.. code-block:: bash -Link a Dataset -~~~~~~~~~~~~~~ + curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" https://demo.dataverse.org/api/files/24/metadata/draft -Creates a link between a dataset and a dataverse (see the Linked Dataverses + Linked Datasets section of the :doc:`/user/dataverse-management` guide for more information). :: +A curl example using a ``PERSISTENT_ID`` - curl -H "X-Dataverse-key: $API_TOKEN" -X PUT http://$SERVER/api/datasets/$linked-dataset-id/link/$linking-dataverse-alias +.. code-block:: bash -Dataset Locks -~~~~~~~~~~~~~ + export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx + export SERVER_URL=https://demo.dataverse.org + export PERSISTENT_ID=doi:10.5072/FK2/AAA000 -To check if a dataset is locked:: + curl -H "X-Dataverse-key:$API_TOKEN" "$SERVER_URL/api/files/:persistentId/metadata/draft?persistentId=$PERSISTENT_ID" - curl "$SERVER_URL/api/datasets/{database_id}/locks +The fully expanded example above (without environment variables) looks like this: -Optionally, you can check if there's a lock of a specific type on the dataset:: +.. code-block:: bash - curl "$SERVER_URL/api/datasets/{database_id}/locks?type={lock_type} + curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" "https://demo.dataverse.org/api/files/:persistentId/metadata/draft?persistentId=doi:10.5072/FK2/AAA000" -Currently implemented lock types are ``Ingest, Workflow, InReview, DcmUpload, pidRegister, and EditInProgress``. +Note: The ``id`` returned in the json response is the id of the file metadata version. -The API will output the list of locks, for example:: +Updating File Metadata +~~~~~~~~~~~~~~~~~~~~~~ - {"status":"OK","data": - [ - { - "lockType":"Ingest", - "date":"Fri Aug 17 15:05:51 EDT 2018", - "user":"dataverseAdmin" - }, - { - "lockType":"Workflow", - "date":"Fri Aug 17 15:02:00 EDT 2018", - "user":"dataverseAdmin" - } - ] - } +Updates the file metadata for an existing file where ``ID`` is the database id of the file to update or ``PERSISTENT_ID`` is the persistent id (DOI or Handle) of the file. Requires a ``jsonString`` expressing the new metadata. No metadata from the previous version of this file will be persisted, so if you want to update a specific field first get the json with the above command and alter the fields you want. -If the dataset is not locked (or if there is no lock of the requested type), the API will return an empty list. +A curl example using an ``ID`` -The following API end point will lock a Dataset with a lock of specified type:: +.. code-block:: bash - POST /api/datasets/{database_id}/lock/{lock_type} + export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx + export SERVER_URL=https://demo.dataverse.org + export ID=24 -For example:: + curl -H "X-Dataverse-key:$API_TOKEN" -X POST \ + -F 'jsonData={"description":"My description bbb.","provFreeform":"Test prov freeform","categories":["Data"],"restrict":false}' \ + $SERVER_URL/api/files/$ID/metadata - curl -X POST "$SERVER_URL/api/datasets/1234/lock/Ingest?key=$ADMIN_API_TOKEN" - or - curl -X POST -H "X-Dataverse-key: $ADMIN_API_TOKEN" "$SERVER_URL/api/datasets/:persistentId/lock/Ingest?persistentId=$DOI_OR_HANDLE_OF_DATASET" +The fully expanded example above (without environment variables) looks like this: -Use the following API to unlock the dataset, by deleting all the locks currently on the dataset:: +.. code-block:: bash - DELETE /api/datasets/{database_id}/locks + curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X POST \ + -F 'jsonData={"description":"My description bbb.","provFreeform":"Test prov freeform","categories":["Data"],"restrict":false}' \ + http://demo.dataverse.org/api/files/24/metadata -Or, to delete a lock of the type specified only:: +A curl example using a ``PERSISTENT_ID`` - DELETE /api/datasets/{database_id}/locks?type={lock_type} +.. code-block:: bash -For example:: + export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx + export SERVER_URL=https://demo.dataverse.org + export PERSISTENT_ID=doi:10.5072/FK2/AAA000 - curl -X DELETE -H "X-Dataverse-key: $ADMIN_API_TOKEN" "$SERVER_URL/api/datasets/1234/locks?type=pidRegister" + curl -H "X-Dataverse-key:$API_TOKEN" -X POST \ + -F 'jsonData={"description":"My description bbb.","provFreeform":"Test prov freeform","categories":["Data"],"restrict":false}' \ + "$SERVER_URL/api/files/:persistentId/metadata?persistentId=$PERSISTENT_ID" -If the dataset is not locked (or if there is no lock of the specified type), the API will exit with a warning message. +The fully expanded example above (without environment variables) looks like this: -(Note that the API calls above all support both the database id and persistent identifier notation for referencing the dataset) +.. code-block:: bash -.. _dataset-metrics-api: + curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X POST \ + -F 'jsonData={"description":"My description bbb.","provFreeform":"Test prov freeform","categories":["Data"],"restrict":false}' \ + "https://demo.dataverse.org/api/files/:persistentId/metadata?persistentId=doi:10.5072/FK2/AAA000" -Dataset Metrics -~~~~~~~~~~~~~~~ +Also note that dataFileTags are not versioned and changes to these will update the published version of the file. -Please note that these dataset level metrics are only available if support for Make Data Count has been enabled in your installation of Dataverse. See the :ref:`Dataset Metrics ` in the :doc:`/user/dataset-management` section of the User Guide and the :doc:`/admin/make-data-count` section of the Admin Guide for details. +.. _EditingVariableMetadata: -.. note:: See :ref:`curl-examples-and-environment-variables` if you are unfamiliar with the use of ``export`` below. +Editing Variable Level Metadata +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -``export DV_BASE_URL=https://demo.dataverse.org`` +Updates variable level metadata using ddi xml ``FILE``, where ``ID`` is file id. -To confirm that the environment variable was set properly, you can use ``echo`` like this: +A curl example using an ``ID`` -``echo $DV_BASE_URL`` +.. code-block:: bash -Please note that for each of these endpoints except the "citations" endpoint, you can optionally pass the query parameter "country" with a two letter code (e.g. "country=us") and you can specify a particular month by adding it in yyyy-mm format after the requested metric (e.g. "viewsTotal/2019-02"). + export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx + export SERVER_URL=https://demo.dataverse.org + export ID=24 + export FILE=dct.xml -Retrieving Total Views for a Dataset -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + curl -H "X-Dataverse-key:$API_TOKEN" -X PUT $SERVER_URL/api/edit/$ID --upload-file $FILE -Please note that "viewsTotal" is a combination of "viewsTotalRegular" and "viewsTotalMachine" which can be requested separately. +The fully expanded example above (without environment variables) looks like this: -``curl "$DV_BASE_URL/api/datasets/:persistentId/makeDataCount/viewsTotal?persistentId=$DOI"`` +.. code-block:: bash -Retrieving Unique Views for a Dataset -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X PUT https://demo.dataverse.org/api/edit/24 --upload-file dct.xml -Please note that "viewsUnique" is a combination of "viewsUniqueRegular" and "viewsUniqueMachine" which can be requested separately. +You can download :download:`dct.xml <../../../../src/test/resources/xml/dct.xml>` from the example above to see what the XML looks like. -``curl "$DV_BASE_URL/api/datasets/:persistentId/makeDataCount/viewsUnique?persistentId=$DOI"`` +Provenance +~~~~~~~~~~ -Retrieving Total Downloads for a Dataset +Get Provenance JSON for an uploaded file ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -Please note that "downloadsTotal" is a combination of "downloadsTotalRegular" and "downloadsTotalMachine" which can be requested separately. - -``curl "$DV_BASE_URL/api/datasets/:persistentId/makeDataCount/downloadsTotal?persistentId=$DOI"`` +A curl example using an ``ID`` -Retrieving Unique Downloads for a Dataset -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +.. code-block:: bash -Please note that "downloadsUnique" is a combination of "downloadsUniqueRegular" and "downloadsUniqueMachine" which can be requested separately. + export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx + export SERVER_URL=https://demo.dataverse.org + export ID=24 -``curl "$DV_BASE_URL/api/datasets/:persistentId/makeDataCount/downloadsUnique?persistentId=$DOI"`` + curl -H "X-Dataverse-key:$API_TOKEN" $SERVER_URL/api/files/$ID/prov-json -Retrieving Citations for a Dataset -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +The fully expanded example above (without environment variables) looks like this: -``curl "$DV_BASE_URL/api/datasets/:persistentId/makeDataCount/citations?persistentId=$DOI"`` +.. code-block:: bash -Delete Unpublished Dataset -~~~~~~~~~~~~~~~~~~~~~~~~~~ + curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" https://demo.dataverse.org/api/files/24/prov-json -Delete the dataset whose id is passed: +A curl example using a ``PERSISTENT_ID`` -``curl -H "X-Dataverse-key:$API_TOKEN" -X DELETE http://$SERVER/api/datasets/$id`` +.. code-block:: bash -Delete Published Dataset -~~~~~~~~~~~~~~~~~~~~~~~~ + export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx + export SERVER_URL=https://demo.dataverse.org + export PERSISTENT_ID=doi:10.5072/FK2/AAA000 -Normally published datasets should not be deleted, but there exists a "destroy" API endpoint for superusers which will act on a dataset given a persistent ID or dataset database ID: + curl -H "X-Dataverse-key:$API_TOKEN" "$SERVER_URL/api/files/:persistentId/prov-json?persistentId=$PERSISTENT_ID" -``curl -H "X-Dataverse-key:$API_TOKEN" -X DELETE http://$SERVER/api/datasets/:persistentId/destroy/?persistentId=doi:10.5072/FK2/AAA000`` - -``curl -H "X-Dataverse-key:$API_TOKEN" -X DELETE http://$SERVER/api/datasets/999/destroy`` - -Calling the destroy endpoint is permanent and irreversible. It will remove the dataset and its datafiles, then re-index the parent dataverse in Solr. This endpoint requires the API token of a superuser. +The fully expanded example above (without environment variables) looks like this: -Files ------ +.. code-block:: bash -Adding Files -~~~~~~~~~~~~ + curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" "https://demo.dataverse.org/api/files/:persistentId/prov-json?persistentId=doi:10.5072/FK2/AAA000" -.. Note:: Files can be added via the native API but the operation is performed on the parent object, which is a dataset. Please see the Datasets_ endpoint above for more information. +Get Provenance Description for an uploaded file +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -Accessing (downloading) files -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +A curl example using an ``ID`` -.. Note:: Access API has its own section in the Guide: :doc:`/api/dataaccess` +.. code-block:: bash -**Note** Data Access API calls can now be made using persistent identifiers (in addition to database ids). This is done by passing the constant ``:persistentId`` where the numeric id of the file is expected, and then passing the actual persistent id as a query parameter with the name ``persistentId``. + export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx + export SERVER_URL=https://demo.dataverse.org + export ID=24 - Example: Getting the file whose DOI is *10.5072/FK2/J8SJZB* :: + curl -H "X-Dataverse-key:$API_TOKEN" $SERVER_URL/api/files/$ID/prov-freeform - GET http://$SERVER/api/access/datafile/:persistentId/?persistentId=doi:10.5072/FK2/J8SJZB +The fully expanded example above (without environment variables) looks like this: +.. code-block:: bash -Restrict Files -~~~~~~~~~~~~~~ + curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" https://demo.dataverse.org/api/files/24/prov-freeform -Restrict or unrestrict an existing file where ``id`` is the database id of the file or ``pid`` is the persistent id (DOI or Handle) of the file to restrict. Note that some Dataverse installations do not allow the ability to restrict files. +A curl example using a ``PERSISTENT_ID`` -A curl example using an ``id``:: +.. code-block:: bash - curl -H "X-Dataverse-key:$API_TOKEN" -X PUT -d true http://$SERVER/api/files/{id}/restrict + export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx + export SERVER_URL=https://demo.dataverse.org + export PERSISTENT_ID=doi:10.5072/FK2/AAA000 -A curl example using a ``pid``:: + curl -H "X-Dataverse-key:$API_TOKEN" "$SERVER_URL/api/files/:persistentId/prov-freeform?persistentId=$PERSISTENT_ID" - curl -H "X-Dataverse-key:$API_TOKEN" -X PUT -d true http://$SERVER/api/files/:persistentId/restrict?persistentId={pid} - -Uningest a File -~~~~~~~~~~~~~~~ +The fully expanded example above (without environment variables) looks like this: -Reverse the tabular data ingest process performed on a file where ``{id}`` is the database id of the file to process. Note that this requires "superuser" credentials:: +.. code-block:: bash - POST http://$SERVER/api/files/{id}/uningest?key={apiKey} + curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" "https://demo.dataverse.org/api/files/:persistentId/prov-freeform?persistentId=doi:10.5072/FK2/AAA000" -Reingest a File -~~~~~~~~~~~~~~~ +Create/Update Provenance JSON and provide related entity name for an uploaded file +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -Attempt to ingest an existing datafile as tabular data. This API can be used on a file that was not ingested as tabular back when it was uploaded. For example, a Stata v.14 file that was uploaded before ingest support for Stata 14 was added (in Dataverse v.4.9). It can also be used on a file that failed to ingest due to a bug in the ingest plugin that has since been fixed (hence the name "reingest"). +A curl example using an ``ID`` -Note that this requires "superuser" credentials:: +.. code-block:: bash - POST http://$SERVER/api/files/{id}/reingest?key={apiKey} + export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx + export SERVER_URL=https://demo.dataverse.org + export ID=24 + export ENTITY_NAME="..." + export FILE_PATH=provenance.json -(``{id}`` is the database id of the file to process) + curl -H "X-Dataverse-key:$API_TOKEN" -X POST $SERVER_URL/api/files/$ID/prov-json?entityName=$ENTITY_NAME -H "Content-type:application/json" --upload-file $FILE_PATH -Note: at present, the API cannot be used on a file that's already successfully ingested as tabular. +The fully expanded example above (without environment variables) looks like this: -.. _redetect-file-type: +.. code-block:: bash -Redetect File Type -~~~~~~~~~~~~~~~~~~ + curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X POST "https://demo.dataverse.org/api/files/24/prov-json?entityName=..." -H "Content-type:application/json" --upload-file provenance.json -Dataverse uses a variety of methods for determining file types (MIME types or content types) and these methods (listed below) are updated periodically. If you have files that have an unknown file type, you can have Dataverse attempt to redetect the file type. +A curl example using a ``PERSISTENT_ID`` -When using the curl command below, you can pass ``dryRun=true`` if you don't want any changes to be saved to the database. Change this to ``dryRun=false`` (or omit it) to save the change. In the example below, the file is identified by database id "42". +.. code-block:: bash -``export FILE_ID=42`` + export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx + export SERVER_URL=https://demo.dataverse.org + export PERSISTENT_ID=doi:10.5072/FK2/AAA000 + export ENTITY_NAME="..." + export FILE_PATH=provenance.json -``curl -H "X-Dataverse-key:$API_TOKEN" -X POST $SERVER_URL/api/files/$FILE_ID/redetect?dryRun=true`` + curl -H "X-Dataverse-key:$API_TOKEN" -X POST "$SERVER_URL/api/files/:persistentId/prov-json?persistentId=$PERSISTENT_ID&entityName=$ENTITY_NAME" -H "Content-type:application/json" --upload-file $FILE_PATH -Currently the following methods are used to detect file types: +The fully expanded example above (without environment variables) looks like this: -- The file type detected by the browser (or sent via API). -- JHOVE: http://jhove.openpreservation.org -- As a last resort the file extension (e.g. ".ipybn") is used, defined in a file called ``MimeTypeDetectionByFileExtension.properties``. +.. code-block:: bash -Replacing Files -~~~~~~~~~~~~~~~ + curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X POST "https://demo.dataverse.org/api/files/:persistentId/prov-json?persistentId=doi:10.5072/FK2/AAA000&entityName=..." -H "Content-type:application/json" --upload-file provenance.json -Replace an existing file where ``id`` is the database id of the file to replace or ``pid`` is the persistent id (DOI or Handle) of the file. Requires the ``file`` to be passed as well as a ``jsonString`` expressing the new metadata. Note that metadata such as description, directoryLabel (File Path) and tags are not carried over from the file being replaced:: +Create/Update Provenance Description for an uploaded file +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - POST -F 'file=@file.extension' -F 'jsonData={json}' http://$SERVER/api/files/{id}/metadata?key={apiKey} +Requires a JSON file with the description connected to a key named "text" -Example:: +A curl example using an ``ID`` - curl -H "X-Dataverse-key:$API_TOKEN" -X POST -F 'file=@data.tsv' \ - -F 'jsonData={"description":"My description.","categories":["Data"],"forceReplace":false}'\ - "https://demo.dataverse.org/api/files/$FILE_ID/replace" +.. code-block:: bash -Getting File Metadata -~~~~~~~~~~~~~~~~~~~~~ + export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx + export SERVER_URL=https://demo.dataverse.org + export ID=24 + export FILE_PATH=provenance.json -Provides a json representation of the file metadata for an existing file where ``id`` is the database id of the file to replace or ``pid`` is the persistent id (DOI or Handle) of the file:: + curl -H "X-Dataverse-key:$API_TOKEN" -X POST $SERVER_URL/api/files/$ID/prov-freeform -H "Content-type:application/json" --upload-file $FILE_PATH - GET http://$SERVER/api/files/{id}/metadata +The fully expanded example above (without environment variables) looks like this: -The current draft can also be viewed if you have permissions and pass your ``apiKey``:: +.. code-block:: bash - GET http://$SERVER/api/files/{id}/metadata/draft?key={apiKey} + curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X POST https://demo.dataverse.org/api/files/24/prov-freeform -H "Content-type:application/json" --upload-file provenance.json -Note: The ``id`` returned in the json response is the id of the file metadata version. +A curl example using a ``PERSISTENT_ID`` -Updating File Metadata -~~~~~~~~~~~~~~~~~~~~~~ +.. code-block:: bash -Updates the file metadata for an existing file where ``id`` is the database id of the file to replace or ``pid`` is the persistent id (DOI or Handle) of the file. Requires a ``jsonString`` expressing the new metadata. No metadata from the previous version of this file will be persisted, so if you want to update a specific field first get the json with the above command and alter the fields you want:: + export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx + export SERVER_URL=https://demo.dataverse.org + export PERSISTENT_ID=doi:10.5072/FK2/AAA000 + export FILE_PATH=provenance.json - POST -F 'jsonData={json}' http://$SERVER/api/files/{id}/metadata?key={apiKey} + curl -H "X-Dataverse-key:$API_TOKEN" -X POST "$SERVER_URL/api/files/:persistentId/prov-freeform?persistentId=$PERSISTENT_ID" -H "Content-type:application/json" --upload-file $FILE_PATH -Example:: +The fully expanded example above (without environment variables) looks like this: - curl -H "X-Dataverse-key:{apiKey}" -X POST -F 'jsonData={"description":"My description bbb.","provFreeform":"Test prov freeform","categories":["Data"],"restrict":false}' 'http://localhost:8080/api/files/264/metadata' +.. code-block:: bash -Also note that dataFileTags are not versioned and changes to these will update the published version of the file. + curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X POST "https://demo.dataverse.org/api/files/:persistentId/prov-freeform?persistentId=doi:10.5072/FK2/AAA000" -H "Content-type:application/json" --upload-file provenance.json -.. _EditingVariableMetadata: +See a sample JSON file :download:`file-provenance.json <../_static/api/file-provenance.json>` from http://openprovenance.org (c.f. Huynh, Trung Dong and Moreau, Luc (2014) ProvStore: a public provenance repository. At 5th International Provenance and Annotation Workshop (IPAW'14), Cologne, Germany, 09-13 Jun 2014. pp. 275-277). -Editing Variable Level Metadata -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Delete Provenance JSON for an uploaded file +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -Updates variable level metadata using ddi xml ``$file``, where ``$id`` is file id:: +A curl example using an ``ID`` - PUT https://$SERVER/api/edit/$id --upload-file $file +.. code-block:: bash -Example: ``curl -H "X-Dataverse-key:$API_TOKEN" -X PUT http://localhost:8080/api/edit/95 --upload-file dct.xml`` + export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx + export SERVER_URL=https://demo.dataverse.org + export ID=24 -You can download :download:`dct.xml <../../../../src/test/resources/xml/dct.xml>` from the example above to see what the XML looks like. + curl -H "X-Dataverse-key:$API_TOKEN" -X DELETE $SERVER_URL/api/files/$ID/prov-json -Provenance -~~~~~~~~~~ -Get Provenance JSON for an uploaded file:: +The fully expanded example above (without environment variables) looks like this: - GET http://$SERVER/api/files/{id}/prov-json?key=$apiKey +.. code-block:: bash -Get Provenance Description for an uploaded file:: + curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X DELETE https://demo.dataverse.org/api/files/24/prov-json - GET http://$SERVER/api/files/{id}/prov-freeform?key=$apiKey +A curl example using a ``PERSISTENT_ID`` -Create/Update Provenance JSON and provide related entity name for an uploaded file:: +.. code-block:: bash - POST http://$SERVER/api/files/{id}/prov-json?key=$apiKey&entityName=$entity -H "Content-type:application/json" --upload-file $filePath + export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx + export SERVER_URL=https://demo.dataverse.org + export PERSISTENT_ID=doi:10.5072/FK2/AAA000 -Create/Update Provenance Description for an uploaded file. Requires a JSON file with the description connected to a key named "text":: + curl -H "X-Dataverse-key:$API_TOKEN" -X DELETE "$SERVER_URL/api/files/:persistentId/prov-json?persistentId=$PERSISTENT_ID" - POST http://$SERVER/api/files/{id}/prov-freeform?key=$apiKey -H "Content-type:application/json" --upload-file $filePath +The fully expanded example above (without environment variables) looks like this: -Delete Provenance JSON for an uploaded file:: +.. code-block:: bash - DELETE http://$SERVER/api/files/{id}/prov-json?key=$apiKey + curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X DELETE "https://demo.dataverse.org/api/files/:persistentId/prov-json?persistentId=doi:10.5072/FK2/AAA000" Datafile Integrity ~~~~~~~~~~~~~~~~~~ -Starting the release 4.10 the size of the saved original file (for an ingested tabular datafile) is stored in the database. The following API will retrieve and permanently store the sizes for any already existing saved originals:: +Starting the release 4.10 the size of the saved original file (for an ingested tabular datafile) is stored in the database. The following API will retrieve and permanently store the sizes for any already existing saved originals: + +.. code-block:: bash + + export SERVER_URL=https://localhost + + curl $SERVER_URL/api/admin/datafiles/integrity/fixmissingoriginalsizes + +with limit parameter: + +.. code-block:: bash + + export SERVER_URL=https://localhost + export LIMIT=10 + + curl "$SERVER_URL/api/admin/datafiles/integrity/fixmissingoriginalsizes?limit=$LIMIT" + +The fully expanded example above (without environment variables) looks like this: + +.. code-block:: bash + + curl https://localhost/api/admin/datafiles/integrity/fixmissingoriginalsizes" + +with limit parameter: + +.. code-block:: bash - GET http://$SERVER/api/admin/datafiles/integrity/fixmissingoriginalsizes{?limit=N} + curl https://localhost/api/admin/datafiles/integrity/fixmissingoriginalsizes?limit=10" Note the optional "limit" parameter. Without it, the API will attempt to populate the sizes for all the saved originals that don't have them in the database yet. Otherwise it will do so for the first N such datafiles. +By default, the admin API calls are blocked and can only be called from localhost. See more details in :ref:`:BlockedApiEndpoints <:BlockedApiEndpoints>` and :ref:`:BlockedApiPolicy <:BlockedApiPolicy>` settings in :doc:`/installation/config`. + Users Token Management ---------------------- @@ -1064,12 +2294,12 @@ In order to delete a token use:: Builtin Users ------------- -Builtin users are known as "Username/Email and Password" users in the :doc:`/user/account` of the User Guide. Dataverse stores a password (encrypted, of course) for these users, which differs from "remote" users such as Shibboleth or OAuth users where the password is stored elsewhere. See also "Auth Modes: Local vs. Remote vs. Both" in the :doc:`/installation/config` section of the Installation Guide. It's a valid configuration of Dataverse to not use builtin users at all. +Builtin users are known as "Username/Email and Password" users in the :doc:`/user/account` of the User Guide. Dataverse stores a password (encrypted, of course) for these users, which differs from "remote" users such as Shibboleth or OAuth users where the password is stored elsewhere. See also :ref:`auth-modes` section of Configuration in the Installation Guide. It's a valid configuration of Dataverse to not use builtin users at all. Create a Builtin User ~~~~~~~~~~~~~~~~~~~~~ -For security reasons, builtin users cannot be created via API unless the team who runs the Dataverse installation has populated a database setting called ``BuiltinUsers.KEY``, which is described under "Securing Your Installation" and "Database Settings" in the :doc:`/installation/config` section of the Installation Guide. You will need to know the value of ``BuiltinUsers.KEY`` before you can proceed. +For security reasons, builtin users cannot be created via API unless the team who runs the Dataverse installation has populated a database setting called ``BuiltinUsers.KEY``, which is described under :ref:`securing-your-installation` and :ref:`database-settings` sections of Configuration in the Installation Guide. You will need to know the value of ``BuiltinUsers.KEY`` before you can proceed. To create a builtin user via API, you must first construct a JSON document. You can download :download:`user-add.json <../_static/api/user-add.json>` or copy the text below as a starting point and edit as necessary. @@ -1177,6 +2407,8 @@ Shibboleth Groups Management of Shibboleth groups via API is documented in the :doc:`/installation/shibboleth` section of the Installation Guide. +.. _info: + Info ---- @@ -1221,7 +2453,7 @@ The fully expanded example above (without environment variables) looks like this Show Custom Popup Text for Publishing Datasets ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -For now, only the value for the ``:DatasetPublishPopupCustomText`` setting from the :doc:`/installation/config` section of the Installation Guide is exposed: +For now, only the value for the :ref:`:DatasetPublishPopupCustomText` setting from the Configuration section of the Installation Guide is exposed: .. note:: See :ref:`curl-examples-and-environment-variables` if you are unfamiliar with the use of export below. @@ -1285,10 +2517,12 @@ Each user can get a dump of their notifications by passing in their API token:: curl -H "X-Dataverse-key:$API_TOKEN" $SERVER_URL/api/notifications/all +.. _admin: + Admin ----- -This is the administrative part of the API. For security reasons, it is absolutely essential that you block it before allowing public access to a Dataverse installation. Blocking can be done using settings. See the ``post-install-api-block.sh`` script in the ``scripts/api`` folder for details. See also "Blocking API Endpoints" under "Securing Your Installation" in the :doc:`/installation/config` section of the Installation Guide. +This is the administrative part of the API. For security reasons, it is absolutely essential that you block it before allowing public access to a Dataverse installation. Blocking can be done using settings. See the ``post-install-api-block.sh`` script in the ``scripts/api`` folder for details. See :ref:`blocking-api-endpoints` in Securing Your Installation section of the Configuration page of the Installation Guide. List All Database Settings ~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -1572,6 +2806,21 @@ Make User a SuperUser Toggles superuser mode on the ``AuthenticatedUser`` whose ``identifier`` (without the ``@`` sign) is passed. :: POST http://$SERVER/api/admin/superuser/$identifier + +Delete a User +~~~~~~~~~~~~~ + +Deletes an ``AuthenticatedUser`` whose ``identifier`` (without the ``@`` sign) is passed. :: + + DELETE http://$SERVER/api/admin/authenticatedUsers/$identifier + +Deletes an ``AuthenticatedUser`` whose ``id`` is passed. :: + + DELETE http://$SERVER/api/admin/authenticatedUsers/id/$id + +Note: If the user has performed certain actions such as creating or contributing to a Dataset or downloading a file they cannot be deleted. + + List Role Assignments of a Role Assignee ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ diff --git a/doc/sphinx-guides/source/api/search.rst b/doc/sphinx-guides/source/api/search.rst index a35a544596e..25bad2b8091 100755 --- a/doc/sphinx-guides/source/api/search.rst +++ b/doc/sphinx-guides/source/api/search.rst @@ -116,6 +116,7 @@ https://demo.dataverse.org/api/search?q=trees "Astronomy and Astrophysics", "Other" ], + "fileCount":3, "versionId":1260, "versionState":"RELEASED", "majorVersion":3, @@ -146,8 +147,8 @@ https://demo.dataverse.org/api/search?q=trees .. _advancedsearch-example: -Advanced Search Example ------------------------ +Advanced Search Examples +------------------------ https://demo.dataverse.org/api/search?q=finch&show_relevance=true&show_facets=true&fq=publicationDate:2016&subtree=birds @@ -261,6 +262,91 @@ In this example, ``show_relevance=true`` matches per field are shown. Available } } +https://demo.dataverse.org/api/search?q=finch&fq=publicationStatus:Published&type=dataset + +The above example ``fq=publicationStatus:Published`` retrieves only "RELEASED" versions of datasets. The same could be done to retrieve "DRAFT" versions, ``fq=publicationStatus:Draft`` + +.. code-block:: json + + { + "status": "OK", + "data": { + "q": "finch", + "total_count": 2, + "start": 0, + "spelling_alternatives": {}, + "items": [ + { + "name": "Darwin's Finches", + "type": "dataset", + "url": "https://doi.org/10.70122/FK2/GUAS41", + "global_id": "doi:10.70122/FK2/GUAS41", + "description": "Darwin's finches (also known as the Galápagos finches) are a group of about fifteen species of passerine birds.", + "published_at": "2019-12-24T08:05:02Z", + "publisher": "mdmizanur rahman Dataverse", + "citationHtml": "Finch, Fiona, 2019, \"Darwin's Finches\", https://doi.org/10.70122/FK2/GUAS41, Demo Dataverse, V1", + "identifier_of_dataverse": "rahman", + "name_of_dataverse": "mdmizanur rahman Dataverse", + "citation": "Finch, Fiona, 2019, \"Darwin's Finches\", https://doi.org/10.70122/FK2/GUAS41, Demo Dataverse, V1", + "storageIdentifier": "file://10.70122/FK2/GUAS41", + "subjects": [ + "Medicine, Health and Life Sciences" + ], + "fileCount":6, + "versionId": 53001, + "versionState": "RELEASED", + "majorVersion": 1, + "minorVersion": 0, + "createdAt": "2019-12-05T09:18:30Z", + "updatedAt": "2019-12-24T08:38:00Z", + "contacts": [ + { + "name": "Finch, Fiona", + "affiliation": "" + } + ], + "authors": [ + "Finch, Fiona" + ] + }, + { + "name": "Darwin's Finches", + "type": "dataset", + "url": "https://doi.org/10.70122/FK2/7ZXYRH", + "global_id": "doi:10.70122/FK2/7ZXYRH", + "description": "Darwin's finches (also known as the Galápagos finches) are a group of about fifteen species of passerine birds.", + "published_at": "2020-01-22T21:47:34Z", + "publisher": "Demo Dataverse", + "citationHtml": "Finch, Fiona, 2020, \"Darwin's Finches\", https://doi.org/10.70122/FK2/7ZXYRH, Demo Dataverse, V1", + "identifier_of_dataverse": "demo", + "name_of_dataverse": "Demo Dataverse", + "citation": "Finch, Fiona, 2020, \"Darwin's Finches\", https://doi.org/10.70122/FK2/7ZXYRH, Demo Dataverse, V1", + "storageIdentifier": "file://10.70122/FK2/7ZXYRH", + "subjects": [ + "Medicine, Health and Life Sciences" + ], + "fileCount":9, + "versionId": 53444, + "versionState": "RELEASED", + "majorVersion": 1, + "minorVersion": 0, + "createdAt": "2020-01-22T21:23:43Z", + "updatedAt": "2020-01-22T21:47:34Z", + "contacts": [ + { + "name": "Finch, Fiona", + "affiliation": "" + } + ], + "authors": [ + "Finch, Fiona" + ] + } + ], + "count_in_response": 2 + } + } + .. _search-date-range: Date Range Search Example diff --git a/doc/sphinx-guides/source/conf.py b/doc/sphinx-guides/source/conf.py index bb27f7610d6..6ca7856c063 100755 --- a/doc/sphinx-guides/source/conf.py +++ b/doc/sphinx-guides/source/conf.py @@ -65,9 +65,9 @@ # built documents. # # The short X.Y version. -version = '4.19' +version = '4.20' # The full version, including alpha/beta/rc tags. -release = '4.19' +release = '4.20' # The language for content autogenerated by Sphinx. Refer to documentation # for a list of supported languages. diff --git a/doc/sphinx-guides/source/developers/big-data-support.rst b/doc/sphinx-guides/source/developers/big-data-support.rst index 37a794e804e..c1c2969a60a 100644 --- a/doc/sphinx-guides/source/developers/big-data-support.rst +++ b/doc/sphinx-guides/source/developers/big-data-support.rst @@ -6,7 +6,52 @@ Big data support is highly experimental. Eventually this content will move to th .. contents:: |toctitle| :local: -Various components need to be installed and configured for big data support. +Various components need to be installed and/or configured for big data support. + +S3 Direct Upload and Download +----------------------------- + +A lightweight option for supporting file sizes beyond a few gigabytes - a size that can cause performance issues when uploaded through the Dataverse server itself - is to configure an S3 store to provide direct upload and download via 'pre-signed URLs'. When these options are configured, file uploads and downloads are made directly to and from a configured S3 store using secure (https) connections that enforce Dataverse's access controls. (The upload and download URLs are signed with a unique key that only allows access for a short time period and Dataverse will only generate such a URL if the user has permission to upload/download the specific file in question.) + +This option can handle files >40GB and could be appropriate for files up to a TB. Other options can scale farther, but this option has the advantages that it is simple to configure and does not require any user training - uploads and downloads are done via the same interface as normal uploads to Dataverse. + +To configure these options, an administrator must set two JVM options for the Dataverse server using the same process as for other configuration options: + +``./asadmin create-jvm-options "-Ddataverse.files..download-redirect=true"`` +``./asadmin create-jvm-options "-Ddataverse.files..upload-redirect=true"`` + + +With multiple stores configured, it is possible to configure one S3 store with direct upload and/or download to support large files (in general or for specific dataverses) while configuring only direct download, or no direct access for another store. + +It is also possible to set file upload size limits per store. See the :MaxFileUploadSizeInBytes setting described in the :doc:`/installation/config` guide. + +At present, one potential drawback for direct-upload is that files are only partially 'ingested', tabular and FITS files are processed, but zip files are not unzipped, and the file contents are not inspected to evaluate their mimetype. This could be appropriate for large files, or it may be useful to completely turn off ingest processing for performance reasons (ingest processing requires a copy of the file to be retrieved by Dataverse from the S3 store). A store using direct upload can be configured to disable all ingest processing for files above a given size limit: + +``./asadmin create-jvm-options "-Ddataverse.files..ingestsizelimit="`` + + +**IMPORTANT:** One additional step that is required to enable direct download to work with previewers is to allow cross site (CORS) requests on your S3 store. +The example below shows how to enable the minimum needed CORS rules on a bucket using the AWS CLI command line tool. Note that you may need to add more methods and/or locations, if you also need to support certain previewers and external tools. + +``aws s3api put-bucket-cors --bucket --cors-configuration file://cors.json`` + +with the contents of the file cors.json as follows: + +.. code-block:: json + + { + "CORSRules": [ + { + "AllowedOrigins": ["https://"], + "AllowedHeaders": ["*"], + "AllowedMethods": ["PUT", "GET"] + } + ] + } + +Alternatively, you can enable CORS using the AWS S3 web interface, using json-encoded rules as in the example above. + +Since the direct upload mechanism creates the final file rather than an intermediate temporary file, user actions, such as neither saving or canceling an upload session before closing the browser page, can leave an abandoned file in the store. The direct upload mechanism attempts to use S3 Tags to aid in identifying/removing such files. Upon upload, files are given a "dv-status":"temp" tag which is removed when the dataset changes are saved and the new file(s) are added in Dataverse. Note that not all S3 implementations support Tags: Minio does not. WIth such stores, direct upload works, but Tags are not used. Data Capture Module (DCM) ------------------------- @@ -18,7 +63,7 @@ Install a DCM Installation instructions can be found at https://github.com/sbgrid/data-capture-module/blob/master/doc/installation.md. Note that shared storage (posix or AWS S3) between Dataverse and your DCM is required. You cannot use a DCM with Swift at this point in time. -.. FIXME: Explain what ``dataverse.files.dcm-s3-bucket-name`` is for and what it has to do with ``dataverse.files.s3-bucket-name``. +.. FIXME: Explain what ``dataverse.files.dcm-s3-bucket-name`` is for and what it has to do with ``dataverse.files.s3.bucket-name``. Once you have installed a DCM, you will need to configure two database settings on the Dataverse side. These settings are documented in the :doc:`/installation/config` section of the Installation Guide: @@ -100,6 +145,7 @@ Optional steps for setting up the S3 Docker DCM Variant ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - Before: the default bucket for DCM to hold files in S3 is named test-dcm. It is coded into `post_upload_s3.bash` (line 30). Change to a different bucket if needed. +- Also Note: With the new support for multiple file store in Dataverse, DCM requires a store with id="s3" and DCM will only work with this store. - Add AWS bucket info to dcmsrv - Add AWS credentials to ``~/.aws/credentials`` @@ -115,6 +161,9 @@ Optional steps for setting up the S3 Docker DCM Variant - ``cd /opt/glassfish4/bin/`` - ``./asadmin delete-jvm-options "\-Ddataverse.files.storage-driver-id=file"`` - ``./asadmin create-jvm-options "\-Ddataverse.files.storage-driver-id=s3"`` + - ``./asadmin create-jvm-options "\-Ddataverse.files.s3.type=s3"`` + - ``./asadmin create-jvm-options "\-Ddataverse.files.s3.label=s3"`` + - Add AWS bucket info to Dataverse - Add AWS credentials to ``~/.aws/credentials`` @@ -132,7 +181,7 @@ Optional steps for setting up the S3 Docker DCM Variant - S3 bucket for Dataverse - - ``/usr/local/glassfish4/glassfish/bin/asadmin create-jvm-options "-Ddataverse.files.s3-bucket-name=iqsstestdcmbucket"`` + - ``/usr/local/glassfish4/glassfish/bin/asadmin create-jvm-options "-Ddataverse.files.s3.bucket-name=iqsstestdcmbucket"`` - S3 bucket for DCM (as Dataverse needs to do the copy over) diff --git a/doc/sphinx-guides/source/developers/deployment.rst b/doc/sphinx-guides/source/developers/deployment.rst index 9532e7c769f..5e830bfde5b 100755 --- a/doc/sphinx-guides/source/developers/deployment.rst +++ b/doc/sphinx-guides/source/developers/deployment.rst @@ -82,23 +82,26 @@ Download and Run the "Create Instance" Script Once you have done the configuration above, you are ready to try running the "ec2-create-instance.sh" script to spin up Dataverse in AWS. -Download :download:`ec2-create-instance.sh <../../../../scripts/installer/ec2-create-instance.sh>` and put it somewhere reasonable. For the purpose of these instructions we'll assume it's in the "Downloads" directory in your home directory. +Download :download:`ec2-create-instance.sh` and put it somewhere reasonable. For the purpose of these instructions we'll assume it's in the "Downloads" directory in your home directory. -ec2-create-instance accepts a number few command-line switches: +To run it with default values you just need the script, but you may also want a current copy of the ansible :download:`group vars`_ file. + +ec2-create-instance accepts a number of command-line switches, including: * -r: GitHub Repository URL (defaults to https://github.com/IQSS/dataverse.git) * -b: branch to build (defaults to develop) * -p: pemfile directory (defaults to $HOME) * -g: Ansible GroupVars file (if you wish to override role defaults) +* -h: help (displays usage for each available option) ``bash ~/Downloads/ec2-create-instance.sh -b develop -r https://github.com/scholarsportal/dataverse.git -g main.yml`` -Now you will need to wait around 15 minutes until the deployment is finished. Eventually, the output should tell you how to access the installation of Dataverse in a web browser or via ssh. It will also provide instructions on how to delete the instance when you are finished with it. Please be aware that AWS charges per minute for a running instance. You can also delete your instance from https://console.aws.amazon.com/console/home?region=us-east-1 . +You will need to wait for 15 minutes or so until the deployment is finished, longer if you've enabled sample data and/or the API test suite. Eventually, the output should tell you how to access the installation of Dataverse in a web browser or via SSH. It will also provide instructions on how to delete the instance when you are finished with it. Please be aware that AWS charges per minute for a running instance. You may also delete your instance from https://console.aws.amazon.com/console/home?region=us-east-1 . -Caveats -~~~~~~~ +Caveat Recipiens +~~~~~~~~~~~~~~~~ -Please note that while the script should work fine on newish branches, older branches that have different dependencies such as an older version of Solr may not produce a working Dataverse installation. Your mileage may vary. +Please note that while the script should work well on new-ish branches, older branches that have different dependencies such as an older version of Solr may not produce a working Dataverse installation. Your mileage may vary. ---- diff --git a/doc/sphinx-guides/source/developers/dev-environment.rst b/doc/sphinx-guides/source/developers/dev-environment.rst index 5ef6eba04ae..e7babf75b34 100755 --- a/doc/sphinx-guides/source/developers/dev-environment.rst +++ b/doc/sphinx-guides/source/developers/dev-environment.rst @@ -131,7 +131,7 @@ On Linux, you should just install PostgreSQL from your package manager without w Install Solr ~~~~~~~~~~~~ -`Solr `_ 7.3.1 is required. +`Solr `_ 7.7.2 is required. To install Solr, execute the following commands: @@ -141,27 +141,27 @@ To install Solr, execute the following commands: ``cd /usr/local/solr`` -``curl -O http://archive.apache.org/dist/lucene/solr/7.3.1/solr-7.3.1.tgz`` +``curl -O http://archive.apache.org/dist/lucene/solr/7.7.2/solr-7.7.2.tgz`` -``tar xvfz solr-7.3.1.tgz`` +``tar xvfz solr-7.7.2.tgz`` -``cd solr-7.3.1/server/solr`` +``cd solr-7.7.2/server/solr`` ``cp -r configsets/_default collection1`` -``curl -O https://raw.githubusercontent.com/IQSS/dataverse/develop/conf/solr/7.3.1/schema.xml`` +``curl -O https://raw.githubusercontent.com/IQSS/dataverse/develop/conf/solr/7.7.2/schema.xml`` -``curl -O https://raw.githubusercontent.com/IQSS/dataverse/develop/conf/solr/7.3.1/schema_dv_mdb_fields.xml`` +``curl -O https://raw.githubusercontent.com/IQSS/dataverse/develop/conf/solr/7.7.2/schema_dv_mdb_fields.xml`` -``curl -O https://raw.githubusercontent.com/IQSS/dataverse/develop/conf/solr/7.3.1/schema_dv_mdb_copies.xml`` +``curl -O https://raw.githubusercontent.com/IQSS/dataverse/develop/conf/solr/7.7.2/schema_dv_mdb_copies.xml`` ``mv schema*.xml collection1/conf`` -``curl -O https://raw.githubusercontent.com/IQSS/dataverse/develop/conf/solr/7.3.1/solrconfig.xml`` +``curl -O https://raw.githubusercontent.com/IQSS/dataverse/develop/conf/solr/7.7.2/solrconfig.xml`` ``mv solrconfig.xml collection1/conf/solrconfig.xml`` -``cd /usr/local/solr/solr-7.3.1`` +``cd /usr/local/solr/solr-7.7.2`` (Please note that the extra jetty argument below is a security measure to limit connections to Solr to only your computer. For extra security, run a firewall.) diff --git a/doc/sphinx-guides/source/developers/geospatial.rst b/doc/sphinx-guides/source/developers/geospatial.rst index 2857f7df9bf..8a19a0b11f2 100644 --- a/doc/sphinx-guides/source/developers/geospatial.rst +++ b/doc/sphinx-guides/source/developers/geospatial.rst @@ -10,7 +10,7 @@ Geoconnect Geoconnect works as a middle layer, allowing geospatial data files in Dataverse to be visualized with Harvard WorldMap. To set up a Geoconnect development environment, you can follow the steps outlined in the `local_setup.md `_ guide. You will need Python and a few other prerequisites. -As mentioned under "Architecture and Components" in the :doc:`/installation/prep` section of the Installation Guide, Geoconnect is an optional component of Dataverse, so this section is only necessary to follow it you are working on an issue related to this feature. +As mentioned under the :ref:`architecture` section of Preparation in the Installation Guide, Geoconnect is an optional component of Dataverse, so this section is only necessary to follow it you are working on an issue related to this feature. How Dataverse Ingests Shapefiles -------------------------------- diff --git a/doc/sphinx-guides/source/developers/intro.rst b/doc/sphinx-guides/source/developers/intro.rst index ea8e924b4ef..3ebfecd4a35 100755 --- a/doc/sphinx-guides/source/developers/intro.rst +++ b/doc/sphinx-guides/source/developers/intro.rst @@ -60,7 +60,7 @@ As a developer, you also may be interested in these projects related to Datavers - DVUploader - a stand-alone command-line Java application that uses the Dataverse API to support upload of files from local disk to a Dataset: https://github.com/IQSS/dataverse-uploader - dataverse-sample-data - populate your Dataverse installation with sample data: https://github.com/IQSS/dataverse-sample-data - dataverse-metrics - aggregate and visualize metrics for installations of Dataverse around the world: https://github.com/IQSS/dataverse-metrics -- Configuration management scripts - Ansible, Puppet, etc.: See "Advanced Installation" in the :doc:`/installation/prep` section of the Installation Guide. +- Configuration management scripts - Ansible, Puppet, etc.: See :ref:`advanced` section in the Installation Guide. - :doc:`/developers/unf/index` (Java) - a Universal Numerical Fingerprint: https://github.com/IQSS/UNF - GeoConnect (Python) - create a map by uploading files to Dataverse: https://github.com/IQSS/geoconnect - `DataTags `_ (Java and Scala) - tag datasets with privacy levels: https://github.com/IQSS/DataTags diff --git a/doc/sphinx-guides/source/developers/make-data-count.rst b/doc/sphinx-guides/source/developers/make-data-count.rst index 0bb7e9e0ffd..253304d78ea 100644 --- a/doc/sphinx-guides/source/developers/make-data-count.rst +++ b/doc/sphinx-guides/source/developers/make-data-count.rst @@ -71,7 +71,7 @@ If all this is working and you want to send data to the test instance of the Dat ``curl --header "Content-Type: application/json; Accept: application/json" -H "Authorization: Bearer $JSON_WEB_TOKEN" -X POST https://api.test.datacite.org/reports/ -d @sushi_report.json`` -For how to put citations into your dev database and how to get them out again, see "Configuring Dataverse for Make Data Count Citations" in the :doc:`/admin/make-data-count` section of the Admin Guide. +For how to put citations into your dev database and how to get them out again, see :ref:`MDC-updateCitationsForDataset` section in Make Data Count of the Admin Guide. Testing Make Data Count and Dataverse ------------------------------------- diff --git a/doc/sphinx-guides/source/developers/remote-users.rst b/doc/sphinx-guides/source/developers/remote-users.rst index 4a517c1beb2..66af0c71eda 100755 --- a/doc/sphinx-guides/source/developers/remote-users.rst +++ b/doc/sphinx-guides/source/developers/remote-users.rst @@ -8,7 +8,7 @@ Shibboleth and OAuth Shibboleth and OAuth -------------------- -If you are working on anything related to users, please keep in mind that your changes will likely affect Shibboleth and OAuth users. For some background on user accounts in Dataverse, see "Auth Modes: Local vs. Remote vs. Both" in the :doc:`/installation/config` section of the Installation Guide. +If you are working on anything related to users, please keep in mind that your changes will likely affect Shibboleth and OAuth users. For some background on user accounts in Dataverse, see :ref:`auth-modes` section of Configuration in the Installation Guide. Rather than setting up Shibboleth on your laptop, developers are advised to simply add a value to their database to enable Shibboleth "dev mode" like this: diff --git a/doc/sphinx-guides/source/developers/testing.rst b/doc/sphinx-guides/source/developers/testing.rst index 88cf415ec31..2894c457d85 100755 --- a/doc/sphinx-guides/source/developers/testing.rst +++ b/doc/sphinx-guides/source/developers/testing.rst @@ -108,22 +108,33 @@ Unfortunately, the term "integration tests" can mean different things to differe Running the Full API Test Suite Using EC2 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -To run the API test suite on EC2 you should first follow the steps in the :doc:`deployment` section to get set up for AWS in general and EC2 in particular. +To run the API test suite in an EC2 instance you should first follow the steps in the :doc:`deployment` section to get set up for AWS in general and EC2 in particular. -Then read the instructions in https://github.com/IQSS/dataverse-sample-data for EC2 but be sure to make the adjustments below. +You may always retrieve a current copy of the ec2-create-instance.sh script and accompanying group_var.yml file from the `dataverse-ansible repo`_: -Edit ``ec2config.yaml`` to change ``test_suite`` to ``true``. +- `ec2-create-instance.sh`_ +- `main.yml`_ -Pass in the repo and branch you are testing. You should also specify a local directory where server.log and other useful information will be written so you can start debugging any failures. +Edit ``main.yml`` to set the desired GitHub repo, branch, and to ensure that the API test suite is enabled: + +- ``dataverse_repo: https://github.com/IQSS/dataverse.git`` +- ``dataverse_branch: develop`` +- ``dataverse.api.test_suite: true`` +- ``dataverse.sampledata.enabled: true`` + +If you wish, you may pass the local path of a logging directory, which will tell ec2-create-instance.sh to `grab glassfish, maven and other logs`_ for your review. + +Finally, run the script: .. code-block:: bash - export REPO=https://github.com/IQSS/dataverse.git - export BRANCH=123-my-branch - export LOGS=/tmp/123 + $ ./ec2-create-instance.sh -g main.yml -l log_dir + +Near the beginning and at the end of the ec2-create-instance.sh output you will see instructions for connecting to the instance via SSH. If you are actively working on a branch and want to refresh the warfile after each commit, you may wish to call a `redeploy.sh`_ script placed by the Ansible role, which will do a "git pull" against your branch, build the warfile, deploy the warfile, then restart glassfish. By default this script is written to /tmp/dataverse/redeploy.sh. You may invoke the script by appending it to the SSH command in ec2-create's output: + +.. code-block:: bash - mkdir $LOGS - ./ec2-create-instance.sh -g ec2config.yaml -r $REPO -b $BRANCH -l $LOGS + $ ssh -i your_pem.pem user@ec2-host.aws.com /tmp/dataverse/redeploy.sh Running the full API test suite using Docker ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -165,7 +176,7 @@ The root dataverse must be published for some of the REST Assured tests to run. dataverse.siteUrl ^^^^^^^^^^^^^^^^^ -When run locally (as opposed to a remote server), some of the REST Assured tests require the ``dataverse.siteUrl`` JVM option to be set to ``http://localhost:8080``. See "JVM Options" under the :doc:`/installation/config` section of the Installation Guide for advice changing JVM options. First you should check to check your JVM options with: +When run locally (as opposed to a remote server), some of the REST Assured tests require the ``dataverse.siteUrl`` JVM option to be set to ``http://localhost:8080``. See :ref:`jvm-options` section in the Installation Guide for advice changing JVM options. First you should check to check your JVM options with: ``./asadmin list-jvm-options | egrep 'dataverse|doi'`` diff --git a/doc/sphinx-guides/source/developers/tools.rst b/doc/sphinx-guides/source/developers/tools.rst index 767a4a91694..236dee2e3c9 100755 --- a/doc/sphinx-guides/source/developers/tools.rst +++ b/doc/sphinx-guides/source/developers/tools.rst @@ -25,6 +25,8 @@ Maven With Maven installed you can run ``mvn package`` and ``mvn test`` from the command line. It can be downloaded from https://maven.apache.org +.. _vagrant: + Vagrant +++++++ diff --git a/doc/sphinx-guides/source/developers/troubleshooting.rst b/doc/sphinx-guides/source/developers/troubleshooting.rst index ec49b442016..2182c9768ad 100755 --- a/doc/sphinx-guides/source/developers/troubleshooting.rst +++ b/doc/sphinx-guides/source/developers/troubleshooting.rst @@ -84,6 +84,8 @@ As another example, here is how to create a Mail Host via command line for Amazo - Delete: ``./asadmin delete-javamail-resource mail/MyMailSession`` - Create (remove brackets and replace the variables inside): ``./asadmin create-javamail-resource --mailhost email-smtp.us-east-1.amazonaws.com --mailuser [test\@test\.com] --fromaddress [test\@test\.com] --transprotocol aws --transprotocolclass com.amazonaws.services.simpleemail.AWSJavaMailTransport --property mail.smtp.auth=true:mail.smtp.user=[aws_access_key]:mail.smtp.password=[aws_secret_key]:mail.transport.protocol=smtp:mail.smtp.port=587:mail.smtp.starttls.enable=true mail/notifyMailSession`` +.. _rebuilding-dev-environment: + Rebuilding Your Dev Environment ------------------------------- @@ -96,7 +98,7 @@ If you have an old copy of the database and old Solr data and want to start fres - confirm http://localhost:8080 is up - If you want to set some dataset-specific facets, go to the root dataverse (or any dataverse; the selections can be inherited) and click "General Information" and make choices under "Select Facets". There is a ticket to automate this: https://github.com/IQSS/dataverse/issues/619 -You may also find https://github.com/IQSS/dataverse/blob/develop/scripts/deploy/phoenix.dataverse.org/deploy and related scripts interesting because they demonstrate how we have at least partially automated the process of tearing down a Dataverse installation and having it rise again, hence the name "phoenix." See also "Fresh Reinstall" in the :doc:`/installation/installation-main` section of the Installation Guide. +You may also find https://github.com/IQSS/dataverse/blob/develop/scripts/deploy/phoenix.dataverse.org/deploy and related scripts interesting because they demonstrate how we have at least partially automated the process of tearing down a Dataverse installation and having it rise again, hence the name "phoenix." See :ref:`fresh-reinstall` section of the Installation Guide. DataCite -------- diff --git a/doc/sphinx-guides/source/developers/version-control.rst b/doc/sphinx-guides/source/developers/version-control.rst index 618468392b7..eaaf0fa1911 100644 --- a/doc/sphinx-guides/source/developers/version-control.rst +++ b/doc/sphinx-guides/source/developers/version-control.rst @@ -93,6 +93,65 @@ Now that you've made your pull request, your goal is to make sure it appears in Look at https://github.com/IQSS/dataverse/blob/master/CONTRIBUTING.md for various ways to reach out to developers who have enough access to the GitHub repo to move your issue and pull request to the "Code Review" column. +Summary of Git commands +~~~~~~~~~~~~~~~~~~~~~~~ + +This section provides sequences of Git commands for two scenarios: + +* preparing the first request, when the IQSS Dataverse repository and the forked repository are identical +* creating an additional request after some time, when the IQSS Dataverse repository is ahead of the forked repository + +In the examples we use 123-COOL-FEATURE as the name of the feature branch, and https://github.com/YOUR_NAME/dataverse.git as your forked repository's URL. In practice modify both accordingly. + +**1st scenario: preparing the first pull request** + +.. code-block:: bash + + # clone Dataverse at Github.com ... then + + git clone https://github.com/YOUR_NAME/dataverse.git dataverse_fork + cd dataverse_fork + + # create a new branch locally for the pull request + git checkout -b 123-COOL-FEATURE + + # working on the branch ... then commit changes + git commit -am "#123 explanation of changes" + + # upload the new branch to https://github.com/YOUR_NAME/dataverse + git push -u origin 123-COOL-FEATURE + + # ... then create pull request at github.com/YOUR_NAME/dataverse + + +**2nd scenario: preparing another pull request some month later** + +.. code-block:: bash + + # register IQSS Dataverse repo + git remote add upstream https://github.com/IQSS/dataverse.git + + git checkout develop + + # update local develop banch from https://github.com/IQSS/dataverse + git fetch upstream develop + git rebase upstream/develop + + # update remote develop branch at https://github.com/YOUR_NAME/dataverse + git push + + # create a new branch locally for the pull request + git checkout -b 123-COOL-FEATURE + + # work on the branch and commit changes + git commit -am "#123 explanation of changes" + + # upload the new branch to https://github.com/YOUR_NAME/dataverse + git push -u origin 123-COOL-FEATURE + + # ... then create pull request at github.com/YOUR_NAME/dataverse + + How to Resolve Conflicts in Your Pull Request --------------------------------------------- diff --git a/doc/sphinx-guides/source/installation/advanced.rst b/doc/sphinx-guides/source/installation/advanced.rst index a1f559af57d..a60d7dbc23f 100644 --- a/doc/sphinx-guides/source/installation/advanced.rst +++ b/doc/sphinx-guides/source/installation/advanced.rst @@ -15,8 +15,8 @@ You should be conscious of the following when running multiple Glassfish servers - Only one Glassfish server can be the dedicated timer server, as explained in the :doc:`/admin/timers` section of the Admin Guide. - When users upload a logo or footer for their dataverse using the "theme" feature described in the :doc:`/user/dataverse-management` section of the User Guide, these logos are stored only on the Glassfish server the user happend to be on when uploading the logo. By default these logos and footers are written to the directory ``/usr/local/glassfish4/glassfish/domains/domain1/docroot/logos``. - When a sitemp is created by a Glassfish server it is written to the filesystem of just that Glassfish server. By default the sitemap is written to the directory ``/usr/local/glassfish4/glassfish/domains/domain1/docroot/sitemap``. -- If Make Data Count is used, its raw logs must be copied from each Glassfish server to single instance of Counter Processor. See also the ``:MDCLogPath`` database setting in the :doc:`config` section of this guide and the :doc:`/admin/make-data-count` section of the Admin Guide. -- Dataset draft version logging occurs separately on each Glassfish server. See "Edit Draft Versions Logging" in the :doc:`/admin/monitoring` section of the Admin Guide for details. +- If Make Data Count is used, its raw logs must be copied from each Glassfish server to single instance of Counter Processor. See also :ref:`:MDCLogPath` section in the Configuration section of this guide and the :doc:`/admin/make-data-count` section of the Admin Guide. +- Dataset draft version logging occurs separately on each Glassfish server. See :ref:`edit-draft-versions-logging` section in Monitoring of the Admin Guide for details. - Password aliases (``db_password_alias``, etc.) are stored per Glassfish server. Detecting Which Glassfish Server a User Is On @@ -34,4 +34,4 @@ If you have successfully installed multiple Glassfish servers behind a load bala You would repeat the steps above for all of your Glassfish servers. If users seem to be having a problem with a particular server, you can ask them to visit https://dataverse.example.edu/host.txt and let you know what they see there (e.g. "server1.example.edu") to help you know which server to troubleshoot. -Please note that "Network Ports" under the :doc:`config` section has more information on fronting Glassfish with Apache. The :doc:`shibboleth` section talks about the use of ``ProxyPassMatch``. +Please note that :ref:`network-ports` under the Configuration section has more information on fronting Glassfish with Apache. The :doc:`shibboleth` section talks about the use of ``ProxyPassMatch``. diff --git a/doc/sphinx-guides/source/installation/config.rst b/doc/sphinx-guides/source/installation/config.rst index 1922eaecac1..fe559776500 100644 --- a/doc/sphinx-guides/source/installation/config.rst +++ b/doc/sphinx-guides/source/installation/config.rst @@ -11,6 +11,8 @@ Once you have finished securing and configuring your Dataverse installation, you .. contents:: |toctitle| :local: +.. _securing-your-installation: + Securing Your Installation -------------------------- @@ -49,17 +51,18 @@ Out of the box, Dataverse will list email addresses of the contacts for datasets Additional Recommendations ++++++++++++++++++++++++++ + Run Glassfish as a User Other Than Root -+++++++++++++++++++++++++++++++++++++++ +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -See the Glassfish section of :doc:`prerequisites` for details and init scripts for running Glassfish as non-root. +See the :ref:`glassfish` section of :doc:`prerequisites` for details and init scripts for running Glassfish as non-root. Related to this is that you should remove ``/root/.glassfish/pass`` to ensure that Glassfish isn't ever accidentally started as root. Without the password, Glassfish won't be able to start as root, which is a good thing. Enforce Strong Passwords for User Accounts -++++++++++++++++++++++++++++++++++++++++++ +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -Dataverse only stores passwords (as salted hash, and using a strong hashing algorithm) for "builtin" users. You can increase the password complexity rules to meet your security needs. If you have configured your Dataverse installation to allow login from remote authentication providers such as Shibboleth, ORCID, GitHub or Google, you do not have any control over those remote providers' password complexity rules. See the "Auth Modes: Local vs. Remote vs. Both" section below for more on login options. +Dataverse only stores passwords (as salted hash, and using a strong hashing algorithm) for "builtin" users. You can increase the password complexity rules to meet your security needs. If you have configured your Dataverse installation to allow login from remote authentication providers such as Shibboleth, ORCID, GitHub or Google, you do not have any control over those remote providers' password complexity rules. See the :ref:`auth-modes` section below for more on login options. Even if you are satisfied with the out-of-the-box password complexity rules Dataverse ships with, for the "dataverseAdmin" account you should use a strong password so the hash cannot easily be cracked through dictionary attacks. @@ -74,6 +77,8 @@ Password complexity rules for "builtin" accounts can be adjusted with a variety - :ref:`:PVGoodStrength` - :ref:`:PVCustomPasswordResetAlertMessage` +.. _network-ports: + Network Ports ------------- @@ -179,6 +184,8 @@ Here are the configuration options for handles: Note: If you are **minting your own handles** and plan to set up your own handle service, please refer to `Handle.Net documentation `_. +.. _auth-modes: + Auth Modes: Local vs. Remote vs. Both ------------------------------------- @@ -208,10 +215,48 @@ As for the "Remote only" authentication mode, it means that: - ``:DefaultAuthProvider`` has been set to use the desired authentication provider - The "builtin" authentication provider has been disabled (:ref:`api-toggle-auth-provider`). Note that disabling the "builtin" authentication provider means that the API endpoint for converting an account from a remote auth provider will not work. Converting directly from one remote authentication provider to another (i.e. from GitHub to Google) is not supported. Conversion from remote is always to "builtin". Then the user initiates a conversion from "builtin" to remote. Note that longer term, the plan is to permit multiple login options to the same Dataverse account per https://github.com/IQSS/dataverse/issues/3487 (so all this talk of conversion will be moot) but for now users can only use a single login option, as explained in the :doc:`/user/account` section of the User Guide. In short, "remote only" might work for you if you only plan to use a single remote authentication provider such that no conversion between remote authentication providers will be necessary. -File Storage: Local Filesystem vs. Swift vs. S3 ------------------------------------------------ +File Storage: Using a Local Filesystem and/or Swift and/or S3 object stores +--------------------------------------------------------------------------- + +By default, a Dataverse installation stores all data files (files uploaded by end users) on the filesystem at ``/usr/local/glassfish4/glassfish/domains/domain1/files``. This path can vary based on answers you gave to the installer (see the :ref:`dataverse-installer` section of the Installation Guide) or afterward by reconfiguring the ``dataverse.files.directory`` JVM option described below. + +Dataverse can alternately store files in a Swift or S3-compatible object store, and can now be configured to support multiple stores at once. With a multi-store configuration, the location for new files can be controlled on a per-dataverse basis. + +The following sections describe how to set up various types of stores and how to configure for multiple stores. + +Multi-store Basics ++++++++++++++++++ + +To support multiple stores, Dataverse now requires an id, type, and label for each store (even for a single store configuration). These are configured by defining two required jvm options: + +.. code-block:: none + + ./asadmin $ASADMIN_OPTS create-jvm-options "\-Ddataverse.files..type=" + ./asadmin $ASADMIN_OPTS create-jvm-options "\-Ddataverse.files..label=