Skip to content

Commit

Permalink
Merge pull request #68 from IQSS/develop
Browse files Browse the repository at this point in the history
Update from develop
  • Loading branch information
lubitchv authored Jan 6, 2023
2 parents ec31196 + ac1454b commit c5f44d6
Show file tree
Hide file tree
Showing 37 changed files with 783 additions and 147 deletions.
7 changes: 7 additions & 0 deletions .github/SECURITY.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# Security

To report a security vulnerability please email [email protected] as explained at https://guides.dataverse.org/en/latest/installation/config.html#reporting-security-issues

Advice on securing your installation can be found at https://guides.dataverse.org/en/latest/installation/config.html#securing-your-installation

Security practices and procedures used by the Dataverse team are described at https://guides.dataverse.org/en/latest/developers/security.html
14 changes: 14 additions & 0 deletions doc/release-notes/7844-codemeta.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
# Experimental CodeMeta Schema Support

With this release, we are adding "experimental" (see note below) support for research software metadata deposits.

By adding a metadata block for [CodeMeta](https://codemeta.github.io), we take another step extending the Dataverse
scope being a research data repository towards first class support of diverse F.A.I.R. objects, currently focusing
on research software and computational workflows.

There is more work underway to make Dataverse installations around the world "research software ready". We hope
for feedback from installations on the new metadata block to optimize and lift it from the experimental stage.

**Note:** like the metadata block for computational workflows before, this schema is flagged as "experimental".
"Experimental" means it's brand new, opt-in, and might need future tweaking based on experience of usage in the field.
These blocks are listed here: https://guides.dataverse.org/en/latest/user/appendix.html#experimental-metadata
1 change: 1 addition & 0 deletions doc/release-notes/8840-improved-download-estimate.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
To improve performance, Dataverse estimates download counts. This release includes an update that makes the estimate more accurate.
1 change: 1 addition & 0 deletions doc/release-notes/9096-folder-upload.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Dataverse can now support upload of an entire folder tree of files and retain the relative paths of files as directory path metadata for the uploaded files, if the installation is configured with S3 direct upload.
12 changes: 11 additions & 1 deletion doc/sphinx-guides/source/developers/big-data-support.rst
Original file line number Diff line number Diff line change
Expand Up @@ -36,10 +36,20 @@ At present, one potential drawback for direct-upload is that files are only part

``./asadmin create-jvm-options "-Ddataverse.files.<id>.ingestsizelimit=<size in bytes>"``

.. _cors-s3-bucket:

**IMPORTANT:** One additional step that is required to enable direct uploads via a Dataverse installation and for direct download to work with previewers is to allow cross site (CORS) requests on your S3 store.
Allow CORS for S3 Buckets
~~~~~~~~~~~~~~~~~~~~~~~~~

**IMPORTANT:** One additional step that is required to enable direct uploads via a Dataverse installation and for direct download to work with previewers and direct upload to work with dvwebloader (:ref:`folder-upload`) is to allow cross site (CORS) requests on your S3 store.
The example below shows how to enable CORS rules (to support upload and download) on a bucket using the AWS CLI command line tool. Note that you may want to limit the AllowedOrigins and/or AllowedHeaders further. https://github.com/gdcc/dataverse-previewers/wiki/Using-Previewers-with-download-redirects-from-S3 has some additional information about doing this.

If you'd like to check the CORS configuration on your bucket before making changes:

``aws s3api get-bucket-cors --bucket <BUCKET_NAME>``

To proceed with making changes:

``aws s3api put-bucket-cors --bucket <BUCKET_NAME> --cors-configuration file://cors.json``

with the contents of the file cors.json as follows:
Expand Down
1 change: 1 addition & 0 deletions doc/sphinx-guides/source/developers/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ Developer Guide
sql-upgrade-scripts
testing
documentation
security
dependencies
debugging
coding-style
Expand Down
34 changes: 34 additions & 0 deletions doc/sphinx-guides/source/developers/security.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
========
Security
========

This section describes security practices and procedures for the Dataverse team.

.. contents:: |toctitle|
:local:

Intake of Security Issues
-------------------------

As described under :ref:`reporting-security-issues`, we encourage the community to email [email protected] if they have any security concerns. These emails go into our private ticket tracker (RT_).

.. _RT: https://help.hmdc.harvard.edu

We use a private GitHub issue tracker at https://github.com/IQSS/dataverse-security/issues for security issues.

Sending Security Notices
------------------------

When drafting the security notice, it might be helpful to look at `previous examples`_.

.. _previous examples: https://drive.google.com/drive/folders/0B_qMYwdHFZghaDZIU2hWQnBDZVE?resourcekey=0-SYjuhCohAIM7_pmysVc3Xg&usp=sharing

Gather email addresses from the following sources (these are also described under :ref:`ongoing-security` in the Installation Guide):

- "contact_email" in the `public installation spreadsheet`_
- "Other Security Contacts" in the `private installation spreadsheet`_

Once you have the emails, include them as bcc.

.. _public installation spreadsheet: https://docs.google.com/spreadsheets/d/1bfsw7gnHlHerLXuk7YprUT68liHfcaMxs1rFciA-mEo/edit#gid=0
.. _private installation spreadsheet: https://docs.google.com/spreadsheets/d/1EWDwsj6eptQ7nEr-loLvdU7I6Tm2ljAplfNSVWR42i0/edit?usp=sharing
2 changes: 1 addition & 1 deletion doc/sphinx-guides/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,7 @@ The support email address is `[email protected] <mailto:[email protected]
Report bugs and add feature requests in `GitHub Issues <https://github.com/IQSS/dataverse/issues>`__
or use `GitHub pull requests <http://guides.dataverse.org/en/latest/developers/version-control.html#how-to-make-a-pull-request>`__,
if you have some code, scripts or documentation that you'd like to share.
If you have a **security issue** to report, please email `[email protected] <mailto:[email protected]>`__.
If you have a **security issue** to report, please email `[email protected] <mailto:[email protected]>`__. See also :ref:`reporting-security-issues`.


Indices and Tables
Expand Down
148 changes: 143 additions & 5 deletions doc/sphinx-guides/source/installation/config.rst
Original file line number Diff line number Diff line change
Expand Up @@ -101,6 +101,31 @@ Password complexity rules for "builtin" accounts can be adjusted with a variety
- :ref:`:PVGoodStrength`
- :ref:`:PVCustomPasswordResetAlertMessage`

.. _ongoing-security:

Ongoing Security of Your Installation
+++++++++++++++++++++++++++++++++++++

Like any application, you should keep up-to-date with patches to both the Dataverse software and the platform (usually Linux) it runs on. Dataverse releases are announced on the dataverse-community_ mailing list, the Dataverse blog_, and in chat.dataverse.org_.

.. _dataverse-community: https://groups.google.com/g/dataverse-community
.. _blog: https://dataverse.org/blog
.. _chat.dataverse.org: https://chat.dataverse.org

In addition to these public channels, you can subscribe to receive security notices via email from the Dataverse team. These notices are sent to the ``contact_email`` in the installation spreadsheet_ and you can open an issue in the dataverse-installations_ repo to add or change the contact email. Security notices are also sent to people and organizations that prefer to remain anonymous. To be added to this private list, please email [email protected].

.. _spreadsheet: https://docs.google.com/spreadsheets/d/1bfsw7gnHlHerLXuk7YprUT68liHfcaMxs1rFciA-mEo/edit#gid=0
.. _dataverse-installations: https://github.com/IQSS/dataverse-installations

For additional details about security practices by the Dataverse team, see the :doc:`/developers/security` section of the Developer Guide.

.. _reporting-security-issues:

Reporting Security Issues
+++++++++++++++++++++++++

If you have a security issue to report, please email it to [email protected].

.. _network-ports:

Network Ports
Expand Down Expand Up @@ -1570,30 +1595,118 @@ Defaults to ``5432``, the default PostgreSQL port.

Can also be set via *MicroProfile Config API* sources, e.g. the environment variable ``DATAVERSE_DB_PORT``.

.. _dataverse.solr.host:

dataverse.solr.host
+++++++++++++++++++

The hostname of a Solr server to connect to. Remember to restart / redeploy Dataverse after changing the setting
(as with :ref:`:SolrHostColonPort`).

Defaults to ``localhost``.

Can also be set via *MicroProfile Config API* sources, e.g. the environment variable ``DATAVERSE_SOLR_HOST``.
Defaults to ``solr``, when used with ``mp.config.profile=ct`` (:ref:`see below <:ApplicationServerSettings>`).

dataverse.solr.port
+++++++++++++++++++

The Solr server port to connect to. Remember to restart / redeploy Dataverse after changing the setting
(as with :ref:`:SolrHostColonPort`).

Defaults to ``8983``, the default Solr port.

Can also be set via *MicroProfile Config API* sources, e.g. the environment variable ``DATAVERSE_SOLR_PORT``.

dataverse.solr.core
+++++++++++++++++++

The name of the Solr core to use for this Dataverse installation. Might be used to switch to a different core quickly.
Remember to restart / redeploy Dataverse after changing the setting (as with :ref:`:SolrHostColonPort`).

Defaults to ``collection1``.

Can also be set via *MicroProfile Config API* sources, e.g. the environment variable ``DATAVERSE_SOLR_CORE``.

dataverse.solr.protocol
+++++++++++++++++++++++

The Solr server URL protocol for the connection. Remember to restart / redeploy Dataverse after changing the setting
(as with :ref:`:SolrHostColonPort`).

Defaults to ``http``, but might be set to ``https`` for extra secure Solr installations.

Can also be set via *MicroProfile Config API* sources, e.g. the environment variable ``DATAVERSE_SOLR_PROTOCOL``.

dataverse.solr.path
+++++++++++++++++++

The path part of the Solr endpoint URL (e.g. ``/solr/collection1`` of ``http://localhost:8389/solr/collection1``).
Might be used to target a Solr API at non-default places. Remember to restart / redeploy Dataverse after changing the
setting (as with :ref:`:SolrHostColonPort`).

Defaults to ``/solr/${dataverse.solr.core}``, interpolating the core name when used. Make sure to include the variable
when using it to configure your core name!

Can also be set via *MicroProfile Config API* sources, e.g. the environment variable ``DATAVERSE_SOLR_PATH``.

dataverse.rserve.host
+++++++++++++++++++++

Host name for Rserve, used for tasks that require use of R (to ingest RData files and to save tabular data as RData frames).
Host name for Rserve, used for tasks that require use of R (to ingest RData
files and to save tabular data as RData frames).

Defaults to ``localhost``.

Can also be set via *MicroProfile Config API* sources, e.g. the environment
variable ``DATAVERSE_RSERVE_HOST``.

dataverse.rserve.port
+++++++++++++++++++++

Port number for Rserve, used for tasks that require use of R (to ingest RData files and to save tabular data as RData frames).
Port number for Rserve, used for tasks that require use of R (to ingest RData
files and to save tabular data as RData frames).

Defaults to ``6311`` when not configured or no valid integer.

Can also be set via *MicroProfile Config API* sources, e.g. the environment
variable ``DATAVERSE_RSERVE_PORT``.

dataverse.rserve.user
+++++++++++++++++++++

Username for Rserve, used for tasks that require use of R (to ingest RData files and to save tabular data as RData frames).
Username for Rserve, used for tasks that require use of R (to ingest RData
files and to save tabular data as RData frames).

Defaults to ``rserve``.

Can also be set via *MicroProfile Config API* sources, e.g. the environment
variable ``DATAVERSE_RSERVE_USER``.

dataverse.rserve.password
+++++++++++++++++++++++++

Password for Rserve, used for tasks that require use of R (to ingest RData files and to save tabular data as RData frames).
Password for Rserve, used for tasks that require use of R (to ingest RData
files and to save tabular data as RData frames).

Defaults to ``rserve``.

Can also be set via *MicroProfile Config API* sources, e.g. the environment
variable ``DATAVERSE_RSERVE_PASSWORD``.

dataverse.rserve.tempdir
++++++++++++++++++++++++

Temporary directory used by Rserve (defaults to /tmp/Rserv). Note that this location is local to the host on which Rserv is running (specified in ``dataverse.rserve.host`` above). When talking to Rserve, Dataverse needs to know this location in order to generate absolute path names of the files on the other end.
Temporary directory used by Rserve (defaults to ``/tmp/Rserv``). Note that this
location is local to the host on which Rserv is running (specified in
``dataverse.rserve.host`` above). When talking to Rserve, Dataverse needs to
know this location in order to generate absolute path names of the files on the
other end.

Defaults to ``/tmp/Rserv``.

Can also be set via *MicroProfile Config API* sources, e.g. the environment
variable ``DATAVERSE_RSERVE_TEMPDIR``.

.. _dataverse.dropbox.key:

Expand Down Expand Up @@ -1814,6 +1927,21 @@ To facilitate large file upload and download, the Dataverse Software installer b

and restart Payara to apply your change.

mp.config.profile
+++++++++++++++++

MicroProfile Config 2.0 defines the `concept of "profiles" <https://download.eclipse.org/microprofile/microprofile-config-2.0/microprofile-config-spec-2.0.html#configprofile>`_.
They can be used to change configuration values by context. This is used in Dataverse to change some configuration
defaults when used inside container context rather classic installations.

As per the spec, you will need to set the configuration value ``mp.config.profile`` to ``ct`` as early as possible.
This is best done with a system property:

``./asadmin create-system-properties 'mp.config.profile=ct'``

You might also create your own profiles and use these, please refer to the upstream documentation linked above.


.. _database-settings:

Database Settings
Expand Down Expand Up @@ -2301,13 +2429,17 @@ Limit the number of files in a zip that your Dataverse installation will accept.

``curl -X PUT -d 2048 http://localhost:8080/api/admin/settings/:ZipUploadFilesLimit``

.. _:SolrHostColonPort:

:SolrHostColonPort
++++++++++++++++++

By default your Dataverse installation will attempt to connect to Solr on port 8983 on localhost. Use this setting to change the hostname or port. You must restart Payara after making this change.

``curl -X PUT -d localhost:8983 http://localhost:8080/api/admin/settings/:SolrHostColonPort``

**Note:** instead of using a database setting, you could alternatively use JVM settings like :ref:`dataverse.solr.host`.

:SolrFullTextIndexing
+++++++++++++++++++++

Expand Down Expand Up @@ -2697,6 +2829,7 @@ The URL for your Repository Storage Abstraction Layer (RSAL) installation. This
This setting controls which upload methods are available to users of your Dataverse installation. The following upload methods are available:

- ``native/http``: Corresponds to "Upload with HTTP via your browser" and APIs that use HTTP (SWORD and native).
- ``dvwebloader``: Corresponds to :ref:`folder-upload`. Note that ``dataverse.files.<id>.upload-redirect`` must be set to "true" on an S3 store for this method to show up in the UI. In addition, :ref:`:WebloaderUrl` must be set. CORS allowed on the S3 bucket. See :ref:`cors-s3-bucket`.
- ``dcm/rsync+ssh``: Corresponds to "Upload with rsync+ssh via Data Capture Module (DCM)". A lot of setup is required, as explained in the :doc:`/developers/big-data-support` section of the Developer Guide.

Out of the box only ``native/http`` is enabled and will work without further configuration. To add multiple upload method, separate them using a comma like this:
Expand Down Expand Up @@ -3198,6 +3331,11 @@ The interval in seconds between Dataverse calls to Globus to check on upload pro

A true/false option to add a Globus transfer option to the file download menu which is not yet fully supported in the dataverse-globus app. See :ref:`globus-support` for details.

.. _:WebloaderUrl:

:WebloaderUrl
+++++++++++++

The URL for main HTML file in https://github.com/gdcc/dvwebloader when that app is deployed. See also :ref:`:UploadMethods` for another required settings.

.. _supported MicroProfile Config API source: https://docs.payara.fish/community/docs/Technical%20Documentation/MicroProfile/Config/Overview.html
5 changes: 3 additions & 2 deletions doc/sphinx-guides/source/user/appendix.rst
Original file line number Diff line number Diff line change
Expand Up @@ -26,8 +26,8 @@ Detailed below are what metadata schemas we support for Citation and Domain Spec
- `Geospatial Metadata <https://docs.google.com/spreadsheet/ccc?key=0AjeLxEN77UZodHFEWGpoa19ia3pldEFyVFR0aFVGa0E#gid=4>`__ (`see .tsv version <https://github.com/IQSS/dataverse/blob/master/scripts/api/data/metadatablocks/geospatial.tsv>`__): compliant with DDI Lite, DDI 2.5 Codebook, DataCite, and Dublin Core. Country / Nation field uses `ISO 3166-1 <http://en.wikipedia.org/wiki/ISO_3166-1>`_ controlled vocabulary.
- `Social Science & Humanities Metadata <https://docs.google.com/spreadsheet/ccc?key=0AjeLxEN77UZodHFEWGpoa19ia3pldEFyVFR0aFVGa0E#gid=1>`__ (`see .tsv version <https://github.com/IQSS/dataverse/blob/master/scripts/api/data/metadatablocks/social_science.tsv>`__): compliant with DDI Lite, DDI 2.5 Codebook, and Dublin Core.
- `Astronomy and Astrophysics Metadata <https://docs.google.com/spreadsheet/ccc?key=0AjeLxEN77UZodHFEWGpoa19ia3pldEFyVFR0aFVGa0E#gid=3>`__ (`see .tsv version <https://github.com/IQSS/dataverse/blob/master/scripts/api/data/metadatablocks/astrophysics.tsv>`__): These metadata elements can be mapped/exported to the International Virtual Observatory Alliance’s (IVOA)
`VOResource Schema format <http://www.ivoa.net/documents/latest/RM.html>`__ and is based on
`Virtual Observatory (VO) Discovery and Provenance Metadata <http://perma.cc/H5ZJ-4KKY>`__.
`VOResource Schema format <http://www.ivoa.net/documents/latest/RM.html>`__ and is based on
`Virtual Observatory (VO) Discovery and Provenance Metadata <http://perma.cc/H5ZJ-4KKY>`__ (`see .tsv version <https://github.com/IQSS/dataverse/blob/master/scripts/api/data/metadatablocks/astrophysics.tsv>`__).
- `Life Sciences Metadata <https://docs.google.com/spreadsheet/ccc?key=0AjeLxEN77UZodHFEWGpoa19ia3pldEFyVFR0aFVGa0E#gid=2>`__ (`see .tsv version <https://github.com/IQSS/dataverse/blob/master/scripts/api/data/metadatablocks/biomedical.tsv>`__): based on `ISA-Tab Specification <https://isa-specs.readthedocs.io/en/latest/isamodel.html>`__, along with controlled vocabulary from subsets of the `OBI Ontology <http://bioportal.bioontology.org/ontologies/OBI>`__ and the `NCBI Taxonomy for Organisms <http://www.ncbi.nlm.nih.gov/Taxonomy/taxonomyhome.html/>`__.
- `Journal Metadata <https://docs.google.com/spreadsheets/d/13HP-jI_cwLDHBetn9UKTREPJ_F4iHdAvhjmlvmYdSSw/edit#gid=8>`__ (`see .tsv version <https://github.com/IQSS/dataverse/blob/master/scripts/api/data/metadatablocks/journals.tsv>`__): based on the `Journal Archiving and Interchange Tag Set, version 1.2 <https://jats.nlm.nih.gov/archiving/tag-library/1.2/chapter/how-to-read.html>`__.

Expand All @@ -36,6 +36,7 @@ Experimental Metadata

Unlike supported metadata, experimental metadata is not enabled by default in a new Dataverse installation. Feedback via any `channel <https://dataverse.org/contact>`_ is welcome!

- `CodeMeta Software Metadata <https://docs.google.com/spreadsheets/d/e/2PACX-1vTE-aSW0J7UQ0prYq8rP_P_AWVtqhyv46aJu9uPszpa9_UuOWRsyFjbWFDnCd7us7PSIpW7Qg2KwZ8v/pub>`__: based on the `CodeMeta Software Metadata Schema, version 2.0 <https://codemeta.github.io/terms/>`__ (`see .tsv version <https://github.com/IQSS/dataverse/blob/master/scripts/api/data/metadatablocks/codemeta.tsv>`__)
- `Computational Workflow Metadata <https://docs.google.com/spreadsheets/d/13HP-jI_cwLDHBetn9UKTREPJ_F4iHdAvhjmlvmYdSSw/edit#gid=447508596>`__ (`see .tsv version <https://github.com/IQSS/dataverse/blob/master/scripts/api/data/metadatablocks/computationalworkflow.tsv>`__): adapted from `Bioschemas Computational Workflow Profile, version 1.0 <https://bioschemas.org/profiles/ComputationalWorkflow/1.0-RELEASE>`__ and `Codemeta <https://codemeta.github.io/terms/>`__.

See Also
Expand Down
Loading

0 comments on commit c5f44d6

Please sign in to comment.