Skip to content

Commit 7910fec

Browse files
committed
Merge branch 'develop' into 11212_postgres_16 IQSS#11212
2 parents 51aefd0 + 66269e2 commit 7910fec

File tree

132 files changed

+3369
-538
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

132 files changed

+3369
-538
lines changed

.env

+2-2
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
APP_IMAGE=gdcc/dataverse:unstable
22
POSTGRES_VERSION=17
33
DATAVERSE_DB_USER=dataverse
4-
SOLR_VERSION=9.3.0
5-
SKIP_DEPLOY=0
4+
SOLR_VERSION=9.8.0
5+
SKIP_DEPLOY=0

.github/workflows/copy_labels.yml

+15
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
name: Copy labels from issue to pull request
2+
3+
on:
4+
pull_request:
5+
types: [opened]
6+
7+
jobs:
8+
copy-labels:
9+
runs-on: ubuntu-latest
10+
name: Copy labels from linked issues
11+
steps:
12+
- name: copy-labels
13+
uses: michalvankodev/[email protected]
14+
with:
15+
repo-token: ${{ secrets.GITHUB_TOKEN }}

.github/workflows/deploy_beta_testing.yml

+1-1
Original file line numberDiff line numberDiff line change
@@ -68,7 +68,7 @@ jobs:
6868
overwrite: true
6969

7070
- name: Execute payara war deployment remotely
71-
uses: appleboy/[email protected].0
71+
uses: appleboy/[email protected].1
7272
env:
7373
INPUT_WAR_FILE: ${{ env.war_file }}
7474
with:

conf/solr/schema.xml

+26-25
Original file line numberDiff line numberDiff line change
@@ -38,36 +38,37 @@
3838
catchall "text" field, and use that for searching.
3939
-->
4040

41-
<schema name="default-config" version="1.6">
41+
<schema name="default-config" version="1.7">
4242
<!-- attribute "name" is the name of this schema and is only used for display purposes.
43-
version="x.y" is Solr's version number for the schema syntax and
43+
version="x.y" is Solr's version number for the schema syntax and
4444
semantics. It should not normally be changed by applications.
4545
46-
1.0: multiValued attribute did not exist, all fields are multiValued
46+
1.0: multiValued attribute did not exist, all fields are multiValued
4747
by nature
48-
1.1: multiValued attribute introduced, false by default
49-
1.2: omitTermFreqAndPositions attribute introduced, true by default
48+
1.1: multiValued attribute introduced, false by default
49+
1.2: omitTermFreqAndPositions attribute introduced, true by default
5050
except for text fields.
5151
1.3: removed optional field compress feature
5252
1.4: autoGeneratePhraseQueries attribute introduced to drive QueryParser
53-
behavior when a single string produces multiple tokens. Defaults
53+
behavior when a single string produces multiple tokens. Defaults
5454
to off for version >= 1.4
55-
1.5: omitNorms defaults to true for primitive field types
55+
1.5: omitNorms defaults to true for primitive field types
5656
(int, float, boolean, string...)
5757
1.6: useDocValuesAsStored defaults to true.
58+
1.7: docValues defaults to true, uninvertible defaults to false.
5859
-->
5960

6061
<!-- Valid attributes for fields:
6162
name: mandatory - the name for the field
62-
type: mandatory - the name of a field type from the
63+
type: mandatory - the name of a field type from the
6364
fieldTypes section
6465
indexed: true if this field should be indexed (searchable or sortable)
6566
stored: true if this field should be retrievable
6667
docValues: true if this field should have doc values. Doc Values is
6768
recommended (required, if you are using *Point fields) for faceting,
6869
grouping, sorting and function queries. Doc Values will make the index
69-
faster to load, more NRT-friendly and more memory-efficient.
70-
They are currently only supported by StrField, UUIDField, all
70+
faster to load, more NRT-friendly and more memory-efficient.
71+
They are currently only supported by StrField, UUIDField, all
7172
*PointFields, and depending on the field type, they might require
7273
the field to be single-valued, be required or have a default value
7374
(check the documentation of the field type you're interested in for
@@ -82,9 +83,9 @@
8283
given field.
8384
When using MoreLikeThis, fields used for similarity should be
8485
stored for best performance.
85-
termPositions: Store position information with the term vector.
86+
termPositions: Store position information with the term vector.
8687
This will increase storage costs.
87-
termOffsets: Store offset information with the term vector. This
88+
termOffsets: Store offset information with the term vector. This
8889
will increase storage costs.
8990
required: The field is required. It will throw an error if the
9091
value does not exist
@@ -102,10 +103,10 @@
102103
<!-- In this _default configset, only four fields are pre-declared:
103104
id, _version_, and _text_ and _root_. All other fields will be type guessed and added via the
104105
"add-unknown-fields-to-the-schema" update request processor chain declared in solrconfig.xml.
105-
106-
Note that many dynamic fields are also defined - you can use them to specify a
106+
107+
Note that many dynamic fields are also defined - you can use them to specify a
107108
field's type via field naming conventions - see below.
108-
109+
109110
WARNING: The _text_ catch-all field will significantly increase your index size.
110111
If you don't need it, consider removing it and the corresponding copyField directive."
111112
-->
@@ -115,12 +116,12 @@
115116
<field name="_version_" type="plong" indexed="false" stored="false"/>
116117
<field name="_root_" type="string" indexed="true" stored="false" docValues="false" />
117118

118-
119-
120-
121-
122-
<!-- Start: Dataverse-specific -->
123-
119+
120+
121+
122+
123+
<!-- Start: Dataverse-specific -->
124+
124125
<!-- catchall field, containing all other searchable text fields (implemented
125126
via copyField further on in this schema -->
126127
<!-- Dataverse solr 7.3.0: for some reason the old text wasn't working so switched to _text_ for copyfields -->
@@ -216,7 +217,7 @@
216217
<!-- https://redmine.hmdc.harvard.edu/issues/3482 -->
217218
<!-- 'Sorting can be done on the "score" of the document, or on any multiValued="false" indexed="true" field provided that field is either non-tokenized (ie: has no Analyzer) or uses an Analyzer that only produces a single Term (ie: uses the KeywordTokenizer)' http://wiki.apache.org/solr/CommonQueryParameters#sort -->
218219
<!-- http://stackoverflow.com/questions/13360706/solr-4-0-alphabetical-sorting-trouble/13361226#13361226 -->
219-
<field name="nameSort" type="alphaOnlySort" indexed="true" stored="true"/>
220+
<field name="nameSort" type="string" indexed="true" stored="true"/>
220221

221222
<field name="dateSort" type="pdate" indexed="true" stored="true"/>
222223

@@ -785,7 +786,7 @@
785786
<filter class="solr.TrimFilterFactory" />
786787
<!-- The PatternReplaceFilter gives you the flexibility to use
787788
Java Regular expression to replace any sequence of characters
788-
matching a pattern with an arbitrary replacement string,
789+
matching a pattern with an arbitrary replacement string,
789790
which may include back references to portions of the original
790791
string matched by the pattern.
791792
@@ -798,8 +799,8 @@
798799
<!-- https://redmine.hmdc.harvard.edu/issues/3482#note-11 -->
799800
<!-- <filter class="solr.PatternReplaceFilterFactory" pattern="([^a-z])" replacement="" replace="all" /> -->
800801
</analyzer>
801-
</fieldType>
802-
802+
</fieldType>
803+
803804
<!-- The StrField type is not analyzed, but indexed/stored verbatim. -->
804805
<fieldType name="string" class="solr.StrField" sortMissingLast="true" docValues="true" />
805806
<fieldType name="strings" class="solr.StrField" sortMissingLast="true" multiValued="true" docValues="true" />

conf/solr/solrconfig.xml

+21-71
Original file line numberDiff line numberDiff line change
@@ -35,52 +35,7 @@
3535
that you fully re-index after changing this setting as it can
3636
affect both how text is indexed and queried.
3737
-->
38-
<luceneMatchVersion>9.7</luceneMatchVersion>
39-
40-
<!-- <lib/> directives can be used to instruct Solr to load any Jars
41-
identified and use them to resolve any "plugins" specified in
42-
your solrconfig.xml or schema.xml (ie: Analyzers, Request
43-
Handlers, etc...).
44-
45-
All directories and paths are resolved relative to the
46-
instanceDir.
47-
48-
Please note that <lib/> directives are processed in the order
49-
that they appear in your solrconfig.xml file, and are "stacked"
50-
on top of each other when building a ClassLoader - so if you have
51-
plugin jars with dependencies on other jars, the "lower level"
52-
dependency jars should be loaded first.
53-
54-
If a "./lib" directory exists in your instanceDir, all files
55-
found in it are included as if you had used the following
56-
syntax...
57-
58-
<lib dir="./lib" />
59-
-->
60-
61-
<!-- A 'dir' option by itself adds any files found in the directory
62-
to the classpath, this is useful for including all jars in a
63-
directory.
64-
65-
When a 'regex' is specified in addition to a 'dir', only the
66-
files in that directory which completely match the regex
67-
(anchored on both ends) will be included.
68-
69-
If a 'dir' option (with or without a regex) is used and nothing
70-
is found that matches, a warning will be logged.
71-
72-
The example below can be used to load a Solr Module along
73-
with their external dependencies.
74-
-->
75-
<!-- <lib dir="${solr.install.dir:../../../..}/modules/ltr/lib" regex=".*\.jar" /> -->
76-
77-
<!-- an exact 'path' can be used instead of a 'dir' to specify a
78-
specific jar file. This will cause a serious error to be logged
79-
if it can't be loaded.
80-
-->
81-
<!--
82-
<lib path="../a-jar-that-does-not-exist.jar" />
83-
-->
38+
<luceneMatchVersion>9.11</luceneMatchVersion>
8439

8540
<!-- Data Directory
8641
@@ -256,16 +211,9 @@
256211
is recommended (see below).
257212
"dir" - the target directory for transaction logs, defaults to the
258213
solr data directory.
259-
"numVersionBuckets" - sets the number of buckets used to keep
260-
track of max version values when checking for re-ordered
261-
updates; increase this value to reduce the cost of
262-
synchronizing access to version buckets during high-volume
263-
indexing, this requires 8 bytes (long) * numVersionBuckets
264-
of heap space per Solr core.
265214
-->
266215
<updateLog>
267216
<str name="dir">${solr.ulog.dir:}</str>
268-
<int name="numVersionBuckets">${solr.ulog.numVersionBuckets:65536}</int>
269217
</updateLog>
270218

271219
<!-- AutoCommit
@@ -360,6 +308,21 @@
360308
-->
361309
<maxBooleanClauses>${solr.max.booleanClauses:1024}</maxBooleanClauses>
362310

311+
<!-- Minimum acceptable prefix-size for prefix-based queries.
312+
313+
Prefix-based queries consume memory in proportion to the number of terms in the index
314+
that start with that prefix. Short prefixes tend to match many many more indexed-terms
315+
and consume more memory as a result, sometimes causing stability issues on the node.
316+
317+
This setting allows administrators to require that prefixes meet or exceed a specified
318+
minimum length requirement. Prefix queries that don't meet this requirement return an
319+
error to users. The limit may be overridden on a per-query basis by specifying a
320+
'minPrefixQueryTermLength' local-param value.
321+
322+
The flag value of '-1' can be used to disable enforcement of this limit.
323+
-->
324+
<minPrefixQueryTermLength>${solr.query.minPrefixLength:-1}</minPrefixQueryTermLength>
325+
363326
<!-- Solr Internal Query Caches
364327
Starting with Solr 9.0 the default cache implementation used is CaffeineCache.
365328
-->
@@ -494,23 +457,6 @@
494457
-->
495458
<queryResultMaxDocsCached>200</queryResultMaxDocsCached>
496459

497-
<!-- Use Filter For Sorted Query
498-
499-
A possible optimization that attempts to use a filter to
500-
satisfy a search. If the requested sort does not include
501-
score, then the filterCache will be checked for a filter
502-
matching the query. If found, the filter will be used as the
503-
source of document ids, and then the sort will be applied to
504-
that.
505-
506-
For most situations, this will not be useful unless you
507-
frequently get the same search repeatedly with different sort
508-
options, and none of them ever use "score"
509-
-->
510-
<!--
511-
<useFilterForSortedQuery>true</useFilterForSortedQuery>
512-
-->
513-
514460
<!-- Query Related Event Listeners
515461
516462
Various IndexSearcher related events can trigger Listeners to
@@ -1015,6 +961,10 @@
1015961
<str name="pattern">[^\w-\.]</str>
1016962
<str name="replacement">_</str>
1017963
</updateProcessor>
964+
<updateProcessor class="solr.NumFieldLimitingUpdateRequestProcessorFactory" name="max-fields">
965+
<int name="maxFields">1000</int>
966+
<bool name="warnOnly">true</bool>
967+
</updateProcessor>
1018968
<updateProcessor class="solr.ParseBooleanFieldUpdateProcessorFactory" name="parse-boolean"/>
1019969
<updateProcessor class="solr.ParseLongFieldUpdateProcessorFactory" name="parse-long"/>
1020970
<updateProcessor class="solr.ParseDoubleFieldUpdateProcessorFactory" name="parse-double"/>
@@ -1061,7 +1011,7 @@
10611011

10621012
<!-- The update.autoCreateFields property can be turned to false to disable schemaless mode -->
10631013
<updateRequestProcessorChain name="add-unknown-fields-to-the-schema" default="${update.autoCreateFields:false}"
1064-
processor="uuid,remove-blank,field-name-mutating,parse-boolean,parse-long,parse-double,parse-date,add-schema-fields">
1014+
processor="uuid,remove-blank,field-name-mutating,max-fields,parse-boolean,parse-long,parse-double,parse-date,add-schema-fields">
10651015
<processor class="solr.LogUpdateProcessorFactory"/>
10661016
<processor class="solr.DistributedUpdateProcessorFactory"/>
10671017
<processor class="solr.RunUpdateProcessorFactory"/>
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
## Cookie Consent Popup (GDPR)
2+
3+
For compliance with GDPR and other privacy regulations, advice on adding a cookie consent popup has been added to the guides. See the new [cookie consent](https://dataverse-guide--10320.org.readthedocs.build/en/10320/installation/config.html#adding-cookie-consent-for-gdpr-etc) section and #10320.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
New feature: Collection administrators can now configure which metadata fields appear during dataset creation through the `displayOnCreate` property, even when fields are not required. This provides greater control over metadata visibility and can help improve metadata completeness.
2+
3+
- The feature is currently available through the API endpoint `/api/dataverses/{alias}/inputLevels`
4+
- UI implementation will be available in a future release [#11221](https://github.com/IQSS/dataverse/issues/11221)
5+
6+
For more information, see the [API Guide](https://guides.dataverse.org/en/latest/api/native-api.html#update-collection-input-levels) and issues [#10476](https://github.com/IQSS/dataverse/issues/10476) and [#11224](https://github.com/IQSS/dataverse/pull/11224).
+12
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
## Dataset Types can be linked to Metadata Blocks
2+
3+
Metadata blocks (e.g. "CodeMeta") can now be linked to dataset types (e.g. "software") using new superuser APIs.
4+
5+
This will have the following effects for the APIs used by the new Dataverse UI ( https://github.com/IQSS/dataverse-frontend ):
6+
7+
- The list of fields shown when creating a dataset will include fields marked as "displayoncreate" (in the tsv/database) for metadata blocks (e.g. "CodeMeta") that are linked to the dataset type (e.g. "software") that is passed to the API.
8+
- The metadata blocks shown when editing a dataset will include metadata blocks (e.g. "CodeMeta") that are linked to the dataset type (e.g. "software") that is passed to the API.
9+
10+
Mostly in order to write automated tests for the above, a [displayOnCreate](https://dataverse-guide--11001.org.readthedocs.build/en/11001/api/native-api.html#set-displayoncreate-for-a-dataset-field) API endpoint has been added.
11+
12+
For more information, see the guides ([overview](https://dataverse-guide--11001.org.readthedocs.build/en/11001/user/dataset-management.html#dataset-types), [new APIs](https://dataverse-guide--11001.org.readthedocs.build/en/11001/api/native-api.html#link-dataset-type-with-metadata-blocks)), #10519 and #11001.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
The [tutorial](https://dataverse-guide--11201.org.readthedocs.build/en/11201/container/running/demo.html#root-collection-customization-alias-name-etc) on running Dataverse in Docker has been updated to explain how to configure the root collection using a JSON file. See also #10541 and #11201.
+11
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
# Signposting Output Now Contains Links to All Dataset Metadata Export Formats
2+
3+
When Signposting was added in Dataverse 5.14 (#8981), it only provided links for the `schema.org` metadata export format.
4+
5+
The output of HEAD, GET, and the Signposting "linkset" API have all been updated to include links to all available dataset metadata export formats (including any external exporters, such as Croissant, that have been enabled).
6+
7+
This provides a lightweight machine-readable way to first retrieve a list of links (via a HTTP HEAD request, for example) to each available metadata export format and then follow up with a request for the export format of interest.
8+
9+
In addition, the content type for the `schema.org` dataset metadata export format has been corrected. It was `application/json` and now it is `application/ld+json`.
10+
11+
See also [the docs](https://preview.guides.gdcc.io/en/develop/api/native-api.html#retrieve-signposting-information) and #10542.
+2
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
Release Highlights:
2+
An experimental "Archival" metadata block has been added, [downloadable](https://dataverse-guide--10626.org.readthedocs.build/en/10626/user/appendix.html) from the User Guide. The purpose of the metadata block is to enable repositories to register metadata relating to the potential archiving of the dataset at a depositor archive, whether that being your own institutional archive or an external archive, i.e. a historical archive. See also #10626.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
Solr 9.8.0 is now the version recommended in our installation guides and used with automated testing. Other libraries Dataverse uses have been updated as well.
2+
3+
For the upgrade instructions section:
4+
5+
[note that 6.6 may contain other solr-related changes, so the instructions may need to contain information merged from multiple release notes!]
6+
7+
If you are upgrading Solr:
8+
- Install solr-9.8.0 following the instructions from the Installation guide.
9+
- Run a full reindex to populate the search catalog.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
### Improvements to PID formatting in exports and citations
2+
3+
Multiple small issues with the formatting of PIDs in the
4+
DDI exporters, and EndNote and BibTeX citation formats have
5+
been addressed. These should improve the ability to import
6+
Dataverse citations into reference managers and fix potential
7+
issues harvesting datasets using PermaLinks.
8+
9+
Backward Incompatibility
10+
11+
Changes to PID formatting occur in the DDI/DDI Html export formats
12+
and the EndNote and BibTex citation formats. These changes correct
13+
errors and improve conformance with best practices but could break
14+
parsing of these formats.
15+
16+
For more information, see #10790.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
The OAI-ORE exporter can now export metadata containing nested compound fields (i.e. compound fields within compound fields). See #10809 and #11190.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
A bug that caused replacing files via API when file PIDs were enabled has been fixed.
2+
3+
For testing purposes, the FAKE PID provider can now be used with file PIDs enabled. (The FAKE provider is not recommended for any production use.)
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
Minor styling fixes for the Related Publication Field and fields using ORCID or ROR have been made (see #11053, #10964, #11106)

doc/release-notes/11095-fix-extcvoc-indexing.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -3,5 +3,5 @@ in indexing failure for the dataset (e.g. when the script tried to index both th
33
Dataverse has been updated to correctly indicate the need for a multi-valued Solr field in these cases in the call to /api/admin/index/solr/schema.
44
Configuring the Solr schema and the update-fields.sh script as usually recommended when using custom metadata blocks will resolve the issue.
55

6-
The overall release notes should include a Solr update (which hopefully is required by an update to 9.7.0 anyway) and our standard instructions
6+
The overall release notes should include a Solr update (which hopefully is required by an update to 9.8.0 anyway) and our standard instructions
77
should change to recommending use of the update-fields.sh script when using custom metadatablocks *and/or external vocabulary scripts*.

0 commit comments

Comments
 (0)