Skip to content

Commit f39c7f2

Browse files
committed
Merge branch 'main' of https://github.com/apache/lucene into deprecated_usage
2 parents 50f2d96 + 8d4f7a6 commit f39c7f2

File tree

171 files changed

+4984
-1666
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

171 files changed

+4984
-1666
lines changed

.gitattributes

+2-2
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Ignore all differences in line endings for the lock file.
2-
versions.lock text eol=lf
3-
versions.props text eol=lf
2+
versions.lock text eol=lf
3+
versions.toml text eol=lf
44

55
# Gradle files are always in LF.
66
*.gradle text eol=lf

README.md

+1
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,7 @@ Apache Lucene is a high-performance, full-featured text search engine library
2323
written in Java.
2424

2525
[![Build Status](https://ci-builds.apache.org/job/Lucene/job/Lucene-Artifacts-main/badge/icon?subject=Lucene)](https://ci-builds.apache.org/job/Lucene/job/Lucene-Artifacts-main/)
26+
[![Revved up by Develocity](https://img.shields.io/badge/Revved%20up%20by-Develocity-06A0CE?logo=Gradle&labelColor=02303A)](https://ge.apache.org/scans?search.buildToolType=gradle&search.rootProjectNames=lucene-root)
2627

2728
## Online Documentation
2829

dev-tools/doap/lucene.rdf

+7
Original file line numberDiff line numberDiff line change
@@ -67,6 +67,13 @@
6767
</maintainer>
6868

6969
<!-- NOTE: please insert releases in numeric order, NOT chronologically. -->
70+
<release>
71+
<Version>
72+
<name>lucene-9.11.1</name>
73+
<created>2024-06-27</created>
74+
<revision>9.11.1</revision>
75+
</Version>
76+
</release>.
7077
<release>
7178
<Version>
7279
<name>lucene-9.11.0</name>
+78
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,78 @@
1+
#!/usr/bin/env python3
2+
# -*- coding: utf-8 -*-
3+
# Licensed to the Apache Software Foundation (ASF) under one or more
4+
# contributor license agreements. See the NOTICE file distributed with
5+
# this work for additional information regarding copyright ownership.
6+
# The ASF licenses this file to You under the Apache License, Version 2.0
7+
# (the "License"); you may not use this file except in compliance with
8+
# the License. You may obtain a copy of the License at
9+
#
10+
# http://www.apache.org/licenses/LICENSE-2.0
11+
#
12+
# Unless required by applicable law or agreed to in writing, software
13+
# distributed under the License is distributed on an "AS IS" BASIS,
14+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
15+
# See the License for the specific language governing permissions and
16+
# limitations under the License.
17+
18+
import os
19+
import re
20+
import subprocess
21+
import sys
22+
import tempfile
23+
import urllib.request
24+
25+
'''
26+
A simple tool to see diffs between main's version of CHANGES.txt entries for
27+
a given release vs the stable branch's version. It's best to keep these 1)
28+
identical and 2) matching what changes were actually backported to be honest
29+
to users and avoid future annoying conflicts on backport.
30+
'''
31+
32+
# e.g. python3 -u diff_lucene_changes.py branch_9_9 main 9.9.0
33+
34+
#
35+
36+
def get_changes_url(branch_name):
37+
if os.path.isdir(branch_name):
38+
url = f'file://{branch_name}/lucene/CHANGES.txt'
39+
else:
40+
url = f'https://raw.githubusercontent.com/apache/lucene/{branch_name}/lucene/CHANGES.txt'
41+
print(f'NOTE: resolving {branch_name} --> {url}')
42+
return url
43+
44+
def extract_release_section(changes_txt, release_name):
45+
return re.search(f'=======+ Lucene {re.escape(release_name)} =======+(.*?)=======+ Lucene .*? =======+$',
46+
changes_txt.decode('utf-8'), re.MULTILINE | re.DOTALL).group(1).encode('utf-8')
47+
48+
def main():
49+
if len(sys.argv) < 3 or len(sys.argv) > 5:
50+
print('\nUsage: python3 -u dev-tools/scripts/diff_lucene_changes.py <branch1-or-local-clone> <branch2-or-local-clone> <release-name> [diff-commandline-extras]\n')
51+
print(' e.g.: python3 -u dev-tools/scripts/diff_lucene_changes.py branch_9_9 /l/trunk 9.9.0 "-w"\n')
52+
sys.exit(1)
53+
54+
branch1 = sys.argv[1]
55+
branch2 = sys.argv[2]
56+
release_name = sys.argv[3]
57+
58+
if len(sys.argv) > 4:
59+
diff_cl_extras = [sys.argv[4]]
60+
else:
61+
diff_cl_extras = []
62+
63+
branch1_changes = extract_release_section(urllib.request.urlopen(get_changes_url(branch1)).read(),
64+
release_name)
65+
branch2_changes = extract_release_section(urllib.request.urlopen(get_changes_url(branch2)).read(),
66+
release_name)
67+
68+
with tempfile.NamedTemporaryFile() as f1, tempfile.NamedTemporaryFile() as f2:
69+
f1.write(branch1_changes)
70+
f2.write(branch2_changes)
71+
72+
command = ['diff'] + diff_cl_extras + [f1.name, f2.name]
73+
74+
# diff returns non-zero exit status when there are diffs, so don't pass check=True
75+
print(subprocess.run(command, check=False, capture_output=True).stdout.decode('utf-8'))
76+
77+
if __name__ == '__main__':
78+
main()

lucene/CHANGES.txt

+81-8
Original file line numberDiff line numberDiff line change
@@ -80,10 +80,6 @@ API Changes
8080
* GITHUB#12875: Ensure token position is always increased in PathHierarchyTokenizer and ReversePathHierarchyTokenizer
8181
and resulting tokens do not overlap. (Michael Froh, Lukáš Vlček)
8282

83-
* GITHUB#12624, GITHUB#12831: Allow FSTCompiler to stream to any DataOutput while building, and
84-
make compile() only return the FSTMetadata. For on-heap (default) use case, please use
85-
FST.fromFSTReader(fstMetadata, fstCompiler.getFSTReader()) to create the FST. (Anh Dung Bui)
86-
8783
* GITHUB#13146, GITHUB#13148: Remove ByteBufferIndexInput and only use MemorySegment APIs
8884
for MMapDirectory. (Uwe Schindler)
8985

@@ -133,6 +129,11 @@ New Features
133129
DocValuesSkipper abstraction. A new flag is added to FieldType.java that configures whether
134130
to create a "skip index" for doc values. (Ignacio Vera)
135131

132+
* GITHUB#13563: Add levels to doc values skip index. (Ignacio Vera)
133+
134+
* GITHUB#13597: Align doc value skipper interval boundaries when an interval contains a constant
135+
value. (Ignacio Vera)
136+
136137
Improvements
137138
---------------------
138139

@@ -256,11 +257,23 @@ API Changes
256257

257258
New Features
258259
---------------------
259-
(No changes)
260+
261+
* GITHUB#13430: Allow configuring the search concurrency via
262+
TieredMergePolicy#setTargetSearchConcurrency. This in-turn instructs the
263+
merge policy to try to have at least this number of segments on the highest
264+
tier. (Adrien Grand, Carlos Delgado)
265+
266+
* GITHUB#13517: Allow configuring the search concurrency on LogDocMergePolicy
267+
and LogByteSizeMergePolicy via a new #setTargetConcurrency setter.
268+
(Adrien Grand)
260269

261270
Improvements
262271
---------------------
263-
(No changes)
272+
273+
* GITHUB#13548: Refactor and javadoc update for KNN vector writer classes. (Patrick Zhai)
274+
275+
* GITHUB#13562: Add Intervals.regexp and Intervals.range methods to produce IntervalsSource
276+
for regexp and range queries. (Mayya Sharipova)
264277

265278
Optimizations
266279
---------------------
@@ -279,14 +292,65 @@ Optimizations
279292

280293
* GITHUB#12941: Don't preserve auxiliary buffer contents in LSBRadixSorter if it grows. (Stefan Vodita)
281294

295+
* GITHUB#13175: Stop double-checking priority queue inserts in some FacetCount classes. (Jakub Slowinski)
296+
297+
* GITHUB#13538: Slightly reduce heap usage for HNSW and scalar quantized vector writers. (Ben Trent)
298+
299+
* GITHUB#12100: WordBreakSpellChecker.suggestWordBreaks now does a breadth first search, allowing it to return
300+
better matches with fewer evaluations (hossman)
301+
302+
* GITHUB#13582: Stop requiring MaxScoreBulkScorer's outer window from having at
303+
least INNER_WINDOW_SIZE docs. (Adrien Grand)
304+
305+
* GITHUB#13570, GITHUB#13574, GITHUB#13535: Avoid performance degradation with closing shared Arenas.
306+
Closing many individual index files can potentially lead to a degradation in execution performance.
307+
Index files are mmapped one-to-one with the JDK's foreign shared Arena. The JVM deoptimizes the top
308+
few frames of all threads when closing a shared Arena (see JDK-8335480). We mitigate this situation
309+
by 1) using a confined Arena where appropriate, and 2) grouping files from the same segment to a
310+
single shared Arena. (Chris Hegarty, Michael Gibney, Uwe Schindler)
311+
312+
Changes in runtime behavior
313+
---------------------
314+
315+
* GITHUB#13472: When an executor is provided to the IndexSearcher constructor, the searcher now executes tasks on the
316+
thread that invoked a search as well as its configured executor. Users should reduce the executor's thread-count by 1
317+
to retain the previous level of parallelism. Moreover, it is now possible to start searches from the same executor
318+
that is configured in the IndexSearcher without risk of deadlocking. A separate executor for starting searches is no
319+
longer required. (Armin Braun)
320+
282321
Bug Fixes
283322
---------------------
284-
(No changes)
323+
324+
* GITHUB#13384: Fix highlighter to use longer passages instead of shorter individual terms. (Zack Kendall)
325+
326+
* GITHUB#13463: Address bug in MultiLeafKnnCollector causing #minCompetitiveSimilarity to stay artificially low in
327+
some corner cases. (Greg Miller)
328+
329+
* GITHUB#13553: Correct RamUsageEstimate for scalar quantized knn vector formats so that raw vectors are correctly
330+
accounted for. (Ben Trent)
285331

286332
Other
287333
--------------------
288334
(No changes)
289335

336+
======================== Lucene 9.11.1 =======================
337+
338+
Bug Fixes
339+
---------------------
340+
341+
* GITHUB#13498: Avoid performance regression by constructing lazily the PointTree in NumericComparator. (Ignacio Vera)
342+
343+
* GITHUB#13501, GITHUB#13478: Remove intra-merge parallelism for everything except HNSW graph merges. (Ben Trent)
344+
345+
* GITHUB#13498, GITHUB#13340: Allow adding a parent field to an index with no fields (Michael Sokolov)
346+
347+
* GITHUB#12431: Fix IndexOutOfBoundsException thrown in DefaultPassageFormatter
348+
by unordered matches. (Stephane Campinas)
349+
350+
* GITHUB#13493: StringValueFacetCounts stops throwing NPE when faceting over an empty match-set. (Grebennikov Roman,
351+
Stefan Vodita)
352+
353+
290354
======================== Lucene 9.11.0 =======================
291355

292356
API Changes
@@ -494,13 +558,23 @@ API Changes
494558

495559
* GITHUB#12854: Mark DrillSideways#createDrillDownFacetsCollector as @Deprecated. (Greg Miller)
496560

561+
* GITHUB#12624, GITHUB#12831: Allow FSTCompiler to stream to any DataOutput while building, and
562+
make compile() only return the FSTMetadata. For on-heap (default) use case, please use
563+
FST.fromFSTReader(fstMetadata, fstCompiler.getFSTReader()) to create the FST. (Anh Dung Bui)
564+
497565
New Features
498566
---------------------
499567
* GITHUB#12679: Add support for similarity-based vector searches using [Byte|Float]VectorSimilarityQuery. Uses a new
500568
VectorSimilarityCollector to find all vectors scoring above a `resultSimilarity` while traversing the HNSW graph till
501569
better-scoring nodes are available, or the best candidate is below a score of `traversalSimilarity` in the lowest
502570
level. (Aditya Prakash, Kaival Parikh)
503571

572+
* GITHUB#12829: For indices newly created as of 9.10.0 onwards, IndexWriter preserves document blocks indexed via
573+
IndexWriter#addDocuments or IndexWriter#updateDocuments also when index sorting is configured. Document blocks are
574+
maintained alongside their parent documents during sort and merge. IndexWriterConfig accepts a parent field that is used
575+
to maintain block orders if index sorting is used. Note, this is fully optional in Lucene 9.x while will be mandatory for
576+
indices that use document blocks together with index sorting as of 10.0.0. (Simon Willnauer)
577+
504578
* GITHUB#12336: Index additional data per facet label in the taxonomy. (Shai Erera, Egor Potemkin, Mike McCandless,
505579
Stefan Vodita)
506580

@@ -592,7 +666,6 @@ Build
592666

593667
Other
594668
---------------------
595-
596669
* GITHUB#11023: Removing some dead code in CheckIndex. (Jakub Slowinski)
597670

598671
* GITHUB#11023: Removing @lucene.experimental tags in testXXX methods in CheckIndex. (Jakub Slowinski)

lucene/backward-codecs/src/java/org/apache/lucene/backward_codecs/lucene40/blocktree/FieldReader.java

+8-12
Original file line numberDiff line numberDiff line change
@@ -88,21 +88,17 @@ public final class FieldReader extends Terms {
8888
(new ByteArrayDataInput(rootCode.bytes, rootCode.offset, rootCode.length)).readVLong()
8989
>>> Lucene40BlockTreeTermsReader.OUTPUT_FLAGS_NUM_BITS;
9090
// Initialize FST always off-heap.
91-
final IndexInput clone = indexIn.clone();
92-
clone.seek(indexStartFP);
91+
final FST.FSTMetadata<BytesRef> fstMetadata;
9392
if (metaIn == indexIn) { // Only true before Lucene 8.6
94-
index =
95-
new FST<>(
96-
readMetadata(clone, ByteSequenceOutputs.getSingleton()),
97-
clone,
98-
new OffHeapFSTStore());
93+
final IndexInput clone = indexIn.clone();
94+
clone.seek(indexStartFP);
95+
fstMetadata = readMetadata(clone, ByteSequenceOutputs.getSingleton());
96+
// FST bytes actually only start after the metadata.
97+
indexStartFP = clone.getFilePointer();
9998
} else {
100-
index =
101-
new FST<>(
102-
readMetadata(metaIn, ByteSequenceOutputs.getSingleton()),
103-
clone,
104-
new OffHeapFSTStore());
99+
fstMetadata = readMetadata(metaIn, ByteSequenceOutputs.getSingleton());
105100
}
101+
index = FST.fromFSTReader(fstMetadata, new OffHeapFSTStore(indexIn, indexStartFP, fstMetadata));
106102
/*
107103
if (false) {
108104
final String dotFileName = segment + "_" + fieldInfo.name + ".dot";

0 commit comments

Comments
 (0)