Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HSEARCH-4487 + HSEARCH-4980 Two fixes for Jakarta Batch Mass Indexing job: MySQL with jdbcFetchSize=Integer.MIN_VALUE + embedded ids #3754

Merged
merged 8 commits into from
Oct 2, 2023
1 change: 0 additions & 1 deletion Jenkinsfile
Original file line number Diff line number Diff line change
Expand Up @@ -532,7 +532,6 @@ stage('Non-default environments') {
String mavenBuildAdditionalArgs = ''' \
-pl !documentation \
-pl !integrationtest/mapper/orm-spring \
-pl !integrationtest/mapper/orm-jakarta-batch \
yrodiere marked this conversation as resolved.
Show resolved Hide resolved
-pl !integrationtest/v5migrationhelper/orm \
-pl !integrationtest/java/modules/orm-lucene \
-pl !integrationtest/java/modules/orm-elasticsearch \
Expand Down
32 changes: 27 additions & 5 deletions documentation/src/main/asciidoc/migration/index.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -288,26 +288,48 @@ alter table hsearch_agent

The configuration properties are backward-compatible with Hibernate Search {hibernateSearchPreviousStableVersionShort}.

`hibernate.search.coordination.entity.mapping.outboxevent.uuid_type` and `hibernate.search.coordination.entity.mapping.agent.uuid_type`
However, some configuration values are deprecated:

* `hibernate.search.coordination.entity.mapping.outboxevent.uuid_type` and `hibernate.search.coordination.entity.mapping.agent.uuid_type`
now accept names of SQL type codes from `org.hibernate.type.SqlTypes` or their corresponding int values.
The value `default` is still valid. `uuid-binary` and `uuid-char` are accepted and converted to their corresponding `org.hibernate.type.SqlTypes` alternatives, but they are deprecated and will not be accepted in the future versions of Hibernate Search.

[[api]]
== API changes

The complement operator (`~`) used for link:{hibernateSearchDocUrl}#search-dsl-predicate-regexp-flags[matching regular expression patterns with flags]
The https://hibernate.org/community/compatibility-policy/#code-categorization[API]
is for the most part backward-compatible with Hibernate Search {hibernateSearchPreviousStableVersionShort}.

However, some APIs changed:

* The complement operator (`~`) used for link:{hibernateSearchDocUrl}#search-dsl-predicate-regexp-flags[matching regular expression patterns with flags]
is now removed with no alternative to replace it.
* The Hibernate Search job for Jakarta Batch no longer accepts a `customQueryHQL` / `.restrictedBy(String)` parameter.
Use `.reindexOnly(String hql, Map parameters)` instead.
* The Hibernate Search job for Jakarta Batch no longer accepts a `sessionClearInterval` / `.sessionClearInterval(int)` parameter.
Use `entityFetchSize`/`.entityFetchSize(int)` instead.

[[spi]]
== SPI changes

The https://hibernate.org/community/compatibility-policy/#code-categorization[SPI]
are backward-compatible with Hibernate Search {hibernateSearchPreviousStableVersionShort}.
are for the most part backward-compatible with Hibernate Search {hibernateSearchPreviousStableVersionShort}.

[[behavior]]
== Behavior changes

The default value for `hibernate.search.backend.query.shard_failure.ignore` is changed from `true` to `false` which means
The behavior of Hibernate Search {hibernateSearchVersion}
is for the most part backward-compatible with Hibernate Search {hibernateSearchPreviousStableVersionShort}.

However, parts of Hibernate Search now behave differently:

* The default value for `hibernate.search.backend.query.shard_failure.ignore` is changed from `true` to `false` which means
that now Hibernate Search will throw an exception if at least one shard failed during a search operation.
To get the previous behavior set this configuration property explicitly to `true`.
Note, this setting must be set for each elasticsearch backend, if multiple are defined.
Note, this setting must be set for each elasticsearch backend, if multiple are defined.
* The Hibernate Search job for Jakarta Batch will now list identifiers in one session (with one DB connection),
while loading entities in another (with another DB connection).
This is to sidestep limitations of scrolling in some JDBC drivers.
* For entities whose document ID is based on a different property than the entity ID,
the Hibernate Search job for Jakarta Batch will now build the partition plan using that property
instead of using the entity ID indiscriminately.
Original file line number Diff line number Diff line change
Expand Up @@ -197,9 +197,9 @@ include::{sourcedir}/org/hibernate/search/documentation/mapper/orm/indexing/Hibe
----
<1> <<entrypoints-search-session,Retrieve the `SearchSession`>>.
<2> Create a `MassIndexer` targeting every indexed entity type.
<3> Reindex only the books published before year 2100.
<3> Reindex only the books published before year 1950.
<4> Reindex only the authors born prior to a given local date.
<5> In this example the date is passed as a query parameter.
<5> In this example the cutoff date is passed as a query parameter.
<6> Start the mass indexing process and return when it is over.
====

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -100,10 +100,9 @@ accept special values, for example MySQL might benefit from using `Integer#MIN_V
will attempt to preload everything in memory.

|`entityFetchSize` / `.entityFetchSize(int)`
|The value of `sessionClearInterval`
|Specifies the fetch size to be used when loading entities from database. Some databases
accept special values, for example MySQL might benefit from using `Integer#MIN_VALUE`, otherwise it
will attempt to preload everything in memory.
|200, or the value of `checkpointInterval` if it is smaller
|Specifies the fetch size to be used when loading entities from database. The value defined must be greater
than 0, and equal to or less than the value of `checkpointInterval`.

|`customQueryHQL` / `.restrictedBy(String)`
|-
Expand Down Expand Up @@ -132,11 +131,6 @@ request maximum.
|The number of entities to process before triggering a checkpoint. The value defined must be greater
than 0, and equal to or less than the value of `rowsPerPartition`.

|`sessionClearInterval` / `.sessionClearInterval(int)`
|200, or the value of `checkpointInterval` if it is smaller
|The number of entities to process before clearing the session. The value defined must be greater
than 0, and equal to or less than the value of `checkpointInterval`.

|`entityManagerFactoryReference` / `.entityManagerFactoryReference(String)`
|-
|**This parameter is required** when there is more than one persistence unit.
Expand All @@ -149,58 +143,37 @@ The string that will identify the `EntityManagerFactory`.
|See <<mapper-orm-indexing-jakarta-batch-emf,Selecting the persistence unit (EntityManagerFactory)>>
|===

[[mapper-orm-indexing-jakarta-batch-indexing-mode]]
== [[jsr-352-indexing-mode]] Indexing mode
[[mapper-orm-indexing-jakarta-batch-conditional]]
== [[mapper-orm-indexing-jakarta-batch-indexing-mode]] [[jsr-352-indexing-mode]] Conditional indexing

You can select a subset of target entities to be indexed
by passing a condition as string to the mass indexing job.
The condition will be applied when querying the database for entities to index.

The mass indexing job allows you to define your own entities to be indexed -- you can start a full
indexing or a partial indexing through 2 different methods: selecting the desired entity types,
or using HQL.
The condition string is expected to follow the link:{hibernateDocUrl}#query-language[Hibernate Query Language (HQL)] syntax.
Accessible entity properties are those of the entity being reindexed (and nothing more).

.Conditional reindexing using a `restrictedBy` HQL parameter
.Conditional indexing using a `reindexOnly` HQL parameter
====
[source, JAVA, indent=0]
----
include::{sourcedir}/org/hibernate/search/documentation/mapper/orm/indexing/HibernateOrmJakartaBatchIT.java[tags=hql]
include::{sourcedir}/org/hibernate/search/documentation/mapper/orm/indexing/HibernateOrmJakartaBatchIT.java[tags=reindexOnly]
----
<1> Start building parameters for a mass-indexing job.
<2> Define the entity type to be indexed.
<3> Restrict the scope of the job using an HQL restriction.
<4> Get `JobOperator` form the framework.
<5> Start the job.
<3> Reindex only the authors born prior to a given local date.
<4> In this example the cutoff date is passed as a query parameter.
<5> Get `JobOperator` from the framework.
<6> Start the job.
====

While the full indexing is useful when you perform the very first indexing, or
after extensive changes to your whole database, it may also be time-consuming.
If your want to reindex only part of your data, you need to add restrictions using HQL:
they help you to define a customized selection, and only the entities inside that selection will be indexed. A typical
use-case is to index the new entities appeared since yesterday.

Note that, as detailed below, some features may not be supported depending on the indexing mode.

.Comparison of each indexing mode
|===
| Indexing mode | Scope | Parallel Indexing

| Full Indexation
| All entities
| Supported

| HQL
| Some entities
| Not supported
|===

[WARNING]
====
When using the HQL mode, there isn't any query validation before the job's start.
If the query is invalid, the job will start and fail.

Also, parallel indexing is disabled in HQL mode,
because our current parallelism implementations relies on selection order,
which might not be provided by the HQL given by user.
Even if the reindexing is applied on a subset of entities, by default *all entities* will be purged at the start.
The purge <<mapper-orm-indexing-jakarta-batch-parameters,can be disabled completely>>,
but when enabled there is no way to filter the entities that will be purged.

Because of those limitations, we suggest you use this approach only for indexing small numbers of entities,
and only if you know that no entities matching the query will be created during indexing.
See https://hibernate.atlassian.net/browse/HSEARCH-3304[HSEARCH-3304] for more information.
====

[[mapper-orm-indexing-jakarta-batch-parallel-indexing]]
Expand Down Expand Up @@ -315,7 +288,7 @@ But the size of a chunk is not only about saving progress, it is also about perf
* a new Hibernate session is opened for each chunk;
* a new transaction is started for each chunk;
* inside a chunk, the session is cleared periodically
according to the `sessionClearInterval` parameter,
according to the `entityFetchSize` parameter,
which must thereby be smaller than (or equal to) the chunk size;
* documents are flushed to the index at the end of each chunk.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -86,7 +86,7 @@ private Book newBook(int id) {
return book;
}

protected Author newAuthor(int id) {
private Author newAuthor(int id) {
Author author = new Author();
author.setId( id );
author.setFirstName( "John" + id );
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,8 @@
import static org.hibernate.search.util.impl.integrationtest.mapper.orm.OrmUtils.with;
import static org.junit.Assume.assumeTrue;

import java.time.Year;
import java.util.Map;
import java.util.Properties;

import jakarta.batch.operations.JobOperator;
Expand Down Expand Up @@ -61,36 +63,27 @@ public void simple() throws Exception {
}

@Test
public void hql() throws Exception {
// tag::hql[]
public void reindexOnly() throws Exception {
// tag::reindexOnly[]
Properties jobProps = MassIndexingJob.parameters() // <1>
.forEntities( Author.class ) // <2>
.restrictedBy( "from Author a where a.lastName = 'Smith1'" ) // <3>
.reindexOnly( "birthDate < :cutoff", // <3>
Map.of( "cutoff", Year.of( 1950 ).atDay( 1 ) ) ) // <4>
.build();

JobOperator jobOperator = BatchRuntime.getJobOperator(); // <4>
long executionId = jobOperator.start( MassIndexingJob.NAME, jobProps ); // <5>
// end::hql[]
JobOperator jobOperator = BatchRuntime.getJobOperator(); // <5>
long executionId = jobOperator.start( MassIndexingJob.NAME, jobProps ); // <6>
// end::reindexOnly[]

JobExecution jobExecution = jobOperator.getJobExecution( executionId );
jobExecution = waitForTermination( jobOperator, jobExecution, JOB_TIMEOUT_MS );
assertThat( jobExecution.getBatchStatus() ).isEqualTo( BatchStatus.COMPLETED );

with( entityManagerFactory ).runNoTransaction( entityManager -> {
assertBookAndAuthorCount( entityManager, 0, NUMBER_OF_BOOKS / 2 );
assertBookAndAuthorCount( entityManager, 0, 500 );
} );
}

@Override
protected Author newAuthor(int id) {
Author author = new Author();
author.setId( id );
author.setFirstName( "John" + id );
// use the id % 2
author.setLastName( "Smith" + ( id % 2 ) );
return author;
}

void assertBookAndAuthorCount(EntityManager entityManager, int expectedBookCount, int expectedAuthorCount) {
setupHelper.assertions().searchAfterIndexChanges(
entityManager.getEntityManagerFactory().unwrap( SessionFactory.class ),
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
import static org.hibernate.search.util.impl.test.FutureAssert.assertThatFuture;

import java.lang.invoke.MethodHandles;
import java.time.LocalDate;
import java.time.Year;
import java.util.concurrent.CompletionStage;
import java.util.concurrent.Future;

Expand Down Expand Up @@ -64,9 +64,9 @@ public void reindexOnly() {
Search.session( entityManager );
// tag::reindexOnly[]
MassIndexer massIndexer = searchSession.massIndexer(); // <2>
massIndexer.type( Book.class ).reindexOnly( "e.publicationYear <= 2100" ); // <3>
massIndexer.type( Author.class ).reindexOnly( "e.birthDate < :birthDate" ) // <4>
.param( "birthDate", LocalDate.ofYearDay( 2100, 77 ) ); // <5>
massIndexer.type( Book.class ).reindexOnly( "publicationYear < 1950" ); // <3>
massIndexer.type( Author.class ).reindexOnly( "birthDate < :cutoff" ) // <4>
.param( "cutoff", Year.of( 1950 ).atDay( 1 ) ); // <5>
// end::reindexOnly[]
if ( !BackendConfigurations.simple().supportsExplicitPurge() ) {
massIndexer.purgeAllOnStart( false );
Expand All @@ -78,7 +78,7 @@ public void reindexOnly() {
catch (InterruptedException e) {
Thread.currentThread().interrupt();
}
assertBookAndAuthorCount( entityManager, 651, 651 );
assertBookAndAuthorCount( entityManager, 500, 500 );
} );
}

Expand Down
Loading
Loading