From ab1f6e72af4f39b2dfce1a243fd87f8e60735ff1 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Yoann=20Rodi=C3=A8re?= Date: Mon, 2 Oct 2023 14:12:51 +0200 Subject: [PATCH] HSEARCH-4923 Tidy up documentation of indexing Co-Authored-By: marko-bekhta --- .../public/reference/_architecture.adoc | 4 +- .../public/reference/_indexing-basics.adoc | 43 +++ .../public/reference/_indexing-explicit.adoc | 292 ++++++++++++++++++ ..._indexing-listener-triggered.asciidoc.adoc | 151 ++------- .../reference/_indexing-massindexer.adoc | 2 +- .../public/reference/_indexing-plan.adoc | 253 +++++++++------ .../asciidoc/public/reference/_indexing.adoc | 10 +- .../public/reference/_limitations.adoc | 10 +- .../_mapper-orm-indexing-jakarta-batch.adoc | 12 +- .../_mapper-orm-indexing-manual.adoc | 157 ---------- .../public/reference/_mapping-reindexing.adoc | 5 +- 11 files changed, 548 insertions(+), 391 deletions(-) create mode 100644 documentation/src/main/asciidoc/public/reference/_indexing-basics.adoc create mode 100644 documentation/src/main/asciidoc/public/reference/_indexing-explicit.adoc delete mode 100644 documentation/src/main/asciidoc/public/reference/_mapper-orm-indexing-manual.adoc diff --git a/documentation/src/main/asciidoc/public/reference/_architecture.adoc b/documentation/src/main/asciidoc/public/reference/_architecture.adoc index f8c2fe3e1e8..4697f5131a1 100644 --- a/documentation/src/main/asciidoc/public/reference/_architecture.adoc +++ b/documentation/src/main/asciidoc/public/reference/_architecture.adoc @@ -104,8 +104,8 @@ See <> for details. 2+|Elasticsearch cluster |Guarantee of index updates -2+|<> -|<> +2+|<> +|<> |Visibility of index updates |<> diff --git a/documentation/src/main/asciidoc/public/reference/_indexing-basics.adoc b/documentation/src/main/asciidoc/public/reference/_indexing-basics.adoc new file mode 100644 index 00000000000..4b36c43d3bb --- /dev/null +++ b/documentation/src/main/asciidoc/public/reference/_indexing-basics.adoc @@ -0,0 +1,43 @@ +[[indexing-basics]] += Basics + +There are multiple ways to index entities in Hibernate Search. + +If you want to get to know the most popular ones, +head directly to the following section: + +* To keep indexes synchronized transparently as entities change in a Hibernate ORM `Session`, +see <>. +* To index a large amount of data -- +for example the whole database, when adding Hibernate Search to an existing application -- +see the <>. + +Otherwise, the following table may help you figure out what's best for your use case. + +[cols="h,3*^",options="header"] +.Comparison of indexing methods +|=== +|Name and link +|Use case +|API +|Mapper + +|<> +|Handle incremental changes in application transactions +|None: works implicitly without API calls +|<> only + +|<> +.2+|Reindex large volumes of data in batches +|Specific to Hibernate Search +|<> or <> + +|<> +|Jakarta EE standard +|<> only + +|<> +|Anything else +|Specific to Hibernate Search +|<> or <> +|=== diff --git a/documentation/src/main/asciidoc/public/reference/_indexing-explicit.adoc b/documentation/src/main/asciidoc/public/reference/_indexing-explicit.adoc new file mode 100644 index 00000000000..ae47b56605b --- /dev/null +++ b/documentation/src/main/asciidoc/public/reference/_indexing-explicit.adoc @@ -0,0 +1,292 @@ +[[indexing-explicit]] += [[mapper-orm-indexing-manual]] [[manual-index-changes]] Explicit indexing + +[[indexing-explicit-basics]] +== [[mapper-orm-indexing-manual-basics]] [[search-batchindex]] Basics + +While <> and +the <> +or <> +should take care of most needs, +it is sometimes necessary to control indexing manually. + +The need arises in particular when <> is <> +or simply not supported (e.g. <>), +or when listener-triggered cannot detect entity changes -- +<>. + +To address these use cases, Hibernate Search exposes several APIs +explained if the following sections. + +[[listener-triggered-indexing-synchronization]] +== Configuration + +As explicit indexing uses <> under the hood, +several configuration options affecting indexing plans will affect explicit indexing as well: + +* The <>. +* The <>. + +[[indexing-explicit-plan]] +== [[mapper-orm-indexing-manual-indexingplan-writes]] [[_deleting_instances_from_the_index]] [[_adding_instances_to_the_index]] Using a `SearchIndexingPlan` manually + +Explicit access to the <> is done in the context of a <> +using the `SearchIndexingPlan` interface. +This interface represents the (mutable) set of changes +that are planned in the context of a session, +and will be applied to indexes upon transaction commit (for the <>) +or upon closing the `SearchSession` (for the <>). + +Here is how explicit indexing based on an <> works at a high level: + +1. When the application wants an index change, +it calls one of the `add`/`addOrUpdate`/`delete` methods on the indexing plan of the current <>. ++ +For the <> the current `SearchSession` is <>, +while for the <> the `SearchSession` is <>. +2. Eventually, the application decides changes are complete, +and the plan processes change events added so far, +either inferring which entities need to be reindexed and building the corresponding documents (<>) +or building events to be sent to the outbox (<>). ++ +The application may trigger this explicitly using the indexing plan's `process` method, +but it is generally not necessary as it happens automatically: +for the <> this happens when the Hibernate ORM `Session` gets flushed +(explicitly or as part of a transaction commit), +while for the <> this happens when the `SearchSession` is closed. +3. Finally the plan gets executed, triggering indexing, potentially asynchronously. ++ +The application may trigger this explicitly using the indexing plan's `execute` method, +but it is generally not necessary as it happens automatically: +for the <> this happens on transaction commit, +while for the <> this happens when the `SearchSession` is closed. + +The `SearchIndexingPlan` interface offers the following methods: + +`add(Object entity)`:: +(Available with the <> only.) ++ +Add a document to the index if the entity type is mapped to an index (`@Indexed`). ++ +WARNING: This may create duplicates in the index if the document already exists. +Prefer `addOrUpdate` unless you are really sure of yourself and need a (slight) performance boost. +`addOrUpdate(Object entity)`:: +Add or update a document in the index if the entity type is mapped to an index (`@Indexed`), +and re-index documents that embed this entity (through `@IndexedEmbedded` for example). +`delete(Object entity)`:: +Delete a document from the index if the entity type is mapped to an index (`@Indexed`), +and re-index documents that embed this entity (through `@IndexedEmbedded` for example). +`purge(Class entityType, Object id)`:: +Delete the entity from the index, +but do not try to re-index documents that embed this entity. ++ +Compared to `delete`, this is mainly useful if the entity has already been deleted from the database +and is not available, even in a detached state, in the session. +In that case, reindexing associated entities will be the user's responsibility, +since Hibernate Search cannot know which entities are associated to an entity that no longer exists. +`purge(String entityName, Object id)`:: +Same as `purge(Class entityType, Object id)`, +but the entity type is referenced by its name (see `@javax.persistence.Entity#name`). +`process()``:: +(Available with the <> only.) ++ +Process change events added so far, +either inferring which entities need to be reindexed and building the corresponding documents (<>) +or building events to be sent to the outbox (<>). ++ +This method is generally executed automatically (see the high-level description near top of this section), +so calling it explicitly is only useful for batching when processing a large number of items, +as explained in <>. +`execute()`:: +(Available with the <> only.) ++ +Execute the indexing plan, triggering indexing, potentially asynchronously. ++ +This method is generally executed automatically (see the high-level description near top of this section), +so calling it explicitly is only useful in very rare cases, +for batching when processing a large number of items **and transactions are not an option**, +as explained in <>. + +Below are examples of using `addOrUpdate` and `delete`. + +.Explicitly adding or updating an entity in the index using `SearchIndexingPlan` +==== +[source, JAVA, indent=0] +---- +include::{sourcedir}/org/hibernate/search/documentation/mapper/orm/indexing/HibernateOrmManualIndexingIT.java[tags=indexing-plan-addOrUpdate] +---- +<1> <>. +<2> Get the search session's indexing plan. +<3> Fetch from the database the `Book` we want to index; +this could be replaced with any other way of loading an entity when using the <>. +<4> Submit the `Book` to the indexing plan for an add-or-update operation. +The operation won't be executed immediately, +but will be delayed until the transaction is committed (<>) +or until the `SearchSession` is closed (<>). +==== + +.Explicitly deleting an entity from the index using `SearchIndexingPlan` +==== +[source, JAVA, indent=0] +---- +include::{sourcedir}/org/hibernate/search/documentation/mapper/orm/indexing/HibernateOrmManualIndexingIT.java[tags=indexing-plan-delete] +---- +<1> <>. +<2> Get the search session's indexing plan. +<3> Fetch from the database the `Book` we want to un-index; +this could be replaced with any other way of loading an entity when using the <>. +<4> Submit the `Book` to the indexing plan for a delete operation. +The operation won't be executed immediately, +but will be delayed until the transaction is committed (<>) +or until the `SearchSession` is closed (<>). +==== + +[TIP] +==== +Multiple operations can be performed in a single indexing plan. +The same entity can even be changed multiple times, +for example added and then removed: +Hibernate Search will simplify the operation as expected. + +This will work fine for any reasonable number of entities, +but changing or simply loading large numbers of entities in a single session +requires special care with Hibernate ORM, +and then some extra care with Hibernate Search. +See <> for more information. +==== + +[[mapper-orm-indexing-manual-indexingplan-process-execute]] +== [[search-batchindex-flushtoindexes]] Hibernate ORM and the periodic "flush-clear" pattern with `SearchIndexingPlan` + +include::../components/_mapper-orm-only-note.adoc[] + +A fairly common use case when manipulating large datasets with JPA +is the link:{hibernateDocUrl}#batch-session-batch-insert[periodic "flush-clear" pattern], +where a loop reads or writes entities for every iteration +and flushes then clears the session every `n` iterations. +This pattern allows processing a large number of entities +while keeping the memory footprint reasonably low. + +Below is an example of this pattern to persist a large number of entities +when not using Hibernate Search. + +.A batch process with JPA +==== +[source, JAVA, indent=0] +---- +include::{sourcedir}/org/hibernate/search/documentation/mapper/orm/indexing/HibernateOrmManualIndexingIT.java[tags=persist-automatic-indexing-periodic-flush-clear] +---- +<1> Execute a loop for a large number of elements, inside a transaction. +<2> For every iteration of the loop, instantiate a new entity and persist it. +<3> Every `BATCH_SIZE` iterations of the loop, `flush` the entity manager to send the changes to the database-side buffer. +<4> After a `flush`, `clear` the ORM session to release some memory. +==== + +With Hibernate Search 6 (on contrary to Hibernate Search 5 and earlier), +this pattern will work as expected: + +* <> (the default), +documents will be built on flushes, and sent to the index upon transaction commit. +* <>, +entity change events will be persisted on flushes, and committed along with the rest of the changes upon transaction commit. + +However, each `flush` call will potentially add data to an internal buffer, +which for large volumes of data may lead to an `OutOfMemoryException`, +depending on the JVM heap size, +the <> +and the complexity and number of documents. + +If you run into memory issues, +the first solution is to break down the batch process +into multiple transactions, each handling a smaller number of elements: +the internal document buffer will be cleared after each transaction. + +See below for an example. + +[IMPORTANT] +==== +With this pattern, if one transaction fails, +part of the data will already be in the database and in indexes, +with no way to roll back the changes. + +However, the indexes will be consistent with the database, +and it will be possible to (manually) restart the process +from the last transaction that failed. +==== + +.A batch process with Hibernate Search using multiple transactions +==== +[source, JAVA, indent=0] +---- +include::{sourcedir}/org/hibernate/search/documentation/mapper/orm/indexing/HibernateOrmManualIndexingIT.java[tags=persist-automatic-indexing-multiple-transactions] +---- +<1> Add an outer loop that creates one transaction per iteration. +<2> Begin the transaction at the beginning of each iteration of the outer loop. +<3> Only handle a limited number of elements per transaction. +<4> For every iteration of the loop, instantiate a new entity and persist it. +Note we're relying on listener-triggered indexing to index the entity, +but this would work just as well if listener-triggered indexing was disabled, +only requiring an extra call to index the entity. +See <>. +<5> Commit the transaction at the end of each iteration of the outer loop. +The entities will be flushed and indexed automatically. +==== + +[NOTE] +==== +The multi-transaction solution +and the original `flush()`/`clear()` loop pattern can be combined, +breaking down the process in multiple medium-sized transactions, +and periodically calling `flush`/`clear` inside each transaction. + +This combined solution is the most flexible, +hence the most suitable if you want to fine-tune your batch process. +==== + +If breaking down the batch process into multiple transactions is not an option, +a second solution is to just write to indexes +after the call to `session.flush()`/`session.clear()`, +without waiting for the database transaction to be committed: +the internal document buffer will be cleared after each write to indexes. + +This is done by calling the `execute()` method on the indexing plan, +as shown in the example below. + +[IMPORTANT] +==== +With this pattern, if an exception is thrown, +part of the data will already be in the index, with no way to roll back the changes, +while the database changes will have been rolled back. +The index will thus be inconsistent with the database. + +To recover from that situation, you will have to either +execute the exact same database changes that failed manually +(to get the database back in sync with the index), +or <> affected by the transaction manually +(to get the index back in sync with the database). + +Of course, if you can afford to take the indexes offline for a longer period of time, +a simpler solution would be to wipe the indexes clean +and <>. +==== + +.A batch process with Hibernate Search using `execute()` +==== +[source, JAVA, indent=0] +---- +include::{sourcedir}/org/hibernate/search/documentation/mapper/orm/indexing/HibernateOrmManualIndexingIT.java[tags=persist-automatic-indexing-periodic-flush-execute-clear] +---- +<1> Get the `SearchSession`. +<2> Get the search session's indexing plan. +<3> For every iteration of the loop, instantiate a new entity and persist it. +Note we're relying on listener-triggered indexing to index the entity, +but this would work just as well if listener-triggered indexing was disabled, +only requiring an extra call to index the entity. +See <>. +<4> After a `flush()`/`clear()`, call `indexingPlan.execute()`. +The entities will be processed and *the changes will be sent to the indexes immediately*. +Hibernate Search will wait for index changes to be "completed" +as required by the configured <>. +<5> After the loop, commit the transaction. +The remaining entities that were not flushed/cleared will be flushed and indexed automatically. +==== diff --git a/documentation/src/main/asciidoc/public/reference/_indexing-listener-triggered.asciidoc.adoc b/documentation/src/main/asciidoc/public/reference/_indexing-listener-triggered.asciidoc.adoc index ef9cd30e521..675b93ef5b9 100644 --- a/documentation/src/main/asciidoc/public/reference/_indexing-listener-triggered.asciidoc.adoc +++ b/documentation/src/main/asciidoc/public/reference/_indexing-listener-triggered.asciidoc.adoc @@ -1,20 +1,30 @@ [[listener-triggered-indexing]] -= [[indexing-automatic]] [[mapper-orm-indexing-automatic]] [[_automatic_indexing]] Listener-triggered indexing += [[indexing-automatic]] [[mapper-orm-indexing-automatic]] [[_automatic_indexing]] Implicit, listener-triggered indexing + +[[listener-triggered-indexing-concepts]] +== [[indexing-automatic-concepts]][[mapper-orm-indexing-automatic-concepts]] Basics include::../components/_mapper-orm-only-note.adoc[] By default, every time an entity is changed through a Hibernate ORM Session, if that entity is <>, -Hibernate Search updates the relevant index. +Hibernate Search updates the relevant index transparently. -Exactly how and when the index update happens depends on the <>; -see <> for more information. +Here is how listener-triggered indexing works at a high level: -[[listener-triggered-indexing-concepts]] -== [[indexing-automatic-concepts]][[mapper-orm-indexing-automatic-concepts]] Overview +1. When the Hibernate ORM `Session` gets flushed (explicitly or as part of a transaction commit), +Hibernate ORM determines what changed exactly (entity created, updated, deleted), +forwards the information to Hibernate Search. +2. Hibernate Search adds this information to a (session-scoped) <> +and the plan processes change events added so far, +either inferring which entities need to be reindexed and building the corresponding documents (<>) +or building events to be sent to the outbox (<>). +3. On database transaction commit, the plan gets executed, +either sending the document indexing/deletion request to the backend (<>) +or sending the events to the database (<>). -Below is a summary of how listener-triggered indexing works depending -on the configured <>. +Below is a summary of key characteristics of listener-triggered indexing +and how they vary depending on the configured <>. Follow the links for more details. @@ -38,8 +48,8 @@ Follow the links for more details. 2+|<> |Guarantee of indexes updates -|<> -|<> +|<> +|<> |Visibility of index updates |<> @@ -59,17 +69,25 @@ Follow the links for more details. Listener-triggered indexing may be unnecessary if your index is read-only or if you update it regularly by reindexing, -either using the <> -or <>. +either using the <>, +using the <>, +or <>. + You can disable listener-triggered indexing by setting the configuration property `hibernate.search.indexing.listeners.enabled` to `false`. +As listener-triggered indexing uses <> under the hood, +several configuration options affecting indexing plans will affect listener-triggered indexing as well: + +* The <>. +* The <>. + [[indexing-automatic-concepts-changes-in-session]] == [[mapper-orm-indexing-automatic-concepts-changes-in-session]] In-session entity change detection and limitations Hibernate Search uses internal events of Hibernate ORM in order to detect changes. These events will be triggered if you actually manipulate managed entity objects in your code: -calls o `session.persist(...)`, `session.delete(...)`, to entity setters, etc. +calls to `session.persist(...)`, `session.delete(...)`, to entity setters, etc. This works great for most applications, but you need to consider some limitations: @@ -82,110 +100,3 @@ Hibernate Search is aware of the entity properties that are accessed when buildi When processing Hibernate ORM entity change events, it is also aware of which properties actually changed. Thanks to that knowledge, it is able to detect which entity changes are actually relevant to indexing, and to skip reindexing when a property is modified, but does not affect the indexed document. - -[[indexing-automatic-synchronization]] -== Synchronization with the indexes - -Listener-triggered indexing is affected by the synchronization strategy in use in the `SearchSession`. - -See <> for more information. - -[[indexing-plan-filter]] -== Indexing plan filter - -include::../components/_incubating-warning.adoc[] - -In some scenarios, it might be helpful to pause the <> programmatically, for example, -when importing larger amounts of data. Hibernate Search allows configuring application-wide -and session-level filters to manage which types are tracked for changes and indexed. - -.Configuring an application-wide filter -==== -[source, JAVA, indent=0, subs="+callouts"] ----- -include::{sourcedir}/org/hibernate/search/documentation/mapper/orm/automaticindexing/HibernateOrmIndexingPlanFilterIT.java[tags=application-filter] ----- -Configuring an application-wide filter requires an instance of the `SearchMapping`. - -<1> <>. -<2> Start the declaration of the indexing plan filter. -<3> Configure included/excluded types through the `SearchIndexingPlanFilter` -==== - -.Configuring a session-level filter -==== -[source, JAVA, indent=0, subs="+callouts"] ----- -include::{sourcedir}/org/hibernate/search/documentation/mapper/orm/automaticindexing/HibernateOrmIndexingPlanFilterIT.java[tags=session-filter] ----- -Configuring a session level filter is available through an instance of the `SearchSession`. - -<1> <> -<2> Configure included/excluded types through the `SearchIndexingPlanFilter` -==== - -Filter can be defined by providing indexed and contained types as well as their supertypes. -Interfaces are not allowed and passing an interface class to any of the filter definition methods will result in an exception. -If dynamic types represented by a `Map` are used then their names must be used to configure the filter. -Filter rules are: - -* If the type `A` is explicitly included by the filter, then a change to an object that is exactly of a type `A` is processed. -* If the type `A` is explicitly excluded by the filter, then a change to an object that is exactly of a type `A` is ignored. -* If the type `A` is explicitly included by the filter, then a change to an object that is exactly of a type `B`, -which is a subtype of the type `A`, is processed unless the filter explicitly excludes a more specific supertype of a type `B`. -* If the type `A` is excluded by the filter explicitly, then a change to an object that is exactly of a type `B`, -which is a subtype of type the `A`, is ignored unless the filter explicitly includes a more specific supertype of a type `B`. - -A session-level filter takes precedence over an application-wide one. If the session-level filter configuration does not -either explicitly or through inheritance include/exclude the exact type of an entity, then the decision will be made by -the application-wide filter. If an application-wide filter also has no explicit configuration for a type, then this type -is considered to be included. - -In some cases we might need to disable the indexing entirely. Listing all entities one by one might be cumbersome, -but since filter configuration is implicitly applied to subtypes, `.exclude(Object.class)` can be used to exclude all types. -Conversely, `.include(Object.class)` can be used to enable indexing within a session filter when -the application-wide filter disables indexing completely. - -.Disable all indexing within a session -==== -[source, JAVA, indent=0, subs="+callouts"] ----- -include::{sourcedir}/org/hibernate/search/documentation/mapper/orm/automaticindexing/HibernateOrmIndexingPlanFilterIT.java[tags=session-filter-exclude-all] ----- -Configuring a session level filter is available through an instance of the `SearchSession`. - -<1> <> -<2> Excluding `Object.class` will lead to excluding all its subtypes which means nothing will be included. -==== - -.Enable indexing in the session while application-wide indexing is paused -==== -[source, JAVA, indent=0, subs="+callouts"] ----- -include::{sourcedir}/org/hibernate/search/documentation/mapper/orm/automaticindexing/HibernateOrmIndexingPlanFilterIT.java[tags=session-filter-exclude-include-all-application] ----- ----- -include::{sourcedir}/org/hibernate/search/documentation/mapper/orm/automaticindexing/HibernateOrmIndexingPlanFilterIT.java[tags=session-filter-exclude-include-all-session] ----- - -<1> <>. -<2> An application-wide filter disables any indexing -<3> <> -<4> A session level filter re-enables indexing *for changes happening in current session only* -==== - -[NOTE] -==== -Trying to configure the same type as both included and excluded at the same time by the same filter -will lead to an exception being thrown. -==== - -[NOTE] -==== -Only an application-wide filter is safe to use when using the <>. -When this coordination strategy is in use, entities are loaded and indexed in a different session from -the one where they were changed. It might lead to unexpected results as the session where events are processed will not -apply the filter configured by the session in which entities were modified. -An exception will be thrown if such a filter is configured unless this filter excludes all the types to prevent any -unexpected consequences of configuring session-level filters with this coordination strategy. -==== diff --git a/documentation/src/main/asciidoc/public/reference/_indexing-massindexer.adoc b/documentation/src/main/asciidoc/public/reference/_indexing-massindexer.adoc index ae08da239bb..60102bde840 100644 --- a/documentation/src/main/asciidoc/public/reference/_indexing-massindexer.adoc +++ b/documentation/src/main/asciidoc/public/reference/_indexing-massindexer.adoc @@ -1,5 +1,5 @@ [[indexing-massindexer]] -= [[mapper-orm-indexing-massindexer]] [[search-batchindex-massindexer]] Reindexing large volumes of data with the `MassIndexer` += [[mapper-orm-indexing-massindexer]] [[search-batchindex-massindexer]] Indexing a large amount of data with the `MassIndexer` [[indexing-massindexer-basics]] == [[mapper-orm-indexing-massindexer-basics]] Basics diff --git a/documentation/src/main/asciidoc/public/reference/_indexing-plan.adoc b/documentation/src/main/asciidoc/public/reference/_indexing-plan.adoc index 048c7ea57f1..e45d11a80ba 100644 --- a/documentation/src/main/asciidoc/public/reference/_indexing-plan.adoc +++ b/documentation/src/main/asciidoc/public/reference/_indexing-plan.adoc @@ -1,110 +1,69 @@ [[indexing-plan]] -= [[mapper-orm-indexing-manual-indexingplan-writes]] [[_deleting_instances_from_the_index]] [[_adding_instances_to_the_index]] Explicitly indexing on entity change events - -When <> is <> -or simply not supported (e.g. <>), -the indexes will start empty and stay that way -until explicit indexing commands are sent to Hibernate Search. - -Explicitly indexing is done in the context of a <> -using the `SearchIndexingPlan` interface. -This interface represents the (mutable) set of changes -that are planned in the context of a session, -and will be applied to indexes upon transaction commit (for the <>) -or upon closing the `SearchSession` (for the <>). - -When indexing explicitly, -the indexing plan should be used whenever a change event (add, update, delete) occurs on an entity. -The indexing plan will automatically determine whether the changed entity needs to be reindexed. -It will even infer which other entities need to be <> -because their indexed document embeds the changed entity -(e.g. through <>). - -The `SearchIndexingPlan` interface offers the following methods: - -`add(Object entity)`:: -(Available with the <> only.) -+ -Add a document to the index if the entity type is mapped to an index (`@Indexed`). -+ -WARNING: This may create duplicates in the index if the document already exists. -Prefer `addOrUpdate` unless you are really sure of yourself and need a (slight) performance boost. -`addOrUpdate(Object entity)`:: -Add or update a document in the index if the entity type is mapped to an index (`@Indexed`), -and re-index documents that embed this entity (through `@IndexedEmbedded` for example). -`delete(Object entity)`:: -Delete a document from the index if the entity type is mapped to an index (`@Indexed`), -and re-index documents that embed this entity (through `@IndexedEmbedded` for example). -`purge(Class entityType, Object id)`:: -Delete the entity from the index, -but do not try to re-index documents that embed this entity. += Indexing plans + +[[indexing-plan-basics]] +== Basics + +For <> as well +as <>, +Hibernate Search relies on an "indexing plan" to aggregate "entity change" events +and infer the resulting indexing operations to execute. + +NOTE: Indexing plans are not used for the <> +or the <>: +those assume all entities they process need to be indexed +and don't need the more subtle mechanisms of indexing plans. + +Here is how indexing plans work at a high level: + +1. While the application performs entity changes, +entity change events (entity created, updated, deleted) are added to the plan. + -Compared to `delete`, this is mainly useful if the entity has already been deleted from the database -and is not available, even in a detached state, in the session. -In that case, reindexing associated entities will be the user's responsibility, -since Hibernate Search cannot know which entities are associated to an entity that no longer exists. -`purge(String entityName, Object id)`:: -Same as `purge(Class entityType, Object id)`, -but the entity type is referenced by its name (see `@javax.persistence.Entity#name`). -`process()` and `execute()`:: -(Available with the <> only.) +For <> (<> only) +this happens implicitly as changes are performed, +but it can also be done <>. +2. Eventually, the application decides changes are complete, +and the plan processes change events added so far, +either inferring which entities need to be reindexed and building the corresponding documents (<>) +or building events to be sent to the outbox (<>). + -Respectively, process the changes and apply them to indexes. +For the <> this happens when the Hibernate ORM `Session` gets flushed +(explicitly or as part of a transaction commit), +while for the <> this happens when the `SearchSession` is closed. +3. Finally the plan gets executed, triggering indexing, potentially asynchronously. + -These methods will be executed automatically on commit, -so they are only useful when processing large number of items, -as explained in <>. +For the <> this happens on transaction commit, +while for the <> this happens when the `SearchSession` is closed. -Below are examples of using `addOrUpdate` and `delete`. +Below is a summary of key characteristics of indexing plans +and how they vary depending on the configured <>. -.Explicitly adding or updating an entity in the index using `SearchIndexingPlan` -==== -[source, JAVA, indent=0] ----- -include::{sourcedir}/org/hibernate/search/documentation/mapper/orm/indexing/HibernateOrmManualIndexingIT.java[tags=indexing-plan-addOrUpdate] ----- -<1> <>. -<2> Get the search session's indexing plan. -<3> Fetch from the database the `Book` we want to index; -this could be replaced with any other way of loading an entity when using the <>. -<4> Submit the `Book` to the indexing plan for an add-or-update operation. -The operation won't be executed immediately, -but will be delayed until the transaction is committed (<>) -or until the `SearchSession` is closed (<>). -==== +[cols="h,2*^",options="header"] +.Comparison of indexing plans depending on the coordination strategy +|=== +|Coordination strategy +|<> (default) +|<> (<> only) -.Explicitly deleting an entity from the index using `SearchIndexingPlan` -==== -[source, JAVA, indent=0] ----- -include::{sourcedir}/org/hibernate/search/documentation/mapper/orm/indexing/HibernateOrmManualIndexingIT.java[tags=indexing-plan-delete] ----- -<1> <>. -<2> Get the search session's indexing plan. -<3> Fetch from the database the `Book` we want to un-index; -this could be replaced with any other way of loading an entity when using the <>. -<4> Submit the `Book` to the indexing plan for a delete operation. -The operation won't be executed immediately, -but will be delayed until the transaction is committed (<>) -or until the `SearchSession` is closed (<>). -==== +|Guarantee of indexes updates +|<> +|<> -[TIP] -==== -Multiple operations can be performed in a single indexing plan. -The same entity can even be changed multiple times, -for example added and then removed: -Hibernate Search will simplify the operation as expected. +|Visibility of index updates +|<> +|<> -This will work fine for any reasonable number of entities, -but changing or simply loading large numbers of entities in a single session -requires special care with Hibernate ORM, -and then some extra care with Hibernate Search. -See <> for more information. -==== +|Overhead for application threads +|<> +|<> + +|Overhead for the database (<> only) +|<> +|<> +|=== [[indexing-plan-synchronization]] -== [[mapper-orm-indexing-automatic-synchronization]] Synchronization with the indexes +== [[indexing-automatic-synchronization]] [[mapper-orm-indexing-automatic-synchronization]] Synchronization with the indexes [[indexing-plan-synchronization-basics]] === [[mapper-orm-indexing-automatic-synchronization-basics]] Basics @@ -124,7 +83,7 @@ and doing so will lead to an exception on startup. When a transaction is committed (<>) or the `SearchSession` is closed (<>), <>, -the execution of the indexing plan (<> or otherwise) +the execution of the indexing plan (<> or <>) can block the application thread until indexing reaches a certain level of completion. @@ -233,3 +192,103 @@ to a <> pointing to the cus for example `class:com.mycompany.MySynchronizationStrategy`. * at the session level by passing an instance of the custom implementation to `SearchSession#indexingPlanSynchronizationStrategy(...)`. + +[[indexing-plan-filter]] +== Indexing plan filter + +include::../components/_incubating-warning.adoc[] + +In some scenarios, it might be helpful to pause the <> programmatically, for example, +when importing larger amounts of data. Hibernate Search allows configuring application-wide +and session-level filters to manage which types are tracked for changes and indexed. + +.Configuring an application-wide filter +==== +[source, JAVA, indent=0, subs="+callouts"] +---- +include::{sourcedir}/org/hibernate/search/documentation/mapper/orm/automaticindexing/HibernateOrmIndexingPlanFilterIT.java[tags=application-filter] +---- +Configuring an application-wide filter requires an instance of the `SearchMapping`. + +<1> <>. +<2> Start the declaration of the indexing plan filter. +<3> Configure included/excluded types through the `SearchIndexingPlanFilter` +==== + +.Configuring a session-level filter +==== +[source, JAVA, indent=0, subs="+callouts"] +---- +include::{sourcedir}/org/hibernate/search/documentation/mapper/orm/automaticindexing/HibernateOrmIndexingPlanFilterIT.java[tags=session-filter] +---- +Configuring a session level filter is available through an instance of the `SearchSession`. + +<1> <> +<2> Configure included/excluded types through the `SearchIndexingPlanFilter` +==== + +Filter can be defined by providing indexed and contained types as well as their supertypes. +Interfaces are not allowed and passing an interface class to any of the filter definition methods will result in an exception. +If dynamic types represented by a `Map` are used then their names must be used to configure the filter. +Filter rules are: + +* If the type `A` is explicitly included by the filter, then a change to an object that is exactly of a type `A` is processed. +* If the type `A` is explicitly excluded by the filter, then a change to an object that is exactly of a type `A` is ignored. +* If the type `A` is explicitly included by the filter, then a change to an object that is exactly of a type `B`, +which is a subtype of the type `A`, is processed unless the filter explicitly excludes a more specific supertype of a type `B`. +* If the type `A` is excluded by the filter explicitly, then a change to an object that is exactly of a type `B`, +which is a subtype of type the `A`, is ignored unless the filter explicitly includes a more specific supertype of a type `B`. + +A session-level filter takes precedence over an application-wide one. If the session-level filter configuration does not +either explicitly or through inheritance include/exclude the exact type of an entity, then the decision will be made by +the application-wide filter. If an application-wide filter also has no explicit configuration for a type, then this type +is considered to be included. + +In some cases we might need to disable the indexing entirely. Listing all entities one by one might be cumbersome, +but since filter configuration is implicitly applied to subtypes, `.exclude(Object.class)` can be used to exclude all types. +Conversely, `.include(Object.class)` can be used to enable indexing within a session filter when +the application-wide filter disables indexing completely. + +.Disable all indexing within a session +==== +[source, JAVA, indent=0, subs="+callouts"] +---- +include::{sourcedir}/org/hibernate/search/documentation/mapper/orm/automaticindexing/HibernateOrmIndexingPlanFilterIT.java[tags=session-filter-exclude-all] +---- +Configuring a session level filter is available through an instance of the `SearchSession`. + +<1> <> +<2> Excluding `Object.class` will lead to excluding all its subtypes which means nothing will be included. +==== + +.Enable indexing in the session while application-wide indexing is paused +==== +[source, JAVA, indent=0, subs="+callouts"] +---- +include::{sourcedir}/org/hibernate/search/documentation/mapper/orm/automaticindexing/HibernateOrmIndexingPlanFilterIT.java[tags=session-filter-exclude-include-all-application] +---- +---- +include::{sourcedir}/org/hibernate/search/documentation/mapper/orm/automaticindexing/HibernateOrmIndexingPlanFilterIT.java[tags=session-filter-exclude-include-all-session] +---- + +<1> <>. +<2> An application-wide filter disables any indexing +<3> <> +<4> A session level filter re-enables indexing *for changes happening in current session only* +==== + +[NOTE] +==== +Trying to configure the same type as both included and excluded at the same time by the same filter +will lead to an exception being thrown. +==== + +[NOTE] +==== +Only an application-wide filter is safe to use when using the <>. +When this coordination strategy is in use, entities are loaded and indexed in a different session from +the one where they were changed. It might lead to unexpected results as the session where events are processed will not +apply the filter configured by the session in which entities were modified. +An exception will be thrown if such a filter is configured unless this filter excludes all the types to prevent any +unexpected consequences of configuring session-level filters with this coordination strategy. +==== diff --git a/documentation/src/main/asciidoc/public/reference/_indexing.adoc b/documentation/src/main/asciidoc/public/reference/_indexing.adoc index 675094590aa..1238551e9cb 100644 --- a/documentation/src/main/asciidoc/public/reference/_indexing.adoc +++ b/documentation/src/main/asciidoc/public/reference/_indexing.adoc @@ -3,16 +3,18 @@ :leveloffset: +1 -include::_indexing-listener-triggered.asciidoc.adoc[] +include::_indexing-basics.adoc[] include::_indexing-plan.adoc[] +include::_indexing-listener-triggered.asciidoc.adoc[] + include::_indexing-massindexer.adoc[] -include::_indexing-workspace.adoc[] +include::_mapper-orm-indexing-jakarta-batch.adoc[] -include::_mapper-orm-indexing-manual.adoc[] +include::_indexing-explicit.adoc[] -include::_mapper-orm-indexing-jakarta-batch.adoc[] +include::_indexing-workspace.adoc[] :leveloffset: -1 diff --git a/documentation/src/main/asciidoc/public/reference/_limitations.adoc b/documentation/src/main/asciidoc/public/reference/_limitations.adoc index d05b518aa56..e9178ed5cf2 100644 --- a/documentation/src/main/asciidoc/public/reference/_limitations.adoc +++ b/documentation/src/main/asciidoc/public/reference/_limitations.adoc @@ -119,8 +119,9 @@ without Hibernate ORM or Search having any knowledge of which entities are actua === Solutions and workarounds One workaround is to reindex explicitly after you run JPQL/SQL queries, -either using the <> -or <>. +either using the <>, +using the <>, +or <>. [[limitations-changes-in-session-roadmap]] === Roadmap @@ -174,8 +175,9 @@ The following solutions can help circumvent this limitation: always update the other side consistently. 2. When the above is not possible, reindex affected entities explicitly after the association update, -either using the <> -or <>. +either using the <>, +using the <>, +or <>. [[limitations-changes-asymmetric-association-updates-roadmap]] === Roadmap diff --git a/documentation/src/main/asciidoc/public/reference/_mapper-orm-indexing-jakarta-batch.adoc b/documentation/src/main/asciidoc/public/reference/_mapper-orm-indexing-jakarta-batch.adoc index 57893af8798..d8c1455aa3c 100644 --- a/documentation/src/main/asciidoc/public/reference/_mapper-orm-indexing-jakarta-batch.adoc +++ b/documentation/src/main/asciidoc/public/reference/_mapper-orm-indexing-jakarta-batch.adoc @@ -1,5 +1,8 @@ [[mapper-orm-indexing-jakarta-batch]] -= [[mapper-orm-indexing-jsr352]] [[jsr352-integration]] Reindexing large volumes of data with the Jakarta Batch integration += [[mapper-orm-indexing-jsr352]] [[jsr352-integration]] Indexing a large amount of data with the Jakarta Batch integration + +[[mapper-orm-indexing-jakarta-batch-basics]] +== Basics include::../components/_mapper-orm-only-note.adoc[] @@ -9,9 +12,10 @@ features of Jakarta Batch, such as failure recovery using checkpoints, chunk oriented processing, and parallel execution. This batch job accepts different entity type(s) as input, loads the relevant entities from the database, then rebuilds the full-text index from these. -However, it requires a batch runtime for the execution. Please notice that we -don't provide any batch runtime, you are free to choose one that fits you needs, e.g. the default -batch runtime embedded in your Jakarta EE container. We provide full integration to the JBeret +Executing this job requires a batch runtime that is not provided by Hibernate Search. +You are free to choose one that fits your needs, e.g. the default +batch runtime embedded in your Jakarta EE container. +Hibernate Search provides full integration to the JBeret implementation (see <>). As for other implementations, they can also be used, but will require <>. diff --git a/documentation/src/main/asciidoc/public/reference/_mapper-orm-indexing-manual.adoc b/documentation/src/main/asciidoc/public/reference/_mapper-orm-indexing-manual.adoc deleted file mode 100644 index f184358ceb9..00000000000 --- a/documentation/src/main/asciidoc/public/reference/_mapper-orm-indexing-manual.adoc +++ /dev/null @@ -1,157 +0,0 @@ -[[mapper-orm-indexing-manual]] -= [[manual-index-changes]] Reindexing large amounts of data manually with Hibernate ORM - -include::../components/_mapper-orm-only-note.adoc[] - -[[mapper-orm-indexing-manual-basics]] -== [[search-batchindex]] Basics - -While <> and -the <> -or <> -should take care of most needs, -it is sometimes necessary to control indexing manually, -for example to reindex just a few entity instances -that were affected by changes to the database that listener-triggered indexing cannot detect, -such as JPQL/SQL `insert`, `update` or `delete` queries. - -To address these use cases, Hibernate Search exposes several APIs -explained if the following sections. - -As with everything in Hibernate Search, -these APIs only affect the Hibernate Search indexes: -they do not write anything to the database. - -[[mapper-orm-indexing-manual-indexingplan-process-execute]] -== [[search-batchindex-flushtoindexes]] Controlling entity reads and index writes with `SearchIndexingPlan` - -A fairly common use case when manipulating large datasets with JPA -is the link:{hibernateDocUrl}#batch-session-batch-insert[periodic "flush-clear" pattern], -where a loop reads or writes entities for every iteration -and flushes then clears the session every `n` iterations. -This pattern allows processing a large number of entities -while keeping the memory footprint reasonably low. - -Below is an example of this pattern to persist a large number of entities -when not using Hibernate Search. - -.A batch process with JPA -==== -[source, JAVA, indent=0] ----- -include::{sourcedir}/org/hibernate/search/documentation/mapper/orm/indexing/HibernateOrmManualIndexingIT.java[tags=persist-automatic-indexing-periodic-flush-clear] ----- -<1> Execute a loop for a large number of elements, inside a transaction. -<2> For every iteration of the loop, instantiate a new entity and persist it. -<3> Every `BATCH_SIZE` iterations of the loop, `flush` the entity manager to send the changes to the database-side buffer. -<4> After a `flush`, `clear` the ORM session to release some memory. -==== - -With Hibernate Search 6 (on contrary to Hibernate Search 5 and earlier), -this pattern will work as expected: - -* <> (the default), -documents will be built on flushes, and sent to the index upon transaction commit. -* <>, -entity change events will be persisted on flushes, and committed along with the rest of the changes upon transaction commit. - -However, each `flush` call will potentially add data to an internal buffer, -which for large volumes of data may lead to an `OutOfMemoryException`, -depending on the JVM heap size, -the <> -and the complexity and number of documents. - -If you run into memory issues, -the first solution is to break down the batch process -into multiple transactions, each handling a smaller number of elements: -the internal document buffer will be cleared after each transaction. - -See below for an example. - -[IMPORTANT] -==== -With this pattern, if one transaction fails, -part of the data will already be in the database and in indexes, -with no way to roll back the changes. - -However, the indexes will be consistent with the database, -and it will be possible to (manually) restart the process -from the last transaction that failed. -==== - -.A batch process with Hibernate Search using multiple transactions -==== -[source, JAVA, indent=0] ----- -include::{sourcedir}/org/hibernate/search/documentation/mapper/orm/indexing/HibernateOrmManualIndexingIT.java[tags=persist-automatic-indexing-multiple-transactions] ----- -<1> Add an outer loop that creates one transaction per iteration. -<2> Begin the transaction at the beginning of each iteration of the outer loop. -<3> Only handle a limited number of elements per transaction. -<4> For every iteration of the loop, instantiate a new entity and persist it. -Note we're relying on listener-triggered indexing to index the entity, -but this would work just as well if listener-triggered indexing was disabled, -only requiring an extra call to index the entity. -See <>. -<5> Commit the transaction at the end of each iteration of the outer loop. -The entities will be flushed and indexed automatically. -==== - -[NOTE] -==== -The multi-transaction solution -and the original `flush()`/`clear()` loop pattern can be combined, -breaking down the process in multiple medium-sized transactions, -and periodically calling `flush`/`clear` inside each transaction. - -This combined solution is the most flexible, -hence the most suitable if you want to fine-tune your batch process. -==== - -If breaking down the batch process into multiple transactions is not an option, -a second solution is to just write to indexes -after the call to `session.flush()`/`session.clear()`, -without waiting for the database transaction to be committed: -the internal document buffer will be cleared after each write to indexes. - -This is done by calling the `execute()` method on the indexing plan, -as shown in the example below. - -[IMPORTANT] -==== -With this pattern, if an exception is thrown, -part of the data will already be in the index, with no way to roll back the changes, -while the database changes will have been rolled back. -The index will thus be inconsistent with the database. - -To recover from that situation, you will have to either -execute the exact same database changes that failed manually -(to get the database back in sync with the index), -or <> affected by the transaction manually -(to get the index back in sync with the database). - -Of course, if you can afford to take the indexes offline for a longer period of time, -a simpler solution would be to wipe the indexes clean -and <>. -==== - -.A batch process with Hibernate Search using `execute()` -==== -[source, JAVA, indent=0] ----- -include::{sourcedir}/org/hibernate/search/documentation/mapper/orm/indexing/HibernateOrmManualIndexingIT.java[tags=persist-automatic-indexing-periodic-flush-execute-clear] ----- -<1> Get the `SearchSession`. -<2> Get the search session's indexing plan. -<3> For every iteration of the loop, instantiate a new entity and persist it. -Note we're relying on listener-triggered indexing to index the entity, -but this would work just as well if listener-triggered indexing was disabled, -only requiring an extra call to index the entity. -See <>. -<4> After a `flush()`/`clear()`, call `indexingPlan.execute()`. -The entities will be processed and *the changes will be sent to the indexes immediately*. -Hibernate Search will wait for index changes to be "completed" -as required by the configured <>. -<5> After the loop, commit the transaction. -The remaining entities that were not flushed/cleared will be flushed and indexed automatically. -==== diff --git a/documentation/src/main/asciidoc/public/reference/_mapping-reindexing.adoc b/documentation/src/main/asciidoc/public/reference/_mapping-reindexing.adoc index 2bde1d8e658..71bea449786 100644 --- a/documentation/src/main/asciidoc/public/reference/_mapping-reindexing.adoc +++ b/documentation/src/main/asciidoc/public/reference/_mapping-reindexing.adoc @@ -236,8 +236,9 @@ reindexing thousands of sensors every few milliseconds probably won't perform we In this scenario, however, search on sensor value is not considered critical and indexes don't need to be as fresh. We can accept indexes to lag behind a few minutes when it comes to a sensor value. We can consider setting up a batch process that runs every few seconds -to reindex all sensors, either through a <> -or <>. +to reindex all sensors, either through a <>, +using the <>, +or <>. So we would really not mind if Hibernate Search just ignored changes to sensor values... That's what `@IndexingDependency(reindexOnUpdate = ReindexOnUpdate.NO)` is for: