Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HSEARCH-2945 Configure default mass indexing logging monitor #4314

Conversation

marko-bekhta
Copy link
Member

https://hibernate.atlassian.net/browse/HSEARCH-2945

I've tried a few approaches, and it seems that having a "dedicated" method to configure the default monitor looks better than others...
Having more options on the mass indexer itself may work for a config like "don't do counts at all" but not for the "count the entities before indexing starts.

Let me know if you have any other ideas we could try. Opening as a draft for now as some more tests are needed and the impl of the monitor should be cleaned up a bit.


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license
and can be relicensed under the terms of the LGPL v2.1 license in the future at the maintainers' discretion.
For more information on licensing, please check here.


@yrodiere
Copy link
Member

It's a shame that this will only work with the default monitor, no? 🤔

Having more options on the mass indexer itself may work for a config like "don't do counts at all"

Ok.

but not for the "count the entities before indexing starts.

Why?


Regarding the solution... In order to make it work with the default monitor, but also custom ones, we could imagine allowing the monitor itself to influence the behavior? I.e.:

class MyMonitor1 implements MassIndexingMonitor {

    void initialize(MassIndexingMonitorContext context) {
        context.countOnStart(this::onCount); // Callback called immediately
    }

    void onCount(EntityTypeReference type, long count) {
         // ...
    }
}
class MyMonitor2 implements MassIndexingMonitor {

    void initialize(MassIndexingMonitorContext context) {
        context.countBeforeTypeStart(this::onCount); // Callback called later
    }

    void onCount(EntityTypeReference type, long count) {
         // ...
    }
}
class MyMonitor2 implements MassIndexingMonitor {

    void initialize(MassIndexingMonitorContext context) {
        context.countOnStartAndBeforeTypeStart(this::onCount, this::onCountAdjustment); // First callback called now, secont callback called later
    }

    void onCount(EntityTypeReference type, long count) {
         // ...
    }
    void onCountAdjustment(EntityTypeReference type, long initialCount, long updatedCount) {
         // ...
    }
}

I'm not suggesting this exact API -- names are bad, in particular -- but you get the idea.

Then making it work with the default monitor would be a matter of exposing that monitor -- e.g.:

massIndexer.monitor(DefaultMassIndexingMonitor.builder().countOnStart(true));
// OR 
massIndexer.monitor(DefaultMassIndexingMonitor.builder().countOnBeforeType(true));
// OR ...

@marko-bekhta
Copy link
Member Author

Ok.

but not for the "count the entities before indexing starts.

Why?

well ... the call to get the total count is currently made by mass indexer itself, and the value is passed to the monitor (no matter what the impl of that monitor is, right). So if we config this at the mass indexer level it makes sense to allow disable the count call for both the built-in one and custom ones.
Now this "new thing" to get the counts before anything even starts and have an approximation that may get adjusted later is something that would be specific to our own monitor implementation. I didn't think that we'd want to call it from the mass indxer and pass the values around to custom monitor implementations... That was the reason why I was thinking that exposing this other property (to do the count-all-before-things-start) won't work ....

what you wrote made me think ... we have this in the type group monitor now:

void indexingStarted(OptionalLong totalCount);
/**
* Notify the monitor that indexing of the type group is completed.
*/
void indexingCompleted();

what if instead of the total count or nothing, we pass the context that exposes some access to the loading strategy ( that can do the counting) through a context class and have something like:

public interface MassIndexingTypeGroupMonitor {
	void documentsAdded(long increment);

	// gets called when we create a workspace for the type group
	// so essentially can be used by us to do the counts before the indexing starts
	void typeGroupInitialized(SomeContext context);
	
	// gets called when the identifier loading runnable starts:
	void indexingStarted(SomeContext context);

	// gets called when all indexing runnables for the group are completed.
	void indexingCompleted(SomeContext context);
}

then in the main mass indexing monitor we drop deprecate the addToTotalCount method with MassIndexingTypeGroupMonitor#indexingStarted being it's replacement ?

@yrodiere
Copy link
Member

Your suggestion makes sense to me.

This is starting to get complex though, so I'd suggest:

  1. Being careful about naming, to make things extra clear. documentsAdded => documentsIndexed would make sense I think, and typeGroupInitialized => initialize() (since it's about initializing the monitor itself, more than anything else) or indexingPlanned(), if you want to keep it event-like? IDK.
  2. Not forgetting about providing ways to configure the default monitor -- see my suggestion above.
  3. Making it obvious that counting may, in some cases, not return anything -- because the loading strategy may not support it. Using Optional in APIs would make sense I think.
  4. Deprecating all methods in the main monitor that duplicate what you're doing -- documentsBuilt, documentsIndexed, etc.
  5. Making sure that existing implementations (implementing deprecated methods) still work -- we'll need tests for that.

@marko-bekhta marko-bekhta force-pushed the feat/HSEARCH-2945-Pass-the-number-of-entities-to-index-to-the-monitor-right-from-the-start branch from 452f7ee to 3eaf77f Compare September 12, 2024 13:31
@marko-bekhta marko-bekhta force-pushed the feat/HSEARCH-2945-Pass-the-number-of-entities-to-index-to-the-monitor-right-from-the-start branch from 3eaf77f to 12b8628 Compare September 13, 2024 09:06
@marko-bekhta marko-bekhta marked this pull request as ready for review September 13, 2024 09:06
@marko-bekhta
Copy link
Member Author

@yrodiere let me know if you'd want to have a look at this one 😉

Copy link
Member

@yrodiere yrodiere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@yrodiere
Copy link
Member

Seems you still need to add something to the migration guide?

@marko-bekhta marko-bekhta force-pushed the feat/HSEARCH-2945-Pass-the-number-of-entities-to-index-to-the-monitor-right-from-the-start branch from 12b8628 to 11f5813 Compare September 24, 2024 17:01
@marko-bekhta marko-bekhta force-pushed the feat/HSEARCH-2945-Pass-the-number-of-entities-to-index-to-the-monitor-right-from-the-start branch from 11f5813 to a4bfc9b Compare October 3, 2024 10:43
Copy link

sonarcloud bot commented Oct 3, 2024

@marko-bekhta marko-bekhta merged commit 6000159 into hibernate:main Oct 3, 2024
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants