Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HSEARCH-5133 Support basic metrics aggregations #4144

Merged
merged 10 commits into from
Sep 5, 2024

Conversation

fax4ever
Copy link
Contributor

@fax4ever fax4ever commented May 2, 2024

https://hibernate.atlassian.net/browse/HSEARCH-5133

1. we need to implement count, count distinct and avg done!
2. test filtering on nested done!
3. test explicit ValueConvert.YES/NO done!
4. support and test aggregations for temporal types done!
5. we need to implement the Lucene backend done!
6. write good tests covering all the fields types done from my side (check with the Hibernate team)
7. write some doc done!

@fax4ever fax4ever force-pushed the HSEARCH-5133 branch 3 times, most recently from 58ebd31 to a23d9b9 Compare May 3, 2024 15:03
Copy link
Member

@marko-bekhta marko-bekhta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice progress 😃,

I've added some inline comments here and there. As for the ElasticsearchFieldCodec#convertFromDouble(Double value):

  • it is probably fine for now, while we are iterating on the implementation.
  • let's figure out what it means for non-numeric types
  • if we don't go with doubles for the Lucene backend and it will stay only as an implementation detail for the Elasticsearch backend, we probably wouldn't need to expose it for any user customization so it can stay in the codec. otherwise maybe it'll make more sense to have something in SearchIndexValueFieldTypeContext

@fax4ever
Copy link
Contributor Author

@marko-bekhta I just rebased the PR on top of the current main branch. Now I'm resuming to doing the changes...

@fax4ever fax4ever force-pushed the HSEARCH-5133 branch 2 times, most recently from 5a5619a to f3a45de Compare June 19, 2024 09:30
@fax4ever
Copy link
Contributor Author

I just added two last commits to address the last @yrodiere's suggestions. Once we're fine with them, I'll reorganize the commits.

Copy link
Member

@yrodiere yrodiere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Since you asked for my opinion, here's some -- I'll let @marko-bekhta take over though.

@fax4ever
Copy link
Contributor Author

fax4ever commented Jun 26, 2024

Thanks @yrodiere I applied all the great suggestions here.
My only doubt is a about the last change that I added in a different commit: a807e2c.

In my opinion returning a double would be better, but I have an idea...
we could expose two avg operations, one returning always a double and one returning the field type to support the exotic cases such as avg of a date. I don't know what you think.
What you think about it @marko-bekhta?

@yrodiere
Copy link
Member

we could expose two avg operations, one returning always a double and one returning the field type to support the exotic cases such as avg of a date. I don't know what you think.

IMO we should just force users to pass a type. They ask for a date, we check if that's possible, and if so we return a date. They ask for an integer, same thing. They as for a double, ditto.

@fax4ever
Copy link
Contributor Author

we could expose two avg operations, one returning always a double and one returning the field type to support the exotic cases such as avg of a date. I don't know what you think.

IMO we should just force users to pass a type. They ask for a date, we check if that's possible, and if so we return a date. They ask for an integer, same thing. They as for a double, ditto.

IMO if you give the user the flexibility of requiring any type, it would be very complex (in term of API) and also error prone. I think that the solution of exposing a double for numeric avg and a field avg for all the types, it would safer and easier to use.

@yrodiere
Copy link
Member

we could expose two avg operations, one returning always a double and one returning the field type to support the exotic cases such as avg of a date. I don't know what you think.

IMO we should just force users to pass a type. They ask for a date, we check if that's possible, and if so we return a date. They ask for an integer, same thing. They as for a double, ditto.

IMO if you give the user the flexibility of requiring any type, it would be very complex (in term of API) and also error prone. I think that the solution of exposing a double for numeric avg and a field avg for all the types, it would safer and easier to use.

I don't get this, they already have the flexibility to require any type: https://github.com/hibernate/hibernate-search/pull/4144/files#diff-6b1cb9769f69a0a38bbf7c3d35a53612d9d88e45f8180ba2cbe9eea36c1baea0R27-R41

And in fact there's already no default: if they get an Integer average, it can only be because they asked for it.

Also, please note I'm not saying we should comply with every request. IMO there are only two types that make sense: the field type (ValueConvert.YES) and the underlying, unconverted type (ValueConvert.NO, would be double I guess).

@fax4ever
Copy link
Contributor Author

we could expose two avg operations, one returning always a double and one returning the field type to support the exotic cases such as avg of a date. I don't know what you think.

IMO we should just force users to pass a type. They ask for a date, we check if that's possible, and if so we return a date. They ask for an integer, same thing. They as for a double, ditto.

IMO if you give the user the flexibility of requiring any type, it would be very complex (in term of API) and also error prone. I think that the solution of exposing a double for numeric avg and a field avg for all the types, it would safer and easier to use.

I don't get this, they already have the flexibility to require any type: https://github.com/hibernate/hibernate-search/pull/4144/files#diff-6b1cb9769f69a0a38bbf7c3d35a53612d9d88e45f8180ba2cbe9eea36c1baea0R27-R41

And in fact there's already no default: if they get an Integer average, it can only be because they asked for it.

Also, please note I'm not saying we should comply with every request. IMO there are only two types that make sense: the field type (ValueConvert.YES) and the underlying, unconverted type (ValueConvert.NO, would be double I guess).

If I try to ask for a double a get this error in my test:

org.hibernate.search.util.common.SearchException: HSEARCH000584: Invalid type for returned values: 'java.lang.Double'. Expected 'java.lang.Integer' or a supertype.
Context: field 'integer'

	at org.hibernate.search.engine.backend.types.converter.spi.ProjectionConverter.withConvertedType(ProjectionConverter.java:75)
	at org.hibernate.search.backend.elasticsearch.search.aggregation.impl.ElasticsearchMetricFieldAggregation$TypeSelector.type(ElasticsearchMetricFieldAggregation.java:89)
	at org.hibernate.search.backend.elasticsearch.search.aggregation.impl.ElasticsearchMetricFieldAggregation$TypeSelector.type(ElasticsearchMetricFieldAggregation.java:71)
	at org.hibernate.search.engine.search.aggregation.dsl.impl.AvgAggregationFieldStepImpl.field(AvgAggregationFieldStepImpl.java:29)
	at org.hibernate.search.engine.search.aggregation.dsl.AvgAggregationFieldStep.field(AvgAggregationFieldStep.java:28)
	at org.hibernate.search.integrationtest.backend.elasticsearch.tmp.ElasticsearchMetricAggregationsIT.lambda$test_filteringResults$11(ElasticsearchMetricAggregationsIT.java:65)
	at org.hibernate.search.engine.search.query.dsl.spi.AbstractSearchQueryOptionsStep.aggregation(AbstractSearchQueryOptionsStep.java:179)
	at org.hibernate.search.integrationtest.backend.elasticsearch.tmp.ElasticsearchMetricAggregationsIT.test_filteringResults(ElasticsearchMetricAggregationsIT.java:65)
	at java.base/java.lang.reflect.Method.invoke(Method.java:568)
	at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
	at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)

@fax4ever
Copy link
Contributor Author

unconverted type (ValueConvert.NO, would be double I guess).

OK so I can revert the last commit?

@yrodiere
Copy link
Member

If I try to ask for a double a get this error in my test:

With ValueConvert.NO?

unconverted type (ValueConvert.NO, would be double I guess).

OK so I can revert the last commit?

From what I've seen that commit is fine and the DSL changes at least should stay, but I might be mistaken.

Can you point me to the exact failing test?

@fax4ever
Copy link
Contributor Author

If I try to ask for a double a get this error in my test:

With ValueConvert.NO?

unconverted type (ValueConvert.NO, would be double I guess).

OK so I can revert the last commit?

From what I've seen that commit is fine and the DSL changes at least should stay, but I might be mistaken.

Can you point me to the exact failing test?

Sure! Here is the commit I which I tried to ask for a double avg => 6abad63. Thanks

@fax4ever fax4ever force-pushed the HSEARCH-5133 branch 3 times, most recently from 28aebae to 5bf82db Compare June 28, 2024 05:57
@fax4ever fax4ever force-pushed the HSEARCH-5133 branch 2 times, most recently from 49cb72c to 031f709 Compare July 8, 2024 08:32
@fax4ever fax4ever force-pushed the HSEARCH-5133 branch 4 times, most recently from b83e97f to 2373398 Compare July 17, 2024 08:54
@fax4ever
Copy link
Contributor Author

fax4ever commented Aug 1, 2024

Hi @marko-bekhta, the commit HSEARCH-5133 Test all types with metric aggregations is in progress and I still need to address the other unresolved conversation.
I will continue to work on it when I come back from PTO in two weeks. Thanks for the support

@fax4ever
Copy link
Contributor Author

With this last push I think I have only one comment to address, the one about introducing abstract List<? extends CollectorFactory> requiredCollectors()...

@fax4ever
Copy link
Contributor Author

With the last push I should have addressed all the comments.
If so, is there anything to improve here?
Thanks @marko-bekhta and @yrodiere

Copy link
Member

@marko-bekhta marko-bekhta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Fabio!
I think we are getting close 🤞🏻😃. I've left some comments inline, but the main thing to solve is to address the ValueModel.RAW case; right now, it can fail with an exception 🙈

@fax4ever fax4ever force-pushed the HSEARCH-5133 branch 2 times, most recently from 9c15aa2 to 444f315 Compare August 28, 2024 08:25
@fax4ever
Copy link
Contributor Author

With last push... again just one comment left!

@fax4ever
Copy link
Contributor Author

Hi @marko-bekhta. Comment addressed (I hope :P)

To fully support the RAW model type, I opened HSEARCH-5230 Support RAW model type with metric aggregations.
Let me know if we can improve something else.
Thanks!

Copy link
Member

@marko-bekhta marko-bekhta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Fabio!
This looks nice 🎉 . Let's make sure we marked everything as incubating so that we will have some freedom to adjust if needed, and then we'll continue in the follow-up PRs as needed.

@hibernate-github-bot
Copy link

hibernate-github-bot bot commented Sep 4, 2024

Thanks for your pull request!

This pull request appears to follow the contribution rules.

› This message was automatically generated.

@fax4ever
Copy link
Contributor Author

fax4ever commented Sep 4, 2024

I integrated the corrections for the doc and incubating rebasing the contribution on top of the current main branch. Thanks

Copy link

sonarcloud bot commented Sep 4, 2024

Quality Gate Failed Quality Gate failed

Failed conditions
10.1% Duplication on New Code (required ≤ 3%)

See analysis details on SonarCloud

Copy link
Member

@marko-bekhta marko-bekhta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @fax4ever ! 🎉

I'll merge this one and we can do the followups to address the RAW ValueModel and tests for other value models.

@marko-bekhta marko-bekhta merged commit 8efc315 into hibernate:main Sep 5, 2024
7 of 8 checks passed
@fax4ever fax4ever deleted the HSEARCH-5133 branch September 5, 2024 09:57
@fax4ever
Copy link
Contributor Author

fax4ever commented Sep 5, 2024

Thank you Marko and Yoann

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants