-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HIVE-28808: Avoid reading event messages in DB-Notification-Cleaner #5688
base: master
Are you sure you want to change the base?
Conversation
Improves DB-Notification-Cleaner thread to just fetch the eventId/txnId and eventTime of the old events, then delete them based on the id range. This avoids the thread hitting OOM and silently die in cleaning huge events that have a long message body. Set datanucleus.query.jdoql.allowAll=true to allow using DELETE in JDOQL.
|
@@ -2064,6 +2068,7 @@ public String toString() { | |||
ConfVars.DATANUCLEUS_TRANSACTION_ISOLATION, | |||
ConfVars.DATANUCLEUS_USE_LEGACY_VALUE_STRATEGY, | |||
ConfVars.DATANUCLEUS_QUERY_SQL_ALLOWALL, | |||
ConfVars.DATANUCLEUS_QUERY_JDOQL_ALLOWALL, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the DATANUCLEUS_QUERY_SQL_ALLOWALL
cover the DATANUCLEUS_QUERY_JDOQL_ALLOWALL
? what if the datanucleus.query.jdoql.allowAll
is set to false
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think they are different. Without setting datanucleus.query.jdoql.allowAll
to true, we will hit this exception:
2025-03-10T09:48:00,444 ERROR [DB-Notification-Cleaner] metastore.ObjectStore: Unable to delete batch of NotificationLog events
javax.jdo.JDOUserException: JDOQL Single-String query should always start with SELECT
at org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:635) ~[datanucleus-api-jdo-5.2.8.jar:?]
at org.datanucleus.api.jdo.JDOPersistenceManager.newQuery(JDOPersistenceManager.java:1309) ~[datanucleus-api-jdo-5.2.8.jar:?]
at org.datanucleus.api.jdo.JDOPersistenceManager.newQuery(JDOPersistenceManager.java:1247) ~[datanucleus-api-jdo-5.2.8.jar:?]
at org.apache.hadoop.hive.metastore.ObjectStore.doCleanNotificationEvents(ObjectStore.java:12231) ~[hive-standalone-metastore-3.1.3000.7.3.1.0-160.jar:3.1.3000.7.3.1.0-SNAPSHOT]
at org.apache.hadoop.hive.metastore.ObjectStore.cleanOlderEvents(ObjectStore.java:12175) ~[hive-standalone-metastore-3.1.3000.7.3.1.0-160.jar:3.1.3000.7.3.1.0-SNAPSHOT]
at org.apache.hadoop.hive.metastore.ObjectStore.cleanNotificationEvents(ObjectStore.java:12161) ~[hive-standalone-metastore-3.1.3000.7.3.1.0-160.jar:3.1.3000.7.3.1.0-SNAPSHOT]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_432]
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_432]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_432]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_432]
at org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:97) ~[hive-standalone-metastore-3.1.3000.7.3.1.0-160.jar:3.1.3000.7.3.1.0-SNAPSHOT]
at com.sun.proxy.$Proxy33.cleanNotificationEvents(Unknown Source) ~[?:?]
at org.apache.hive.hcatalog.listener.DbNotificationListener$CleanerThread.run(DbNotificationListener.java:1354) ~[hive-hcatalog-server-extensions-3.1.3000.7.3.1.0-160.jar:3.1.3000.7.3.1.0-160]
Caused by: org.datanucleus.exceptions.NucleusUserException: JDOQL Single-String query should always start with SELECT
at org.datanucleus.query.JDOQLSingleStringParser$Compiler.compileSelect(JDOQLSingleStringParser.java:148) ~[datanucleus-core-5.2.10.jar:?]
at org.datanucleus.query.JDOQLSingleStringParser$Compiler.compile(JDOQLSingleStringParser.java:119) ~[datanucleus-core-5.2.10.jar:?]
at org.datanucleus.query.JDOQLSingleStringParser$Compiler.access$000(JDOQLSingleStringParser.java:108) ~[datanucleus-core-5.2.10.jar:?]
at org.datanucleus.query.JDOQLSingleStringParser.parse(JDOQLSingleStringParser.java:100) ~[datanucleus-core-5.2.10.jar:?]
at org.datanucleus.store.query.AbstractJDOQLQuery.<init>(AbstractJDOQLQuery.java:122) ~[datanucleus-core-5.2.10.jar:?]
at org.datanucleus.store.rdbms.query.JDOQLQuery.<init>(JDOQLQuery.java:159) ~[datanucleus-rdbms-5.2.10.jar:?]
at org.datanucleus.store.rdbms.RDBMSStoreManager.newQuery(RDBMSStoreManager.java:1192) ~[datanucleus-rdbms-5.2.10.jar:?]
at org.datanucleus.api.jdo.JDOPersistenceManager.newQuery(JDOPersistenceManager.java:1300) ~[datanucleus-api-jdo-5.2.8.jar:?]
... 11 more
The code also indicates only datanucleus.query.jdoql.allowAll
is checked:
https://github.com/datanucleus/datanucleus-core/blob/c48c526ae7c154302904e906bcb809993c2a04ab/src/main/java/org/datanucleus/store/query/AbstractJDOQLQuery.java#L112-L116
tx.begin(); | ||
|
||
try (Query query = pm.newQuery(tableClass, "eventTime <= tooOld")) { | ||
try (Query query = pm.newQuery(String.format( | ||
"select %s, eventTime from %s where eventTime <= tooOld order by %s", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we try select min(%s), max(%s), min(eventTime), max(eventTime) ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like a direct sql implementation, can we move the optimization to MetaStoreDirectSql
? e.g, https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L606-L623
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we try select min(%s), max(%s), min(eventTime), max(eventTime) ...
We want to delete events using the given batch size. This query with query.setRange(0, batchSize.get())
at L11576 is used to get the n-th event id (i.e. maxId) so we can delete events in range [minId, maxId].
Using min(), max() functions, we might need a more complex query like
select min(eventId), max(eventId) from (
select eventId, eventTime from tableName
where eventTime <= tooOld order by eventId limit 10000);
However, it seems JDOQL just supports a simple syntax and doesn't allow this.
https://www.datanucleus.org/products/accessplatform_5_2/jdo/query.html#jdoql_string
Got an exception while trying the above query:
javax.jdo.JDOUserException: Query contains a JDOQL keyword ("from") that is out of order. Keywords can only be used in a defined order.
at org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:635) ~[datanucleus-api-jdo-5.2.8.jar:?]
at org.datanucleus.api.jdo.JDOPersistenceManager.newQuery(JDOPersistenceManager.java:1309) ~[datanucleus-api-jdo-5.2.8.jar:?]
at org.datanucleus.api.jdo.JDOPersistenceManager.newQuery(JDOPersistenceManager.java:1247) ~[datanucleus-api-jdo-5.2.8.jar:?]
at org.apache.hadoop.hive.metastore.ObjectStore.doCleanNotificationEvents(ObjectStore.java:12210) ~[hive-standalone-metastore-3.1.3000.7.3.1.0-160.jar:3.1.3000.7.3.1.0-SNAPSHOT]
at org.apache.hadoop.hive.metastore.ObjectStore.cleanOlderEvents(ObjectStore.java:12176) ~[hive-standalone-metastore-3.1.3000.7.3.1.0-160.jar:3.1.3000.7.3.1.0-SNAPSHOT]
at org.apache.hadoop.hive.metastore.ObjectStore.cleanNotificationEvents(ObjectStore.java:12162) ~[hive-standalone-metastore-3.1.3000.7.3.1.0-160.jar:3.1.3000.7.3.1.0-SNAPSHOT]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_432]
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_432]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_432]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_432]
at org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:97) ~[hive-standalone-metastore-3.1.3000.7.3.1.0-160.jar:3.1.3000.7.3.1.0-SNAPSHOT]
at com.sun.proxy.$Proxy33.cleanNotificationEvents(Unknown Source) ~[?:?]
at org.apache.hive.hcatalog.listener.DbNotificationListener$CleanerThread.run(DbNotificationListener.java:1354) ~[hive-hcatalog-server-extensions-3.1.3000.7.3.1.0-160.jar:3.1.3000.7.3.1.0-160]
Caused by: org.datanucleus.exceptions.NucleusUserException: Query contains a JDOQL keyword ("from") that is out of order. Keywords can only be used in a defined order.
at org.datanucleus.query.JDOQLSingleStringParser$Compiler.compile(JDOQLSingleStringParser.java:124) ~[datanucleus-core-5.2.10.jar:?]
at org.datanucleus.query.JDOQLSingleStringParser$Compiler.access$000(JDOQLSingleStringParser.java:108) ~[datanucleus-core-5.2.10.jar:?]
at org.datanucleus.query.JDOQLSingleStringParser.parse(JDOQLSingleStringParser.java:100) ~[datanucleus-core-5.2.10.jar:?]
at org.datanucleus.store.query.AbstractJDOQLQuery.<init>(AbstractJDOQLQuery.java:122) ~[datanucleus-core-5.2.10.jar:?]
at org.datanucleus.store.rdbms.query.JDOQLQuery.<init>(JDOQLQuery.java:159) ~[datanucleus-rdbms-5.2.10.jar:?]
at org.datanucleus.store.rdbms.RDBMSStoreManager.newQuery(RDBMSStoreManager.java:1192) ~[datanucleus-rdbms-5.2.10.jar:?]
at org.datanucleus.api.jdo.JDOPersistenceManager.newQuery(JDOPersistenceManager.java:1300) ~[datanucleus-api-jdo-5.2.8.jar:?]
... 11 more
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like a direct sql implementation, can we move the optimization to MetaStoreDirectSql?
For directSql, I think using JDOQL is simpler since we don't need to deal with escapes in different kinds of databases, e.g. PostgreSQL quotes all the column names. Though we need to retrive 10000 (by default) ids and timestamps using JDOQL, the memory footprint is acceptable and it might not worth adding the complexity of using directSql.
TBO, I'm new to the directSql codes. I might overestimate the complexity. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, it might not as complex as you think, the directSql has already supported the delete here, without the new DATANUCLEUS_QUERY_JDOQL_ALLOWALL
solely for this change,
What changes were proposed in this pull request?
Improves DB-Notification-Cleaner thread to just fetch the eventId/txnId and eventTime of the old events, then delete them based on the id range. This avoids the thread hitting OOM and silently die in cleaning huge events that have a long message body.
Set datanucleus.query.jdoql.allowAll=true to allow using DELETE in JDOQL.
Why are the changes needed?
Fix the DB-Notification-Cleaner hitting OOM and die silently.
Does this PR introduce any user-facing change?
No
Is the change a dependency upgrade?
No
How was this patch tested?
Tested in a downstream build of CDP Hive 3.1.3000.7.3.1.0-160 in an env that have 1134925 old events. Some a pretty big that have a long message body. The DB is PostgreSQL.
Before this fix, HMS is not able to clean the events using the default
hive.metastore.event.db.clean.maxevents
(10000).After this fix, it works as expected: