Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HIVE-28808: Avoid reading event messages in DB-Notification-Cleaner #5688

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

stiga-huang
Copy link
Contributor

What changes were proposed in this pull request?

Improves DB-Notification-Cleaner thread to just fetch the eventId/txnId and eventTime of the old events, then delete them based on the id range. This avoids the thread hitting OOM and silently die in cleaning huge events that have a long message body.

Set datanucleus.query.jdoql.allowAll=true to allow using DELETE in JDOQL.

Why are the changes needed?

Fix the DB-Notification-Cleaner hitting OOM and die silently.

Does this PR introduce any user-facing change?

No

Is the change a dependency upgrade?

No

How was this patch tested?

Tested in a downstream build of CDP Hive 3.1.3000.7.3.1.0-160 in an env that have 1134925 old events. Some a pretty big that have a long message body. The DB is PostgreSQL.
Before this fix, HMS is not able to clean the events using the default hive.metastore.event.db.clean.maxevents (10000).
After this fix, it works as expected:

2025-03-10T10:43:59,651  INFO [DB-Notification-Cleaner] metastore.ObjectStore: Deleted 1134893 NotificationLog events older than epoch:1741488200 in 38667ms
2025-03-10T10:43:59,740  INFO [DB-Notification-Cleaner] metastore.ObjectStore: WriteNotification Cleaned 1072 events with eventTime < 1741488239 in 1 iteration, minimum txnId 187032 (with eventTime 1674098659) and maximum txnId 194134 (with eventTime 1734521765)

Improves DB-Notification-Cleaner thread to just fetch the eventId/txnId
and eventTime of the old events, then delete them based on the id range.
This avoids the thread hitting OOM and silently die in cleaning huge
events that have a long message body.

Set datanucleus.query.jdoql.allowAll=true to allow using DELETE in
JDOQL.
@@ -2064,6 +2068,7 @@ public String toString() {
ConfVars.DATANUCLEUS_TRANSACTION_ISOLATION,
ConfVars.DATANUCLEUS_USE_LEGACY_VALUE_STRATEGY,
ConfVars.DATANUCLEUS_QUERY_SQL_ALLOWALL,
ConfVars.DATANUCLEUS_QUERY_JDOQL_ALLOWALL,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the DATANUCLEUS_QUERY_SQL_ALLOWALL cover the DATANUCLEUS_QUERY_JDOQL_ALLOWALL ? what if the datanucleus.query.jdoql.allowAll is set to false?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think they are different. Without setting datanucleus.query.jdoql.allowAll to true, we will hit this exception:

2025-03-10T09:48:00,444 ERROR [DB-Notification-Cleaner] metastore.ObjectStore: Unable to delete batch of NotificationLog events
javax.jdo.JDOUserException: JDOQL Single-String query should always start with SELECT
        at org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:635) ~[datanucleus-api-jdo-5.2.8.jar:?]
        at org.datanucleus.api.jdo.JDOPersistenceManager.newQuery(JDOPersistenceManager.java:1309) ~[datanucleus-api-jdo-5.2.8.jar:?]
        at org.datanucleus.api.jdo.JDOPersistenceManager.newQuery(JDOPersistenceManager.java:1247) ~[datanucleus-api-jdo-5.2.8.jar:?]
        at org.apache.hadoop.hive.metastore.ObjectStore.doCleanNotificationEvents(ObjectStore.java:12231) ~[hive-standalone-metastore-3.1.3000.7.3.1.0-160.jar:3.1.3000.7.3.1.0-SNAPSHOT]
        at org.apache.hadoop.hive.metastore.ObjectStore.cleanOlderEvents(ObjectStore.java:12175) ~[hive-standalone-metastore-3.1.3000.7.3.1.0-160.jar:3.1.3000.7.3.1.0-SNAPSHOT]
        at org.apache.hadoop.hive.metastore.ObjectStore.cleanNotificationEvents(ObjectStore.java:12161) ~[hive-standalone-metastore-3.1.3000.7.3.1.0-160.jar:3.1.3000.7.3.1.0-SNAPSHOT]
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_432]
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_432]
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_432]
        at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_432]
        at org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:97) ~[hive-standalone-metastore-3.1.3000.7.3.1.0-160.jar:3.1.3000.7.3.1.0-SNAPSHOT]
        at com.sun.proxy.$Proxy33.cleanNotificationEvents(Unknown Source) ~[?:?]
        at org.apache.hive.hcatalog.listener.DbNotificationListener$CleanerThread.run(DbNotificationListener.java:1354) ~[hive-hcatalog-server-extensions-3.1.3000.7.3.1.0-160.jar:3.1.3000.7.3.1.0-160]
Caused by: org.datanucleus.exceptions.NucleusUserException: JDOQL Single-String query should always start with SELECT
        at org.datanucleus.query.JDOQLSingleStringParser$Compiler.compileSelect(JDOQLSingleStringParser.java:148) ~[datanucleus-core-5.2.10.jar:?]
        at org.datanucleus.query.JDOQLSingleStringParser$Compiler.compile(JDOQLSingleStringParser.java:119) ~[datanucleus-core-5.2.10.jar:?]
        at org.datanucleus.query.JDOQLSingleStringParser$Compiler.access$000(JDOQLSingleStringParser.java:108) ~[datanucleus-core-5.2.10.jar:?]
        at org.datanucleus.query.JDOQLSingleStringParser.parse(JDOQLSingleStringParser.java:100) ~[datanucleus-core-5.2.10.jar:?]
        at org.datanucleus.store.query.AbstractJDOQLQuery.<init>(AbstractJDOQLQuery.java:122) ~[datanucleus-core-5.2.10.jar:?]
        at org.datanucleus.store.rdbms.query.JDOQLQuery.<init>(JDOQLQuery.java:159) ~[datanucleus-rdbms-5.2.10.jar:?]
        at org.datanucleus.store.rdbms.RDBMSStoreManager.newQuery(RDBMSStoreManager.java:1192) ~[datanucleus-rdbms-5.2.10.jar:?]
        at org.datanucleus.api.jdo.JDOPersistenceManager.newQuery(JDOPersistenceManager.java:1300) ~[datanucleus-api-jdo-5.2.8.jar:?]
        ... 11 more

The code also indicates only datanucleus.query.jdoql.allowAll is checked:
https://github.com/datanucleus/datanucleus-core/blob/c48c526ae7c154302904e906bcb809993c2a04ab/src/main/java/org/datanucleus/store/query/AbstractJDOQLQuery.java#L112-L116

tx.begin();

try (Query query = pm.newQuery(tableClass, "eventTime <= tooOld")) {
try (Query query = pm.newQuery(String.format(
"select %s, eventTime from %s where eventTime <= tooOld order by %s",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we try select min(%s), max(%s), min(eventTime), max(eventTime) ...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we try select min(%s), max(%s), min(eventTime), max(eventTime) ...

We want to delete events using the given batch size. This query with query.setRange(0, batchSize.get()) at L11576 is used to get the n-th event id (i.e. maxId) so we can delete events in range [minId, maxId].

Using min(), max() functions, we might need a more complex query like

select min(eventId), max(eventId) from (
  select eventId, eventTime from tableName
  where eventTime <= tooOld order by eventId limit 10000);

However, it seems JDOQL just supports a simple syntax and doesn't allow this.
https://www.datanucleus.org/products/accessplatform_5_2/jdo/query.html#jdoql_string

Got an exception while trying the above query:

javax.jdo.JDOUserException: Query contains a JDOQL keyword ("from") that is out of order. Keywords can only be used in a defined order.
        at org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:635) ~[datanucleus-api-jdo-5.2.8.jar:?]
        at org.datanucleus.api.jdo.JDOPersistenceManager.newQuery(JDOPersistenceManager.java:1309) ~[datanucleus-api-jdo-5.2.8.jar:?]
        at org.datanucleus.api.jdo.JDOPersistenceManager.newQuery(JDOPersistenceManager.java:1247) ~[datanucleus-api-jdo-5.2.8.jar:?]
        at org.apache.hadoop.hive.metastore.ObjectStore.doCleanNotificationEvents(ObjectStore.java:12210) ~[hive-standalone-metastore-3.1.3000.7.3.1.0-160.jar:3.1.3000.7.3.1.0-SNAPSHOT]
        at org.apache.hadoop.hive.metastore.ObjectStore.cleanOlderEvents(ObjectStore.java:12176) ~[hive-standalone-metastore-3.1.3000.7.3.1.0-160.jar:3.1.3000.7.3.1.0-SNAPSHOT]
        at org.apache.hadoop.hive.metastore.ObjectStore.cleanNotificationEvents(ObjectStore.java:12162) ~[hive-standalone-metastore-3.1.3000.7.3.1.0-160.jar:3.1.3000.7.3.1.0-SNAPSHOT]
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_432]
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_432]
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_432]
        at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_432]
        at org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:97) ~[hive-standalone-metastore-3.1.3000.7.3.1.0-160.jar:3.1.3000.7.3.1.0-SNAPSHOT]
        at com.sun.proxy.$Proxy33.cleanNotificationEvents(Unknown Source) ~[?:?]
        at org.apache.hive.hcatalog.listener.DbNotificationListener$CleanerThread.run(DbNotificationListener.java:1354) ~[hive-hcatalog-server-extensions-3.1.3000.7.3.1.0-160.jar:3.1.3000.7.3.1.0-160]
Caused by: org.datanucleus.exceptions.NucleusUserException: Query contains a JDOQL keyword ("from") that is out of order. Keywords can only be used in a defined order.
        at org.datanucleus.query.JDOQLSingleStringParser$Compiler.compile(JDOQLSingleStringParser.java:124) ~[datanucleus-core-5.2.10.jar:?]
        at org.datanucleus.query.JDOQLSingleStringParser$Compiler.access$000(JDOQLSingleStringParser.java:108) ~[datanucleus-core-5.2.10.jar:?]
        at org.datanucleus.query.JDOQLSingleStringParser.parse(JDOQLSingleStringParser.java:100) ~[datanucleus-core-5.2.10.jar:?]
        at org.datanucleus.store.query.AbstractJDOQLQuery.<init>(AbstractJDOQLQuery.java:122) ~[datanucleus-core-5.2.10.jar:?]
        at org.datanucleus.store.rdbms.query.JDOQLQuery.<init>(JDOQLQuery.java:159) ~[datanucleus-rdbms-5.2.10.jar:?]
        at org.datanucleus.store.rdbms.RDBMSStoreManager.newQuery(RDBMSStoreManager.java:1192) ~[datanucleus-rdbms-5.2.10.jar:?]
        at org.datanucleus.api.jdo.JDOPersistenceManager.newQuery(JDOPersistenceManager.java:1300) ~[datanucleus-api-jdo-5.2.8.jar:?]
        ... 11 more

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like a direct sql implementation, can we move the optimization to MetaStoreDirectSql?

For directSql, I think using JDOQL is simpler since we don't need to deal with escapes in different kinds of databases, e.g. PostgreSQL quotes all the column names. Though we need to retrive 10000 (by default) ids and timestamps using JDOQL, the memory footprint is acceptable and it might not worth adding the complexity of using directSql.

TBO, I'm new to the directSql codes. I might overestimate the complexity. :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, it might not as complex as you think, the directSql has already supported the delete here, without the new DATANUCLEUS_QUERY_JDOQL_ALLOWALL solely for this change,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants