[HUDI-8550] Make Hudi 1.x write timeline to a dedicated timeline folder under .hoodie #12288

bvaradar · 2024-11-18T22:27:59Z

Change Logs

Hudi 1.x to use dedicated timeline folder under .hoodie

Impact

Timeline folder to move to a dedicated folder.

Risk level (write none, low medium or high below)

none

Documentation Update

none

The config description must be updated if new configs are added or the default value of the configs are changed
Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the
ticket number here and follow the instruction to make
changes to the website.

Contributor's checklist

Read through contributor's guide
Change Logs and Impact were stated clearly
Adequate tests were added if applicable
CI passed

danny0405 · 2024-11-19T01:39:52Z

hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieWriteClient.java

    if (needsUpgradeOrDowngrade(metaClient)) {
      // unclear what instant to use, since upgrade does have a given instant.
      executeUsingTxnManager(Option.empty(), () -> tryUpgrade(metaClient, Option.empty()));
+      updatedMetaClient = createMetaClient(true);


We better update the options within tryUpgrade, like what we already do: reloadTableConfig, reloadActiveTimeline.

The metaClient is locally passed to this method which in turn needs to be passed to startCommit. On upgrade, metaclient is stale (wrong layoutVetsion) and needs to be updated. This is the reason for the above change.

Can we just refresh the timelineLayoutVersion and timelineLayout from the table config right after the table config are reloaded, so the refresh sequence become:

refresh table config

refresh timeline layout

refresh timeline

Or we just remove the reload in tryUpgrade which is much simpler, and we need to take care all the calls of tryUpgrade.

hudi-common/src/main/java/org/apache/hudi/common/table/timeline/CommitMetadataSerDe.java

bvaradar · 2024-11-21T01:19:41Z

@danny0405 : Apart from the two comments, Are you ok with other changes ?

danny0405 · 2024-11-21T03:23:12Z

@danny0405 : Apart from the two comments, Are you ok with other changes ?

Yes, overall looks good to me, I reviewed EightToSevenDowngradeHandler in detail and it should be good, the other changes are minor besides that meta client refresh change.

...mmon/src/main/java/org/apache/hudi/common/table/timeline/versioning/v2/ActiveTimelineV2.java

hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableMetaClient.java

vinothchandar · 2024-11-21T13:18:44Z

@bvaradar Have not looked very closely. but can we also make sure the lsm timeline moves along with new folder.. Call it history instead of archived?

so

.hoodie 
   |______timeline <-- active lives here
                       |________ history <-- lsm lives here.

over 1.1 or 1.2 I want to fully converge it to a single timeline.

...-client-common/src/main/java/org/apache/hudi/table/upgrade/EightToSevenDowngradeHandler.java

…er under .hoodie

codope · 2024-11-22T09:48:44Z

...spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/ddl/TestSpark3DDL.scala

@@ -711,7 +711,7 @@ class TestSpark3DDL extends HoodieSparkSqlTestBase {

  test("Test schema auto evolution") {
    withTempDir { tmp =>
-      Seq("COPY_ON_WRITE", "MERGE_ON_READ").foreach { tableType =>
+      Seq("COPY_ON_WRITE").foreach { tableType =>


Need to revisit this after fixing all other tests. For MOR, this test fails due to

Caused by: org.apache.parquet.io.ParquetDecodingException: Can not read value at 1 in block 0 in file file:/private/var/folders/s5/pqxf5ndx12qg6h0zgl2d9zxh0000gn/T/spark-83fab01d-7af8-4c22-9dee-8d840aa02e90/h1/americas/brazil/sao_paulo/c7c9ab23-56f7-45f4-bdbe-d7a8de9671bf-0_0-22-35_20241122094757341.parquet at org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:264) at org.apache.parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:210) at org.apache.spark.sql.execution.datasources.parquet.ParquetRowIndexUtil$RecordReaderWithRowIndexes.nextKeyValue(ParquetRowIndexUtil.scala:89) at org.apache.spark.sql.execution.datasources.RecordReaderIterator.hasNext(RecordReaderIterator.scala:39) at org.apache.spark.sql.execution.datasources.RecordReaderIterator$$anon$1.hasNext(RecordReaderIterator.scala:61) at org.apache.hudi.util.CloseableInternalRowIterator.hasNext(CloseableInternalRowIterator.scala:50) at org.apache.hudi.common.table.read.HoodieKeyBasedFileGroupRecordBuffer.doHasNext(HoodieKeyBasedFileGroupRecordBuffer.java:134) at org.apache.hudi.common.table.read.HoodieBaseFileGroupRecordBuffer.hasNext(HoodieBaseFileGroupRecordBuffer.java:149) at org.apache.hudi.common.table.read.HoodieFileGroupReader.hasNext(HoodieFileGroupReader.java:235) at org.apache.hudi.common.table.read.HoodieFileGroupReader$HoodieFileGroupReaderIterator.hasNext(HoodieFileGroupReader.java:289) at org.apache.spark.sql.execution.datasources.parquet.HoodieFileGroupReaderBasedParquetFileFormat$$anon$1.hasNext(HoodieFileGroupReaderBasedParquetFileFormat.scala:273) at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:129) at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:283) ... 22 more Caused by: java.lang.ClassCastException: org.apache.spark.sql.catalyst.expressions.MutableAny cannot be cast to org.apache.spark.sql.catalyst.expressions.MutableDouble at org.apache.spark.sql.catalyst.expressions.SpecificInternalRow.setDouble(SpecificInternalRow.scala:284) at org.apache.spark.sql.execution.datasources.parquet.ParquetRowConverter$RowUpdater.setDouble(ParquetRowConverter.scala:185) at org.apache.spark.sql.execution.datasources.parquet.ParquetPrimitiveConverter.addDouble(ParquetRowConverter.scala:96) at org.apache.parquet.column.impl.ColumnReaderBase$2$2.writeValue(ColumnReaderBase.java:269) at org.apache.parquet.column.impl.ColumnReaderBase.writeCurrentValueToConverter(ColumnReaderBase.java:440) at org.apache.parquet.column.impl.ColumnReaderImpl.writeCurrentValueToConverter(ColumnReaderImpl.java:30) at org.apache.parquet.io.RecordReaderImplementation.read(RecordReaderImplementation.java:406) at org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:234) ... 34 more

hudi-bot · 2024-11-22T15:37:49Z

CI report:

dd362fe UNKNOWN
5cdf176 Azure: FAILURE
f8e6683 Azure: PENDING

Bot commands

@hudi-bot supports the following commands:

@hudi-bot run azure re-run the last Azure build

github-actions bot added the size:L PR with lines of changes in (300, 1000] label Nov 18, 2024

danny0405 reviewed Nov 19, 2024

View reviewed changes

hudi-common/src/main/java/org/apache/hudi/common/table/timeline/CommitMetadataSerDe.java Outdated Show resolved Hide resolved

danny0405 added migration version-compatibility labels Nov 21, 2024

codope reviewed Nov 21, 2024

View reviewed changes

...mmon/src/main/java/org/apache/hudi/common/table/timeline/versioning/v2/ActiveTimelineV2.java Show resolved Hide resolved

hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableMetaClient.java Outdated Show resolved Hide resolved

codope reviewed Nov 21, 2024

View reviewed changes

...-client-common/src/main/java/org/apache/hudi/table/upgrade/EightToSevenDowngradeHandler.java Outdated Show resolved Hide resolved

balaji-varadarajan-ai and others added 15 commits November 22, 2024 12:07

[HUDI-8550] Make Hudi 1.x write timeline to a dedicated timeline fold…

b282946

…er under .hoodie

More cleanup

eeb0727

Fix test

08cb685

Fix test failures

678fb74

More test fixes

7ed1760

Fix bug in file slice after downgrade. Fix more tests.

cafa794

Address review comments

359b769

pass timeline path when creating meta file

0bdb224

update the timeline layout for meta client table config reloading

48e93de

Fix one more test

e4c033b

Make Timeline folder a table property

05d3550

Make Archive folder to be under timeline folder as was requested

b0ef6d7

Fix compilation

99ea8bd

Fix compilation

f38510d

handle active timeline migration in upgrade

a90e9ea

codope force-pushed the HUDI-8550-new_timeline_folder branch from 5a9c25f to a90e9ea Compare November 22, 2024 06:38

fix fsview, timeline utils tests

32526f6

codope reviewed Nov 22, 2024

View reviewed changes

codope added 2 commits November 22, 2024 18:12

fix more tests

5cdf176

fix pending restore and meta sync tests

f8e6683

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[HUDI-8550] Make Hudi 1.x write timeline to a dedicated timeline folder under .hoodie #12288

[HUDI-8550] Make Hudi 1.x write timeline to a dedicated timeline folder under .hoodie #12288

bvaradar commented Nov 18, 2024

danny0405 Nov 19, 2024

bvaradar Nov 21, 2024

danny0405 Nov 21, 2024 •

edited

Loading

bvaradar commented Nov 21, 2024

danny0405 commented Nov 21, 2024

vinothchandar commented Nov 21, 2024

codope Nov 22, 2024

hudi-bot commented Nov 22, 2024

[HUDI-8550] Make Hudi 1.x write timeline to a dedicated timeline folder under .hoodie #12288

Are you sure you want to change the base?

[HUDI-8550] Make Hudi 1.x write timeline to a dedicated timeline folder under .hoodie #12288

Conversation

bvaradar commented Nov 18, 2024

Change Logs

Impact

Risk level (write none, low medium or high below)

Documentation Update

Contributor's checklist

danny0405 Nov 19, 2024

Choose a reason for hiding this comment

bvaradar Nov 21, 2024

Choose a reason for hiding this comment

danny0405 Nov 21, 2024 • edited Loading

Choose a reason for hiding this comment

bvaradar commented Nov 21, 2024

danny0405 commented Nov 21, 2024

vinothchandar commented Nov 21, 2024

codope Nov 22, 2024

Choose a reason for hiding this comment

hudi-bot commented Nov 22, 2024

CI report:

danny0405 Nov 21, 2024 •

edited

Loading