Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] This is a bug! The partition org.apache.paimon.data.BinaryRow@9c67b85d and bucket 200 is filtered! #4542

Closed
1 of 2 tasks
19922577513 opened this issue Nov 18, 2024 · 2 comments
Labels
bug Something isn't working

Comments

@19922577513
Copy link

Search before asking

  • I searched in the issues and found nothing similar.

Paimon version

paimon-flink-1.17-0.8.jar
paimon-flink-action-0.8.0.jar
paimon-hive-connector-2.3-0.8.0.jar

Compute Engine

flink version -> 1.17.2
hadoop -> 3.3.5

Minimal reproduce step

The cdc sync_table table synchronization task needs to be restarted to increase the task memory because the memory is insufficient. After the restart, an exception occurs.
This is the task run command:
sudo ./bin/flink run /home/q/module/flink/flink-1.17.2/lib/paimon-flink-action-0.8.0.jar mysql_sync_table
--warehouse viewfs://qunarcluster/user/tujiadev/home/data/paimon/warehouse
--type_mapping tinyint1-not-bool,to-nullable,char-to-string,bigint-unsigned-to-bigint
--database tujia_ods
--table ods_tns_baseinfo_log_house_log_paimon
--metadata_column database_name,table_name
--primary_keys id,house_id,operate_type,operate_id,operate_user,account_type,operate_platform,operate_content,comment,create_time
--mysql_conf hostname=10.88.134.114
--mysql_conf username=bi_rep_user
--mysql_conf password=V3.blpazqc8gjsf
--mysql_conf database-name='tns_baseinfo_log'
--mysql_conf port=3316
--mysql_conf server-id=6000-6002
--mysql_conf table-name='house_log_.+'
--mysql_conf scan.incremental.snapshot.chunk.size=81920
--mysql_conf scan.startup.mode=timestamp
--mysql_conf scan.startup.timestamp-millis=1731513600000
--table_conf bucket=320
--table_conf sink.parallelism=1
--table_conf file.format=parquet
--table_conf file.compression=snappy
--table_conf scan.infer-parallelism=false
--table_conf scan.parallelism=100
--table_conf snapshot.time-retained=4h
--catalog_conf metastore=hive
--catalog_conf uri='thrift://l-hiveserver2.data.cn2:9083,thrift://l-hiveserver5.data.cn2:9083,thrift://l-hiveserver6.data.cn2:9083,thrift://l-hiveserver7.data.cn2:9083,thrift://l-hiveserver8.data.cn2:9083'

What doesn't meet your expectations?

运行后异常如下
java.io.IOException: java.lang.IllegalArgumentException: This is a bug! The partition org.apache.paimon.data.BinaryRow@9c67b85d and bucket 200 is filtered!

Anything else?

No response

Are you willing to submit a PR?

  • I'm willing to submit a PR!
@19922577513 19922577513 added the bug Something isn't working label Nov 18, 2024
@ningoy
Copy link

ningoy commented Nov 20, 2024

Thank you for providing detailed information about your issue.

I appreciate that you searched through the issues before posting. Based on the details you've shared, it seems like you are encountering a bug related to partition filtering after restarting the CDC sync task due to insufficient memory.

The error message you provided (java.io.IOException: java.lang.IllegalArgumentException: This is a bug! The partition org.apache.paimon.data.BinaryRow@9c67b85d and bucket 200 is filtered!) suggests that there may be an issue with how partitions and buckets are being handled in your configuration.

To better assist you, could you please provide the following additional information?

The complete stack trace of the error message.
Any specific logs or outputs from Flink that might provide more context.
The configuration settings for the Flink job, especially any related to partitioning or bucketing.
Additionally, if you have any insights into the specific conditions under which the error occurs or if it happens consistently, that would be very helpful.

Regarding your willingness to submit a PR, that's great to hear! Once we gather more information, we can discuss the next steps for addressing this issue.

Looking forward to your response!

@JingsongLi
Copy link
Contributor

This should fixed already.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants