-
Notifications
You must be signed in to change notification settings - Fork 932
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[KYUUBI #6912][LINEAGE] Properly handle empty attribute set on mergeRelationColumnLineage #6911
[KYUUBI #6912][LINEAGE] Properly handle empty attribute set on mergeRelationColumnLineage #6911
Conversation
(acc + (attr -> x.head._2), x.tail) | ||
val head_2 = if (x.isEmpty) AttributeSet.empty else x.head._2 | ||
val tail = if (x.isEmpty) x.empty else x.tail | ||
(acc + (attr -> head_2), tail) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe can be written in
relationOutput.foldLeft((ListMap[Attribute, AttributeSet](), relationColumnLineage)) {
case ((acc, x), attr) if x.isEmpty =>
(acc + (attr -> AttributeSet.empty), ListMap.empty)
case ((acc, x), attr) =>
(acc + (attr -> x.head._2), x.tail)
}._1
TBH, the code is not much readable ...
s""" | ||
|insert overwrite table test_db.test_table_from_dir | ||
|SELECT | ||
| `a`, | ||
| `b` | ||
|FROM temp_view | ||
|""".stripMargin) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: use UPPER CASE for SQL keywords, and format the SQL in a short readable way
s""" | |
|insert overwrite table test_db.test_table_from_dir | |
|SELECT | |
| `a`, | |
| `b` | |
|FROM temp_view | |
|""".stripMargin) | |
s""" | |
|INSERT OVERWRITE TABLE test_db.test_table_from_dir | |
|SELECT `a`, `b` FROM temp_view | |
|""".stripMargin) |
s""" | ||
| CREATE OR REPLACE TEMPORARY VIEW temp_view | ||
|( | ||
| `a` STRING COMMENT '', | ||
| `b` STRING COMMENT '' | ||
|) | ||
|USING csv OPTIONS( | ||
| sep='\t', | ||
| path='${sourceFile.path}' | ||
|); | ||
|""".stripMargin).collect() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s""" | |
| CREATE OR REPLACE TEMPORARY VIEW temp_view | |
|( | |
| `a` STRING COMMENT '', | |
| `b` STRING COMMENT '' | |
|) | |
|USING csv OPTIONS( | |
| sep='\t', | |
| path='${sourceFile.path}' | |
|); | |
|""".stripMargin).collect() | |
s""" | |
|CREATE OR REPLACE TEMPORARY VIEW temp_view ( | |
| `a` STRING COMMENT '', | |
| `b` STRING COMMENT '' | |
|) USING csv OPTIONS( | |
| sep='\t', | |
| path='${sourceFile.path}' | |
|) | |
|""".stripMargin).collect() |
thanks for fixing the issue, leave some nits, BTW, you can run |
thanks for reviewing. I will optimize the code by your advice. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #6911 +/- ##
======================================
Coverage 0.00% 0.00%
======================================
Files 688 688
Lines 42590 42592 +2
Branches 5805 5806 +1
======================================
- Misses 42590 42592 +2 ☔ View full report in Codecov by Sentry. |
...rc/test/scala/org/apache/kyuubi/plugin/lineage/helper/SparkSQLLineageParserHelperSuite.scala
Outdated
Show resolved
Hide resolved
…e/kyuubi/plugin/lineage/helper/SparkSQLLineageParserHelperSuite.scala
...rc/test/scala/org/apache/kyuubi/plugin/lineage/helper/SparkSQLLineageParserHelperSuite.scala
Outdated
Show resolved
Hide resolved
…e/kyuubi/plugin/lineage/helper/SparkSQLLineageParserHelperSuite.scala
…elationColumnLineage # Why are the changes needed? ## Issue reference: #6912 ## How to reproduce the issue? The changes in this PR will avoid a wrong result when generating the instance of org.apache.kyuubi.plugin.lineage.Lineage, in the certain case as follows: step 1: create a temporary view from a file step 2: insert into a table by selecting from the temporary view in step 1 step 3: generate the lineage when executing the insert statement in step 2 In detail, please see the UT code submission in this patch. ## The issue analysis Let's see the current code when getting the Lineage object by resolving a LogicalPlan object: <img width="694" alt="image" src="https://github.com/user-attachments/assets/65256a0d-320d-4271-968f-59eafb74de9f" /> According to the above logic, a None org.apache.kyuubi.plugin.lineage.Lineage object will be generated due to "try-catch" self-protection, in this certain case. This None object will lead to problems in the following 2 scenes: ### Unit Test Environment In Unit Test, when the code runs here a "None.get" exception will be raised: <img width="682" alt="image" src="https://github.com/user-attachments/assets/102dc9bd-294f-4b1e-b1c6-01b6fee50fed" /> Here's the runtime exception stack: ``` None.get java.util.NoSuchElementException: None.get at scala.None$.get(Option.scala:529) at scala.None$.get(Option.scala:527) at org.apache.kyuubi.plugin.lineage.helper.SparkSQLLineageParserHelperSuite.extractLineageWithoutExecuting(SparkSQLLineageParserHelperSuite.scala:1485) at org.apache.kyuubi.plugin.lineage.helper.SparkSQLLineageParserHelperSuite.$anonfun$new$83(SparkSQLLineageParserHelperSuite.scala:1465) ``` ### Production Environment This Lineage object cannot be used in the production environment because it has a None value which lacks some necessary lineage information. The right content of the Lineage instance in the above case should be: ``` inputTables(List()) outputTables(List(spark_catalog.test_db.test_table_from_dir)) columnLineage(List(ColumnLineage(spark_catalog.test_db.test_table_from_dir.a0,Set()), ColumnLineage(spark_catalog.test_db.test_table_from_dir.b0,Set()))) ``` a newly added test case(test directory to table) passed after this issue is fixed. # How to fix the issue? Add a "Empty judgment" logic. In detail, please see the code submission in this patch. # How was this patch tested? 1. by adding a new test case in UT code and make sure it passes 2. by submitting a Spark application including the SQL of this case in the production environment, and make sure a right Lineage instance is generated, instead of a None object # Was this patch authored or co-authored using generative AI tooling? No Closes #6911 from xglv1985/fix_spark_lineage_runtime_exception. Closes #6912 13a7107 [Cheng Pan] Update extensions/spark/kyuubi-spark-lineage/src/test/scala/org/apache/kyuubi/plugin/lineage/helper/SparkSQLLineageParserHelperSuite.scala 4e89b95 [Cheng Pan] Update extensions/spark/kyuubi-spark-lineage/src/test/scala/org/apache/kyuubi/plugin/lineage/helper/SparkSQLLineageParserHelperSuite.scala 59b350b [xglv1985] fix a runtime exception when generate column lineage tuple--more readable code 52bc028 [xglv1985] fix a runtime exception when generate column lineage tuple--spotless sytle fea6bbc [xglv1985] fix a runtime exception when generate column lineage tuple--remove tab from UT code 9018790 [xglv1985] fix a runtime exception when generate column lineage tuple--unit test fbb4df8 [xglv1985] fix a runtime exception when generate column lineage tuple Lead-authored-by: xglv1985 <[email protected]> Co-authored-by: Cheng Pan <[email protected]> Signed-off-by: Cheng Pan <[email protected]> (cherry picked from commit 7c110b6) Signed-off-by: Cheng Pan <[email protected]>
…elationColumnLineage # Why are the changes needed? ## Issue reference: #6912 ## How to reproduce the issue? The changes in this PR will avoid a wrong result when generating the instance of org.apache.kyuubi.plugin.lineage.Lineage, in the certain case as follows: step 1: create a temporary view from a file step 2: insert into a table by selecting from the temporary view in step 1 step 3: generate the lineage when executing the insert statement in step 2 In detail, please see the UT code submission in this patch. ## The issue analysis Let's see the current code when getting the Lineage object by resolving a LogicalPlan object: <img width="694" alt="image" src="https://github.com/user-attachments/assets/65256a0d-320d-4271-968f-59eafb74de9f" /> According to the above logic, a None org.apache.kyuubi.plugin.lineage.Lineage object will be generated due to "try-catch" self-protection, in this certain case. This None object will lead to problems in the following 2 scenes: ### Unit Test Environment In Unit Test, when the code runs here a "None.get" exception will be raised: <img width="682" alt="image" src="https://github.com/user-attachments/assets/102dc9bd-294f-4b1e-b1c6-01b6fee50fed" /> Here's the runtime exception stack: ``` None.get java.util.NoSuchElementException: None.get at scala.None$.get(Option.scala:529) at scala.None$.get(Option.scala:527) at org.apache.kyuubi.plugin.lineage.helper.SparkSQLLineageParserHelperSuite.extractLineageWithoutExecuting(SparkSQLLineageParserHelperSuite.scala:1485) at org.apache.kyuubi.plugin.lineage.helper.SparkSQLLineageParserHelperSuite.$anonfun$new$83(SparkSQLLineageParserHelperSuite.scala:1465) ``` ### Production Environment This Lineage object cannot be used in the production environment because it has a None value which lacks some necessary lineage information. The right content of the Lineage instance in the above case should be: ``` inputTables(List()) outputTables(List(spark_catalog.test_db.test_table_from_dir)) columnLineage(List(ColumnLineage(spark_catalog.test_db.test_table_from_dir.a0,Set()), ColumnLineage(spark_catalog.test_db.test_table_from_dir.b0,Set()))) ``` a newly added test case(test directory to table) passed after this issue is fixed. # How to fix the issue? Add a "Empty judgment" logic. In detail, please see the code submission in this patch. # How was this patch tested? 1. by adding a new test case in UT code and make sure it passes 2. by submitting a Spark application including the SQL of this case in the production environment, and make sure a right Lineage instance is generated, instead of a None object # Was this patch authored or co-authored using generative AI tooling? No Closes #6911 from xglv1985/fix_spark_lineage_runtime_exception. Closes #6912 13a7107 [Cheng Pan] Update extensions/spark/kyuubi-spark-lineage/src/test/scala/org/apache/kyuubi/plugin/lineage/helper/SparkSQLLineageParserHelperSuite.scala 4e89b95 [Cheng Pan] Update extensions/spark/kyuubi-spark-lineage/src/test/scala/org/apache/kyuubi/plugin/lineage/helper/SparkSQLLineageParserHelperSuite.scala 59b350b [xglv1985] fix a runtime exception when generate column lineage tuple--more readable code 52bc028 [xglv1985] fix a runtime exception when generate column lineage tuple--spotless sytle fea6bbc [xglv1985] fix a runtime exception when generate column lineage tuple--remove tab from UT code 9018790 [xglv1985] fix a runtime exception when generate column lineage tuple--unit test fbb4df8 [xglv1985] fix a runtime exception when generate column lineage tuple Lead-authored-by: xglv1985 <[email protected]> Co-authored-by: Cheng Pan <[email protected]> Signed-off-by: Cheng Pan <[email protected]> (cherry picked from commit 7c110b6) Signed-off-by: Cheng Pan <[email protected]>
Thanks, merged to master/1.10.2/1.9.4 |
Why are the changes needed?
Issue reference:
#6912
How to reproduce the issue?
The changes in this PR will avoid a wrong result when generating the instance of org.apache.kyuubi.plugin.lineage.Lineage, in the certain case as follows:
step 1: create a temporary view from a file
step 2: insert into a table by selecting from the temporary view in step 1
step 3: generate the lineage when executing the insert statement in step 2
In detail, please see the UT code submission in this patch.
The issue analysis
Let's see the current code when getting the Lineage object by resolving a LogicalPlan object:

According to the above logic, a None org.apache.kyuubi.plugin.lineage.Lineage object will be generated due to "try-catch" self-protection, in this certain case. This None object will lead to problems in the following 2 scenes:
Unit Test Environment
In Unit Test, when the code runs here a "None.get" exception will be raised:

Here's the runtime exception stack:
Production Environment
This Lineage object cannot be used in the production environment because it has a None value which lacks some necessary lineage information. The right content of the Lineage instance in the above case should be:
a newly added test case(test directory to table) passed after this issue is fixed.
How to fix the issue?
Add a "Empty judgment" logic. In detail, please see the code submission in this patch.
How was this patch tested?
Was this patch authored or co-authored using generative AI tooling?
No