-
Notifications
You must be signed in to change notification settings - Fork 932
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] Lacking "The judgment to Empty" introduces a "wrong value of a Spark lineage object" issue #6912
Closed
3 of 4 tasks
Labels
Comments
Hello @xglv1985, |
pan3793
added a commit
that referenced
this issue
Feb 14, 2025
…elationColumnLineage # Why are the changes needed? ## Issue reference: #6912 ## How to reproduce the issue? The changes in this PR will avoid a wrong result when generating the instance of org.apache.kyuubi.plugin.lineage.Lineage, in the certain case as follows: step 1: create a temporary view from a file step 2: insert into a table by selecting from the temporary view in step 1 step 3: generate the lineage when executing the insert statement in step 2 In detail, please see the UT code submission in this patch. ## The issue analysis Let's see the current code when getting the Lineage object by resolving a LogicalPlan object: <img width="694" alt="image" src="https://github.com/user-attachments/assets/65256a0d-320d-4271-968f-59eafb74de9f" /> According to the above logic, a None org.apache.kyuubi.plugin.lineage.Lineage object will be generated due to "try-catch" self-protection, in this certain case. This None object will lead to problems in the following 2 scenes: ### Unit Test Environment In Unit Test, when the code runs here a "None.get" exception will be raised: <img width="682" alt="image" src="https://github.com/user-attachments/assets/102dc9bd-294f-4b1e-b1c6-01b6fee50fed" /> Here's the runtime exception stack: ``` None.get java.util.NoSuchElementException: None.get at scala.None$.get(Option.scala:529) at scala.None$.get(Option.scala:527) at org.apache.kyuubi.plugin.lineage.helper.SparkSQLLineageParserHelperSuite.extractLineageWithoutExecuting(SparkSQLLineageParserHelperSuite.scala:1485) at org.apache.kyuubi.plugin.lineage.helper.SparkSQLLineageParserHelperSuite.$anonfun$new$83(SparkSQLLineageParserHelperSuite.scala:1465) ``` ### Production Environment This Lineage object cannot be used in the production environment because it has a None value which lacks some necessary lineage information. The right content of the Lineage instance in the above case should be: ``` inputTables(List()) outputTables(List(spark_catalog.test_db.test_table_from_dir)) columnLineage(List(ColumnLineage(spark_catalog.test_db.test_table_from_dir.a0,Set()), ColumnLineage(spark_catalog.test_db.test_table_from_dir.b0,Set()))) ``` a newly added test case(test directory to table) passed after this issue is fixed. # How to fix the issue? Add a "Empty judgment" logic. In detail, please see the code submission in this patch. # How was this patch tested? 1. by adding a new test case in UT code and make sure it passes 2. by submitting a Spark application including the SQL of this case in the production environment, and make sure a right Lineage instance is generated, instead of a None object # Was this patch authored or co-authored using generative AI tooling? No Closes #6911 from xglv1985/fix_spark_lineage_runtime_exception. Closes #6912 13a7107 [Cheng Pan] Update extensions/spark/kyuubi-spark-lineage/src/test/scala/org/apache/kyuubi/plugin/lineage/helper/SparkSQLLineageParserHelperSuite.scala 4e89b95 [Cheng Pan] Update extensions/spark/kyuubi-spark-lineage/src/test/scala/org/apache/kyuubi/plugin/lineage/helper/SparkSQLLineageParserHelperSuite.scala 59b350b [xglv1985] fix a runtime exception when generate column lineage tuple--more readable code 52bc028 [xglv1985] fix a runtime exception when generate column lineage tuple--spotless sytle fea6bbc [xglv1985] fix a runtime exception when generate column lineage tuple--remove tab from UT code 9018790 [xglv1985] fix a runtime exception when generate column lineage tuple--unit test fbb4df8 [xglv1985] fix a runtime exception when generate column lineage tuple Lead-authored-by: xglv1985 <[email protected]> Co-authored-by: Cheng Pan <[email protected]> Signed-off-by: Cheng Pan <[email protected]> (cherry picked from commit 7c110b6) Signed-off-by: Cheng Pan <[email protected]>
pan3793
added a commit
that referenced
this issue
Feb 14, 2025
…elationColumnLineage # Why are the changes needed? ## Issue reference: #6912 ## How to reproduce the issue? The changes in this PR will avoid a wrong result when generating the instance of org.apache.kyuubi.plugin.lineage.Lineage, in the certain case as follows: step 1: create a temporary view from a file step 2: insert into a table by selecting from the temporary view in step 1 step 3: generate the lineage when executing the insert statement in step 2 In detail, please see the UT code submission in this patch. ## The issue analysis Let's see the current code when getting the Lineage object by resolving a LogicalPlan object: <img width="694" alt="image" src="https://github.com/user-attachments/assets/65256a0d-320d-4271-968f-59eafb74de9f" /> According to the above logic, a None org.apache.kyuubi.plugin.lineage.Lineage object will be generated due to "try-catch" self-protection, in this certain case. This None object will lead to problems in the following 2 scenes: ### Unit Test Environment In Unit Test, when the code runs here a "None.get" exception will be raised: <img width="682" alt="image" src="https://github.com/user-attachments/assets/102dc9bd-294f-4b1e-b1c6-01b6fee50fed" /> Here's the runtime exception stack: ``` None.get java.util.NoSuchElementException: None.get at scala.None$.get(Option.scala:529) at scala.None$.get(Option.scala:527) at org.apache.kyuubi.plugin.lineage.helper.SparkSQLLineageParserHelperSuite.extractLineageWithoutExecuting(SparkSQLLineageParserHelperSuite.scala:1485) at org.apache.kyuubi.plugin.lineage.helper.SparkSQLLineageParserHelperSuite.$anonfun$new$83(SparkSQLLineageParserHelperSuite.scala:1465) ``` ### Production Environment This Lineage object cannot be used in the production environment because it has a None value which lacks some necessary lineage information. The right content of the Lineage instance in the above case should be: ``` inputTables(List()) outputTables(List(spark_catalog.test_db.test_table_from_dir)) columnLineage(List(ColumnLineage(spark_catalog.test_db.test_table_from_dir.a0,Set()), ColumnLineage(spark_catalog.test_db.test_table_from_dir.b0,Set()))) ``` a newly added test case(test directory to table) passed after this issue is fixed. # How to fix the issue? Add a "Empty judgment" logic. In detail, please see the code submission in this patch. # How was this patch tested? 1. by adding a new test case in UT code and make sure it passes 2. by submitting a Spark application including the SQL of this case in the production environment, and make sure a right Lineage instance is generated, instead of a None object # Was this patch authored or co-authored using generative AI tooling? No Closes #6911 from xglv1985/fix_spark_lineage_runtime_exception. Closes #6912 13a7107 [Cheng Pan] Update extensions/spark/kyuubi-spark-lineage/src/test/scala/org/apache/kyuubi/plugin/lineage/helper/SparkSQLLineageParserHelperSuite.scala 4e89b95 [Cheng Pan] Update extensions/spark/kyuubi-spark-lineage/src/test/scala/org/apache/kyuubi/plugin/lineage/helper/SparkSQLLineageParserHelperSuite.scala 59b350b [xglv1985] fix a runtime exception when generate column lineage tuple--more readable code 52bc028 [xglv1985] fix a runtime exception when generate column lineage tuple--spotless sytle fea6bbc [xglv1985] fix a runtime exception when generate column lineage tuple--remove tab from UT code 9018790 [xglv1985] fix a runtime exception when generate column lineage tuple--unit test fbb4df8 [xglv1985] fix a runtime exception when generate column lineage tuple Lead-authored-by: xglv1985 <[email protected]> Co-authored-by: Cheng Pan <[email protected]> Signed-off-by: Cheng Pan <[email protected]> (cherry picked from commit 7c110b6) Signed-off-by: Cheng Pan <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Code of Conduct
Search before asking
Describe the bug
How to reproduce the issue?
The changes in this PR will avoid a wrong result when generating the instance of org.apache.kyuubi.plugin.lineage.Lineage, in the certain case as follows:
step 1: create a temporary view from a file
CREATE OR REPLACE TEMPORARY VIEW temp_view (
aSTRING COMMENT '',
bSTRING COMMENT '' ) USING csv OPTIONS( sep='\t', path='${sourceFile.path}' );
step 2: insert into a table by selecting from the temporary view in step 1
insert overwrite table test_db.test_table_from_dir SELECT
a,
bFROM temp_view
step 3: generate the lineage when executing the insert statement in step 2
Then a None object of org.apache.kyuubi.plugin.lineage.Lineage will be generated. However, the correct value of it should be:
inputTables(List()) outputTables(List(spark_catalog.test_db.test_table_from_dir)) columnLineage(List(ColumnLineage(spark_catalog.test_db.test_table_from_dir.a0,Set()), ColumnLineage(spark_catalog.test_db.test_table_from_dir.b0,Set())))
How is the issue introduced?
Let's see the current code when getting the Lineage object by resolving a LogicalPlan object:

According to the above logic, a None org.apache.kyuubi.plugin.lineage.Lineage object will be generated due to "try-recover" self-protection, in this certain case.
The consequence of this bug
Unit Test Environment
In Unit Test, when the code runs here a "None.get" exception will be raised:

Here's the runtime exception stack:
None.get java.util.NoSuchElementException: None.get at scala.None$.get(Option.scala:529) at scala.None$.get(Option.scala:527) at org.apache.kyuubi.plugin.lineage.helper.SparkSQLLineageParserHelperSuite.extractLineageWithoutExecuting(SparkSQLLineageParserHelperSuite.scala:1485) at org.apache.kyuubi.plugin.lineage.helper.SparkSQLLineageParserHelperSuite.$anonfun$new$83(SparkSQLLineageParserHelperSuite.scala:1465)
Production Environment
This Lineage object cannot be used in the production environment because it has a None value which lacks necessary lineage information.
Affects Version(s)
all versions
Kyuubi Server Log Output
Kyuubi Engine Log Output
Kyuubi Server Configurations
unrelated
Kyuubi Engine Configurations
unrelated
Additional context
I have already proposed a PR to this issue in advance, please see:
#6911
Are you willing to submit PR?
The text was updated successfully, but these errors were encountered: