Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Lacking "The judgment to Empty" introduces a "wrong value of a Spark lineage object" issue #6912

Closed
3 of 4 tasks
xglv1985 opened this issue Feb 10, 2025 · 1 comment
Closed
3 of 4 tasks
Labels
kind:bug This is a clearly a bug priority:major

Comments

@xglv1985
Copy link

xglv1985 commented Feb 10, 2025

Code of Conduct

Search before asking

  • I have searched in the issues and found no similar issues.

Describe the bug

How to reproduce the issue?

The changes in this PR will avoid a wrong result when generating the instance of org.apache.kyuubi.plugin.lineage.Lineage, in the certain case as follows:
step 1: create a temporary view from a file
CREATE OR REPLACE TEMPORARY VIEW temp_view ( aSTRING COMMENT '',b STRING COMMENT '' ) USING csv OPTIONS( sep='\t', path='${sourceFile.path}' );

step 2: insert into a table by selecting from the temporary view in step 1
insert overwrite table test_db.test_table_from_dir SELECT a, b FROM temp_view

step 3: generate the lineage when executing the insert statement in step 2

Then a None object of org.apache.kyuubi.plugin.lineage.Lineage will be generated. However, the correct value of it should be:
inputTables(List()) outputTables(List(spark_catalog.test_db.test_table_from_dir)) columnLineage(List(ColumnLineage(spark_catalog.test_db.test_table_from_dir.a0,Set()), ColumnLineage(spark_catalog.test_db.test_table_from_dir.b0,Set())))

How is the issue introduced?

Let's see the current code when getting the Lineage object by resolving a LogicalPlan object:
image

According to the above logic, a None org.apache.kyuubi.plugin.lineage.Lineage object will be generated due to "try-recover" self-protection, in this certain case.

The consequence of this bug

Unit Test Environment

In Unit Test, when the code runs here a "None.get" exception will be raised:
image

Here's the runtime exception stack:
None.get java.util.NoSuchElementException: None.get at scala.None$.get(Option.scala:529) at scala.None$.get(Option.scala:527) at org.apache.kyuubi.plugin.lineage.helper.SparkSQLLineageParserHelperSuite.extractLineageWithoutExecuting(SparkSQLLineageParserHelperSuite.scala:1485) at org.apache.kyuubi.plugin.lineage.helper.SparkSQLLineageParserHelperSuite.$anonfun$new$83(SparkSQLLineageParserHelperSuite.scala:1465)

Production Environment

This Lineage object cannot be used in the production environment because it has a None value which lacks necessary lineage information.

Affects Version(s)

all versions

Kyuubi Server Log Output

unrelated

Kyuubi Engine Log Output

unrelated

Kyuubi Server Configurations

unrelated

Kyuubi Engine Configurations

unrelated

Additional context

I have already proposed a PR to this issue in advance, please see:
#6911

Are you willing to submit PR?

  • Yes. I would be willing to submit a PR with guidance from the Kyuubi community to fix.
  • No. I cannot submit a PR at this time.
@xglv1985 xglv1985 added kind:bug This is a clearly a bug priority:major labels Feb 10, 2025
Copy link

Hello @xglv1985,
Thanks for finding the time to report the issue!
We really appreciate the community's efforts to improve Apache Kyuubi.

pan3793 added a commit that referenced this issue Feb 14, 2025
…elationColumnLineage

# Why are the changes needed?
## Issue reference:
#6912

## How to reproduce the issue?
The changes in this PR will avoid a wrong result when generating the instance of org.apache.kyuubi.plugin.lineage.Lineage, in the certain case as follows:
step 1: create a temporary view from a file
step 2: insert into a table by selecting from the temporary view in step 1
step 3: generate the lineage when executing the insert statement in step 2
In detail, please see the UT code submission in this patch.

## The issue analysis
Let's see the current code when getting the Lineage object by resolving a LogicalPlan object:
<img width="694" alt="image" src="https://github.com/user-attachments/assets/65256a0d-320d-4271-968f-59eafb74de9f" />

According to the above logic, a None org.apache.kyuubi.plugin.lineage.Lineage object will be generated due to "try-catch" self-protection, in this certain case. This None object will lead to problems in the following 2 scenes:
### Unit Test Environment
In Unit Test, when the code runs here a "None.get" exception will be raised:
<img width="682" alt="image" src="https://github.com/user-attachments/assets/102dc9bd-294f-4b1e-b1c6-01b6fee50fed" />

Here's the runtime exception stack:
```
None.get
java.util.NoSuchElementException: None.get
	at scala.None$.get(Option.scala:529)
	at scala.None$.get(Option.scala:527)
	at org.apache.kyuubi.plugin.lineage.helper.SparkSQLLineageParserHelperSuite.extractLineageWithoutExecuting(SparkSQLLineageParserHelperSuite.scala:1485)
	at org.apache.kyuubi.plugin.lineage.helper.SparkSQLLineageParserHelperSuite.$anonfun$new$83(SparkSQLLineageParserHelperSuite.scala:1465)
```
### Production Environment
This Lineage object cannot be used in the production environment because it has a None value which lacks some necessary lineage information. The right content of the Lineage instance in the above case should be:
```
inputTables(List())
outputTables(List(spark_catalog.test_db.test_table_from_dir))
columnLineage(List(ColumnLineage(spark_catalog.test_db.test_table_from_dir.a0,Set()), ColumnLineage(spark_catalog.test_db.test_table_from_dir.b0,Set())))
```

a newly added test case(test directory to table) passed after this issue is fixed.

# How to fix the issue?
Add a "Empty judgment" logic. In detail, please see the code submission in this patch.

# How was this patch tested?
1. by adding a new test case in UT code and make sure it passes
2. by submitting a Spark application including the SQL of this case in the production environment, and make sure a right Lineage instance is generated, instead of a None object

# Was this patch authored or co-authored using generative AI tooling?
No

Closes #6911 from xglv1985/fix_spark_lineage_runtime_exception.

Closes #6912

13a7107 [Cheng Pan] Update extensions/spark/kyuubi-spark-lineage/src/test/scala/org/apache/kyuubi/plugin/lineage/helper/SparkSQLLineageParserHelperSuite.scala
4e89b95 [Cheng Pan] Update extensions/spark/kyuubi-spark-lineage/src/test/scala/org/apache/kyuubi/plugin/lineage/helper/SparkSQLLineageParserHelperSuite.scala
59b350b [xglv1985] fix a runtime exception when generate column lineage tuple--more readable code
52bc028 [xglv1985] fix a runtime exception when generate column lineage tuple--spotless sytle
fea6bbc [xglv1985] fix a runtime exception when generate column lineage tuple--remove tab from UT code
9018790 [xglv1985] fix a runtime exception when generate column lineage tuple--unit test
fbb4df8 [xglv1985] fix a runtime exception when generate column lineage tuple

Lead-authored-by: xglv1985 <[email protected]>
Co-authored-by: Cheng Pan <[email protected]>
Signed-off-by: Cheng Pan <[email protected]>
(cherry picked from commit 7c110b6)
Signed-off-by: Cheng Pan <[email protected]>
pan3793 added a commit that referenced this issue Feb 14, 2025
…elationColumnLineage

# Why are the changes needed?
## Issue reference:
#6912

## How to reproduce the issue?
The changes in this PR will avoid a wrong result when generating the instance of org.apache.kyuubi.plugin.lineage.Lineage, in the certain case as follows:
step 1: create a temporary view from a file
step 2: insert into a table by selecting from the temporary view in step 1
step 3: generate the lineage when executing the insert statement in step 2
In detail, please see the UT code submission in this patch.

## The issue analysis
Let's see the current code when getting the Lineage object by resolving a LogicalPlan object:
<img width="694" alt="image" src="https://github.com/user-attachments/assets/65256a0d-320d-4271-968f-59eafb74de9f" />

According to the above logic, a None org.apache.kyuubi.plugin.lineage.Lineage object will be generated due to "try-catch" self-protection, in this certain case. This None object will lead to problems in the following 2 scenes:
### Unit Test Environment
In Unit Test, when the code runs here a "None.get" exception will be raised:
<img width="682" alt="image" src="https://github.com/user-attachments/assets/102dc9bd-294f-4b1e-b1c6-01b6fee50fed" />

Here's the runtime exception stack:
```
None.get
java.util.NoSuchElementException: None.get
	at scala.None$.get(Option.scala:529)
	at scala.None$.get(Option.scala:527)
	at org.apache.kyuubi.plugin.lineage.helper.SparkSQLLineageParserHelperSuite.extractLineageWithoutExecuting(SparkSQLLineageParserHelperSuite.scala:1485)
	at org.apache.kyuubi.plugin.lineage.helper.SparkSQLLineageParserHelperSuite.$anonfun$new$83(SparkSQLLineageParserHelperSuite.scala:1465)
```
### Production Environment
This Lineage object cannot be used in the production environment because it has a None value which lacks some necessary lineage information. The right content of the Lineage instance in the above case should be:
```
inputTables(List())
outputTables(List(spark_catalog.test_db.test_table_from_dir))
columnLineage(List(ColumnLineage(spark_catalog.test_db.test_table_from_dir.a0,Set()), ColumnLineage(spark_catalog.test_db.test_table_from_dir.b0,Set())))
```

a newly added test case(test directory to table) passed after this issue is fixed.

# How to fix the issue?
Add a "Empty judgment" logic. In detail, please see the code submission in this patch.

# How was this patch tested?
1. by adding a new test case in UT code and make sure it passes
2. by submitting a Spark application including the SQL of this case in the production environment, and make sure a right Lineage instance is generated, instead of a None object

# Was this patch authored or co-authored using generative AI tooling?
No

Closes #6911 from xglv1985/fix_spark_lineage_runtime_exception.

Closes #6912

13a7107 [Cheng Pan] Update extensions/spark/kyuubi-spark-lineage/src/test/scala/org/apache/kyuubi/plugin/lineage/helper/SparkSQLLineageParserHelperSuite.scala
4e89b95 [Cheng Pan] Update extensions/spark/kyuubi-spark-lineage/src/test/scala/org/apache/kyuubi/plugin/lineage/helper/SparkSQLLineageParserHelperSuite.scala
59b350b [xglv1985] fix a runtime exception when generate column lineage tuple--more readable code
52bc028 [xglv1985] fix a runtime exception when generate column lineage tuple--spotless sytle
fea6bbc [xglv1985] fix a runtime exception when generate column lineage tuple--remove tab from UT code
9018790 [xglv1985] fix a runtime exception when generate column lineage tuple--unit test
fbb4df8 [xglv1985] fix a runtime exception when generate column lineage tuple

Lead-authored-by: xglv1985 <[email protected]>
Co-authored-by: Cheng Pan <[email protected]>
Signed-off-by: Cheng Pan <[email protected]>
(cherry picked from commit 7c110b6)
Signed-off-by: Cheng Pan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind:bug This is a clearly a bug priority:major
Projects
None yet
Development

No branches or pull requests

1 participant