Skip to content

Commit

Permalink
[SPARK-49275][SQL] Fix return type nullness of the xpath expression
Browse files Browse the repository at this point in the history
### What changes were proposed in this pull request?

The `xpath` expression incorrectly marks its return type as array of non-null strings. However, it can actually return an array containing nulls. This can cause NPE in code generation, such as query `select coalesce(xpath(repeat('<a></a>', id), 'a')[0], '') from range(1, 2)`.

### Why are the changes needed?

It avoids potential failures in queries that uses the `xpath` expression.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

A new unit test. It would fail without the change in the PR.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#47796 from chenhao-db/fix_xpath_nullness.

Authored-by: Chenhao Li <[email protected]>
Signed-off-by: Max Gekk <[email protected]>
  • Loading branch information
chenhao-db authored and MaxGekk committed Sep 2, 2024
1 parent c274c5a commit ec7570e
Show file tree
Hide file tree
Showing 2 changed files with 8 additions and 1 deletion.
Original file line number Diff line number Diff line change
Expand Up @@ -242,13 +242,15 @@ case class XPathString(xml: Expression, path: Expression) extends XPathExtract {
Examples:
> SELECT _FUNC_('<a><b>b1</b><b>b2</b><b>b3</b><c>c1</c><c>c2</c></a>','a/b/text()');
["b1","b2","b3"]
> SELECT _FUNC_('<a><b>b1</b><b>b2</b><b>b3</b><c>c1</c><c>c2</c></a>','a/b');
[null,null,null]
""",
since = "2.0.0",
group = "xml_funcs")
// scalastyle:on line.size.limit
case class XPathList(xml: Expression, path: Expression) extends XPathExtract {
override def prettyName: String = "xpath"
override def dataType: DataType = ArrayType(SQLConf.get.defaultStringType, containsNull = false)
override def dataType: DataType = ArrayType(SQLConf.get.defaultStringType)

override def nullSafeEval(xml: Any, path: Any): Any = {
val nodeList = xpathUtil.evalNodeList(xml.asInstanceOf[UTF8String].toString, pathString)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -185,6 +185,11 @@ class XPathExpressionSuite extends SparkFunSuite with ExpressionEvalHelper {
testExpr("<a><b class='bb'>b1</b><b>b2</b><b>b3</b><c class='bb'>c1</c><c>c2</c></a>",
"a/*[@class='bb']/text()", Seq("b1", "c1"))

checkEvaluation(
Coalesce(Seq(
GetArrayItem(XPathList(Literal("<a></a>"), Literal("a")), Literal(0)),
Literal("nul"))), "nul")

testNullAndErrorBehavior(testExpr)
}

Expand Down

0 comments on commit ec7570e

Please sign in to comment.