Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix issue 1222 #1225

Closed
wants to merge 35 commits into from
Closed

Fix issue 1222 #1225

wants to merge 35 commits into from

Conversation

ericvergnaud
Copy link
Contributor

@ericvergnaud ericvergnaud commented Nov 20, 2024

Progresses #1222 by implementing DealiasLCAs, a transformation rule that replaces LCAs by the underlying alias expression in project filters (where clause of a select statement)

@ericvergnaud ericvergnaud requested a review from a team as a code owner November 20, 2024 13:20
Copy link

github-actions bot commented Nov 20, 2024

Coverage tests results

462 tests  +6   424 ✅ +5   4s ⏱️ ±0s
  7 suites +1    38 💤 +1 
  7 files   +1     0 ❌ ±0 

Results for commit c247036. ± Comparison against base commit 9dcc986.

♻️ This comment has been updated with latest results.

Copy link
Contributor

@jimidle jimidle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Find a way to not use asInstanceOf

Comment on lines 20 to 22
.filter(col => col.isInstanceOf[Alias])
.map(col => col.asInstanceOf[Alias])
.filter(alias => alias.child.isInstanceOf[Id]) // TODO do we need to support more than that ?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should avoid isInstanceOf in favor of match

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd agree if we cared about more than 1 type, but in this case it would make the code much more complex without a clear benefit ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The benefit is not using asInstanceOf ;)

Copy link
Contributor Author

@ericvergnaud ericvergnaud Nov 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how is that a clear benefit ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

( the style guide doesn't say anything on this topic..)

Comment on lines 20 to 22
.collect { case a: Alias => a }
.filter(alias => !alias.child.isInstanceOf[Literal])
.map(alias => alias.name.id -> alias.child)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.collect { case a: Alias => a }
.filter(alias => !alias.child.isInstanceOf[Literal])
.map(alias => alias.name.id -> alias.child)
.collect { case Alias(e, name) if !e.isInstanceOf[Literal] => name.id -> e }

further distillation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines 81 to 84
if (alias.isEmpty) {
item
} else {
val replacement = alias.get._2
item transform {
case name: Name => name.makeCopy(Array(replacement))
case id: Id => id.makeCopy(Array(replacement.asInstanceOf[AnyRef], id.caseSensitive.asInstanceOf[AnyRef]))
}
alias.get._2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The whole function could become:

Option(item).collect{
  case Name(name) => name
  case Id(id, _) => id
}
.flatMap(aliases.get)
.getOrElse(item)

or

val key = item match {
  case Name(name) => name
  case Id(id, _) => name
  case _ => null
}
aliases.getOrElse(key, item)

The former being slightly preferable, even though it is unlikely that aliases would have a entry with a null key.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

outdated

Copy link
Contributor

@vil1 vil1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the pattern of calling transform and then checking the result with eq against the original value before calling makeCopy if there was a change is a premature optimization that unnecessarily hurts readability. Even more so if we consider that transform already performs a similar optimization on the descendants of the transformed tree.

As an example, dealiasProject would be more readable as:

private def dealiasProject(project: Project, filter: Filter, aliases: Map[String, Expression]): Project = project.copy(input = dealiasFilter(filter, aliases))

at the expense of a single object allocation.

Moreover, as it relies on reflexion under the hood, makeCopy is prone to silently break if the order/types of constructor parameters gets changed in the future (silently = at runtime). If we insist on using makeCopy here, we should make ensure 100% test coverage to prevent such silent breaking from happening.

@vil1
Copy link
Contributor

vil1 commented Nov 21, 2024

It wouldn't hurt to add a test where aliases appear deeply nested in an arbitrarily contrived expression involving boolean operators (including NOT), function calls, etc.

@ericvergnaud
Copy link
Contributor Author

I think the pattern of calling transform and then checking the result with eq against the original value before calling makeCopy if there was a change is a premature optimization that unnecessarily hurts readability. Even more so if we consider that transform already performs a similar optimization on the descendants of the transformed tree.

As an example, dealiasProject would be more readable as:

private def dealiasProject(project: Project, filter: Filter, aliases: Map[String, Expression]): Project = project.copy(input = dealiasFilter(filter, aliases))

at the expense of a single object allocation.

Open for discussion, but the expense is much higher than a single object allocation since you'd scan and transform a lot of stuff before discovering that you haven't changed anything (and might have lost information along the way).

@vil1
Copy link
Contributor

vil1 commented Nov 22, 2024

Open for discussion, but the expense is much higher than a single object allocation since you'd scan and transform a lot of stuff before discovering that you haven't changed anything (and might have lost information along the way).

If you look into transform implementation, you'll see that it already implements such "copy avoiding" mechanism, so wrapping the result of transform in a new node (ie, calling copy in our Rules) is a marginal cost (that doesn't justify hurting readability/future refactoring IMO).

@ericvergnaud
Copy link
Contributor Author

Moreover, as it relies on reflexion under the hood, makeCopy is prone to silently break if the order/types of constructor parameters gets changed in the future (silently = at runtime). If we insist on using makeCopy here, we should make ensure 100% test coverage to prevent such silent breaking from happening.

I prefer makeCopy for now because as we've seen, the constructor doesn't convey origin, comments etc... so I fear information would be lost

@ericvergnaud
Copy link
Contributor Author

Open for discussion, but the expense is much higher than a single object allocation since you'd scan and transform a lot of stuff before discovering that you haven't changed anything (and might have lost information along the way).

If you look into transform implementation, you'll see that it already implements such "copy avoiding" mechanism, so wrapping the result of transform in a new node (ie, calling copy in our Rules) is a marginal cost (that doesn't justify hurting readability/future refactoring IMO).

It would be interesting to measure that cost

@ericvergnaud ericvergnaud mentioned this pull request Nov 23, 2024
@ericvergnaud ericvergnaud self-assigned this Nov 25, 2024
extends Unary(expr)
extends Unary(expr) {

override def makeCopy(newArgs: Array[AnyRef]): Expression = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why we need overriding makeCopy here, is there a test somewhere showcasing the problem/solution ?

Copy link
Contributor Author

@ericvergnaud ericvergnaud Nov 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TreeNode.makeCopy docs say:

Must be overridden by child classes that have
   * constructor arguments that are not present in the productIterator.

I don't know exactly why only 1 arg of the Cast ctor is seen by the productIterator (maybe because they are not vals ?), but it's what happens.

This is tested via existing test cases in SnowflakeToDatabricksTranspilerTest, which crash without this bug fix.

Comment on lines +11 to +24
SELECT
ca_zip
FROM
SELECT
SUBSTR(ca_zip,1,5) AS ca_zip,
TRIM(name) AS name,
COUNT(*) OVER (
PARTITION BY
SUBSTR(ca_zip,1,5)
)
FROM
customer_address
WHERE
SUBSTR(ca_zip,1,5) IN ('89436', '30868');
Copy link
Contributor

@vil1 vil1 Nov 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be syntactically incorrect on Databricks (probably missing parenthesis around the subquery).
Also, the aliased name should be different from the columns they alias for, for the test to be really significant. (ca_zip in the WHERE clause could refer to customer_address.ca_zip, which isn't what we want)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parenthesis issue is fixed in PR #1232.
This test is about homonyms, not subqueries, so I'm afraid I disagree with your comment re names.

@ericvergnaud
Copy link
Contributor Author

ericvergnaud commented Nov 26, 2024

Superseded by #1242

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants