-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix issue 1222 #1225
Fix issue 1222 #1225
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Find a way to not use asInstanceOf
.filter(col => col.isInstanceOf[Alias]) | ||
.map(col => col.asInstanceOf[Alias]) | ||
.filter(alias => alias.child.isInstanceOf[Id]) // TODO do we need to support more than that ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should avoid isInstanceOf in favor of match
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd agree if we cared about more than 1 type, but in this case it would make the code much more complex without a clear benefit ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The benefit is not using asInstanceOf ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how is that a clear benefit ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
( the style guide doesn't say anything on this topic..)
core/src/main/scala/com/databricks/labs/remorph/parsers/snowflake/rules/DealiasLCAFilter.scala
Outdated
Show resolved
Hide resolved
.collect { case a: Alias => a } | ||
.filter(alias => !alias.child.isInstanceOf[Literal]) | ||
.map(alias => alias.name.id -> alias.child) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.collect { case a: Alias => a } | |
.filter(alias => !alias.child.isInstanceOf[Literal]) | |
.map(alias => alias.name.id -> alias.child) | |
.collect { case Alias(e, name) if !e.isInstanceOf[Literal] => name.id -> e } |
further distillation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
if (alias.isEmpty) { | ||
item | ||
} else { | ||
val replacement = alias.get._2 | ||
item transform { | ||
case name: Name => name.makeCopy(Array(replacement)) | ||
case id: Id => id.makeCopy(Array(replacement.asInstanceOf[AnyRef], id.caseSensitive.asInstanceOf[AnyRef])) | ||
} | ||
alias.get._2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The whole function could become:
Option(item).collect{
case Name(name) => name
case Id(id, _) => id
}
.flatMap(aliases.get)
.getOrElse(item)
or
val key = item match {
case Name(name) => name
case Id(id, _) => name
case _ => null
}
aliases.getOrElse(key, item)
The former being slightly preferable, even though it is unlikely that aliases
would have a entry with a null
key.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the pattern of calling transform
and then checking the result with eq
against the original value before calling makeCopy
if there was a change is a premature optimization that unnecessarily hurts readability. Even more so if we consider that transform
already performs a similar optimization on the descendants of the transformed tree.
As an example, dealiasProject
would be more readable as:
private def dealiasProject(project: Project, filter: Filter, aliases: Map[String, Expression]): Project = project.copy(input = dealiasFilter(filter, aliases))
at the expense of a single object allocation.
Moreover, as it relies on reflexion under the hood, makeCopy
is prone to silently break if the order/types of constructor parameters gets changed in the future (silently = at runtime). If we insist on using makeCopy
here, we should make ensure 100% test coverage to prevent such silent breaking from happening.
It wouldn't hurt to add a test where aliases appear deeply nested in an arbitrarily contrived expression involving boolean operators (including |
Open for discussion, but the expense is much higher than a single object allocation since you'd scan and transform a lot of stuff before discovering that you haven't changed anything (and might have lost information along the way). |
If you look into |
I prefer |
It would be interesting to measure that cost |
extends Unary(expr) | ||
extends Unary(expr) { | ||
|
||
override def makeCopy(newArgs: Array[AnyRef]): Expression = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why we need overriding makeCopy
here, is there a test somewhere showcasing the problem/solution ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TreeNode.makeCopy
docs say:
Must be overridden by child classes that have
* constructor arguments that are not present in the productIterator.
I don't know exactly why only 1 arg of the Cast
ctor is seen by the productIterator
(maybe because they are not val
s ?), but it's what happens.
This is tested via existing test cases in SnowflakeToDatabricksTranspilerTest
, which crash without this bug fix.
SELECT | ||
ca_zip | ||
FROM | ||
SELECT | ||
SUBSTR(ca_zip,1,5) AS ca_zip, | ||
TRIM(name) AS name, | ||
COUNT(*) OVER ( | ||
PARTITION BY | ||
SUBSTR(ca_zip,1,5) | ||
) | ||
FROM | ||
customer_address | ||
WHERE | ||
SUBSTR(ca_zip,1,5) IN ('89436', '30868'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be syntactically incorrect on Databricks (probably missing parenthesis around the subquery).
Also, the aliased name should be different from the columns they alias for, for the test to be really significant. (ca_zip
in the WHERE clause could refer to customer_address.ca_zip
, which isn't what we want)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The parenthesis issue is fixed in PR #1232.
This test is about homonyms, not subqueries, so I'm afraid I disagree with your comment re names.
Superseded by #1242 |
Progresses #1222 by implementing
DealiasLCAs
, a transformation rule that replaces LCAs by the underlying alias expression in project filters (where
clause of aselect
statement)