Fix issue 1222 #1225

ericvergnaud · 2024-11-20T13:20:20Z

Progresses #1222 by implementing DealiasLCAs, a transformation rule that replaces LCAs by the underlying alias expression in project filters (where clause of a select statement)

github-actions · 2024-11-20T13:24:53Z

Coverage tests results

462 tests +6 424 ✅ +5 4s ⏱️ ±0s
7 suites +1 38 💤 +1
7 files +1 0 ❌ ±0

Results for commit c247036. ± Comparison against base commit 9dcc986.

♻️ This comment has been updated with latest results.

jimidle

Find a way to not use asInstanceOf

jimidle · 2024-11-20T14:26:24Z

core/src/main/scala/com/databricks/labs/remorph/parsers/snowflake/rules/DealiasLCAFilter.scala

+      .filter(col => col.isInstanceOf[Alias])
+      .map(col => col.asInstanceOf[Alias])
+      .filter(alias => alias.child.isInstanceOf[Id]) // TODO do we need to support more than that ?


We should avoid isInstanceOf in favor of match

I'd agree if we cared about more than 1 type, but in this case it would make the code much more complex without a clear benefit ?

The benefit is not using asInstanceOf ;)

how is that a clear benefit ?

( the style guide doesn't say anything on this topic..)

core/src/main/scala/com/databricks/labs/remorph/parsers/snowflake/rules/DealiasLCAFilter.scala

vil1 · 2024-11-21T08:09:51Z

core/src/main/scala/com/databricks/labs/remorph/parsers/snowflake/rules/DealiasLCAFilter.scala

+      .collect { case a: Alias => a }
+      .filter(alias => !alias.child.isInstanceOf[Literal])
+      .map(alias => alias.name.id -> alias.child)


Suggested change

.collect { case a: Alias => a }

.filter(alias => !alias.child.isInstanceOf[Literal])

.map(alias => alias.name.id -> alias.child)

.collect { case Alias(e, name) if !e.isInstanceOf[Literal] => name.id -> e }

further distillation

vil1 · 2024-11-21T08:35:56Z

core/src/main/scala/com/databricks/labs/remorph/parsers/snowflake/rules/DealiasLCAFilter.scala

      if (alias.isEmpty) {
        item
      } else {
-        val replacement = alias.get._2
-        item transform {
-          case name: Name => name.makeCopy(Array(replacement))
-          case id: Id => id.makeCopy(Array(replacement.asInstanceOf[AnyRef], id.caseSensitive.asInstanceOf[AnyRef]))
-        }
+        alias.get._2


The whole function could become:

Option(item).collect{ case Name(name) => name case Id(id, _) => id } .flatMap(aliases.get) .getOrElse(item)

or

val key = item match { case Name(name) => name case Id(id, _) => name case _ => null } aliases.getOrElse(key, item)

The former being slightly preferable, even though it is unlikely that aliases would have a entry with a null key.

vil1

I think the pattern of calling transform and then checking the result with eq against the original value before calling makeCopy if there was a change is a premature optimization that unnecessarily hurts readability. Even more so if we consider that transform already performs a similar optimization on the descendants of the transformed tree.

As an example, dealiasProject would be more readable as:

private def dealiasProject(project: Project, filter: Filter, aliases: Map[String, Expression]): Project = project.copy(input = dealiasFilter(filter, aliases))

at the expense of a single object allocation.

Moreover, as it relies on reflexion under the hood, makeCopy is prone to silently break if the order/types of constructor parameters gets changed in the future (silently = at runtime). If we insist on using makeCopy here, we should make ensure 100% test coverage to prevent such silent breaking from happening.

vil1 · 2024-11-21T10:26:09Z

It wouldn't hurt to add a test where aliases appear deeply nested in an arbitrarily contrived expression involving boolean operators (including NOT), function calls, etc.

ericvergnaud · 2024-11-22T09:46:39Z

I think the pattern of calling transform and then checking the result with eq against the original value before calling makeCopy if there was a change is a premature optimization that unnecessarily hurts readability. Even more so if we consider that transform already performs a similar optimization on the descendants of the transformed tree.

As an example, dealiasProject would be more readable as:
private def dealiasProject(project: Project, filter: Filter, aliases: Map[String, Expression]): Project = project.copy(input = dealiasFilter(filter, aliases))
at the expense of a single object allocation.

Open for discussion, but the expense is much higher than a single object allocation since you'd scan and transform a lot of stuff before discovering that you haven't changed anything (and might have lost information along the way).

vil1 · 2024-11-22T09:52:18Z

Open for discussion, but the expense is much higher than a single object allocation since you'd scan and transform a lot of stuff before discovering that you haven't changed anything (and might have lost information along the way).

If you look into transform implementation, you'll see that it already implements such "copy avoiding" mechanism, so wrapping the result of transform in a new node (ie, calling copy in our Rules) is a marginal cost (that doesn't justify hurting readability/future refactoring IMO).

ericvergnaud · 2024-11-22T10:13:11Z

Moreover, as it relies on reflexion under the hood, makeCopy is prone to silently break if the order/types of constructor parameters gets changed in the future (silently = at runtime). If we insist on using makeCopy here, we should make ensure 100% test coverage to prevent such silent breaking from happening.

I prefer makeCopy for now because as we've seen, the constructor doesn't convey origin, comments etc... so I fear information would be lost

ericvergnaud · 2024-11-22T10:28:23Z

Open for discussion, but the expense is much higher than a single object allocation since you'd scan and transform a lot of stuff before discovering that you haven't changed anything (and might have lost information along the way).

If you look into transform implementation, you'll see that it already implements such "copy avoiding" mechanism, so wrapping the result of transform in a new node (ie, calling copy in our Rules) is a marginal cost (that doesn't justify hurting readability/future refactoring IMO).

It would be interesting to measure that cost

vil1 · 2024-11-25T10:49:31Z

core/src/main/scala/com/databricks/labs/remorph/intermediate/expressions.scala

-    extends Unary(expr)
+    extends Unary(expr) {
+
+  override def makeCopy(newArgs: Array[AnyRef]): Expression = {


I don't understand why we need overriding makeCopy here, is there a test somewhere showcasing the problem/solution ?

TreeNode.makeCopy docs say:

Must be overridden by child classes that have * constructor arguments that are not present in the productIterator.

I don't know exactly why only 1 arg of the Cast ctor is seen by the productIterator (maybe because they are not vals ?), but it's what happens.

This is tested via existing test cases in SnowflakeToDatabricksTranspilerTest, which crash without this bug fix.

vil1 · 2024-11-25T14:15:40Z

tests/resources/functional/snowflake/core_engine/lca/lca_homonym.sql

+SELECT
+  ca_zip
+FROM
+SELECT
+  SUBSTR(ca_zip,1,5) AS ca_zip,
+  TRIM(name) AS name,
+  COUNT(*) OVER (
+    PARTITION BY
+      SUBSTR(ca_zip,1,5)
+  )
+FROM
+  customer_address
+WHERE
+  SUBSTR(ca_zip,1,5) IN ('89436', '30868');


This seems to be syntactically incorrect on Databricks (probably missing parenthesis around the subquery).
Also, the aliased name should be different from the columns they alias for, for the test to be really significant. (ca_zip in the WHERE clause could refer to customer_address.ca_zip, which isn't what we want)

The parenthesis issue is fixed in PR #1232.
This test is about homonyms, not subqueries, so I'm afraid I disagree with your comment re names.

ericvergnaud · 2024-11-26T11:10:08Z

Superseded by #1242

ericvergnaud added 2 commits November 20, 2024 14:14

add test

9cc752c

create and add transform rule

b54874f

ericvergnaud requested a review from a team as a code owner November 20, 2024 13:20

formatting

c38538e

simplify

3cf8a27

ericvergnaud requested a review from vil1 November 20, 2024 13:26

fix failing tests

c212701

jimidle requested changes Nov 20, 2024

View reviewed changes

vil1 reviewed Nov 20, 2024

View reviewed changes

core/src/main/scala/com/databricks/labs/remorph/parsers/snowflake/rules/DealiasLCAFilter.scala Outdated Show resolved Hide resolved

ericvergnaud added 16 commits November 20, 2024 15:55

address comment

6110676

formatting

4c739a8

add test

eebc66b

more tests

466cc07

more tests

aef09b2

more tests

19b5eed

fix failing tests

4d3b5ad

dealias partitions

e1f96f6

add support for SUBSTR

21c422e

add test

509d5a9

Merge branch 'support-snowflake-substr' into fix-issue-1222

f5e0793

formatting

a45be2a

rename for clarity

18daff5

more tests

e7e2129

formatting

464785e

Merge branch 'support-snowflake-substr' into fix-issue-1222

c51c6a3

vil1 reviewed Nov 21, 2024

View reviewed changes

vil1 requested changes Nov 21, 2024

View reviewed changes

address comments

7b3ba60

ericvergnaud added 2 commits November 22, 2024 11:17

remove equality checks

1d168a2

address comments

4a27801

ericvergnaud added 11 commits November 22, 2024 16:09

rename for clarity

a555a91

Add ability to debug a single acceptance test

08bbaf5

fix infinite recursion issue with homonyms

c79d414

explain

535d5e0

fix failing test by only comparing arity

c0b908c

Merge branch 'main' into support-snowflake-substr

93cdfd7

fix merge issue

cd45efb

Merge branch 'support-snowflake-substr' into fix-issue-1222

72395fc

formatting

ebb3465

fix failing tests

a484df8

formatting

c247036

ericvergnaud mentioned this pull request Nov 23, 2024

Fix issue 976 #1223

Closed

ericvergnaud self-assigned this Nov 25, 2024

vil1 reviewed Nov 25, 2024

View reviewed changes

ericvergnaud requested review from vil1 and jimidle November 25, 2024 15:08

This was referenced Nov 26, 2024

Add support for left column alias (LCA) to SnowFlake parser #1239

Closed

Add support for SnowFlake LCAs #1241

Closed

ericvergnaud closed this Nov 26, 2024

ericvergnaud mentioned this pull request Nov 26, 2024

Add support for SnowFlake LCAs #1242

Open

asnare mentioned this pull request Nov 29, 2024

Update TSQL grammar so that INTERSECT precedence is handled there #1259

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix issue 1222 #1225

Fix issue 1222 #1225

ericvergnaud commented Nov 20, 2024 •

edited

Loading

github-actions bot commented Nov 20, 2024 •

edited

Loading

jimidle left a comment

jimidle Nov 20, 2024

ericvergnaud Nov 20, 2024

jimidle Nov 20, 2024

ericvergnaud Nov 20, 2024 •

edited

Loading

ericvergnaud Nov 20, 2024

vil1 Nov 21, 2024

ericvergnaud Nov 22, 2024

vil1 Nov 21, 2024

ericvergnaud Nov 22, 2024

vil1 left a comment •

edited

Loading

vil1 commented Nov 21, 2024

ericvergnaud commented Nov 22, 2024

vil1 commented Nov 22, 2024

ericvergnaud commented Nov 22, 2024

ericvergnaud commented Nov 22, 2024

vil1 Nov 25, 2024

ericvergnaud Nov 25, 2024 •

edited

Loading

vil1 Nov 25, 2024 •

edited

Loading

ericvergnaud Nov 25, 2024

ericvergnaud commented Nov 26, 2024 •

edited

Loading

Fix issue 1222 #1225

Fix issue 1222 #1225

Conversation

ericvergnaud commented Nov 20, 2024 • edited Loading

github-actions bot commented Nov 20, 2024 • edited Loading

Coverage tests results

jimidle left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ericvergnaud Nov 20, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vil1 left a comment • edited Loading

Choose a reason for hiding this comment

vil1 commented Nov 21, 2024

ericvergnaud commented Nov 22, 2024

vil1 commented Nov 22, 2024

ericvergnaud commented Nov 22, 2024

ericvergnaud commented Nov 22, 2024

Choose a reason for hiding this comment

ericvergnaud Nov 25, 2024 • edited Loading

Choose a reason for hiding this comment

vil1 Nov 25, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ericvergnaud commented Nov 26, 2024 • edited Loading

ericvergnaud commented Nov 20, 2024 •

edited

Loading

github-actions bot commented Nov 20, 2024 •

edited

Loading

ericvergnaud Nov 20, 2024 •

edited

Loading

vil1 left a comment •

edited

Loading

ericvergnaud Nov 25, 2024 •

edited

Loading

vil1 Nov 25, 2024 •

edited

Loading

ericvergnaud commented Nov 26, 2024 •

edited

Loading