Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SQL] use inheritance to support different dialects #3046

Merged
merged 281 commits into from
Nov 22, 2024

Conversation

keith-hall
Copy link
Collaborator

@keith-hall keith-hall commented Sep 16, 2021

Initial work on supporting different SQL dialects using inheritance. Defaults to MySQL like before, but users can easily override the SQL.sublime-syntax to point to their preferred dialect. Idea originally proposed by SublimeHQ on the forum.

Fixes #1742, #1764, #2899, #3072, #3072, closes #2631

Note: breaking change:

  • the scope keyword.other.DML has been lowercased to keyword.other.dml, so color schemes may need a small change if they target this scope specifically

@jrappen
Copy link
Contributor

jrappen commented Sep 18, 2021

Thanks for working on this. I noticed a few things:

  • The indentation for context: types: match: ... is inconsistent with the rest of this PR.
  • You use
    captures:
      0: ...
      1: ...
    while some others use
    scope: ...
    captures:
      1: ...
    which naturally both works but I thought you might want to use one over the other for readability. But you knew that.
  • While requiring word borders via regex for correctly being able to scope words I believe the word scope should be only applied to the word without the borders.
  • AFAIK *.sublime-syntax does not define a comment key, although PackageDev marks them as such. Maybe use standard YAML comments?

@michaelblyons
Copy link
Collaborator

  • RE captures: 0:: Someone (not me) is very determined to convert all of these to scope:.

  • RE comment: mapping: I am pretty sure this is a product of the <comment> tag in tmLanguage that people used to use. To avoid dropping the comments, the sublime-syntax converter brought them over, and PD was kind enough to mark them as comments.

    I think we may as well convert to #-comments in the future as I mentioned in [SQL] Add support for build-in MariaDB/MySQL/PostgreSQL functions #2631, but the comment:-style ones were already there in the original.

Thanks for working on this.

Big +1.

@michaelblyons
Copy link
Collaborator

michaelblyons commented Sep 19, 2021

INSERT INTO my_table (foo, bar)
VALUES (1, 'two')

The table is scoped as a function.


I'm finding that the biggest thing I miss from my janky T-SQL definition is the scoping for identifiers within (eg) SELECT statements. It's not at all perfect, and tailored to conventions I use (vs. the more-complete, but less specifically-scoped identifier context here). But it has been very nice to have table-name/alias marked a different color than column-name.

@keith-hall
Copy link
Collaborator Author

keith-hall commented Sep 19, 2021

INSERT INTO my_table (foo, bar)
VALUES (1, 'two')

The table is scoped as a function.

I'm finding that the biggest thing I miss from my janky T-SQL definition is the scoping for identifiers within (eg) SELECT statements. It's not at all perfect, and tailored to conventions I use (vs. the more-complete, but less specifically-scoped identifier context here). But it has been very nice to have table-name/alias marked a different color than column-name.

Indeed, I am aware of this (though thanks for pointing it out) and am trying to decide what approach is best to use - I'm thinking of specific, small contexts which will allow me to match column names where column names are expected, table names where table names are expected etc, while allowing fully/partially qualified names. I think the alternative is to just match anything left which is not a keyword as an identifier - it could still support qualified names, but without being context aware, it won't know if it is a table or column name when unqualified (especially if the schema is not dbo), which is why I prefer the former approach, despite it being more complicated and potentially brittle if I don't think of/cover all cases/syntax... So I had the idea to work on syntax test coverage first, to make that easier to implement in terms of knowing when it is correct.

@keith-hall
Copy link
Collaborator Author

* While requiring word borders via regex for correctly being able to scope words I believe the word scope should be only applied to the word without the borders.

can you clarify what you mean here with an example please @jrappen ?

I believe I addressed your other comments :)

@jrappen
Copy link
Contributor

jrappen commented Sep 19, 2021

Will do when I'm back home.

@michaelblyons
Copy link
Collaborator

Regression: Having a function in the SELECT breaks downstream highlighting.

SELECT  foo, COUNT(*) AS tally
FROM    bar
WHERE   1 = 1
GROUP BY foo

@keith-hall
Copy link
Collaborator Author

Regression: Having a function in the SELECT breaks downstream highlighting.

Sorry about that, it seems every change I make breaks something 😅 but at least each time we get some more syntax test assertions to prevent it in future :D so thanks for the testing and reporting of issues! It is fixed now :)

@keith-hall keith-hall force-pushed the sql branch 2 times, most recently from 1809d84 to 2541790 Compare September 21, 2021 20:19
@jrappen
Copy link
Contributor

jrappen commented Sep 22, 2021

I did some "comment"s research.

The built-in converter had originally brought over the comment tags from tmLanguage, when I did a PR to convert all files to sublime-syntax. I haven't tested it, but I assume the built-in converter works unchanged.

The docs do not specify a comment key, though. Whether by ommission or design, only Jon or one of the others could say.


With regards to word borders, I meant something like this random example from your code:

  expressions:
    - meta_append: true
    - match: (?i)\b(CONCATENATE|CONVERT|LOWER|SUBSTRING|TRANSLATE|TRIM|UPPER)\b
      scope: support.function.string.sql

should've probably been this, yes? :

  expressions:
    - meta_append: true
    - match: (?i)\b(CONCATENATE|CONVERT|LOWER|SUBSTRING|TRANSLATE|TRIM|UPPER)\b
      captures:
        1: support.function.string.sql

There still some pop: true left. These should be pop: 1 when using v2 syntax definitions.

@deathaxe
Copy link
Collaborator

RE captures: 0:: Someone (not me) is very determined to convert all of these to scope:.

Note: captures: 0 may be somewhat faster than pairing captures with scope.

Copy link
Collaborator

@michaelblyons michaelblyons left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The scope-everything-as-a-constant for now is a little wild, but I can edit-color-scheme my way out of it.

Two notes on T-SQL types:

  • You're missing text (intentional?)
  • You can wrap them in [] in some places, including the somewhat odd [VARCHAR](1) style.

SQL/SQL (basic).sublime-syntax Outdated Show resolved Hide resolved
SQL/TSQL.sublime-syntax Outdated Show resolved Hide resolved
SQL/SQL (basic).sublime-syntax Outdated Show resolved Hide resolved
@michaelblyons
Copy link
Collaborator

michaelblyons commented Sep 24, 2021

Looking very good. I will probably now run this at work, rather than my own def.

The current weird stuff:

  • I can't seem to UPDATE after a CTE.
  • FOR XML RAW 🙄 Not a big priority. The def just pretends each word is a column and moves on.
  • UNPIVOT
  • INSERT rather than INSERT INTO: I'm not exactly sure what this does, but the highlighting doesn't recognize it.
  • I'm not sure the intention of constant.language.with. The places I'm seeing it are not constants. They're naming columns in a CTE. Is that what you're trying to do with it?

@keith-hall
Copy link
Collaborator Author

keith-hall commented Sep 25, 2021

  • I'm not sure the intention of constant.language.with. The places I'm seeing it are not constants. They're naming columns in a CTE. Is that what you're trying to do with it?

Hmm, I thought I had covered named CTE columns with syntax tests and scoped them as meta.column-name.

-- ^^^^^^^^^^^^^ meta.column-name

The WITH keyword is too overloaded in TSQL imo - we may need to get smarter about which scenarios we handle it in different ways. It can be used after RAISERROR "WITH NOWAIT", it can be used with parens after a FROM tablename or a JOIN tablename (or subquery?) (and before or after a table alias with an optional AS) to give table-level hints like (NOLOCK) - maybe I should take these known hints from your syntax def instead of scoping all words inside parens as constant.language.with (and perhaps changing the scope to mention "table hint") and finally at the beginning of a statement to declare a CTE. (there may be more cases I'm missing).

I am thinking that because semi-colons to end statements are mostly optional, it is going to be too brittle to try to accurately detect if it is the beginning of a (top level?) statement or not. But I may be able to improve it some more. Do you have any concrete examples you can share where it doesn't work please?

I will work on the other items, thanks for notifying about them. I may just add them into the bottom of the syntax test file without assertions to start with (like I've done for merge) to remind myself what still needs to be done. I also need to cover creating sql logins etc. and more types like image and co-ordinate related ones.

@keith-hall
Copy link
Collaborator Author

I found a case where a CTE wasn't scoped/identified correctly and fixed it. And I found another overload of WITH and added syntax tests for it. I believe I fixed FOR XML RAW and INSERT without INTO. PIVOT and UNPIVOT is still a WIP, ran out of time for now. I found a few other things which need fixing too, and added comments for them in the syntax tests file.

@michaelblyons
Copy link
Collaborator

Hmm, I thought I had covered named CTE columns with syntax tests and scoped them as meta.column-name.

SELECT  *
FROM    table_name AS t1
        INNER JOIN  (SELECT foo FROM bar) AS t2(id)
--                                              ^^ constant.language.with
                    ON t2.ID = t1.ID

which I think is equivalent to

SELECT  *
FROM    table_name AS t1
        INNER JOIN  (SELECT foo AS id FROM bar) AS t2
                    ON t2.ID = t1.ID

@keith-hall
Copy link
Collaborator Author

Hmm, I thought I had covered named CTE columns with syntax tests and scoped them as meta.column-name.

SELECT  *
FROM    table_name AS t1
        INNER JOIN  (SELECT foo FROM bar) AS t2(id)
--                                              ^^ constant.language.with
                    ON t2.ID = t1.ID

which I think is equivalent to

SELECT  *
FROM    table_name AS t1
        INNER JOIN  (SELECT foo AS id FROM bar) AS t2
                    ON t2.ID = t1.ID

Thanks, I hadn't heard of that before. I will try to implement everything from https://docs.microsoft.com/en-us/sql/t-sql/queries/from-transact-sql?view=sql-server-ver15, which indeed includes this syntax.

@michaelblyons
Copy link
Collaborator

Thanks, I hadn't heard of that before.

I've never used it, either. 😅 I'm just clicking around in our SQL files looking for stuff that seems odd.

I will try to implement everything from [T-SQL v15]

You're doing great. If you need to take a break, or if you want me to stop finding bugs just tell me to stop for a bit. 😁

@keith-hall
Copy link
Collaborator Author

You're doing great. If you need to take a break, or if you want me to stop finding bugs just tell me to stop for a bit. 😁

Thanks! Pray, please continue - though I probably should also try to concentrate on other dialects (like MySql) to be sure the base syntax is as useful/correct as it should be, rather than just concentrating on T-SQL 😅
but I find T-SQL most relatable and personally useful for me, and still lots to do/fix... so, if someone else wants to contribute a SQL dialect they are familiar with, it'd be most welcome.

deathaxe added a commit to deathaxe/sublime-packages that referenced this pull request Mar 15, 2024
This commit
1. renames CSS.sublime-syntax to CSS (Plain).sublime-syntax
2. creates an inherit CSS from CSS (Plain)

Goal
----

Support different CSS dialects (Basic, PostCSS, Tailwind CSS, ...) in
embedded code scenarios including templating support.

Motivation
----------

More and more syntax packages inherit CSS.sublime-syntax to add support
for interpolation or embed template tags (e.g.: ASP, JSP, PHP, Liquid,
Vue, ...).

Other packages might provide certain dialects of CSS, such as PostCSS
or Tailwind CSS.

It is currently not easily possible to embed Tailwind CSS in PHP or
Liquid without rewriting each of those syntaxes, because all of them
rely on default CSS.sublime-syntax due to inheritance depending on
file names rather than scopes.

The motivation is to enable exactly this (again), to benefit from both
inheritance and flexibility from 3rd-party packages.

Strategy
--------

The idea is to turn CSS.sublime-syntax into an interface, which is
inherit from CSS (Plain) and can be extended by and embedded into
syntax definitions such as PHP, while being able to easily create an
override with `extends` value replaced by desired syntax dialect
(e.g.: Tailwind CSS).

       CSS (PHP)
          |
         CSS
          |  \
          |  Tailwind CSS
          |   |
          |  PostCSS
          |  /
         CSS (Plain)

Note: It follows the idea being discussed in SQL (sublimehq#3046).
@michaelblyons
Copy link
Collaborator

In terms of number 2, it's a sad fact that almost everything in SQL is a keyword that doesn't fit into any other existing subscope.

This may not be the place for it, but I think it's worth considering a new second-level scope on keyword, especially since languages (like C#) have SQL-like expressions that could use it. I don't know what the level should be called (keyword.relational? keyword.query?), but I think it'd be more useful than keword.other.dml.

Also, I've been using this branch for years. It'd be nice to have its improvements available for everyone without hunting for it.

@deathaxe
Copy link
Collaborator

Besides finalizing support for various statements and dialects (e.g. Postgre) scope naming in general is still one of the major todos. I agree with keyword.other.[dml|ddl|...] not really fitting well into the general scope naming scheme.

Copy link
Collaborator

@deathaxe deathaxe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SQL specs are too heavy to get everything done perfectly well.

This branch/PR may not be perfect, but is still lightyears ahead of master.

I'd say: Just go with it and improve based on user feedback.

Copy link
Collaborator

@michaelblyons michaelblyons left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not my first time approving this. 😜

@deathaxe deathaxe merged commit 15ccade into sublimehq:master Nov 22, 2024
1 check passed
@deathaxe
Copy link
Collaborator

Summary

All supported SQL syntax dialects share a common SQL (basic).sublime-syntax which provides common patterns/contexts and acts as a library.

Supported dialects are: MySQL, PostgreSQL, T-SQL, Cassandra

Basically all syntaxes are completely rewritten, following SQL specs.

Primarily most important DML/DDL parts of each language are covered. As SQL requires dedicated syntax rules for each statement/feature, it is however hard to cover everything. So there are still unsupported statements in each dialect.

All sorts of identifiers, such as tables, index, columns, users, constraints, ... are scoped to enable goto definition or goto reference as well as symbol lists. Not all of those have been added to the ladder though.

Note: Postgre is still in an early WIP state and has not seen much work. As the primarily supported syntax of the old SQL package however was MySQL, this PR doesn't step backwards.

@jtojnar
Copy link
Contributor

jtojnar commented Nov 23, 2024

Just to make sure, is overriding supposed to be done by copying https://github.com/sublimehq/Packages/blob/master/SQL/SQL.sublime-syntax to ~/.config/sublime-text/Packages/SQL/SQL.sublime-syntax and replacing the extends line?

@michaelblyons
Copy link
Collaborator

Yes. Or install DefaultSyntaxChooser from this DeathAxe comment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[SQL] MySQL and Goto Symbol