Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ES|QL] Use custom AST for suggestion, validation, etc... #166185

Merged
merged 64 commits into from
Oct 26, 2023

Conversation

dej611
Copy link
Contributor

@dej611 dej611 commented Sep 11, 2023

Summary

Fixes #166242
This is a PR to use an higher level AST instead of the raw ParserTree to compute all Monaco features for ES|QL.

Grammar

This PR drops the existing grammar in favour of the ES one, which a tiny tweak in order to support any case for the query.
I think we could fully adopt the ES grammar without tweaks once this issue is resolved in the antlr4ts dependency.

The adoption of the ES grammar helped also to resolve #166173 .

Screenshot 2023-09-21 at 12 16 45 Screenshot 2023-09-21 at 12 16 40 Screenshot 2023-09-21 at 12 16 29 Screenshot 2023-09-21 at 12 16 23 Screenshot 2023-09-21 at 12 16 17

AST generator

The AST is a subset of the ESQL grammar parser tree, which is easier to navigate.
Tests will be provided to the tool in order to test that the tool can translate even the craziest query string.
Note that this is not preventing the query to be submitted to ES: even if a validation case fails here or the AST fails to translate part of the expression ES will always have the last word about error and warnings, which will be promptly notified to the user.

Commands coverage:

  • ROW
    • ROW a=1
    • ROW a=1, b=1
  • FROM
    • FROM a
    • FROM a, b
    • FROM a [METADATA ...]
    • FROM a, b [METADATA ...]
      • METADATA won't have indentifiers yet
  • SHOW
    • INFO
    • FUNCTIONS
  • EVAL
    • EVAL a = b + 1
    • EVAL a = fnA( ... ) + fnB( ... )
    • EVAL b + 1
    • EVAL fnA( ... ) + fnB( ... )
  • STATS
    • STATS a = fn(b) + fn(c)
    • STATS a = fn(b) + fn(c) BY group
    • STATS a = fn(b) + fn(c) BY groupA, groupB
    • STATS fn(b) + fn(c)
    • STATS fn(b) + fn(c) BY groupA
    • STATS fn(b) + fn(c) BY groupA, groupB
  • WHERE
  • LIMIT
  • KEEP
  • PROJECT
  • DROP
  • RENAME.
    • RENAME a as b
    • RENAME a + 1 / 5asmyField`
  • DISSECT
  • GROK
  • ENRICH
  • MV_EXPAND
  • SORT

Collect all user defined variables

  • Basic support for assigned variables (i.e. eval a = ...)
  • Field type shadowing with assignment
  • Advanced support for non-assigned cases? (i.e. eval a + 1)

Validation

With this new approach it would be possible to catch most of warnings and errors while typing, preventing users to submit invalid queries to ES.
Note that here the validation function can detect multiple errors and warnings at once, while ES only returns one error at the time - improving the current sub-optimal editing experience provided.

esql_clientside_validation

Tasks

  • Create definitions
    • Define function definitions
    • Define aggs definitions
    • Define command definitions
    • Define command options definitions
  • Create basic validation walker function
    • Support both errors and warnings
    • Validate commands
    • Validate command options
    • Validate functions
    • Validate source/columns (I phase - just formally)
    • Validate policies (I phase - just formally)
    • Validate literals
    • Validate duration/math syntax
    • Validate variables
  • Improve validation work with high level checks
    • Validate source/columns (II phase - leverage ES metadata)
    • Validate policies (II phase - leverage ES metadata)
    • Optimize queries
      • phase I: just make queries strictly when required
  • Write tests
    • Source commands
      • Options
    • Processing commands
      • Options
    • Sorting commands
      • All options, combinations
    • Enrich commands
      • Options
    • Warnings cases
  • Integrate all of it as Monaco diagnostics module
  • Solve issue with long queries and single line mode
  • Provide better sync between client and server side error management
    • Show server side warnings only on absence of errors
    • Clear up server side errors on code change

Autocomplete

Using the same provider here to provide better contextual suggestions.
First example giving better suggestions (filtered field based on types and functions based on returned type):

better_suggestions_types

  • Commands
    • Show all commands on empty string
    • Show all but source commands after first pipe
    • Suggest option at the right moment

Hover

Basic hover information about used functions:
Screenshot 2023-10-16 at 16 22 16
Screenshot 2023-10-16 at 16 21 54

Things not covered in this PR

  • Load function definitions from SHOW FUNCTIONS at language bootstrap
  • Make validate function an independent module?
    • Provide a "formal" validation option (i.e. mute Unknown <entity> [...] messages)
  • Have autocomplete on par with the current state in main

Validation cases not covered yet

  • ... | eval var = 1 year => ❌ it should report an error
  • ... | eval 1 + [1, 2, 3] => ⚠️ it should report a null warning
    • same applies for -, *, /, %
  • any location aware field/variable check
    • ... | stats a = avg(field) | keep b => ❌ b should be reported as unknown
    • ... | keep a | eval b => ❌ b should be reported as unknown
    • ... | rename a as b, d as c, c as b => ❌ c should be reported as unknown
    • ... | enrich policy on fieldFromEnrich => ❌ fieldFromEnrich is not reported as unknown
    • ... | eval a = round(b), b = round(a) => ❌ b should be reported an unknown
  • from ... [metadata <list of meta fields> => ❌ meta fields won't be validated as there's currently no available API to get them

Hover

Advanced usage of hover feature.

Signature

TBD

@dej611 dej611 added Team:Visualizations Visualization editors, elastic-charts and infrastructure Feature:ES|QL ES|QL related features in Kibana labels Sep 11, 2023
@dej611 dej611 added the v8.12.0 label Oct 18, 2023
Copy link
Contributor

@stratoula stratoula left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some notes from me (I haven't checked the code yet)

  • For some reason I get a 0 warning icon
image

This works ok in main ( I should see 1 warning and also the warning on the editor footer but I don't)

  • I see the error for unknown index but is wrong (I have multiple logstash indices and I want to use the wildcard)
image
  • I think this is important, I want to see the list of aggregations here.
image
  • Here also
image

If we want to work on autocomplete on a follow up PR can we merge this in a feature branch instead and not main. I feel that the regressions in autocomplete are important and we should not merge them in main.

  • Seems the validation hasn't run properly here
image
  • Here I expect var1 to be in the list of suggestions as it is on stats
image

@dej611
Copy link
Contributor Author

dej611 commented Oct 18, 2023

Thanks for your quick feedback @stratoula .
I'd prefer to split validation or hover issue from autocomplete as the latter deserve a more focused approach and there's already a lot of content here. I'm ok to have a feature branch to merge in before main and work on a separate autocomplete PR as follow up.

Will reply inline here about validation and UI issues, will defer the autocomplete ones for a later PR:

  • For some reason I get a 0 warning icon
image This works ok in main ( I should see 1 warning and also the warning on the editor footer but I don't)

Can reproduce as well, fixed with 9acd436

  • I see the error for unknown index but is wrong (I have multiple logstash indices and I want to use the wildcard)
image

This is a bug as I was not validating fuzzy strings for indices. Added the check and a unit test for it with 8ec2226

  • Seems the validation hasn't run properly here
image

Can reproduce as well, fixed with 9acd436

@dej611
Copy link
Contributor Author

dej611 commented Oct 18, 2023

It seems that this PR is actually reducing the bundlesize of the kbn-monaco bundle rather than increase it.

Enabling detailed stats on the kbn-monaco webpack optimizer I get this numbers:

[ibazel]     Hash: 8f992c2e72951cec4d98
[ibazel]     Time: 1797ms
[ibazel]     Built at: 10/18/2023 12:43:36 PM
[ibazel]                         Asset      Size  Chunks                   Chunk Names
[ibazel]         esql.editor.worker.js  1.75 MiB    main  [emitted]        main
[ibazel]     esql.editor.worker.js.map  1.85 MiB    main  [emitted] [dev]  main
[ibazel]     Entrypoint main = esql.editor.worker.js esql.editor.worker.js.map
[ibazel]     chunk {main} esql.editor.worker.js, esql.editor.worker.js.map (main) 1.56 MiB [entry] [rendered]

While on main I get this:

[ibazel]     Hash: 2b4a5e316705efa60531
[ibazel]     Time: 2294ms
[ibazel]     Built at: 10/18/2023 12:45:28 PM
[ibazel]                         Asset      Size  Chunks                   Chunk Names
[ibazel]         esql.editor.worker.js  2.04 MiB    main  [emitted]        main
[ibazel]     esql.editor.worker.js.map  2.15 MiB    main  [emitted] [dev]  main
[ibazel]     Entrypoint main = esql.editor.worker.js esql.editor.worker.js.map
[ibazel]     chunk {main} esql.editor.worker.js, esql.editor.worker.js.map (main) 1.79 MiB [entry] [rendered]

so kbn-monaco get ~200kb lighter from the stats.

After looking at the shared-bundle-src detailed view maybe I get what's going on here:

[ibazel]                                 Asset       Size                  Chunks                          Chunk Names
[ibazel]  a1cef3d530e1adb0f52b2f62994a2aca.ttf  69.3 KiB                          [emitted]       
- [ibazel]     kbn-ui-shared-deps-src.chunk.0.js   1.37 MiB                       0  [emitted]        [big]  
+ [ibazel]     kbn-ui-shared-deps-src.chunk.0.js   466 KiB                       0  [emitted]        [big]  
- [ibazel] kbn-ui-shared-deps-src.chunk.0.js.map   1.42 MiB                       0  [emitted] [dev]   
+ [ibazel] kbn-ui-shared-deps-src.chunk.0.js.map   438 KiB                       0  [emitted] [dev]              
- [ibazel]     kbn-ui-shared-deps-src.chunk.1.js    466 KiB                       1  [emitted]        [big]     
+ [ibazel]     kbn-ui-shared-deps-src.chunk.1.js  14.6 KiB                       1  [emitted]             
- [ibazel] kbn-ui-shared-deps-src.chunk.1.js.map    438 KiB                       1  [emitted] [dev]     
+ [ibazel] kbn-ui-shared-deps-src.chunk.1.js.map  14.2 KiB                       1  [emitted] [dev]         
- [ibazel]     kbn-ui-shared-deps-src.chunk.2.js   14.6 KiB                       2  [emitted]            
+ [ibazel]     kbn-ui-shared-deps-src.chunk.2.js  4.94 KiB                       2  [emitted]                 
- [ibazel] kbn-ui-shared-deps-src.chunk.2.js.map   14.2 KiB                       2  [emitted] [dev]     
+ [ibazel] kbn-ui-shared-deps-src.chunk.2.js.map  4.35 KiB                       2  [emitted] [dev]       
- [ibazel]     kbn-ui-shared-deps-src.chunk.3.js   1.33 KiB                       3  [emitted]                    
+ [ibazel]     kbn-ui-shared-deps-src.chunk.3.js  7.61 KiB                       3  [emitted]               
- [ibazel] kbn-ui-shared-deps-src.chunk.3.js.map  124 bytes                       3  [emitted] [dev]       
+ [ibazel] kbn-ui-shared-deps-src.chunk.3.js.map  7.87 KiB                       3  [emitted] [dev]        
- [ibazel]     kbn-ui-shared-deps-src.chunk.4.js   4.94 KiB                       4  [emitted]               
- [ibazel] kbn-ui-shared-deps-src.chunk.4.js.map   4.35 KiB                       4  [emitted] [dev]         
- [ibazel]     kbn-ui-shared-deps-src.chunk.5.js   7.61 KiB                       5  [emitted]               
- [ibazel] kbn-ui-shared-deps-src.chunk.5.js.map   7.87 KiB                       5  [emitted] [dev]         
[ibazel]            kbn-ui-shared-deps-src.css   92.2 KiB  kbn-ui-shared-deps-src  [emitted]               kbn-ui-shared-deps-src
[ibazel]        kbn-ui-shared-deps-src.css.map    120 KiB  kbn-ui-shared-deps-src  [emitted] [dev]         kbn-ui-shared-deps-src
- [ibazel]             kbn-ui-shared-deps-src.js   7.47 MiB  kbn-ui-shared-deps-src  [emitted]        [big]  kbn-ui-shared-deps-src
+ [ibazel]             kbn-ui-shared-deps-src.js  8.97 MiB  kbn-ui-shared-deps-src  [emitted]        [big]  kbn-ui-shared-deps-src
- [ibazel]         kbn-ui-shared-deps-src.js.map    6.5 MiB  kbn-ui-shared-deps-src  [emitted] [dev]         kbn-ui-shared-deps-src
+ [ibazel]         kbn-ui-shared-deps-src.js.map  8.01 MiB  kbn-ui-shared-deps-src  [emitted] [dev]         kbn-ui-shared-deps-src

Apparently part of the chunk.0 and chunk.3 in this PR gets incorporated into the kbn-ui-shared-deps-src.js initial bundle. I'll investigate whether it's possible to move it back to an async bundle the kbn-monaco package, but I wonder if its actual shrink may have played some role in here.

The good news is that overall this PR adds a whopping 12.8 Kb overall to the shared bundle.
The bad news is that this increase seems to be shifted on the wrong side.

@dej611
Copy link
Contributor Author

dej611 commented Oct 19, 2023

Managed to have a view of what is going on here and why the bundle changed.

This PR moved both esql and antlr4ts into the main bundle:

Screenshot 2023-10-19 at 09 24 11

While in main both modules are in the async chunk (not tracked by the github metrics):

Screenshot 2023-10-19 at 09 24 02

I'll try to move back both modules back to the async realm.

@dej611
Copy link
Contributor Author

dej611 commented Oct 19, 2023

Managed to solve the bundle issue and will report here some things I found during the investigation which might be useful if somebody is facing the same problem with the shared bundle:

  • Mind your index.ts "barrel files"

    • this is not something strictly a performance problem as described in this article, but rather a dependency graph problem. The tree-shaking power of a bundler is limited and adding more depth in the analysis between two modules with middle men like "barrel files" might make it harder to optimized (for the bundler) and debug (for the developer) this type of problems.
  • if you can't see it, it's hard to debug

    • it's really hard to debug something you do not see, especially when you have to track down dependencies and where a package is to determine the final bundle size. Right now it is not easy to analyse bundle size of something that is not a plugin, as well documented in the Kibana documentation. So for this specific use case it was required to manual tweak the Bazel + Webpack configuration in order to emit a stats.json file that can be visualized.
    • Tools to visualize here are fundamental to understand the relationships of various packages, and the combination of several representation is key to find the problem: Webpack bundle analyzer was useful to understand in which chunk the package kbn-monaco, antlr4ts and more were stored, while the Webpack analyse > Modules app was super useful to track the type of imports (async vs sync) between files.
  • Mind static and class-based properties. This code got me some headache to refactor in a way that was treeshaking properly:

const symbolsLookup: Record<number, string> = Object.entries(esql_parser)
  .filter(([k, v]) => typeof v === 'number' && !/RULE_/.test(k) && k.toUpperCase() === k)
  .reduce((memo, [k, v]: [string, number]) => {
    memo[v] = k;
    return memo;
  }, {} as Record<number, string>);

The problem here was that the compiler didn't know what to keep and what to drop as the code was iterating thru everything there. The code was needed to map rule numbers to their string equivalent, but luckily there's a static property to use which makes the tree-shake work performantly:

esql_parser.VOCABULARY.getSymbolicName(<tokenId>);
  • if you have a big dependency, moving the logic who depends on that on a Web Worker moves the big dependency on an async chunk.
    • in this case generating the simplified AST is not much code (ast_factory.ts + ast_walker.ts) per se, but it requires the esql_parser full dependency, which in turns requires the full antlr4ts library, which is worth 1 Mb. Moving the AST generation on the web worker makes the architecture and the flow a little bit more convoluted (see the new packages/kbn-monaco/src/esql/lib/monaco/esql_ast_provider.ts adapter) but it ensures that the 1 Mb dependency is required only within the web worker as async chunk.

@dej611 dej611 changed the base branch from main to feature/esql-validation October 23, 2023 07:28
@dej611
Copy link
Contributor Author

dej611 commented Oct 23, 2023

As discussed offline with @stratoula this PR now targets the new feature branch feature/esql-validation where next iterations with autocomplete and fetching optimizations will land.

Once this PR will get reviewed (validation & hover) and merged into the new feature branch the following follow ups will be pursued:

  • fix autocomplete
    • in particular put stats and eval on par with current main state
  • fix number of requests for validation
    • not allow it to make requests actively and only enable field validation when the cache gets populated by autocomplete
    • same for hover
    • restore muted tests and add new ones

@kibana-ci
Copy link
Collaborator

💚 Build Succeeded

Metrics [docs]

Public APIs missing comments

Total count of every public API that lacks a comment. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats comments for more detailed information.

id before after diff
@kbn/monaco 76 94 +18

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id before after diff
textBasedLanguages 148.8KB 149.3KB +570.0B

Public APIs missing exports

Total count of every type that is part of your API that should be exported but is not. This will cause broken links in the API documentation system. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats exports for more detailed information.

id before after diff
@kbn/monaco 3 2 -1

Page load bundle

Size of the bundles that are downloaded on every page load. Target size is below 100kb

id before after diff
kbnUiSharedDeps-srcJs 2.2MB 2.3MB +47.5KB
Unknown metric groups

API count

id before after diff
@kbn/monaco 78 96 +18

ESLint disabled line counts

id before after diff
@kbn/text-based-editor 2 3 +1

Total ESLint disabled count

id before after diff
@kbn/text-based-editor 2 3 +1

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

@dej611 dej611 marked this pull request as ready for review October 23, 2023 08:51
@dej611 dej611 requested review from a team as code owners October 23, 2023 08:51
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-visualizations (Team:Visualizations)

@@ -223,7 +223,8 @@ export default function ({ getService, getPageObjects }: FtrProviderContext) {
});
});

describe('ES|QL mode', () => {
// @TODO: fix this in a follow up
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't realize this was going into a feature branch at first and was surprised to see these tests skipped, but assuming they'll be addressed and unskipped before making it to main, this LGTM 👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Davis correct, we found some performance problems so this is not going to be merged. Also autocomplete doesnt work as expected. But as this is a big PR already we want to move this into a feature branch and then continue with the investigation and the development

Copy link
Contributor

@stratoula stratoula left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed with Marco this is not ready to be merged and this is why we decided to create a feature branch instead. Specifically:

  • autocomplete needs some refinement
  • We should see what we are going to about making the client side validation performant. Currently we are hitting ES every 250ms and also we add another api call to Discover initialization

As a first step the code looks ok to me so I am approving the merge to the feature branch

Copy link
Contributor

@Dosant Dosant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be great if you'd make your team the owner of kbn-monaco/src/esql :D

@dej611 dej611 merged commit e793907 into elastic:feature/esql-validation Oct 26, 2023
dej611 added a commit that referenced this pull request Dec 7, 2023
## Summary

Fixes #166242 , #166876, #166173
, part of #166092, #166084

List of tasks:

* [x] AST work ( #166185 )
* [x] Basic validate work ( #166185 )
* [x] Hover feature ( #166185 )
* [x] Initial autocomplete work with new AST ( #166185 )
* Complete validation feature for MVP
  * [x] wildcards support ( #170014 )
  * [x] remote index validation support ( #171996 )
  * [x] wildcard support as `count` argument  ( #172054 )
  * [x] Aggressive caching for field queries:
* cache as much as possible the `FROM` queries - possible clear the
cache every 10/15 minutes?
    * do not fire a query when code == submitted code
* cache as much as possible the custom `FROM` built from `ENRICH`
policies - same clear policy as above
  * [x] Add unsupported fields warning messages
  * [x] Notify usage of `project` command with deprecation `warning`
* Complete autocomplete work ( #171664 )
  * [x] `stats`
  * [x] `where`
  * [x] `eval`
    * `math syntax`
  * [x] Aggressive cache for fields queries? ( #171866 )
  * [x] Fix when cursor is not at the end position ( #172060 ) 
* [x] Revisit copy messages
  * Label Kibana-only messages
* [x] Extend hover feature ( #171940 )
* [x] Disable editor query highlight for warnings ( #171968 )
* Fix editor highlight with new grammar
  * [x] on multi-line ( #172080 )
  * [x] for functions ( #172287 )

## Release notes

Enhance ES|QL query editing experience with client side validation.
Enhance ES|QL suggestions experience with more in context suggestions
leveraging field and variable types.
Show meta informations on ES|QL query hover on policy names.
Show function signature on ES|QL query hover on function text.


### Checklist

Delete any items that are not applicable to this PR.

- [x] Any text added follows [EUI's writing
guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses
sentence case text and includes [i18n
support](https://github.com/elastic/kibana/blob/main/packages/kbn-i18n/README.md)
- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios
- [x] Any UI touched in this PR is usable by keyboard only (learn more
about [keyboard accessibility](https://webaim.org/techniques/keyboard/))

---------

Co-authored-by: kibanamachine <[email protected]>
Co-authored-by: Stratoula Kalafateli <[email protected]>
Co-authored-by: Abdon Pijpelink <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci:build-webpack-bundle-analyzer Feature:ES|QL ES|QL related features in Kibana release_note:skip Skip the PR/issue when compiling release notes Team:Visualizations Visualization editors, elastic-charts and infrastructure v8.12.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[ES|QL] Simplify grammar vs improve validation checks on top of it [ES|QL] Kibana parser issues
7 participants