Simplify class TokenCollector
to avoid two versions of maximal token logic
#500
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
(The description of this PR is somewhat lengthy -- a bit in the style of a small ADR -- to be able to remember why this PR was needed and how we got here.)
Background
The implementation of the semantic tokenizer consists of three parts:
TokenCollector
: conversion from parse trees to lists of semantic tokensTokenList
: data structure that encodes lists of semantic tokens as integer arrays (required by LSP);TokenTypes
: mapping from Rascal categories (= Rascal's legacy semantic token types, TextMate scopes, and LSP semantic token types) to LSP semantic token types;To compute the semantic tokens for a parse tree, an instance of
TokenCollector
recursively traverses the parse tree. Each time when a semantic token is identified during the traversal, an instance ofTokenList
is updated.Scope
This PR touches only the token collector. Tests were recently added in #494 to gain confidence this PR doesn't break anything.
Motivation
The semantic tokenizer should compute maximal tokens. For instance, if
foobar
is a string, then the semantic tokenizer should returnfoobar
as semantic token instead of separatelyfoo
andbar
.As of 2495f20, logic to compute maximal tokens is actually implemented twice. Roughly:
This PR removes point 1 from the token collector, because:
syntax
tree (with category) hassyntax
children #456, now seems to be a good opportunity to re-simplify the design and implementation of the token collector.Plan
Before this PR
The token collector has a number of fields that are used for bookkeeping. The fields are used throughout the recursive traversal. These are the relevant commits in which fields were added to support new features:
Initial implementation:
rascal-language-servers/rascal-lsp/src/main/java/org/rascalmpl/vscode/lsp/util/SemanticTokenizer.java
Lines 130 to 131 in 6a73d54
After adding support for subtrees without categories:
rascal-language-servers/rascal-lsp/src/main/java/org/rascalmpl/vscode/lsp/util/SemanticTokenizer.java
Lines 155 to 158 in cb733e1
After adding support for multiline subtrees:
rascal-language-servers/rascal-lsp/src/main/java/org/rascalmpl/vscode/lsp/util/SemanticTokenizer.java
Lines 363 to 367 in 9bef283
After adding support for subtrees with nested categories
rascal-language-servers/rascal-lsp/src/main/java/org/rascalmpl/vscode/lsp/util/SemanticTokenizer.java
Lines 391 to 396 in ee70838
After this PR
By removing the logic to compute maximal tokens from the token collector (keep it only in the token list), the state of the token collector is simplified to:
rascal-language-servers/rascal-lsp/src/main/java/org/rascalmpl/vscode/lsp/util/SemanticTokenizer.java
Lines 394 to 396 in 6c2ed04
The recursive traversal is much simplified as well.