-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support discarding the CST to preserve memory #1733
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Mark,
I just did a quick 'scan' of the code and asked questions.
When I read "cache" my reflex is "and what about invalidation?", i.e. how do you ensure that the closed file contents don't deviate from the cached regions?
Daily reminder (and note to myself): It would be easier to have code formatting in a separate commit ;)
Thanks
Daniel
As soon as the document is changed (either on disk, or via the |
Sounds all good! Thx |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(see my last review)
)]; | ||
if (goToLink && goToLink.target.$segments) { | ||
const name = this.nameProvider.getNameProperty(goToLink.target); | ||
if (name) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we use the whole composite node as target in case a name is not available?
// Whenever the user reopens the document, the CST will be rebuilt | ||
discardCst(document.parseResult.value); | ||
} | ||
// Discard the diagnostics for the closed document |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we make this configurable (without method overriding)? It could be simply a public property on the service class to turn this behavior on or off?
I imagine there are users/language implementors who would like to keep all problem markers even for closed documents.
@@ -61,6 +61,9 @@ export class DefaultDocumentValidator implements DocumentValidator { | |||
|
|||
async validateDocument(document: LangiumDocument, options: ValidationOptions = {}, cancelToken = CancellationToken.None): Promise<Diagnostic[]> { | |||
const parseResult = document.parseResult; | |||
if (!parseResult.value.$cstNode) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the early-exit here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because the validator requires the CST to identify where to place the validation markers. If I want to place a diagnostic on a keyword, the validator doesn't know where to place it, because all information about that keyword was lost when we discarded the CST.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But the lexer and parser errors don't need the CST, so we should at least report those.
readonly locale: string; | ||
} | ||
|
||
export interface Environment extends EnvironmentInfo { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need this as a separate service? AFAICT the information is only used by the WorkspaceManager.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like this could benefit other services as well and is well suited as a stand-alone service.
@spoenemann I agree, this is something for a v4.0 of Langium. This PR isn't finished anyway - the formatter needs a refactoring as the new way of doing comments in the CST broke the formatter and I still haven't gotten to writing any tests for this. |
5104b23
to
60b5a57
Compare
I fixed a few issues with this PR and everything should now work as expected. Ready for review :) |
60b5a57
to
dbb2956
Compare
dbb2956
to
6d91794
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change improves the memory usage for large workspaces. But it actually worsens the situation when you open a large file because the parser attaches both $cstNode
and $segments
to the AST.
Could we change that so the parser attaches either $cstNode
or $segments
, but not both? Then the discardCst
function can take care of gathering the necessary information from the CST to create $segments
. Of course that's more effort then, but I think it's acceptable given that it's done only when a document is closed by the user, so it can happen in the background without the user noticing it.
The LSP services would have to support both ways to obtain range information, but that might be extracted into utility functions to keep it simple.
@@ -198,6 +204,7 @@ export class LangiumParser extends AbstractLangiumParser { | |||
private lexerResult?: LexerResult; | |||
private stack: any[] = []; | |||
private assignmentMap = new Map<AbstractElement, AssignmentElement | undefined>(); | |||
private currentMode: CstParserMode = CstParserMode.Retain; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: name this currentCstMode
to be more specific
return obj; | ||
obj.$cstNode = cstNode; | ||
delete obj.$segments.comment; | ||
obj.$segments.comment = this.commentProvider.getComment(obj); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why delete
if it's immediately assigned?
@@ -39,7 +46,11 @@ export class DefaultNameProvider implements NameProvider { | |||
return undefined; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this return node[this.getNameProperty(node)]
?
const nameProperty = this.nameProvider.getNameProperty(node); | ||
const nameSegment: DocumentSegment | undefined = nameProperty | ||
? node.$segments?.properties.get(nameProperty)[0] | ||
: undefined; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The description providers are rather central services. I suggest to keep the CST-based segment resolution as fallback.
This (surprisingly small) change enables us to discard the CST on closed documents (in LSP use cases) to reduce our memory footprint. I've seen several crashes of Langium based language servers on large (> 100mb) workspaces, and this change should help alleviate those issues.
This change includes:
Environment
service that identifies whether we're currently running as a language server. Running in a CLI environment usually requires full access to the CST.CstParserMode
that allows to discard parsed CSTs. The default isRetain
, butDiscard
will be used when we encounter a closed document while running as a language server. This also goes in tandem with the newdiscardCst
function that removes CSTs from ASTs that were parsed inRetain
mode. Useful for documents that have been closed.$segments
property has been added for caching of CST data. In particular, it contains the full range of any AST node and the ranges for all properties. Ranges for keywords are not stored (yet).$segments
property.Specific tests for the discard behavior will follow soon. This PR just shows the feasibility and backwards compatibility of the changes at hand so far.