Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support discarding the CST to preserve memory #1733

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

msujew
Copy link
Member

@msujew msujew commented Oct 29, 2024

This (surprisingly small) change enables us to discard the CST on closed documents (in LSP use cases) to reduce our memory footprint. I've seen several crashes of Langium based language servers on large (> 100mb) workspaces, and this change should help alleviate those issues.

This change includes:

  1. A new Environment service that identifies whether we're currently running as a language server. Running in a CLI environment usually requires full access to the CST.
  2. A new CstParserMode that allows to discard parsed CSTs. The default is Retain, but Discard will be used when we encounter a closed document while running as a language server. This also goes in tandem with the new discardCst function that removes CSTs from ASTs that were parsed in Retain mode. Useful for documents that have been closed.
  3. A new $segments property has been added for caching of CST data. In particular, it contains the full range of any AST node and the ranges for all properties. Ranges for keywords are not stored (yet).
  4. This new property is used extensively in the LSP related services to find the correct positions of properties within closed files. Therefore, most services that attempt to find some specific position/range now use the $segments property.

Specific tests for the discard behavior will follow soon. This PR just shows the feasibility and backwards compatibility of the changes at hand so far.

@msujew msujew added ast AST structure related issue performance Issues related to the runtime performance of Langium labels Oct 29, 2024
Copy link
Contributor

@danieldietrich danieldietrich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Mark,
I just did a quick 'scan' of the code and asked questions.
When I read "cache" my reflex is "and what about invalidation?", i.e. how do you ensure that the closed file contents don't deviate from the cached regions?
Daily reminder (and note to myself): It would be easier to have code formatting in a separate commit ;)
Thanks
Daniel

@msujew
Copy link
Member Author

msujew commented Oct 30, 2024

When I read "cache" my reflex is "and what about invalidation?", i.e. how do you ensure that the closed file contents don't deviate from the cached regions?

As soon as the document is changed (either on disk, or via the textDocument/open notification), the language server will be notified by the language client and reparses the document. Reparsing the document automatically invalidates the cache, so we don't need to worry about desync in that regard.

@danieldietrich
Copy link
Contributor

When I read "cache" my reflex is "and what about invalidation?", i.e. how do you ensure that the closed file contents don't deviate from the cached regions?

As soon as the document is changed (either on disk, or via the textDocument/open notification), the language server will be notified by the language client and reparses the document. Reparsing the document automatically invalidates the cache, so we don't need to worry about desync in that regard.

Sounds all good! Thx

Copy link
Contributor

@danieldietrich danieldietrich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(see my last review)

)];
if (goToLink && goToLink.target.$segments) {
const name = this.nameProvider.getNameProperty(goToLink.target);
if (name) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we use the whole composite node as target in case a name is not available?

// Whenever the user reopens the document, the CST will be rebuilt
discardCst(document.parseResult.value);
}
// Discard the diagnostics for the closed document
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we make this configurable (without method overriding)? It could be simply a public property on the service class to turn this behavior on or off?

I imagine there are users/language implementors who would like to keep all problem markers even for closed documents.

@@ -61,6 +61,9 @@ export class DefaultDocumentValidator implements DocumentValidator {

async validateDocument(document: LangiumDocument, options: ValidationOptions = {}, cancelToken = CancellationToken.None): Promise<Diagnostic[]> {
const parseResult = document.parseResult;
if (!parseResult.value.$cstNode) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the early-exit here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the validator requires the CST to identify where to place the validation markers. If I want to place a diagnostic on a keyword, the validator doesn't know where to place it, because all information about that keyword was lost when we discarded the CST.

Copy link
Contributor

@spoenemann spoenemann Mar 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the lexer and parser errors don't need the CST, so we should at least report those.

readonly locale: string;
}

export interface Environment extends EnvironmentInfo {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need this as a separate service? AFAICT the information is only used by the WorkspaceManager.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this could benefit other services as well and is well suited as a stand-alone service.

@msujew
Copy link
Member Author

msujew commented Nov 7, 2024

@spoenemann I agree, this is something for a v4.0 of Langium. This PR isn't finished anyway - the formatter needs a refactoring as the new way of doing comments in the CST broke the formatter and I still haven't gotten to writing any tests for this.

@msujew msujew marked this pull request as draft November 7, 2024 11:45
@msujew msujew force-pushed the msujew/discard-cst branch from 5104b23 to 60b5a57 Compare February 13, 2025 21:55
@msujew msujew marked this pull request as ready for review February 13, 2025 21:55
@msujew
Copy link
Member Author

msujew commented Feb 13, 2025

I fixed a few issues with this PR and everything should now work as expected. Ready for review :)

@msujew msujew added this to the v4.0.0 milestone Feb 19, 2025
@msujew msujew force-pushed the msujew/discard-cst branch from 60b5a57 to dbb2956 Compare February 19, 2025 16:06
@msujew msujew force-pushed the msujew/discard-cst branch from dbb2956 to 6d91794 Compare February 26, 2025 13:07
Copy link
Contributor

@spoenemann spoenemann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change improves the memory usage for large workspaces. But it actually worsens the situation when you open a large file because the parser attaches both $cstNode and $segments to the AST.

Could we change that so the parser attaches either $cstNode or $segments, but not both? Then the discardCst function can take care of gathering the necessary information from the CST to create $segments. Of course that's more effort then, but I think it's acceptable given that it's done only when a document is closed by the user, so it can happen in the background without the user noticing it.

The LSP services would have to support both ways to obtain range information, but that might be extracted into utility functions to keep it simple.

@@ -198,6 +204,7 @@ export class LangiumParser extends AbstractLangiumParser {
private lexerResult?: LexerResult;
private stack: any[] = [];
private assignmentMap = new Map<AbstractElement, AssignmentElement | undefined>();
private currentMode: CstParserMode = CstParserMode.Retain;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: name this currentCstMode to be more specific

return obj;
obj.$cstNode = cstNode;
delete obj.$segments.comment;
obj.$segments.comment = this.commentProvider.getComment(obj);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why delete if it's immediately assigned?

@@ -39,7 +46,11 @@ export class DefaultNameProvider implements NameProvider {
return undefined;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this return node[this.getNameProperty(node)]?

const nameProperty = this.nameProvider.getNameProperty(node);
const nameSegment: DocumentSegment | undefined = nameProperty
? node.$segments?.properties.get(nameProperty)[0]
: undefined;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The description providers are rather central services. I suggest to keep the CST-based segment resolution as fallback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ast AST structure related issue performance Issues related to the runtime performance of Langium
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants