You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, the backend is modeled after a composition of stateless functions. Every Dependents operation is meant to be standalone and can handle all work needed for fulfilling its purpose. This is great for simplicity, but results in poor performance. Every subsequent operation has to regenerate its local state – which could mean re-processing (and generating ASTs for every file) an entire directory.
AST generation is the bottleneck. On average, every file in the processed directory (either root and/or styles_root) takes a few hundred ms to parse. Even with parallelization in place, the hundreds of ms stack up to cause a few seconds of wait time. In large codebases (like ESLint), this means the user waits up to 19s for the results of a single command (ex: Find Dependents).
Example
Assume that we have 200 files in a directory. Assume each file takes 100ms to parse (which would take some heroic and hopeful parser optimizations, but also depends on a user's machine specs). In a serial processing fashion, a command would take 100ms per file * 100 files = 10,000ms (10s) to generate the ASTs for every file (and then have to do the sniffs, like finding the dependents of a given file, based on the operation being requested).
If we split the work across the 4 cores of a typical user's laptop, that means each core is responsible for 50 files. Since each core processes its batch of files serially, each core will theoretically take 100ms per file * 50 files = 5000ms (5s). Since all cores process their batches simultaneously, it takes roughly 5s to generate the ASTs for all files. This cuts the runtime in half, but this is still an O(n) operation; when we parallelize the work, we're only multiplying n by a fraction. As we add more files to the project/directory, the wait time of 5s increases.
We do the above work every time a user issues a command. Note, some commands like Copy Path and Jump To Dependency don't require AST generation.
A more efficient architecture involves doing the AST generation work once and reusing the ASTs on subsequent commands. Of course, this requires monitoring the filesystem for file changes (saving a file, adding new files, removing files, adding/removing directories) that may require regenerating a file's (or a directory of files') AST.
If the runtime of a command like Find Dependents is O(nk) (where n is the cost of generating the ast for every file and k is the cost of sniffing the generated asts to find the dependents of a file), then we will pay that cost on the first instance of the command, but only pay O(k) on subsequent requests.
The gist
Client (on startup) instructs the backend to init
Create an http server that listens for GET requests containing data describing the operation request
Maybe support a configurable port (nice to have)
Construct a (long-lived) dependency tree instance for the current directory
Do root and styles_root even matter anymore?
We could join those paths together in a glob for all files containing support extensions
Once the server is ready and the dependency tree has been generated, we can accept operation requests
Client issues commands to the server
The server expects http requests
Every client would need to use their host language's http library to issue requests
It would be great to not have to change clients to issue http requests and have the backend cli translate cli commands into http requests
Can't have the cli talk to the server via an object instance since they're on different processes.
Things to consider:
Server idle shutdown
Debounced on every command request
Does the cli need to attempt to ping the server and start a new one if it gets no pong?
Filesystem updates?
Adding files
Removing files
Removing directories (empty or with files)
Adding directories (empty or with files)
What if the user modifies the deprc while the server is running?
Should we reparse the config on every request
This would imply supporting every config value change (adding/removing directories) and modifying the dependency tree object appropriately.
If we have a dependency tree object for the entire project (minus the vendor excludes), there may not be a need to handle config changes
Motivation
Currently, the backend is modeled after a composition of stateless functions. Every Dependents operation is meant to be standalone and can handle all work needed for fulfilling its purpose. This is great for simplicity, but results in poor performance. Every subsequent operation has to regenerate its local state – which could mean re-processing (and generating ASTs for every file) an entire directory.
AST generation is the bottleneck. On average, every file in the processed directory (either
root
and/orstyles_root
) takes a few hundred ms to parse. Even with parallelization in place, the hundreds of ms stack up to cause a few seconds of wait time. In large codebases (like ESLint), this means the user waits up to 19s for the results of a single command (ex: Find Dependents).Example
Assume that we have 200 files in a directory. Assume each file takes 100ms to parse (which would take some heroic and hopeful parser optimizations, but also depends on a user's machine specs). In a serial processing fashion, a command would take 100ms per file * 100 files = 10,000ms (10s) to generate the ASTs for every file (and then have to do the sniffs, like finding the dependents of a given file, based on the operation being requested).
If we split the work across the 4 cores of a typical user's laptop, that means each core is responsible for 50 files. Since each core processes its batch of files serially, each core will theoretically take 100ms per file * 50 files = 5000ms (5s). Since all cores process their batches simultaneously, it takes roughly 5s to generate the ASTs for all files. This cuts the runtime in half, but this is still an
O(n)
operation; when we parallelize the work, we're only multiplyingn
by a fraction. As we add more files to the project/directory, the wait time of 5s increases.We do the above work every time a user issues a command. Note, some commands like
Copy Path
andJump To Dependency
don't require AST generation.A more efficient architecture involves doing the AST generation work once and reusing the ASTs on subsequent commands. Of course, this requires monitoring the filesystem for file changes (saving a file, adding new files, removing files, adding/removing directories) that may require regenerating a file's (or a directory of files') AST.
If the runtime of a command like Find Dependents is
O(nk)
(wheren
is the cost of generating the ast for every file andk
is the cost of sniffing the generated asts to find the dependents of a file), then we will pay that cost on the first instance of the command, but only payO(k)
on subsequent requests.The gist
Client (on startup) instructs the backend to init
Client issues commands to the server
Things to consider:
References
The text was updated successfully, but these errors were encountered: