Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Server-based architecture #78

Open
mrjoelkemp opened this issue Jul 17, 2016 · 0 comments
Open

Server-based architecture #78

mrjoelkemp opened this issue Jul 17, 2016 · 0 comments

Comments

@mrjoelkemp
Copy link
Collaborator

mrjoelkemp commented Jul 17, 2016

Motivation

Currently, the backend is modeled after a composition of stateless functions. Every Dependents operation is meant to be standalone and can handle all work needed for fulfilling its purpose. This is great for simplicity, but results in poor performance. Every subsequent operation has to regenerate its local state – which could mean re-processing (and generating ASTs for every file) an entire directory.

AST generation is the bottleneck. On average, every file in the processed directory (either root and/or styles_root) takes a few hundred ms to parse. Even with parallelization in place, the hundreds of ms stack up to cause a few seconds of wait time. In large codebases (like ESLint), this means the user waits up to 19s for the results of a single command (ex: Find Dependents).

Example

Assume that we have 200 files in a directory. Assume each file takes 100ms to parse (which would take some heroic and hopeful parser optimizations, but also depends on a user's machine specs). In a serial processing fashion, a command would take 100ms per file * 100 files = 10,000ms (10s) to generate the ASTs for every file (and then have to do the sniffs, like finding the dependents of a given file, based on the operation being requested).

If we split the work across the 4 cores of a typical user's laptop, that means each core is responsible for 50 files. Since each core processes its batch of files serially, each core will theoretically take 100ms per file * 50 files = 5000ms (5s). Since all cores process their batches simultaneously, it takes roughly 5s to generate the ASTs for all files. This cuts the runtime in half, but this is still an O(n) operation; when we parallelize the work, we're only multiplying n by a fraction. As we add more files to the project/directory, the wait time of 5s increases.

We do the above work every time a user issues a command. Note, some commands like Copy Path and Jump To Dependency don't require AST generation.

A more efficient architecture involves doing the AST generation work once and reusing the ASTs on subsequent commands. Of course, this requires monitoring the filesystem for file changes (saving a file, adding new files, removing files, adding/removing directories) that may require regenerating a file's (or a directory of files') AST.

If the runtime of a command like Find Dependents is O(nk) (where n is the cost of generating the ast for every file and k is the cost of sniffing the generated asts to find the dependents of a file), then we will pay that cost on the first instance of the command, but only pay O(k) on subsequent requests.

The gist

Client (on startup) instructs the backend to init

  • Create an http server that listens for GET requests containing data describing the operation request
    • Maybe support a configurable port (nice to have)
  • Construct a (long-lived) dependency tree instance for the current directory
    • Do root and styles_root even matter anymore?
    • We could join those paths together in a glob for all files containing support extensions
  • Once the server is ready and the dependency tree has been generated, we can accept operation requests

Client issues commands to the server

  • The server expects http requests
    • Every client would need to use their host language's http library to issue requests
  • It would be great to not have to change clients to issue http requests and have the backend cli translate cli commands into http requests
    • Can't have the cli talk to the server via an object instance since they're on different processes.

Things to consider:

  • Server idle shutdown
    • Debounced on every command request
    • Does the cli need to attempt to ping the server and start a new one if it gets no pong?
  • Filesystem updates?
    • Adding files
    • Removing files
    • Removing directories (empty or with files)
    • Adding directories (empty or with files)
  • What if the user modifies the deprc while the server is running?
    • Should we reparse the config on every request
    • This would imply supporting every config value change (adding/removing directories) and modifying the dependency tree object appropriately.
    • If we have a dependency tree object for the entire project (minus the vendor excludes), there may not be a need to handle config changes

References

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant