Server-based architecture #78

mrjoelkemp · 2016-07-17T14:57:58Z

Motivation

Currently, the backend is modeled after a composition of stateless functions. Every Dependents operation is meant to be standalone and can handle all work needed for fulfilling its purpose. This is great for simplicity, but results in poor performance. Every subsequent operation has to regenerate its local state – which could mean re-processing (and generating ASTs for every file) an entire directory.

AST generation is the bottleneck. On average, every file in the processed directory (either root and/or styles_root) takes a few hundred ms to parse. Even with parallelization in place, the hundreds of ms stack up to cause a few seconds of wait time. In large codebases (like ESLint), this means the user waits up to 19s for the results of a single command (ex: Find Dependents).

Example

Assume that we have 200 files in a directory. Assume each file takes 100ms to parse (which would take some heroic and hopeful parser optimizations, but also depends on a user's machine specs). In a serial processing fashion, a command would take 100ms per file * 100 files = 10,000ms (10s) to generate the ASTs for every file (and then have to do the sniffs, like finding the dependents of a given file, based on the operation being requested).

If we split the work across the 4 cores of a typical user's laptop, that means each core is responsible for 50 files. Since each core processes its batch of files serially, each core will theoretically take 100ms per file * 50 files = 5000ms (5s). Since all cores process their batches simultaneously, it takes roughly 5s to generate the ASTs for all files. This cuts the runtime in half, but this is still an O(n) operation; when we parallelize the work, we're only multiplying n by a fraction. As we add more files to the project/directory, the wait time of 5s increases.

We do the above work every time a user issues a command. Note, some commands like Copy Path and Jump To Dependency don't require AST generation.

A more efficient architecture involves doing the AST generation work once and reusing the ASTs on subsequent commands. Of course, this requires monitoring the filesystem for file changes (saving a file, adding new files, removing files, adding/removing directories) that may require regenerating a file's (or a directory of files') AST.

If the runtime of a command like Find Dependents is O(nk) (where n is the cost of generating the ast for every file and k is the cost of sniffing the generated asts to find the dependents of a file), then we will pay that cost on the first instance of the command, but only pay O(k) on subsequent requests.

The gist

Client (on startup) instructs the backend to init

Create an http server that listens for GET requests containing data describing the operation request
- Maybe support a configurable port (nice to have)
Construct a (long-lived) dependency tree instance for the current directory
- Do root and styles_root even matter anymore?
- We could join those paths together in a glob for all files containing support extensions
Once the server is ready and the dependency tree has been generated, we can accept operation requests

Client issues commands to the server

The server expects http requests
- Every client would need to use their host language's http library to issue requests
It would be great to not have to change clients to issue http requests and have the backend cli translate cli commands into http requests
- Can't have the cli talk to the server via an object instance since they're on different processes.

Things to consider:

Server idle shutdown
- Debounced on every command request
- Does the cli need to attempt to ping the server and start a new one if it gets no pong?
Filesystem updates?
- Adding files
- Removing files
- Removing directories (empty or with files)
- Adding directories (empty or with files)
What if the user modifies the deprc while the server is running?
- Should we reparse the config on every request
- This would imply supporting every config value change (adding/removing directories) and modifying the dependency tree object appropriately.
- If we have a dependency tree object for the entire project (minus the vendor excludes), there may not be a need to handle config changes

References

Ternjs server startup via cli script: https://github.com/ternjs/tern/blob/95dfbf405802207b5529314b83b6104f9ea9b042/bin/tern
Atom-tern client invoking commands on the server: https://github.com/tststs/atom-ternjs/blob/6344a6b63d2ba33d7b56266952e7ed073dce45eb/lib/atom-ternjs-server-worker.js#L34
- Because it's a JS env, that client opts for issuing commands to the library-layer server instead of the server process.

The text was updated successfully, but these errors were encountered:

mrjoelkemp added the enhancement label Jul 17, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Server-based architecture #78

Server-based architecture #78

mrjoelkemp commented Jul 17, 2016 •

edited

Loading

Server-based architecture #78

Server-based architecture #78

Comments

mrjoelkemp commented Jul 17, 2016 • edited Loading

Motivation

Example

The gist

Things to consider:

References

mrjoelkemp commented Jul 17, 2016 •

edited

Loading