Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support ThreadPoolExecutor to serve simultaneous sync requests #3416

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

quique0194
Copy link

@quique0194 quique0194 commented Aug 17, 2024

I know tornado is meant to be used as an async webserver. But perhaps due to hype or whatever reason, people has used it as a sync webserver. That's exactly my case, I have a project that runs on tornado, but uses mongoengine (async mongo ORM) to connect to the db. There is a lot of code to refactor so it is not really viable.

Now we are starting to feel the drawbacks, as our code is about 60% of the time idling waiting IO (in blocking sync). A good solution would have been to use wsgi to spawn some thread workers, but as tornado 6 it is no longer supported. So this fork allows to process several sync requests in OS-threads workers. It achieves that by passing a ThreadPoolExecutor to the Application object when created. It detects sync code and runs it with IOLoop.current().run_in_executor(). If you do not pass tornado_workers_executor to your Application object, it just works as usual.

# Build the application like this:
tornado.web.Application(
    ...
    tornado_workers_executor=ThreadPoolExecutor(4, 'tornado_workers_executor')
)


# The magic happens on these lines:
tornado_workers_executor = self.application.settings.get('tornado_workers_executor')
if iscoroutinefunction(method) or getattr(method, '__tornado_coroutine__', False):
    result = await method(*self.path_args, **self.path_kwargs)
elif tornado_workers_executor:
    result = await IOLoop.current().run_in_executor(
        tornado_workers_executor,
        functools.partial(method, *self.path_args, **self.path_kwargs))
else:
    result = method(*self.path_args, **self.path_kwargs)

I also think that this approach enforces a better separation of responsibility in the code, what makes it in general a better code.

The bigger change that comes with this is that render() and finish() no longer return futures, they just return None, but you can use them as usual.

The bigger change that comes with this is that you should NEVER call
self.finish() in your code. You should just set the following
to prevent further response modifications:

self._finished = True
* render() now returns None, if it returned a future, we would not
be able to put each method code inside a thread_pool_executor
But it no longer returns a Future, just None, and it does not flush
the results yet
@bdarnell
Copy link
Member

First, thank you for the thoughtful contribution. I'm not opposed to making it easier to work with synchronous code and threads from Tornado, but it's a very tricky area because of the way it interacts with everything else. (@run_on_executor was an attempt at this use case but it didn't go far enough)

That's exactly my case, I have a project that runs on tornado, but uses mongoengine (async mongo ORM) to connect to the db. There is a lot of code to refactor so it is not really viable.

Now we are starting to feel the drawbacks, as our code is about 60% of the time idling waiting IO (in blocking sync).

FWIW, this was exactly the scenario when Tornado was originally developed - async database drivers weren't generally available so we had blocking database (mysql) access in our otherwise async code. We mitigated this by tuning our queries and database so they were as fast and consistent as possible, and by running more processes (not threads) than we otherwise would to be able to use more of our CPUs. But now expectations around latency are less forgiving and we should try to do better than that.

The bigger change that comes with this is that render() and finish() no longer return futures, they just return None, but you can use them as usual.

Because this is a departure from the way that tornado is designed to be used, it should be held to a high standard for backwards compatibility. Those Futures are important for some applications and I don't want to get rid of them. It might be OK if you can't wait on those futures in another thread, but they should still be there for the single-threaded case.

Beyond backwards compatibility, I have two major concerns with this approach. The first is that it implicitly establishes a lot of new thread-safety guarantees - it must be safe to call many methods of RequestHandler (but not all - for example, flush() is unavailable) from another thread. To some extent this just works by a combination of the GIL and the fact that the main thread won't touch the handler until the worker thread returns, but this is not a recipe for a robust multi-threaded application.

Second, it's kind of limited as a solution in a way that leaves traps for future changes for your application. If you change a post method from a regular function to a coroutine (so that you can use AsyncHTTPClient, for example, or libraries that use it indirectly like tornado.auth), the whole thing moves from the worker thread to the main thread. If the handler also includes some synchronous database calls, those are now blocking the main thread again. I think any mechanism to run some handlers on worker threads should be designed to be transitional so that you can (aspirationally) refactor to fully-async over time.

If I were starting from scratch I think a nicer design would be to replace finish with something that uses the return value of the handler method. This would be appealing even for async handlers but it would be especially helpful for avoiding concerns about thread-safety and side effects with the existing finish method. Maybe there's a way to get there with a reasonable transition path, although I suppose that doesn't help you much if refactoring "is not really viable".

It seems like what you want is basically the behavior of the old WSGIAdapter, which had similar restrictions on what you could do in a handler. It might be better for you to fork the RequestHandler and WSGIAdapter classes from Tornado 5.0 and customize them to your needs than to try to fit restricted multi-threading back into Tornado with your no-refactoring constraint.

@bdarnell bdarnell added the web label Sep 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants