Support ThreadPoolExecutor to serve simultaneous sync requests #3416

quique0194 · 2024-08-17T15:29:33Z

I know tornado is meant to be used as an async webserver. But perhaps due to hype or whatever reason, people has used it as a sync webserver. That's exactly my case, I have a project that runs on tornado, but uses mongoengine (async mongo ORM) to connect to the db. There is a lot of code to refactor so it is not really viable.

Now we are starting to feel the drawbacks, as our code is about 60% of the time idling waiting IO (in blocking sync). A good solution would have been to use wsgi to spawn some thread workers, but as tornado 6 it is no longer supported. So this fork allows to process several sync requests in OS-threads workers. It achieves that by passing a ThreadPoolExecutor to the Application object when created. It detects sync code and runs it with IOLoop.current().run_in_executor(). If you do not pass tornado_workers_executor to your Application object, it just works as usual.

# Build the application like this:
tornado.web.Application(
    ...
    tornado_workers_executor=ThreadPoolExecutor(4, 'tornado_workers_executor')
)


# The magic happens on these lines:
tornado_workers_executor = self.application.settings.get('tornado_workers_executor')
if iscoroutinefunction(method) or getattr(method, '__tornado_coroutine__', False):
    result = await method(*self.path_args, **self.path_kwargs)
elif tornado_workers_executor:
    result = await IOLoop.current().run_in_executor(
        tornado_workers_executor,
        functools.partial(method, *self.path_args, **self.path_kwargs))
else:
    result = method(*self.path_args, **self.path_kwargs)

I also think that this approach enforces a better separation of responsibility in the code, what makes it in general a better code.

The bigger change that comes with this is that render() and finish() no longer return futures, they just return None, but you can use them as usual.

The bigger change that comes with this is that you should NEVER call self.finish() in your code. You should just set the following to prevent further response modifications: self._finished = True

* render() now returns None, if it returned a future, we would not be able to put each method code inside a thread_pool_executor

But it no longer returns a Future, just None, and it does not flush the results yet

bdarnell · 2024-08-19T14:20:23Z

First, thank you for the thoughtful contribution. I'm not opposed to making it easier to work with synchronous code and threads from Tornado, but it's a very tricky area because of the way it interacts with everything else. (@run_on_executor was an attempt at this use case but it didn't go far enough)

That's exactly my case, I have a project that runs on tornado, but uses mongoengine (async mongo ORM) to connect to the db. There is a lot of code to refactor so it is not really viable.

Now we are starting to feel the drawbacks, as our code is about 60% of the time idling waiting IO (in blocking sync).

FWIW, this was exactly the scenario when Tornado was originally developed - async database drivers weren't generally available so we had blocking database (mysql) access in our otherwise async code. We mitigated this by tuning our queries and database so they were as fast and consistent as possible, and by running more processes (not threads) than we otherwise would to be able to use more of our CPUs. But now expectations around latency are less forgiving and we should try to do better than that.

The bigger change that comes with this is that render() and finish() no longer return futures, they just return None, but you can use them as usual.

Because this is a departure from the way that tornado is designed to be used, it should be held to a high standard for backwards compatibility. Those Futures are important for some applications and I don't want to get rid of them. It might be OK if you can't wait on those futures in another thread, but they should still be there for the single-threaded case.

Beyond backwards compatibility, I have two major concerns with this approach. The first is that it implicitly establishes a lot of new thread-safety guarantees - it must be safe to call many methods of RequestHandler (but not all - for example, flush() is unavailable) from another thread. To some extent this just works by a combination of the GIL and the fact that the main thread won't touch the handler until the worker thread returns, but this is not a recipe for a robust multi-threaded application.

Second, it's kind of limited as a solution in a way that leaves traps for future changes for your application. If you change a post method from a regular function to a coroutine (so that you can use AsyncHTTPClient, for example, or libraries that use it indirectly like tornado.auth), the whole thing moves from the worker thread to the main thread. If the handler also includes some synchronous database calls, those are now blocking the main thread again. I think any mechanism to run some handlers on worker threads should be designed to be transitional so that you can (aspirationally) refactor to fully-async over time.

If I were starting from scratch I think a nicer design would be to replace finish with something that uses the return value of the handler method. This would be appealing even for async handlers but it would be especially helpful for avoiding concerns about thread-safety and side effects with the existing finish method. Maybe there's a way to get there with a reasonable transition path, although I suppose that doesn't help you much if refactoring "is not really viable".

It seems like what you want is basically the behavior of the old WSGIAdapter, which had similar restrictions on what you could do in a handler. It might be better for you to fork the RequestHandler and WSGIAdapter classes from Tornado 5.0 and customize them to your needs than to try to fit restricted multi-threading back into Tornado with your no-refactoring constraint.

quique0194 added 4 commits August 16, 2024 12:35

Support ThreadPoolExecutor to serve simultaneous not-async requests

44f3258

The bigger change that comes with this is that you should NEVER call self.finish() in your code. You should just set the following to prevent further response modifications: self._finished = True

Fix tests

8cfd6d3

* render() now returns None, if it returned a future, we would not be able to put each method code inside a thread_pool_executor

Allow the usage of self.finish() as usual

6ecc169

But it no longer returns a Future, just None, and it does not flush the results yet

Await returned coroutines, but with a warning

ccdd751

bdarnell added the web label Sep 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support ThreadPoolExecutor to serve simultaneous sync requests #3416

Support ThreadPoolExecutor to serve simultaneous sync requests #3416

quique0194 commented Aug 17, 2024 •

edited

Loading

bdarnell commented Aug 19, 2024

Support ThreadPoolExecutor to serve simultaneous sync requests #3416

Are you sure you want to change the base?

Support ThreadPoolExecutor to serve simultaneous sync requests #3416

Conversation

quique0194 commented Aug 17, 2024 • edited Loading

bdarnell commented Aug 19, 2024

quique0194 commented Aug 17, 2024 •

edited

Loading