Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Windows Threading Issues #385

Open
wants to merge 20 commits into
base: main
Choose a base branch
from

Conversation

FrsECM
Copy link

@FrsECM FrsECM commented Dec 4, 2024

Before submitting
  • Was this discussed/agreed via a Github issue? (no need for typos and docs improvements)

⚠️ How does this PR impact the user? ⚠️
As a user, i need to serve a model with multiple worker per device on a windows machine

What does this PR do?

This PR propose a fix to bug from Issue #384.

Uvicorn reference: Source

For example, with this simple server :

import litserve as ls
class SimpleLitAPI(ls.LitAPI):
    def setup(self, device):
        self.model1 = lambda x: x**2
    def decode_request(self, request):
        return request["input"] 
    def predict(self, x):
        squared = self.model1(x)
        output = squared
        return {"output": output}

    def encode_response(self, output):
        return {"output": output} 

# (STEP 2) - START THE SERVER
if __name__ == "__main__":
    # scale with advanced features (batching, GPUs, etc...)
    server = ls.LitServer(SimpleLitAPI(), accelerator="auto", max_batch_size=1,workers_per_device=2)
    server.run(port=8000)

Bellow, you'll find the difference between before and after the PR :

Before the PR
(litserve_win) PS > python .\dummy_server.py
uvloop is not installed. Falling back to the default asyncio event loop. Please install uvloop for better performance using `pip install uvloop`.
uvloop is not installed. Falling back to the default asyncio event loop.
INFO:     Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
C:\Users\F296849\AppData\Local\miniforge3\envs\iris2_win\lib\site-packages\litserve\server.py:475: UserWarning: Windows does not support forking. Using threads api_server_worker_type will be set to 'thread'
  warnings.warn(
uvloop is not installed. Falling back to the default asyncio event loop. Please install uvloop for better performance using `pip install uvloop`.
uvloop is not installed. Falling back to the default asyncio event loop.
Swagger UI is available at http://0.0.0.0:8000/docs
INFO:     Started server process [35312]
INFO:     Started server process [35312]
INFO:     Waiting for application startup.
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Application startup complete.
Accept failed on a socket
socket: <asyncio.TransportSocket fd=608, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0, laddr=('0.0.0.0', 8000)>
Traceback (most recent call last):
  File "C:\Users\F296849\AppData\Local\miniforge3\envs\iris2_win\lib\asyncio\proactor_events.py", line 841, in loop
    f = self._proactor.accept(sock)
  File "C:\Users\F296849\AppData\Local\miniforge3\envs\iris2_win\lib\asyncio\windows_events.py", line 563, in accept
    self._register_with_iocp(listener)
  File "C:\Users\F296849\AppData\Local\miniforge3\envs\iris2_win\lib\asyncio\windows_events.py", line 732, in _register_with_iocp      
    _overlapped.CreateIoCompletionPort(obj.fileno(), self._iocp, 0, 0)
OSError: [WinError 87] Paramètre incorrect
Task exception was never retrieved
future: <Task finished name='Task-7' coro=<IocpProactor.accept.<locals>.accept_coro() done, defined at C:\Users\F296849\AppData\Local\miniforge3\envs\iris2_win\lib\asyncio\windows_events.py:577> exception=OSError(22, 'L’opération d’entrée/sortie a été abandonnée en raison de l’arrêt d’un thread ou à la demande d’une application', None, 995, None)>
Traceback (most recent call last):
  File "C:\Users\F296849\AppData\Local\miniforge3\envs\iris2_win\lib\asyncio\windows_events.py", line 580, in accept_coro
    await future
OSError: [WinError 995] L’opération d’entrée/sortie a été abandonnée en raison de l’arrêt d’un thread ou à la demande d’une application
uvloop is not installed. Falling back to the default asyncio event loop. Please install uvloop for better performance using `pip install uvloop`.
uvloop is not installed. Falling back to the default asyncio event loop.
uvloop is not installed. Falling back to the default asyncio event loop. Please install uvloop for better performance using `pip install uvloop`.
uvloop is not installed. Falling back to the default asyncio event loop.
Setup complete for worker 1.
Setup complete for worker 0.
After the PR
(litserve_win) PS > python .\dummy_server.py
uvloop is not installed. Falling back to the default asyncio event loop. Please install uvloop for better performance using `pip install uvloop`.
uvloop is not installed. Falling back to the default asyncio event loop.
INFO:     Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
C:\Users\F296849\AppData\Local\miniforge3\envs\iris2_win\lib\site-packages\litserve\server.py:475: UserWarning: Windows does not support forking. Using threads api_server_worker_type will be set to 'thread'
  warnings.warn(
uvloop is not installed. Falling back to the default asyncio event loop. Please install uvloop for better performance using `pip install uvloop`.
uvloop is not installed. Falling back to the default asyncio event loop.
Swagger UI is available at http://0.0.0.0:8000/docs
INFO:     Started server process [24816]
INFO:     Waiting for application startup.
INFO:     Started server process [24816]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Application startup complete.
uvloop is not installed. Falling back to the default asyncio event loop. Please install uvloop for better performance using `pip install uvloop`.
uvloop is not installed. Falling back to the default asyncio event loop.
Setup complete for worker 0.
uvloop is not installed. Falling back to the default asyncio event loop. Please install uvloop for better performance using `pip install uvloop`.
uvloop is not installed. Falling back to the default asyncio event loop.
Setup complete for worker 1.

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in GitHub issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

Copy link

codecov bot commented Dec 4, 2024

Codecov Report

Attention: Patch coverage is 70.37037% with 8 lines in your changes missing coverage. Please review.

Project coverage is 91%. Comparing base (f28c816) to head (bde6fdb).

Additional details and impacted files
@@         Coverage Diff         @@
##           main   #385   +/-   ##
===================================
- Coverage    91%    91%   -1%     
===================================
  Files        25     25           
  Lines      1783   1800   +17     
===================================
+ Hits       1631   1635    +4     
- Misses      152    165   +13     

@aniketmaurya
Copy link
Collaborator

hi @FrsECM, thank you so much for the PR! Could you also add additional information about why setting config.workers fixes this issue for reference?

@FrsECM
Copy link
Author

FrsECM commented Dec 4, 2024

hi @FrsECM, thank you so much for the PR! Could you also add additional information about why setting config.workers fixes this issue for reference?

Sure.
It's to make sure the code go through this portion of uvicorn :
encode/uvicorn@858f1c5

By default, the number of worker is set to 1 and it crashes when we have multiples worker per devices on windows.

@FrsECM
Copy link
Author

FrsECM commented Dec 5, 2024

I have to investigate a little more on another issue that cause [WinError 10022].

It seems that in some case (not completely reproductible), the socket is not ready to listen when we start uvicorn servers.
A workarround is to force listening before setting up uvicorn servers. I don't have any idea about why it behaves like that.
If you have any idea.

@FrsECM
Copy link
Author

FrsECM commented Dec 6, 2024

@aniketmaurya did you have a look on the PR ?

It seems to fix my problem on windows.

It doesn't fix issue number #372 but it allows to make multiple worker on windows.

Copy link
Collaborator

@aniketmaurya aniketmaurya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! thanks for creating the fix 🚀

src/litserve/server.py Outdated Show resolved Hide resolved
@FrsECM
Copy link
Author

FrsECM commented Dec 8, 2024

@aniketmaurya i got an idea for #372.

Unlike on Linux, i discovered that inference workers were stopped first instead of uvicorn's.

A fix is to join on inference worker on windows and to cleanly give a signal to uvicorn in order to end threads properly.
In order to do it, i need to keep uvicorn's server in a class variable.
It's because currently only workers are returned by method _start_server().

I renamed some variables to be a little more explicit about their content.

@FrsECM FrsECM changed the title Fix bug on windows with uvicorn when multiple workers. Fix Windows Threading Issues Dec 8, 2024
src/litserve/server.py Outdated Show resolved Hide resolved
src/litserve/server.py Outdated Show resolved Hide resolved
FrsECM and others added 4 commits December 8, 2024 22:13
@aniketmaurya aniketmaurya self-requested a review December 8, 2024 23:39
@aniketmaurya
Copy link
Collaborator

@FrsECM it seems like the Windows tests are stuck and have timed out.

@FrsECM
Copy link
Author

FrsECM commented Dec 12, 2024 via email

@FrsECM
Copy link
Author

FrsECM commented Dec 13, 2024

@aniketmaurya i did a first try and suspected httpx>=0.28.0. But i was not up to date.

On python 3.10.15, i have no issues on my computer, every tests are running.

(litserve) PS C:\BUSCODE\packages\LitServe> python -m pytest
C:\Users\F296849\AppData\Local\miniforge3\envs\litserve\lib\site-packages\pytest_asyncio\plugin.py:207: PytestDeprecationWarning: The configuration option "asyncio_default_fixture_loop_scope" is unset.
The event loop scope for asynchronous fixtures will default to the fixture caching scope. Future versions of pytest-asyncio will default the loop scope for asynchronous fixtures to function scope. Set the default fixture loop scope explicitly in order to avoid unexpected behavior in the future. Valid fixture loop scopes are: "function", "class", "module", "package", "session"

  warnings.warn(PytestDeprecationWarning(_DEFAULT_FIXTURE_LOOP_SCOPE_UNSET))
==================================================================== test session starts ====================================================================
platform win32 -- Python 3.10.15, pytest-8.3.4, pluggy-1.5.0
rootdir: C:\BUSCODE\packages\LitServe
configfile: pytest.ini
plugins: anyio-4.6.2.post1, asyncio-0.25.0, cov-6.0.0
asyncio: mode=strict, asyncio_default_fixture_loop_scope=None
collected 169 items

tests\e2e\test_e2e.py .............                                                                                                                    [  7%]
tests\test_auth.py ....                                                                                                                                [ 10%]
tests\test_batch.py ...........                                                                                                                        [ 16%]
tests\test_callbacks.py ....                                                                                                                           [ 18%]
tests\test_cli.py ..                                                                                                                                   [ 20%] 
tests\test_compression.py .                                                                                                                            [ 20%]
tests\test_connector.py s..ssssssss.                                                                                                                   [ 27%] 
tests\test_docker_builder.py ..                                                                                                                        [ 28%]
tests\test_examples.py ..........                                                                                                                      [ 34%]
tests\test_form.py ...                                                                                                                                 [ 36%]
tests\test_lit_server.py .........sssss..ss...........                                                                                                 [ 53%]
tests\test_litapi.py ..................                                                                                                                [ 64%]
tests\test_logger.py ........                                                                                                                          [ 69%]
tests\test_logging.py ...                                                                                                                              [ 71%] 
tests\test_loops.py .............                                                                                                                      [ 78%]
tests\test_middlewares.py ...                                                                                                                          [ 80%]
tests\test_pydantic.py .                                                                                                                               [ 81%]
tests\test_readme.py s                                                                                                                                 [ 81%] 
tests\test_schema.py ..                                                                                                                                [ 82%]
tests\test_simple.py .......                                                                                                                           [ 86%]
tests\test_specs.py ..................                                                                                                                 [ 97%]
tests\test_torch.py .s                                                                                                                                 [ 98%]
tests\test_utils.py ..                                                                                                                                 [100%]
================================================= 151 passed, 18 skipped, 50 warnings in 204.37s (0:03:24) ==================================================

On python 3.11.11, i have no issues on my computer, every tests are running.

(litserve3.11) PS C:\BUSCODE\packages\LitServe> python -m pytest
C:\Users\F296849\AppData\Local\miniforge3\envs\litserve3.11\Lib\site-packages\pytest_asyncio\plugin.py:207: PytestDeprecationWarning: The configuration option "asyncio_default_fixture_loop_scope" is unset.
The event loop scope for asynchronous fixtures will default to the fixture caching scope. Future versions of pytest-asyncio will default the loop scope for asynchronous fixtures to function scope. Set the default fixture loop scope explicitly in order to avoid unexpected behavior in the future. Valid fixture loop scopes are: "function", "class", "module", "package", "session"

  warnings.warn(PytestDeprecationWarning(_DEFAULT_FIXTURE_LOOP_SCOPE_UNSET))
==================================================================== test session starts ====================================================================
platform win32 -- Python 3.11.11, pytest-8.3.4, pluggy-1.5.0
rootdir: C:\BUSCODE\packages\LitServe
configfile: pytest.ini
plugins: anyio-4.7.0, asyncio-0.25.0, cov-6.0.0, retry-1.6.3
asyncio: mode=Mode.STRICT, asyncio_default_fixture_loop_scope=None
collected 169 items

tests\e2e\test_e2e.py .............                                                                                                                    [  7%]
tests\test_auth.py ....                                                                                                                                [ 10%]
tests\test_batch.py ...........                                                                                                                        [ 16%]
tests\test_callbacks.py ....                                                                                                                           [ 18%]
tests\test_cli.py ..                                                                                                                                   [ 20%]
tests\test_compression.py .                                                                                                                            [ 20%]
tests\test_connector.py s..ssssssss.                                                                                                                   [ 27%] 
tests\test_docker_builder.py ..                                                                                                                        [ 28%] 
tests\test_examples.py ..........                                                                                                                      [ 34%]
tests\test_form.py ...                                                                                                                                 [ 36%]
tests\test_lit_server.py .........sssss..ss...........                                                                                                 [ 53%]
tests\test_litapi.py ..................                                                                                                                [ 64%]
tests\test_logger.py ........                                                                                                                          [ 69%]
tests\test_logging.py ...                                                                                                                              [ 71%] 
tests\test_loops.py .............                                                                                                                      [ 78%]
tests\test_middlewares.py ...                                                                                                                          [ 80%]
tests\test_pydantic.py .                                                                                                                               [ 81%]
tests\test_readme.py s                                                                                                                                 [ 81%] 
tests\test_schema.py ..                                                                                                                                [ 82%]
tests\test_simple.py .......                                                                                                                           [ 86%]
tests\test_specs.py ..................                                                                                                                 [ 97%]
tests\test_torch.py .s                                                                                                                                 [ 98%]
tests\test_utils.py ..                                                                                                                                 [100%] 
================================================= 151 passed, 18 skipped, 48 warnings in 205.77s (0:03:25) ==================================================

@FrsECM
Copy link
Author

FrsECM commented Dec 13, 2024

@aniketmaurya I tried to increase the timeout. The usage of threading instead of processes and the absence of uvloop on windows makes it slower. It may also depend on the runner.
But i need your agreement to rerun the ci.

@FrsECM FrsECM requested a review from Borda December 13, 2024 09:48
@@ -48,7 +48,7 @@ jobs:
pip list

- name: Tests
timeout-minutes: 10
timeout-minutes: 30
Copy link
Collaborator

@aniketmaurya aniketmaurya Dec 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's switch it back after figuring out the reason CI is stuck since we don't want to run tests for 30 mins.

Suggested change
timeout-minutes: 30
timeout-minutes: 10

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems CI is stuck since this commit 2bbed42

Maybe due to version 3.11 of python for a reason i ignore.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, since it's only specific to Python 3.11 on Windows latest, it probably means that something is not working as expected.

@aniketmaurya
Copy link
Collaborator

hi @FrsECM, Happy New Year! How is it going here? Please let me know if you need any help?

@FrsECM
Copy link
Author

FrsECM commented Jan 6, 2025 via email

logger.debug("Enable Windows explicit socket sharing...")
# We make sure sockets is listening...
# It prevents further [WinError 10022]
[sock.listen(config.backlog) for sock in sockets]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
[sock.listen(config.backlog) for sock in sockets]
for sock in sockets:
sock.listen(config.backlog)

no need to create a list only to discard it afterwards :)

@@ -232,6 +235,7 @@ def __init__(
self.model_metadata = model_metadata
self._connector = _Connector(accelerator=accelerator, devices=devices)
self._callback_runner = CallbackRunner(callbacks)
self._uvicorn_servers = None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's init this immediately as an empty list? no need to first have None here and properly init later on. Also maybe type it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants