Transport & Engine: `AsyncTransport` plugin #6626

khsrali · 2024-11-21T07:27:00Z

This PR proposes many changes to make transport tasks asynchronous. This ensures that the daemon won’t be blocked by time-consuming tasks such as uploads, downloads, and similar operations, requested by @giovannipizzi.

Here’s a summary of the main updates:

New Transport Plugin: Introduces AsyncSshTransport with the entry point core.ssh_async.
Enhanced Authentication: AsyncSshTransport supports executing custom scripts before connections, which is particularly useful for authentication. 🥇
Engine Updates: Modifies the engine to consistently call asynchronous transport methods.
Deprecated Methods: Deprecates the use of transport.chdir() and transport.getcwd() (merged in Transport & Engine: factor out getcwd() & chdir() for compatibility with upcoming async transport #6594).
Backward Compatibility: Provides synchronous counterparts for all asynchronous methods in AsyncSshTransport.
Transport Class Overhaul: Deprecates the previous Transport class. Introduces _BaseTransport, Transport, and AsyncTransport as replacements.
Improved Documentation: Adds more docstrings and comments to guide plugin developers. Blocking plugins should inherit from Transport, while asynchronous ones should inherit from AsyncSshTransport.
Updated Tests: Revises test_all_plugins.py to reflect these changes. Unfortunately, existing tests for transport plugins remain minimal and need improvement in a separate PR (TODO).
New Path Type: Defines a TransportPath type and upgrades transport plugins to work with Union[str, Path, PurePosixPath].
New Feature: Introduces copy_from_remote_to_remote_async, addressing a previous issue where such tasks blocked the entire daemon.

Dependencies: This PR relies on PR 272 in plumpy.

Note: The initial commits by Chris were pulled from #6079 (closed).

Test Results: Performance Comparisons

When `core.ssh_async` Outperforms

In scenarios where the daemon is blocked by heavy transfer tasks (uploading/downloading/copying large files), core.ssh_async shows significant improvement.

For example, I submitted two WorkGraphs:

The first handles heavy transfers:
- Upload 10 MB
- Remote copy 1 GB
- Retrieve 1 GB
The second performs a simple shell command: touch file.

The time taken until the submit command is processed (with one daemon running):

core.ssh_async: Only 4 seconds! 🚀🚀🚀🚀 A major improvement!
core.ssh: 108 seconds (WorkGraph 1 fully completes before processing the second).

When `core.ssh_async` and `core.ssh` Are Comparable

For tasks involving both (and many!) uploads and downloads (a common scenario), performance varies slightly depending on the case.

Large Files (~1 GB):
- core.ssh_async performs better due to simultaneous uploads and downloads. In some networks, this can almost double the bandwidth, as demonstrated in the graph below. My bandwidth is 11.8 MB/s but increased to nearly double under favorable conditions:
- However, under heavy network load, bandwidth may revert to its base level (e.g., 11.8 MB/s):
  
  Test Case: Two WorkGraphs: one uploads 1 GB, the other retrieves 1 GB using RemoteData.
  - core.ssh_async: 120 seconds
  - core.ssh: 204 seconds
Small Files (Many Small Transfers):
- Test Case: 25 WorkGraphs each transferring a few 1 MB files.
  - core.ssh_async: 105 seconds
  - core.ssh: 65 seconds
In this scenario, the overhead of asynchronous calls seems to outweigh the benefits. We need to discuss the trade-offs and explore possible optimizations. As @agoscinski mentioned, this might be expected, see here async overheads.

codecov · 2024-11-21T08:00:17Z

Codecov Report

Attention: Patch coverage is 81.93277% with 172 lines in your changes missing coverage. Please review.

Project coverage is 77.97%. Comparing base (c532b34) to head (2ebf945).

Files with missing lines	Patch %	Lines
src/aiida/transports/plugins/ssh_async.py	76.28%	102 Missing ⚠️
src/aiida/transports/transport.py	87.50%	33 Missing ⚠️
src/aiida/engine/daemon/execmanager.py	75.56%	11 Missing ⚠️
src/aiida/transports/plugins/ssh.py	87.96%	10 Missing ⚠️
src/aiida/transports/plugins/local.py	92.21%	6 Missing ⚠️
src/aiida/engine/processes/calcjobs/monitors.py	66.67%	1 Missing ⚠️
src/aiida/engine/processes/calcjobs/tasks.py	75.00%	1 Missing ⚠️
src/aiida/engine/transports.py	66.67%	1 Missing ⚠️
src/aiida/orm/authinfos.py	66.67%	1 Missing ⚠️
src/aiida/orm/computers.py	66.67%	1 Missing ⚠️
... and 5 more

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #6626      +/-   ##
==========================================
+ Coverage   77.92%   77.97%   +0.06%     
==========================================
  Files         563      564       +1     
  Lines       41671    42418     +747     
==========================================
+ Hits        32467    33072     +605     
- Misses       9204     9346     +142

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

agoscinski

Thanks! Looks good, just to reiterate most important comments:

Why don't you just use Transport instead of BlockingTransport, since you set it one to the other? Now you have redundancy. I feel like this API is clear to me.

_BaseTransport -> Transport -> SshTransport
_BaseTransport -> AsyncTransport -> AsyncSshTransport

Will you make a PR in plumpy there so we can do a new release?

Tests I will review in the separate PR

agoscinski · 2024-11-22T10:12:22Z

requirements/requirements-py-3.12.txt

@@ -119,7 +120,7 @@ pillow==10.1.0
 platformdirs==3.11.0
 plotly==5.17.0
 pluggy==1.3.0
-plumpy==0.22.3
+plumpy@git+https://github.com/khsrali/plumpy.git@allow-async-upload-download#egg=plumpy


Will you make a PR there so we can do a new release?

yes! Please review here: aiidateam/plumpy#272

agoscinski · 2024-11-22T10:17:23Z

utils/dependency_management.py

+                if (
+                    canonicalize_name(requirement_abstract.name) == canonicalize_name(requirement_concrete.name)
+                    and abstract_contains
+                ):


Do we remove this before merge? Otherwise it would be good to add some comment what the new if-else does. Hard to understand without context

I plan to keep it, as it's very useful to pass CI when we make PRs like this, that are hooked to another PR, or branch of other repo with @

The problem is @ is not listed as a valid specifier in class Specifier.
This little change, basically, accepts @ as a valid specifier and will check if a hooked dependency is to the same "version" across all files, requirement-xx and enviroment.yml , etc...

This way, apart of this nice check, the dependency test fails and it still triggers the main unit tests test-presto , test-3.xx for such PRs.. (otherwise it won't)

I added a few lines of comment to clarify this

This is nice, perhaps would be better to separate into standalone PR for visibility.

btw: I started looking into using uv lockfile in #6640, seems like a better strategy than having to wrangle 4 different requirements files. :-)

As we discussed, this feature is already covered in the new PR #6640.
So I keep the changes temporarily for this PR only, and will revert 'utils/dependency_management.py' before any merge.

src/aiida/transports/transport.py

agoscinski · 2024-11-22T10:57:32Z

src/aiida/transports/transport.py

+    return str(path)
+
+
+class _BaseTransport:


Isn't this part of public API? I should use it if I create a new transport plugin? Or should I use Transport?

no this is private. No one should inherent from this except 'AsyncTransport', 'BlockingTransport'.
Only 'AsyncTransport', 'BlockingTransport' are the public ones -- to be used to create a new plugin--

agoscinski · 2024-11-22T11:02:02Z

src/aiida/transports/transport.py

+
+
+# This is here for backwards compatibility
+Transport = BlockingTransport


I don't know if this makes sense to make blocking the default one, especially if you expose both of them in the API. Shouldn't there be a public class for Blocking and Nonblocking transport which one should use to inherit from?

This was just for backward compatibility as Giovanni suggested to call the former blocking Transport, now as, BlockingTransport

tests/engine/daemon/test_execmanager.py

agoscinski · 2024-11-22T11:04:26Z

tests/engine/daemon/test_execmanager.py

@@ -164,7 +167,8 @@ def test_upload_local_copy_list(
    calc_info.local_copy_list = [[folder.uuid] + local_copy_list]

    with node.computer.get_transport() as transport:
-        execmanager.upload_calculation(node, transport, calc_info, fixture_sandbox)
+        runner = get_manager().get_runner()
+        runner.loop.run_until_complete(execmanager.upload_calculation(node, transport, calc_info, fixture_sandbox))


why is this needed now?

Because execmanager.upload_calculation is now a async function.. this way we can call it in a sync test.

What happens if you use the old way? The test just passes and continues before finishing the command?

I think it is very tricky to mix up the async programming and sync function, it is in general a very hard problem. This looks to me the runner.loop.run_until_complete will block the running of the task until it complete so give no benefit after making these methods async. Is the create_task the correct thing to use?

Okay, I just asked Ali offline. This is only for tests and only for test the functionality of the implementation is correct. The async behaviors of four operations working together is not the purpose here.

src/aiida/transports/transport.py

agoscinski · 2024-11-22T11:21:23Z

src/aiida/transports/util.py

@@ -86,3 +86,24 @@ def copy_from_remote_to_remote(transportsource, transportdestination, remotesour
    .. note:: it uses the method transportsource.copy_from_remote_to_remote
    """
    transportsource.copy_from_remote_to_remote(transportdestination, remotesource, remotedestination, **kwargs)
+
+
+async def copy_from_remote_to_remote_async(


Is this required in the utils? I don't find any usage

Not sure how it's used, tbh, probably by external plugins? so far I just provide the similar functionality as in copy_from_remote_to_remote

okay something that might be cleaned up in the future but for this PR it does not make so much sense

unkcpz · 2024-11-24T12:57:21Z

I am about to finish #6627 which I think can benefit for the tests here as well. Please hold a bit for that. I'll try my best to get that one merge by Wednesday.

khsrali · 2024-11-25T12:51:17Z

Why don't you just use Transport instead of BlockingTransport, since you set it one to the other? Now you have redundancy. I feel like this API is clear to me.
_BaseTransport -> Transport -> SshTransport
_BaseTransport -> AsyncTransport -> AsyncSshTransport

I just followed what @giovannipizzi suggested. But agreed this makes more sense, so I'm gonna apply this changes..

Will you make a PR in plumpy there so we can do a new release?

Will do once my performance tests are ready..

khsrali · 2024-11-25T13:49:31Z

Note to myself:
@danielhollas suggested we apply the changes directly on core.ssh rather than creating a new plugin core.async_ssh
I should investigate this..

utils/dependency_management.py

agoscinski · 2024-11-26T15:51:18Z

utils/dependency_management.py

+                if (
+                    canonicalize_name(requirement_abstract.name) == canonicalize_name(requirement_concrete.name)
+                    and abstract_contains
+                ):


agoscinski

some minor changes

agoscinski · 2024-11-26T16:23:42Z

src/aiida/transports/util.py

@@ -86,3 +86,24 @@ def copy_from_remote_to_remote(transportsource, transportdestination, remotesour
    .. note:: it uses the method transportsource.copy_from_remote_to_remote
    """
    transportsource.copy_from_remote_to_remote(transportdestination, remotesource, remotedestination, **kwargs)
+
+
+async def copy_from_remote_to_remote_async(


okay something that might be cleaned up in the future but for this PR it does not make so much sense

tests/plugins/test_factories.py

khsrali · 2024-12-05T16:31:24Z

Note:
tests are failing due to this issue aiidateam/plumpy#294

khsrali · 2024-12-05T17:09:27Z

Checklist:

To think whether unifying core.ssh, with core.ssh_async (and even core.ssh_auto) is possible and if so, should that has to be done here? or preferably in a separate PR.
Finalize and report the performance tests.
Merge ♻️ Make Process.run async plumpy#272 and release

unkcpz · 2024-12-05T17:19:39Z

tests are failing due to this issue aiidateam/plumpy#294

Hi @khsrali, I merge #6640, so it should work now I guess. Can you resolve the conflict and try it again? Thanks.

khsrali · 2024-12-05T17:51:57Z

Hi @khsrali, I merge #6640, so it should work now I guess. Can you resolve the conflict and try it again? Thanks.

Thanks @unkcpz , now I face issues I never had before, lol:

error: The lockfile at `uv.lock` needs to be updated, but `--locked` was provided. To update the lockfile, run `uv lock`.

actually I even tried to update the file using 'uv lock', still won't pass..

agoscinski · 2024-12-05T19:47:15Z

actually I even tried to update the file using 'uv lock', still won't pass..

Sorry for the experience. We are now trying uv out for the dependency management and installation. uv is a really useful tool but it is still a bit unstable. So for some reason the uv lock fails, you can see it when executing it in verbose mode uv lock -v. I don't know why the full backtrace of the error is also meaningless but what worked for me is to manually add the two packages you changed

uv add git+https://github.com/aiidateam/plumpy --branch async-run
uv add git+https://github.com/ronf/asyncssh --rev 033ef54302b2b09d496d68ccf39778b9e5fc89e2

I will push the fix now, but I basically only ran these two commands

uv add git+https://github.com/aiidateam/plumpy --branch async-run uv add git+https://github.com/ronf/asyncssh --rev 033ef54302b2b09d496d68ccf39778b9e5fc89e2

khsrali · 2024-12-16T11:36:45Z

@agoscinski
I'll appreciated if you guys can give this PR, another round of review. -- I also asked @unkcpz, in the office) --

It would be nice to have it merged by the end of this week, because when I come back from holidays,
I'll lose half of my memory :-)))

chrisjsewell and others added 11 commits July 9, 2023 23:00

♻️ Allow for file uploads/downloads to be async

b59d999

Update test_execmanager.py

0c42841

Merge remote-tracking branch 'upstream/main' into async-run

6cb4d9e

update run methods

14cbd29

Merge branch 'main' into async-transport

e9361bb

async transport, the first implementation

6811098

asynchrounous counterparts are added to transport.py

5fdde51

Giovanni's review applied

ccc545e

adopted tests

f187fdc

docstring updated

565724d

added computer test for ssh_async

6e350e7

khsrali mentioned this pull request Nov 21, 2024

♻️ Allow for file uploads/downloads to be async #6079

Closed

khsrali marked this pull request as ready for review November 21, 2024 09:11

khsrali requested a review from agoscinski November 21, 2024 17:28

agoscinski requested changes Nov 22, 2024

View reviewed changes

unkcpz mentioned this pull request Nov 25, 2024

Refactoring: use tmp path fixture to mock remote and local for transport plugins #6627

Merged

2 tasks

review applied

03ccc30

agoscinski reviewed Nov 26, 2024

View reviewed changes

agoscinski requested changes Nov 26, 2024

View reviewed changes

unkcpz reviewed Nov 27, 2024

View reviewed changes

tests/plugins/test_factories.py Outdated Show resolved Hide resolved

khsrali added 6 commits November 27, 2024 17:58

chnage from machine to machine_

178bf7b

review applied

cc0bc5c

copy-remote adopted with behaviour of asyncssh

3210c27

Merge branch 'main' into async-transport

76aaf53

remove str() use from test_all_plugins

65f0663

copy() are now aligned with fresh development on asyncssh

a809b98

khsrali added 2 commits December 5, 2024 14:27

fixed some stupid issues

38cfc24

plumpy hook pointing to async-run branch, now

799e0f8

khsrali added 2 commits December 5, 2024 18:42

Merge branch 'main' into async-transport

665a163

updated uv lock

0837193

agoscinski and others added 3 commits December 5, 2024 20:48

Fixing uv.lock file for the depedencies from a github repo

1b96110

uv add git+https://github.com/aiidateam/plumpy --branch async-run uv add git+https://github.com/ronf/asyncssh --rev 033ef54302b2b09d496d68ccf39778b9e5fc89e2

Merge branch 'main' into async-transport

3aa0031

fix conflicts

a68240c

unkcpz mentioned this pull request Dec 6, 2024

♻️ Make Process.run async aiidateam/plumpy#272

Merged

khsrali added 7 commits December 10, 2024 17:05

fixed afew self blocking calls in copy_async()

520e58e

Merge branch 'main' into async-transport

22eb929

fix rtd

90718f4

fix uv

343cf9c

escape for bash on command

482eeca

fixed many warnings of rtd

5e29e5b

Merge branch 'main' into async-transport

a5ff84d

khsrali requested review from unkcpz and agoscinski December 11, 2024 17:07

khsrali added 4 commits December 13, 2024 11:33

implement max_io_allowed

1761d94

Merge branch 'main' into async-transport

cf01ac0

update asyncssh dependency

6627b21

plumpy dependency pin the exact commit

2ebf945

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Transport & Engine: `AsyncTransport` plugin #6626

Transport & Engine: `AsyncTransport` plugin #6626

khsrali commented Nov 21, 2024 •

edited

Loading

codecov bot commented Nov 21, 2024 •

edited

Loading

agoscinski left a comment

agoscinski Nov 22, 2024

khsrali Dec 5, 2024

agoscinski Nov 22, 2024

khsrali Nov 25, 2024

khsrali Nov 25, 2024

agoscinski Nov 26, 2024

danielhollas Nov 26, 2024

khsrali Nov 29, 2024

agoscinski Nov 22, 2024

khsrali Nov 25, 2024

agoscinski Nov 22, 2024

khsrali Nov 25, 2024

agoscinski Nov 22, 2024

khsrali Nov 25, 2024

agoscinski Nov 26, 2024

unkcpz Nov 27, 2024

unkcpz Nov 27, 2024

agoscinski Nov 22, 2024

khsrali Nov 25, 2024

agoscinski Nov 26, 2024

unkcpz commented Nov 24, 2024

khsrali commented Nov 25, 2024

khsrali commented Nov 25, 2024

agoscinski Nov 26, 2024

agoscinski left a comment

agoscinski Nov 26, 2024

khsrali commented Dec 5, 2024

khsrali commented Dec 5, 2024 •

edited

Loading

unkcpz commented Dec 5, 2024

khsrali commented Dec 5, 2024

agoscinski commented Dec 5, 2024

khsrali commented Dec 16, 2024



		# This is here for backwards compatibility
		Transport = BlockingTransport

Transport & Engine: AsyncTransport plugin #6626

Are you sure you want to change the base?

Transport & Engine: AsyncTransport plugin #6626

Conversation

khsrali commented Nov 21, 2024 • edited Loading

Test Results: Performance Comparisons

When core.ssh_async Outperforms

When core.ssh_async and core.ssh Are Comparable

codecov bot commented Nov 21, 2024 • edited Loading

Codecov Report

agoscinski left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

unkcpz commented Nov 24, 2024

khsrali commented Nov 25, 2024

khsrali commented Nov 25, 2024

Choose a reason for hiding this comment

agoscinski left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

khsrali commented Dec 5, 2024

khsrali commented Dec 5, 2024 • edited Loading

unkcpz commented Dec 5, 2024

khsrali commented Dec 5, 2024

agoscinski commented Dec 5, 2024

khsrali commented Dec 16, 2024

Transport & Engine: `AsyncTransport` plugin #6626

Transport & Engine: `AsyncTransport` plugin #6626

khsrali commented Nov 21, 2024 •

edited

Loading

When `core.ssh_async` Outperforms

When `core.ssh_async` and `core.ssh` Are Comparable

codecov bot commented Nov 21, 2024 •

edited

Loading

khsrali commented Dec 5, 2024 •

edited

Loading