Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Incorrect behavior when using ThreadPoolExecutor to download multiple files #282

Open
forrestfwilliams opened this issue Mar 21, 2024 · 1 comment

Comments

@forrestfwilliams
Copy link
Contributor

Describe the bug
In order to download files while specifying desired file names, you must use the download_url function. However, when using download_url in concert with concurrent.future's ThreadPoolExecutor, the path to which each dataset is downloaded becomes mangled. Depending on the random order in which products are ready, the products are downloaded to a random one of the specified filenames.

To Reproduce

from concurrent.futures import ThreadPoolExecutor
from itertools import repeat
from pathlib import Path

import asf_search


urls = [
    'https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20240313T140832_20240313T140859_052964_06694F_90B4/IW1/VV/7.tiff',
    'https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20240301T140832_20240301T140859_052789_06635B_791A/IW1/VV/7.tiff',
    'https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20240313T140832_20240313T140859_052964_06694F_90B4/IW1/VV/7.xml',
    'https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20240301T140832_20240301T140859_052789_06635B_791A/IW1/VV/7.xml',
]
paths = [
    Path('./burst_20240313.tif'),
    Path('./burst_20240301.tif'),
    Path('./burst_20240313.xml'),
    Path('./burst_20240301.xml'),
]

session = asf_search.ASFSession()
with ThreadPoolExecutor() as executor:
    executor.map(
        asf_search.download_url,
        urls,
        [x.parent for x in paths],
        [x.name for x in paths],
        repeat(session, len(urls)),
    )

Expected behavior
The above should produce the same as:

from pathlib import Path

import asf_search


urls = [
    'https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20240313T140832_20240313T140859_052964_06694F_90B4/IW1/VV/7.tiff',
    'https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20240301T140832_20240301T140859_052789_06635B_791A/IW1/VV/7.tiff',
    'https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20240313T140832_20240313T140859_052964_06694F_90B4/IW1/VV/7.xml',
    'https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20240301T140832_20240301T140859_052789_06635B_791A/IW1/VV/7.xml',
]
paths = [
    Path('./burst_20240313.tif'),
    Path('./burst_20240301.tif'),
    Path('./burst_20240313.xml'),
    Path('./burst_20240301.xml'),
]

session = asf_search.ASFSession()
for url, path in zip(urls, paths):
    asf_search.download_url(url, path.parent, path.name, session)
@forrestfwilliams
Copy link
Contributor Author

Notably, using a separate session for every thread resolves the issue. However this solution is less than ideal because it creates a lot of unnecessary overhead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant