[Bug] Incorrect behavior when using `ThreadPoolExecutor` to download multiple files #282

forrestfwilliams · 2024-03-21T20:50:56Z

Describe the bug
In order to download files while specifying desired file names, you must use the download_url function. However, when using download_url in concert with concurrent.future's ThreadPoolExecutor, the path to which each dataset is downloaded becomes mangled. Depending on the random order in which products are ready, the products are downloaded to a random one of the specified filenames.

To Reproduce

from concurrent.futures import ThreadPoolExecutor
from itertools import repeat
from pathlib import Path

import asf_search


urls = [
    'https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20240313T140832_20240313T140859_052964_06694F_90B4/IW1/VV/7.tiff',
    'https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20240301T140832_20240301T140859_052789_06635B_791A/IW1/VV/7.tiff',
    'https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20240313T140832_20240313T140859_052964_06694F_90B4/IW1/VV/7.xml',
    'https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20240301T140832_20240301T140859_052789_06635B_791A/IW1/VV/7.xml',
]
paths = [
    Path('./burst_20240313.tif'),
    Path('./burst_20240301.tif'),
    Path('./burst_20240313.xml'),
    Path('./burst_20240301.xml'),
]

session = asf_search.ASFSession()
with ThreadPoolExecutor() as executor:
    executor.map(
        asf_search.download_url,
        urls,
        [x.parent for x in paths],
        [x.name for x in paths],
        repeat(session, len(urls)),
    )

Expected behavior
The above should produce the same as:

from pathlib import Path

import asf_search


urls = [
    'https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20240313T140832_20240313T140859_052964_06694F_90B4/IW1/VV/7.tiff',
    'https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20240301T140832_20240301T140859_052789_06635B_791A/IW1/VV/7.tiff',
    'https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20240313T140832_20240313T140859_052964_06694F_90B4/IW1/VV/7.xml',
    'https://sentinel1-burst.asf.alaska.edu/S1A_IW_SLC__1SDV_20240301T140832_20240301T140859_052789_06635B_791A/IW1/VV/7.xml',
]
paths = [
    Path('./burst_20240313.tif'),
    Path('./burst_20240301.tif'),
    Path('./burst_20240313.xml'),
    Path('./burst_20240301.xml'),
]

session = asf_search.ASFSession()
for url, path in zip(urls, paths):
    asf_search.download_url(url, path.parent, path.name, session)

The text was updated successfully, but these errors were encountered:

forrestfwilliams · 2024-03-25T19:17:01Z

Notably, using a separate session for every thread resolves the issue. However this solution is less than ideal because it creates a lot of unnecessary overhead.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Incorrect behavior when using `ThreadPoolExecutor` to download multiple files #282

[Bug] Incorrect behavior when using `ThreadPoolExecutor` to download multiple files #282

forrestfwilliams commented Mar 21, 2024

forrestfwilliams commented Mar 25, 2024

[Bug] Incorrect behavior when using ThreadPoolExecutor to download multiple files #282

[Bug] Incorrect behavior when using ThreadPoolExecutor to download multiple files #282

Comments

forrestfwilliams commented Mar 21, 2024

forrestfwilliams commented Mar 25, 2024

[Bug] Incorrect behavior when using `ThreadPoolExecutor` to download multiple files #282

[Bug] Incorrect behavior when using `ThreadPoolExecutor` to download multiple files #282