Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance download.py with Parallel Download Capability #1

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

abilzerian
Copy link

This pull request introduces a significant improvement to the download.py script by implementing parallel downloads using ThreadPoolExecutor. This enhancement is designed to speed up the process of downloading files from the specified source.

Key Changes:

  • Integrated ThreadPoolExecutor from concurrent.futures for parallel file downloads.
  • Added error handling for HTTP errors and JSON decoding, improving the script's robustness in various network conditions.
  • Maintained the original script's functionality, ensuring compatibility with existing usage patterns.

Benefits:

  • Increased Efficiency: By downloading multiple files simultaneously, the overall completion time is significantly reduced, especially beneficial when dealing with large numbers of files.
  • Improved Error Handling: The script now gracefully handles potential errors, providing clear messages for troubleshooting.
  • Scalability: The number of workers for ThreadPoolExecutor can be adjusted, allowing users to optimize performance based on their specific environment and network conditions.

This update aims to enhance user experience by reducing waiting times and improving the reliability of the download process. It's particularly useful for bulk downloading tasks in environments with robust network infrastructure.

Add parallel download functionality with error handling
@CristianArean
Copy link

CristianArean commented Jun 23, 2024

I think you have a problem with the edge case of a thread that already downloaded a file
42%|██████████████████████████████████████████████████████████████▋ | 1682/4049 [00:48<01:08, 34.35it/s] Traceback (most recent call last): File "/home/cristian/Documents/download.py", line 71, in <module> future.result() File "/usr/lib64/python3.12/concurrent/futures/_base.py", line 449, in result return self.__get_result() ^^^^^^^^^^^^^^^^^^^ File "/usr/lib64/python3.12/concurrent/futures/_base.py", line 401, in __get_result raise self._exception File "/usr/lib64/python3.12/concurrent/futures/thread.py", line 58, in run result = self.fn(*self.args, **self.kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/cristian/Documents/download.py", line 21, in download_file os.makedirs(parent_folder) File "<frozen os>", line 225, in makedirs FileExistsError: [Errno 17] File exists: '/home/...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants