Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiprocessing with process safe file writes. #8

Open
wants to merge 22 commits into
base: main
Choose a base branch
from

Conversation

Greeley
Copy link

@Greeley Greeley commented Mar 31, 2024

I lost my humble library, I re-downloaded this humble bundle downloader and thought, I could clone my whole library down again. This thing was incredible slow, it was still running after I woke up. Python is faster than that...

So I forked the original and made it multiprocessed with a file writing queue. I also broke up a lot of the responsibilities and make some things a bit easier to read. I found this repository that had quite a few changes and as someone else mentioned "Seems to do a good job at consolidating the other forks." So from that comment I decided to rebase my changes on the main of this repo.

I changed the way caching works, using json for on-disk caching is kind of wild... not entirely sure what the source of the cache file corruption was, but this should be fine with cancelling in the middle interruptions or incomplete downloads. Especially with the changes made in this repo.

I have not multi-processed trove yet, but I'll do that here soon, also converting the old cache needs to be done on first startup, but I haven't done that yet either

Below is a list of itemized changes, probably not well written but if there are any questions I'm available to answer.

  • lambda to be pedantic about the type map expects.
  • added exception package with InvalidCookieException.py
  • added multiprocess package with exorcise_daemons.py
  • exorcise_daemons#ExorcistPool creates multiprocessing pools without creating daemons
  • mapped self._process_order_id to multiprocess pool of purchase_keys.
  • Downloading: basename now prints to avoid so much console spam.
  • caching with json is pretty crazy, so I've switched it to a csv.
  • added _strtobool
  • added _strtonone will be useful when converting old json cache
  • cache object inheriting list.
  • file operations moved to file_ops.py
  • readability changes... sorry
  • changes for consistency across trove and non-trove cache
  • no tmp cache anymore
  • cache is a visible file.
  • intellij .idea to .gitignore

Greeley and others added 22 commits March 29, 2024 22:26
- lambda to be pedantic about the type map expects.
- added exception package with InvalidCookieException.py
- added multiprocess package with exorcise_daemons.py
- exorcise_daemons#ExorcistPool creates multiprocessing pools without creating daemons
- mapped self._process_order_id to multiprocess pool of purchase_keys.
- updated trove_base_url
- Downloading: basename now prints to avoid so much console spam.
- join the pool to finalize it properly.
- created CacheData class in cache.py
- read cache data in for every write.
- caching with json is pretty crazy, so I've switched it to a csv.
- added _strtobool
- added _strtonone will be useful when converting old json cache
- cache object inheriting list.
- file operations moved to file_ops.py
- readability changes... sorry
- changes for consistency across trove and non-trove cache
- don't cancel a rebase.
- lambda to be pedantic about the type map expects.
- added exception package with InvalidCookieException.py
- added multiprocess package with exorcise_daemons.py
- exorcise_daemons#ExorcistPool creates multiprocessing pools without creating daemons
- mapped self._process_order_id to multiprocess pool of purchase_keys.
- Downloading: basename now prints to avoid so much console spam.
- join the pool to finalize it properly.
- created CacheData class in cache.py
- read cache data in for every write.
- updated trove_base_url
- caching with json is pretty crazy, so I've switched it to a csv.
- added _strtobool
- added _strtonone will be useful when converting old json cache
- cache object inheriting list.
- file operations moved to file_ops.py
- readability changes... sorry
- changes for consistency across trove and non-trove cache
- don't cancel a rebase.
- multi-processing
- file safety
- no tmp cache anymore
- cache is a visible file.
- intellij .idea to .gitignore
…ytake

# Conflicts:
#	humblebundle_downloader/download_library.py
- multi-processing
- file safety
- no tmp cache anymore
- cache is a visible file.
- intellij .idea to .gitignore
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant