Multiprocessing with process safe file writes. #8

Greeley · 2024-03-31T10:37:36Z

I lost my humble library, I re-downloaded this humble bundle downloader and thought, I could clone my whole library down again. This thing was incredible slow, it was still running after I woke up. Python is faster than that...

So I forked the original and made it multiprocessed with a file writing queue. I also broke up a lot of the responsibilities and make some things a bit easier to read. I found this repository that had quite a few changes and as someone else mentioned "Seems to do a good job at consolidating the other forks." So from that comment I decided to rebase my changes on the main of this repo.

I changed the way caching works, using json for on-disk caching is kind of wild... not entirely sure what the source of the cache file corruption was, but this should be fine with cancelling in the middle interruptions or incomplete downloads. Especially with the changes made in this repo.

I have not multi-processed trove yet, but I'll do that here soon, also converting the old cache needs to be done on first startup, but I haven't done that yet either

Below is a list of itemized changes, probably not well written but if there are any questions I'm available to answer.

lambda to be pedantic about the type map expects.
added exception package with InvalidCookieException.py
added multiprocess package with exorcise_daemons.py
exorcise_daemons#ExorcistPool creates multiprocessing pools without creating daemons
mapped self._process_order_id to multiprocess pool of purchase_keys.
Downloading: basename now prints to avoid so much console spam.
caching with json is pretty crazy, so I've switched it to a csv.
added _strtobool
added _strtonone will be useful when converting old json cache
cache object inheriting list.
file operations moved to file_ops.py
readability changes... sorry
changes for consistency across trove and non-trove cache
no tmp cache anymore
cache is a visible file.
intellij .idea to .gitignore

- lambda to be pedantic about the type map expects. - added exception package with InvalidCookieException.py - added multiprocess package with exorcise_daemons.py - exorcise_daemons#ExorcistPool creates multiprocessing pools without creating daemons - mapped self._process_order_id to multiprocess pool of purchase_keys.

- updated trove_base_url

- Downloading: basename now prints to avoid so much console spam. - join the pool to finalize it properly.

…ally.

- created CacheData class in cache.py - read cache data in for every write.

Feature/multi process

_get_trove_products

- caching with json is pretty crazy, so I've switched it to a csv. - added _strtobool - added _strtonone will be useful when converting old json cache - cache object inheriting list. - file operations moved to file_ops.py - readability changes... sorry - changes for consistency across trove and non-trove cache

- don't cancel a rebase.

- lambda to be pedantic about the type map expects. - added exception package with InvalidCookieException.py - added multiprocess package with exorcise_daemons.py - exorcise_daemons#ExorcistPool creates multiprocessing pools without creating daemons - mapped self._process_order_id to multiprocess pool of purchase_keys.

- Downloading: basename now prints to avoid so much console spam. - join the pool to finalize it properly.

…ally.

- created CacheData class in cache.py - read cache data in for every write.

- updated trove_base_url

- caching with json is pretty crazy, so I've switched it to a csv. - added _strtobool - added _strtonone will be useful when converting old json cache - cache object inheriting list. - file operations moved to file_ops.py - readability changes... sorry - changes for consistency across trove and non-trove cache

- don't cancel a rebase.

- multi-processing - file safety - no tmp cache anymore - cache is a visible file. - intellij .idea to .gitignore

…ytake # Conflicts: # humblebundle_downloader/download_library.py

- multi-processing - file safety - no tmp cache anymore - cache is a visible file. - intellij .idea to .gitignore

Greeley and others added 22 commits March 29, 2024 22:26

_get_trove_products

27886af

- updated trove_base_url

multiprocessing

7953b5b

- Downloading: basename now prints to avoid so much console spam. - join the pool to finalize it properly.

removed unnecessary import to base __init__.py that I pushed accident…

b157ed1

…ally.

Process Safe File Writing

1bc7f52

- created CacheData class in cache.py - read cache data in for every write.

Merge pull request #1 from Greeley/feature/multi-process

f36e619

Feature/multi process

Merge pull request #2 from Greeley/feature/trove_update

d4129d9

_get_trove_products

Fixed Changes that got wiped

71525c9

- don't cancel a rebase.

multiprocessing

b768dbe

- Downloading: basename now prints to avoid so much console spam. - join the pool to finalize it properly.

removed unnecessary import to base __init__.py that I pushed accident…

2d9e17a

…ally.

Process Safe File Writing

4a96a8d

- created CacheData class in cache.py - read cache data in for every write.

_get_trove_products

22b5d9a

- updated trove_base_url

Fixed Changes that got wiped

33860d9

- don't cancel a rebase.

Rebased on missytake/main

4724394

- multi-processing - file safety - no tmp cache anymore - cache is a visible file. - intellij .idea to .gitignore

Merge remote-tracking branch 'origin/merge_missytake' into merge_miss…

da6a91e

…ytake # Conflicts: # humblebundle_downloader/download_library.py

Rebased on missytake/main

6505bef

- multi-processing - file safety - no tmp cache anymore - cache is a visible file. - intellij .idea to .gitignore

changed uploaded_at to string again

2ace093

removed comment

fdbeb9b

changed file_type['md5'] to a get so we can get none instead

d30adb3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multiprocessing with process safe file writes. #8

Multiprocessing with process safe file writes. #8

Greeley commented Mar 31, 2024

Multiprocessing with process safe file writes. #8

Are you sure you want to change the base?

Multiprocessing with process safe file writes. #8

Conversation

Greeley commented Mar 31, 2024