Skip to content
This repository has been archived by the owner on Sep 5, 2024. It is now read-only.

Update download url template #41

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

rokasramas
Copy link

Recently Common Crawl base URL was changed to "https://data.commoncrawl.org/" (see blog post), which caused issues when downloading pages (fixes #40).

Also, it seems that some time ago 'charset' attribute was renamed to 'encoding'.

@georgegach
Copy link

The patch works great. +1 from me. Thanks!

@GMalueg
Copy link

GMalueg commented Jun 27, 2022

Recently Common Crawl base URL was changed to "https://data.commoncrawl.org/" (see blog post), which caused issues when downloading pages (fixes #40).

Also, it seems that some time ago 'charset' attribute was renamed to 'encoding'.

How do I access this fix? I recently installed the package and the error is still occurring.

@rokasramas
Copy link
Author

rokasramas commented Jul 5, 2022

How do I access this fix? I recently installed the package and the error is still occurring.

@GMalueg, you can install from my fork using pip install git+https://github.com/rokasramas/comcrawl.git#egg=comcrawl

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

client.download() not working. Gives error (Not a gzipped file (b'<?')).
3 participants