Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue getting write_html past generate_html #23

Open
5000thinmints opened this issue Jul 18, 2020 · 4 comments
Open

Issue getting write_html past generate_html #23

5000thinmints opened this issue Jul 18, 2020 · 4 comments

Comments

@5000thinmints
Copy link

Here's the error under win10 having just installed latest python and snudown, my data folder is about a gig and a half. I assumed min score/comments and deleted are all set to something by default.

E:\Myfolder\reddittohtml>write_html.py
Traceback (most recent call last):
File "E:\Myfolder\reddittohtml\write_html.py", line 774, in
generate_html(args.min_score, args.min_comments, hide_deleted_comments)
File "E:\Myfolder\reddittohtml\write_html.py", line 114, in generate_html
raw_links = load_links(d, sub, True)
File "E:\Myfolder\reddittohtml\write_html.py", line 625, in load_links
comments_file_path = daily_path + '/' + link_row['id'] + '.csv'
KeyError: 'id'

Here's when I do try it with min and max etc set;

E:\Myfolder\reddittohtml>write_html.py --min-score -4 --min-comments 2 --hide-deleted-comments
Traceback (most recent call last):
File "E:\Myfolder\reddittohtml\write_html.py", line 774, in
generate_html(args.min_score, args.min_comments, hide_deleted_comments)
File "E:\Myfolder\reddittohtml\write_html.py", line 114, in generate_html
raw_links = load_links(d, sub, True)
File "E:\Myfolder\reddittohtml\write_html.py", line 625, in load_links
comments_file_path = daily_path + '/' + link_row['id'] + '.csv'
KeyError: 'id'

@libertysoft3
Copy link
Owner

I pushed a fix, will you try again?

@5000thinmints
Copy link
Author

So it look longer to print out an error this time, if it helps some of the reddit subs I'm pulling from use non-english characters (ex hindi, korean) frequently

E:\Myfolder\reddittohtml>write_html.py
Traceback (most recent call last):
File "E:\Myfolder\reddittohtml\write_html.py", line 774, in
generate_html(args.min_score, args.min_comments, hide_deleted_comments)
File "E:\Myfolder\reddittohtml\write_html.py", line 119, in generate_html
write_link_page(subs, l, sub, hide_deleted_comments)
File "E:\Myfolder\reddittohtml\write_html.py", line 288, in write_link_page
'###BODY###': snudown.markdown(c['body'].replace('>','>')),
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc2 in position 382: invalid continuation byte

@libertysoft3
Copy link
Owner

Cool, forward progress. I'm not too hip with Windows, but can you try running the 2 commands listed here under "Windows users may need to run"? https://github.com/libertysoft3/reddit-html-archiver/blob/master/README.md#install

@5000thinmints
Copy link
Author

Running the commands with cmd under admin still has the issue;

E:\Myfolder\reddittohtml>chcp 65001
Active code page: 65001

E:\Myfolder\reddittohtml>set PYTHONIOENCODING=utf-8

E:\Myfolder\reddittohtml>write_html.py
Traceback (most recent call last):
File "E:\Myfolder\reddittohtml\write_html.py", line 774, in
generate_html(args.min_score, args.min_comments, hide_deleted_comments)
File "E:\Myfolder\reddittohtml\write_html.py", line 119, in generate_html
write_link_page(subs, l, sub, hide_deleted_comments)
File "E:\Myfolder\reddittohtml\write_html.py", line 288, in write_link_page
'###BODY###': snudown.markdown(c['body'].replace('>','>')),
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc2 in position 382: invalid continuation byte

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants