You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I tried to use your program on the big archive while using an UTF-8 locale and it crashed with the stack trace:
Traceback (most recent call last):
File "tarindexer.py", line 123, in
main()
File "tarindexer.py", line 118, in main
indextar(dbtarfile,indexfile)
File "tarindexer.py", line 66, in indextar
outfile.write(rec)
UnicodeEncodeError: 'utf-8' codec can't encode characters in position 40-47: surrogates not allowed
The file name that most likely triggered the crash is
\317\360\356\341\353\345\354\373\ \341\345\347\356\357\340\361\355\356\361\362\350\ \342\ \310\322.pdf
(as output by ls -b), which indeed does not look like the valid UTF-8.
Unfortunately I cannot send you the archive, mostly because the file and the surrounding files are rather big.
While having this file in the archive is my fault, I think the program should avoid the crash, may be printing ls -b-style output instead.
The text was updated successfully, but these errors were encountered:
Hello!
Thanks for useful idea!
I tried to use your program on the big archive while using an UTF-8 locale and it crashed with the stack trace:
Traceback (most recent call last):
File "tarindexer.py", line 123, in
main()
File "tarindexer.py", line 118, in main
indextar(dbtarfile,indexfile)
File "tarindexer.py", line 66, in indextar
outfile.write(rec)
UnicodeEncodeError: 'utf-8' codec can't encode characters in position 40-47: surrogates not allowed
The file name that most likely triggered the crash is
\317\360\356\341\353\345\354\373\ \341\345\347\356\357\340\361\355\356\361\362\350\ \342\ \310\322.pdf
(as output by ls -b), which indeed does not look like the valid UTF-8.
Unfortunately I cannot send you the archive, mostly because the file and the surrounding files are rather big.
While having this file in the archive is my fault, I think the program should avoid the crash, may be printing ls -b-style output instead.
The text was updated successfully, but these errors were encountered: