Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

decoding to utf-8 issues #20

Open
noah opened this issue Jan 3, 2017 · 3 comments
Open

decoding to utf-8 issues #20

noah opened this issue Jan 3, 2017 · 3 comments

Comments

@noah
Copy link

noah commented Jan 3, 2017

I am using your excellent library to extract EXIF from a largish repository of images (100k+). I've encountered an encoding-related issue. Basically exiftool returns a garbage tag value and it breaks the call to decode('utf-8') in execute_json().

If I'm reading it correctly, your code assumes that whatever it reads from exiftool will capable of being decoded to utf-8 (is valid JSON). But this does not seem to always be the case:

% exiftool -s -SerialNumber -charset UTF8 P3090087.JPG
SerialNumber                    : #ທ.L.9.-.<.#K%
% exiftool -s -SerialNumber -charset UTF8 P3090087.JPG > file
% cat -v test.json 
Serial Number                   : M-O;#M-`M-:M-^W.M--M-OM-ILM-i}.9.-M-..M-vM-^PM-=M-#<.M-^QM-dG#M-%K%
% exiftool -j -SerialNumber P3090087.JPG     
[{
  "SourceFile": "P3090087.JPG",
  "SerialNumber": "?;#ທ\u0008???L?}\u001F9\u000B-?\u001E<\u0014??G#?K%"
}]

Per the exiftool author, the fix for this seems to be to add the -b (binary output) flag to the call to Popen. This way base64-encoded strings are returned, which cannot trigger a unicode decoding error. Overall encoding is pretty tricky so I thought I'd post and see if you think this is a bug. If nothing else perhaps this will be useful to someone else with a similar problem. Let me know if you'd like further diagnostics.

@smarnach
Copy link
Owner

smarnach commented Jan 8, 2017

@noah Thanks a lot for the report. This library didn't get the attention it deserves for years now, but I hope to get back to it very soon. At first sight, exiftool -j yielding invalid JSON seems like a bug in ExifTool to me, but I'll have to take a closer look to be sure.

@rusq
Copy link

rusq commented Feb 12, 2017

@smarnach I know you know but there are 5 pull requests waiting for your attention.

@CTimmerman
Copy link

Here's a test PNG featuring a topless AI girl that works in exiftool but breaks Python wrappers that use text mode: breaks_exif_wrapper.zip

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants