Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wrong encoding on exif strings (bug) #11

Open
acidicX opened this issue Dec 20, 2015 · 11 comments
Open

wrong encoding on exif strings (bug) #11

acidicX opened this issue Dec 20, 2015 · 11 comments

Comments

@acidicX
Copy link

acidicX commented Dec 20, 2015

Hey there,

first things first: thanks for the module!

I'm having encoding problems with certain EXIF strings, e.g. the ImageDescription

This is what other EXIF readers tell me:
Fahrt über den Ozean
but I am getting this with exif-parser:
Fahrt C<ber den Ozean

Seems to me like they are not read with the proper encoding.

I'd fix it myself, but I have no idea how the tags are encoded in the first place :/

Cheers,
acidicX

@langpavel
Copy link

langpavel commented Jul 11, 2016

This should be hard.. I expect that this library expects UTF8 as encoding.. but your example looks like some 8bit encoding is used instead?
@bwindels Can you describe gently how this works?
Thanks!

@acidicX
Copy link
Author

acidicX commented Jul 15, 2016

I'm not sure if UTF-8 is in the EXIF specs. The image descriptions were edited by Adobe Lightroom, but Adobe sometimes hates standards (check the SVG export of Illustrator 👎 ).

@langpavel
Copy link

Hi I figured this out. Using ArrayBuffer, DataView and TextDecoder API with polyfill you can read UTF-8 strings. And yes, UTF-8 is best choice, backward compatible with ASCII and confirmed for Czech..

@acidicX
Copy link
Author

acidicX commented Nov 29, 2016

@langpavel I just found that the bug still exists. Did you fork the lib to resolve it? seems that PRs are not actively worked on anymore... or did you find a better lib?

@langpavel
Copy link

Hi, I have no time to work on this, sorry..

@bwindels
Copy link
Owner

bwindels commented Jul 9, 2017

EXIF assumes ASCII and doesn't have a field to specify an encoding, so without using an encoding detection library, this will be hard to do. Since this library needs to work in the browser as well as in node.js, I'd be hesitant to add a big thing like encoding detection to it.

@bwindels bwindels closed this as completed Jul 9, 2017
@bwindels
Copy link
Owner

bwindels commented Jul 9, 2017

@bwindels
Copy link
Owner

bwindels commented Jul 9, 2017

I did notice that on node.js, the library forcefully decodes using ASCII, while in the browser it uses UTF16 (Compatible with ASCII). Ideally, on both platforms it should decode using UTF-8, since that's what's most widely used and compatible with ASCII as well. Your example text might be encoded with UTF-8 as a matter of fact. Browser support for UTF-8 is not ubiquitous, so might be hard to do cross-platform, I'll have a look.

@bwindels bwindels reopened this Jul 9, 2017
@bwindels
Copy link
Owner

bwindels commented Jul 9, 2017

Released 0.1.11 that uses utf-8 for nodejs. If you want, you could test if the description in your image decodes properly now on nodejs. For the browser, we'd have to use TextEncoder if supported, and revert to fromCodePoint and fromCharCode if not. Don't have time to do this right now, but you're welcome to make a PR.

@SergioCrisostomo
Copy link

@acidicX if this issue is still a problem for you could you share the image that causes you problems so others can look at it and try to fix/suggest changes?

@bwindels
Copy link
Owner

TextDecoder seems reasonably supported nowadays. Worth a look at some point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants