Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unicode string termination issues #54

Open
daleonov opened this issue Feb 6, 2024 · 1 comment
Open

Unicode string termination issues #54

daleonov opened this issue Feb 6, 2024 · 1 comment

Comments

@daleonov
Copy link

daleonov commented Feb 6, 2024

When I read the tags that have unicode text in them, in most cases there's some garbage before string termination character.

Example:

Byte sequence I get:
ff fe 32 00 30 00 31 00 35 00 31 0c f7 7f 00 00

"Corrected" byte order before I decode it from UTF-16:
fe ff 00 32 00 30 00 31 00 35 0c 31 7f f7 00 00

What I get:
2015ఱ翷

So, "fe ff" is a BOM, "00 32 00 30 00 31 00 35" is "2015". Then there are four garbage bytes "0c 31 7f f7" followed by string termination. Those garbage bytes are different every time I run the app, so looks like RAM garbage rather than something existing in the mp3 file. Also, those files' metadata displays correctly in other apps.

Here's what different "size" values say.

frame->header->size: 17
textFrame->data->size: 16
ID3v2_strlen: 14
ID3v2_strlent: 16

Correct length, if we count BOM and termination bytes should be 8 + 2 for BOM + 2 for terminations, equals 12. So all those values above are off.

Here's a bit of code I'm doing:

// I'm iterating through all the tags.
// At this point of code I'm sure I'm looking at Text frame, so I just cast it below.

juce::String value;  // I'm using this type externally, don't worry too much about it
ID3v2_TextFrame* textFrame = (ID3v2_TextFrame*)frame;
const char encoding = textFrame->data->encoding;

/*
I also try frame->header->size and textFrame->size.
All three of them are different values, but the actual readable section of the text is shorter that them.
*/
const int textLengthBytes = ID3v2_strlent (textFrame->data->text);

if (encoding == ID3v2_ENCODING_ISO)
{
    // ISO encoding, keeping it as is
    value = juce::String (textFrame->data->text, textLengthBytes);
}
else if (encoding == ID3v2_ENCODING_UNICODE)
{
    // UNICODE - this is where the trouble is

    DBG ("frame->header->size: " << frame->header->size);
    DBG ("textFrame->size: " << textFrame->size);

    // I'm printing those bytes here to see what's under the hood
    DBG ("====");
    for (int i = 0; i < textLengthBytes; ++i)
        DBG (static_cast<unsigned char> (textFrame->data->text[i]));
    DBG ("====");

    // This but makes sure bytes are in correct order and interprets them as UTF16
    // I've used other means of converting, same result, so I don't think the issue is here
    const auto* textAsUnicode = reinterpret_cast<juce::CharPointer_UTF16::CharType*> (textFrame->data->text);
    juce::CharPointer_UTF16 textAsUnicodeCharPointer (textAsUnicode);
    value = juce::String (textAsUnicodeCharPointer);
}

Also worth mentioning that I build the project using VS v143, but I've compiled your library using LLVM (clang) since it has Variable length arrays that aren't supported by VS. Either there's some incompatibility between binaries, or I'm missing something else there.

If you think my compiler setup is to blame, I can volunteer to fix #48 and replace those VLA's by malloc's, so I can build everything via VS toolset.

daleonov added a commit to daleonov/id3v2lib that referenced this issue Feb 6, 2024
Kept all "size" values in frames metadata untouched, but now any strlen()-like functions work correctly.
Fixes issue larsbs#54
@daleonov
Copy link
Author

daleonov commented Feb 6, 2024

Update: apparently VLA's weren't the problem. Fixed both issues in my recent PR. Tested via reading a bunch of mp3 files with unicode tags. Haven't tested anything that does writing/removing though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant