Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

StlLCMapStringA.cpp: Does __crtLCMapStringA() have bogus error paths? #4984

Open
StephanTLavavej opened this issue Sep 26, 2024 · 3 comments
Labels
question Further information is requested

Comments

@StephanTLavavej
Copy link
Member

While fixing a CodeQL warning in __crtLCMapStringA(), I'm seeing code that makes no sense:

// Exit:
// Success: number of chars written to lpDestStr (including null terminator)
// Failure: 0
extern "C" _CRTIMP2 int __cdecl __crtLCMapStringA(_In_opt_z_ LPCWSTR LocaleName, _In_ DWORD dwMapFlags,
_In_reads_(cchSrc) LPCSTR lpSrcStr, _In_ int cchSrc, _Out_writes_opt_(cchDest) char* lpDestStr, _In_ int cchDest,
_In_ int code_page, _In_ BOOL bError) noexcept {
// LCMapString will map past the null terminator. We must find the null
// terminator if it occurs in the string before cchSrc characters
// and cap the number of characters to be considered.
if (cchSrc > 0) {
const int cchSrcCnt = static_cast<int>(__strncnt(lpSrcStr, cchSrc));
// Include the null terminator if the source string is terminated within
// the buffer.
if (cchSrcCnt < cchSrc) {
cchSrc = cchSrcCnt + 1;
} else {
cchSrc = cchSrcCnt;
}
}
// Convert string and return the requested information. Note that we are converting to a wide string so there is not
// a one-to-one correspondence between number of wide chars in the input string and the number of *bytes* in the
// buffer. However, there had *better be* a one-to-one correspondence between the number of wide characters and the
// number of multibyte characters or the resulting mapped string will be worthless to the user.
// find out how big a buffer we need (includes null terminator if any)
const int inbuff_size = MultiByteToWideChar(
code_page, bError ? MB_PRECOMPOSED | MB_ERR_INVALID_CHARS : MB_PRECOMPOSED, lpSrcStr, cchSrc, nullptr, 0);
if (0 == inbuff_size) {
return 0;
}
// allocate enough space for wide chars
const __crt_scoped_stack_ptr<wchar_t> inwbuffer(_malloca_crt_t(wchar_t, inbuff_size));
if (!inwbuffer) {
return 0;
}
// do the conversion
if (0 == MultiByteToWideChar(code_page, MB_PRECOMPOSED, lpSrcStr, cchSrc, inwbuffer.get(), inbuff_size)) {
return 0;
}
// get size required for string mapping
int retval = LCMapStringEx(LocaleName, dwMapFlags, inwbuffer.get(), inbuff_size, nullptr, 0, nullptr, nullptr, 0);
if (0 == retval) {
return 0;
}
if (dwMapFlags & LCMAP_SORTKEY) {
// retval is size in BYTES
if (0 != cchDest) {
if (retval > cchDest) {
return retval;
}
const auto wide_dest = reinterpret_cast<LPWSTR>(lpDestStr); // lgtm [cpp/incorrect-string-type-conversion]
// do string mapping
if (0
== LCMapStringEx(
LocaleName, dwMapFlags, inwbuffer.get(), inbuff_size, wide_dest, cchDest, nullptr, nullptr, 0)) {
return retval;
}
}
} else {
// retval is size in wide chars
int outbuff_size = retval;
// allocate enough space for wide chars (includes null terminator if any)
const __crt_scoped_stack_ptr<wchar_t> outwbuffer(_malloca_crt_t(wchar_t, outbuff_size));
if (!outwbuffer) {
return retval;
}
// do string mapping
if (0
== LCMapStringEx(LocaleName, dwMapFlags, inwbuffer.get(), inbuff_size, outwbuffer.get(), outbuff_size,
nullptr, nullptr, 0)) {
return retval;
}
if (0 == cchDest) {
// get size required
retval = WideCharToMultiByte(code_page, 0, outwbuffer.get(), outbuff_size, nullptr, 0, nullptr, nullptr);
} else {
// convert mapping
retval =
WideCharToMultiByte(code_page, 0, outwbuffer.get(), outbuff_size, lpDestStr, cchDest, nullptr, nullptr);
}
}
return retval;
}

Specifically, every single return retval; (except for the last one, which is the success path) looks like it's a failure case where we should actually be returning 0 (failure).

Early on, if we can't allocate an inwbuffer, we return 0, which makes total sense:

const __crt_scoped_stack_ptr<wchar_t> inwbuffer(_malloca_crt_t(wchar_t, inbuff_size));
if (!inwbuffer) {
return 0;
}

But later, if we can't allocate an outwbuffer, we return retval;, which makes no sense whatsoever:

const __crt_scoped_stack_ptr<wchar_t> outwbuffer(_malloca_crt_t(wchar_t, outbuff_size));
if (!outwbuffer) {
return retval;
}

Similarly for the other cases.

Notes:

  1. This code is very old and somewhat scary. We shouldn't mess with it until we have a crystal-clear understanding of what's going on, and ideally test cases that exercise various codepaths. Our test coverage here is very minimal.
    • I'm filing this issue as a reminder to thoroughly investigate this in the future - I specifically don't want to see a PR that just mechanically replaces lines with return 0; without a thorough analysis and evidence that this won't damage behavior in some way.
  2. We were confused long ago, and dllexported this (marked as _CRTIMP2), but it's used only within the STL and is not user-visible. I believe it's never been user-visible in the VS 2015+ era (need to confirm this). If it was never user-visible, then we need to maintain its signature for dllexport validation but we need not preserve its behavior.
    • I really don't like this function (beginning with the totally incorrect comment "Tries to use NLS API call LCMapStringA"), especially how it calls LCMapStringEx for LCMAP_SORTKEY (this is what I'm looking at for CodeQL). Only xstrxfrm.cpp has 2 calls with LCMAP_SORTKEY, so if we ever gain the understanding/courage to mess with this, we should consider splitting up this function's behavior.
@StephanTLavavej StephanTLavavej added the question Further information is requested label Sep 26, 2024
@cpplearner
Copy link
Contributor

__crtLCMapStringA is used by _Strxfrm, _Toupper, and _Tolower.

  • _Toupper and _Tolower each converts a single character. It should be extremely unlikely (if not impossible) for them to meet an allocation failure.
  • _Strxfrm calls __crtLCMapStringA twice. The first call is used to compute the required size without filling the output buffer, which means __crtLCMapStringA is expected to return the required size regardless of whether it can write to the buffer.

STL/stl/src/xstrxfrm.cpp

Lines 77 to 90 in 1e312b3

// Inquire size of dst string in BYTES
const int dstlen =
__crtLCMapStringA(locale_name, LCMAP_SORTKEY, string2, static_cast<int>(n2), nullptr, 0, codepage, TRUE);
if (dstlen != 0) {
retval = dstlen;
// if not enough room, return amount needed
if (dstlen <= static_cast<int>(n1)) {
// Map src string to dst string
__crtLCMapStringA(locale_name, LCMAP_SORTKEY, string2, static_cast<int>(n2), string1,
static_cast<int>(n1), codepage, TRUE);
}
}

@StephanTLavavej
Copy link
Member Author

is expected to return the required size regardless of whether it can write to the buffer.

Ah, but we pass 0 for cchDest when we're doing that. Thus, if (0 != cchDest) is inactive in the SORTKEY block, which otherwise does nothing:

if (0 != cchDest) {
if (retval > cchDest) {
return retval;
}
const auto wide_dest = reinterpret_cast<LPWSTR>(lpDestStr); // lgtm [cpp/incorrect-string-type-conversion]
// do string mapping
if (0
== LCMapStringEx(
LocaleName, dwMapFlags, inwbuffer.get(), inbuff_size, wide_dest, cchDest, nullptr, nullptr, 0)) {
return retval;
}
}

We then fall through to the end of the function, where we use the "successful" return retval; - that's the only one I don't think is bogus.

@AlexGuteniev
Copy link
Contributor

Quote from the offline documentation I still use when the offline is bogus (as I thing currently happens)

Return Values
Returns the number of characters or bytes in the translated string or sort key, including a terminating null character, if successful. If the function succeeds and the value of cchDest is 0, the return value is the size of the buffer required to hold the translated string or sort key, including a terminating null character.

This function returns 0 if it does not succeed. To get extended error information, the application can call GetLastError. GetLastError can return one of the following error codes:

  • ERROR_INSUFFICIENT_BUFFER
  • ERROR_INVALID_FLAGS
  • ERROR_INVALID_PARAMETER

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants