-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Certain "emoji" are still half-sized #900
Comments
This issue may be related to https://github.com/microsoft/terminal/blob/master/src/types/CodepointWidthDetector.cpp. The width obtained by emoji 🛠 (\U0001F6E0) is actually 1. In this project https://github.com/fumiyas/wcwidth-cjk , the open emoji detection width is 2. |
////
// https://github.com/microsoft/terminal/blob/734fc1dcc6de4315d4cc91944c5ea83b7b8a7e1a/src/types/CodepointWidthDetector.cpp
#include <iterator>
#include <algorithm>
#include <array>
namespace unicode {
enum class CodepointWidth : uint8_t {
Narrow,
Wide,
Ambiguous, // could be narrow or wide depending on the current codepage and
// font
Invalid // not a valid unicode codepoint
};
// used to store range data in CodepointWidthDetector's internal map
struct UnicodeRange final {
unsigned int lowerBound;
unsigned int upperBound;
CodepointWidth width;
};
static bool operator<(const UnicodeRange &range,
const unsigned int searchTerm) {
return range.upperBound < searchTerm;
}
static constexpr std::array<UnicodeRange, 285> s_wideAndAmbiguousTable{
// generated from
// http://www.unicode.org/Public/UCD/latest/ucd/EastAsianWidth.txt
// anything not present here is presumed to be Narrow.
UnicodeRange{0xa1, 0xa1, CodepointWidth::Ambiguous},
UnicodeRange{0xa4, 0xa4, CodepointWidth::Ambiguous},
UnicodeRange{0xa7, 0xa8, CodepointWidth::Ambiguous},
UnicodeRange{0xaa, 0xaa, CodepointWidth::Ambiguous},
UnicodeRange{0xad, 0xae, CodepointWidth::Ambiguous},
UnicodeRange{0xb0, 0xb4, CodepointWidth::Ambiguous},
UnicodeRange{0xb6, 0xba, CodepointWidth::Ambiguous},
UnicodeRange{0xbc, 0xbf, CodepointWidth::Ambiguous},
UnicodeRange{0xc6, 0xc6, CodepointWidth::Ambiguous},
UnicodeRange{0xd0, 0xd0, CodepointWidth::Ambiguous},
UnicodeRange{0xd7, 0xd8, CodepointWidth::Ambiguous},
UnicodeRange{0xde, 0xe1, CodepointWidth::Ambiguous},
UnicodeRange{0xe6, 0xe6, CodepointWidth::Ambiguous},
UnicodeRange{0xe8, 0xea, CodepointWidth::Ambiguous},
UnicodeRange{0xec, 0xed, CodepointWidth::Ambiguous},
UnicodeRange{0xf0, 0xf0, CodepointWidth::Ambiguous},
UnicodeRange{0xf2, 0xf3, CodepointWidth::Ambiguous},
UnicodeRange{0xf7, 0xfa, CodepointWidth::Ambiguous},
UnicodeRange{0xfc, 0xfc, CodepointWidth::Ambiguous},
UnicodeRange{0xfe, 0xfe, CodepointWidth::Ambiguous},
UnicodeRange{0x101, 0x101, CodepointWidth::Ambiguous},
UnicodeRange{0x111, 0x111, CodepointWidth::Ambiguous},
UnicodeRange{0x113, 0x113, CodepointWidth::Ambiguous},
UnicodeRange{0x11b, 0x11b, CodepointWidth::Ambiguous},
UnicodeRange{0x126, 0x127, CodepointWidth::Ambiguous},
UnicodeRange{0x12b, 0x12b, CodepointWidth::Ambiguous},
UnicodeRange{0x131, 0x133, CodepointWidth::Ambiguous},
UnicodeRange{0x138, 0x138, CodepointWidth::Ambiguous},
UnicodeRange{0x13f, 0x142, CodepointWidth::Ambiguous},
UnicodeRange{0x144, 0x144, CodepointWidth::Ambiguous},
UnicodeRange{0x148, 0x14b, CodepointWidth::Ambiguous},
UnicodeRange{0x14d, 0x14d, CodepointWidth::Ambiguous},
UnicodeRange{0x152, 0x153, CodepointWidth::Ambiguous},
UnicodeRange{0x166, 0x167, CodepointWidth::Ambiguous},
UnicodeRange{0x16b, 0x16b, CodepointWidth::Ambiguous},
UnicodeRange{0x1ce, 0x1ce, CodepointWidth::Ambiguous},
UnicodeRange{0x1d0, 0x1d0, CodepointWidth::Ambiguous},
UnicodeRange{0x1d2, 0x1d2, CodepointWidth::Ambiguous},
UnicodeRange{0x1d4, 0x1d4, CodepointWidth::Ambiguous},
UnicodeRange{0x1d6, 0x1d6, CodepointWidth::Ambiguous},
UnicodeRange{0x1d8, 0x1d8, CodepointWidth::Ambiguous},
UnicodeRange{0x1da, 0x1da, CodepointWidth::Ambiguous},
UnicodeRange{0x1dc, 0x1dc, CodepointWidth::Ambiguous},
UnicodeRange{0x251, 0x251, CodepointWidth::Ambiguous},
UnicodeRange{0x261, 0x261, CodepointWidth::Ambiguous},
UnicodeRange{0x2c4, 0x2c4, CodepointWidth::Ambiguous},
UnicodeRange{0x2c7, 0x2c7, CodepointWidth::Ambiguous},
UnicodeRange{0x2c9, 0x2cb, CodepointWidth::Ambiguous},
UnicodeRange{0x2cd, 0x2cd, CodepointWidth::Ambiguous},
UnicodeRange{0x2d0, 0x2d0, CodepointWidth::Ambiguous},
UnicodeRange{0x2d8, 0x2db, CodepointWidth::Ambiguous},
UnicodeRange{0x2dd, 0x2dd, CodepointWidth::Ambiguous},
UnicodeRange{0x2df, 0x2df, CodepointWidth::Ambiguous},
UnicodeRange{0x300, 0x36f, CodepointWidth::Ambiguous},
UnicodeRange{0x391, 0x3a1, CodepointWidth::Ambiguous},
UnicodeRange{0x3a3, 0x3a9, CodepointWidth::Ambiguous},
UnicodeRange{0x3b1, 0x3c1, CodepointWidth::Ambiguous},
UnicodeRange{0x3c3, 0x3c9, CodepointWidth::Ambiguous},
UnicodeRange{0x401, 0x401, CodepointWidth::Ambiguous},
UnicodeRange{0x410, 0x44f, CodepointWidth::Ambiguous},
UnicodeRange{0x451, 0x451, CodepointWidth::Ambiguous},
UnicodeRange{0x1100, 0x115f, CodepointWidth::Wide},
UnicodeRange{0x2010, 0x2010, CodepointWidth::Ambiguous},
UnicodeRange{0x2013, 0x2016, CodepointWidth::Ambiguous},
UnicodeRange{0x2018, 0x2019, CodepointWidth::Ambiguous},
UnicodeRange{0x201c, 0x201d, CodepointWidth::Ambiguous},
UnicodeRange{0x2020, 0x2022, CodepointWidth::Ambiguous},
UnicodeRange{0x2024, 0x2027, CodepointWidth::Ambiguous},
UnicodeRange{0x2030, 0x2030, CodepointWidth::Ambiguous},
UnicodeRange{0x2032, 0x2033, CodepointWidth::Ambiguous},
UnicodeRange{0x2035, 0x2035, CodepointWidth::Ambiguous},
UnicodeRange{0x203b, 0x203b, CodepointWidth::Ambiguous},
UnicodeRange{0x203e, 0x203e, CodepointWidth::Ambiguous},
UnicodeRange{0x2074, 0x2074, CodepointWidth::Ambiguous},
UnicodeRange{0x207f, 0x207f, CodepointWidth::Ambiguous},
UnicodeRange{0x2081, 0x2084, CodepointWidth::Ambiguous},
UnicodeRange{0x20ac, 0x20ac, CodepointWidth::Ambiguous},
UnicodeRange{0x2103, 0x2103, CodepointWidth::Ambiguous},
UnicodeRange{0x2105, 0x2105, CodepointWidth::Ambiguous},
UnicodeRange{0x2109, 0x2109, CodepointWidth::Ambiguous},
UnicodeRange{0x2113, 0x2113, CodepointWidth::Ambiguous},
UnicodeRange{0x2116, 0x2116, CodepointWidth::Ambiguous},
UnicodeRange{0x2121, 0x2122, CodepointWidth::Ambiguous},
UnicodeRange{0x2126, 0x2126, CodepointWidth::Ambiguous},
UnicodeRange{0x212b, 0x212b, CodepointWidth::Ambiguous},
UnicodeRange{0x2153, 0x2154, CodepointWidth::Ambiguous},
UnicodeRange{0x215b, 0x215e, CodepointWidth::Ambiguous},
UnicodeRange{0x2160, 0x216b, CodepointWidth::Ambiguous},
UnicodeRange{0x2170, 0x2179, CodepointWidth::Ambiguous},
UnicodeRange{0x2189, 0x2189, CodepointWidth::Ambiguous},
UnicodeRange{0x2190, 0x2199, CodepointWidth::Ambiguous},
UnicodeRange{0x21b8, 0x21b9, CodepointWidth::Ambiguous},
UnicodeRange{0x21d2, 0x21d2, CodepointWidth::Ambiguous},
UnicodeRange{0x21d4, 0x21d4, CodepointWidth::Ambiguous},
UnicodeRange{0x21e7, 0x21e7, CodepointWidth::Ambiguous},
UnicodeRange{0x2200, 0x2200, CodepointWidth::Ambiguous},
UnicodeRange{0x2202, 0x2203, CodepointWidth::Ambiguous},
UnicodeRange{0x2207, 0x2208, CodepointWidth::Ambiguous},
UnicodeRange{0x220b, 0x220b, CodepointWidth::Ambiguous},
UnicodeRange{0x220f, 0x220f, CodepointWidth::Ambiguous},
UnicodeRange{0x2211, 0x2211, CodepointWidth::Ambiguous},
UnicodeRange{0x2215, 0x2215, CodepointWidth::Ambiguous},
UnicodeRange{0x221a, 0x221a, CodepointWidth::Ambiguous},
UnicodeRange{0x221d, 0x2220, CodepointWidth::Ambiguous},
UnicodeRange{0x2223, 0x2223, CodepointWidth::Ambiguous},
UnicodeRange{0x2225, 0x2225, CodepointWidth::Ambiguous},
UnicodeRange{0x2227, 0x222c, CodepointWidth::Ambiguous},
UnicodeRange{0x222e, 0x222e, CodepointWidth::Ambiguous},
UnicodeRange{0x2234, 0x2237, CodepointWidth::Ambiguous},
UnicodeRange{0x223c, 0x223d, CodepointWidth::Ambiguous},
UnicodeRange{0x2248, 0x2248, CodepointWidth::Ambiguous},
UnicodeRange{0x224c, 0x224c, CodepointWidth::Ambiguous},
UnicodeRange{0x2252, 0x2252, CodepointWidth::Ambiguous},
UnicodeRange{0x2260, 0x2261, CodepointWidth::Ambiguous},
UnicodeRange{0x2264, 0x2267, CodepointWidth::Ambiguous},
UnicodeRange{0x226a, 0x226b, CodepointWidth::Ambiguous},
UnicodeRange{0x226e, 0x226f, CodepointWidth::Ambiguous},
UnicodeRange{0x2282, 0x2283, CodepointWidth::Ambiguous},
UnicodeRange{0x2286, 0x2287, CodepointWidth::Ambiguous},
UnicodeRange{0x2295, 0x2295, CodepointWidth::Ambiguous},
UnicodeRange{0x2299, 0x2299, CodepointWidth::Ambiguous},
UnicodeRange{0x22a5, 0x22a5, CodepointWidth::Ambiguous},
UnicodeRange{0x22bf, 0x22bf, CodepointWidth::Ambiguous},
UnicodeRange{0x2312, 0x2312, CodepointWidth::Ambiguous},
UnicodeRange{0x231a, 0x231b, CodepointWidth::Wide},
UnicodeRange{0x2329, 0x232a, CodepointWidth::Wide},
UnicodeRange{0x23e9, 0x23ec, CodepointWidth::Wide},
UnicodeRange{0x23f0, 0x23f0, CodepointWidth::Wide},
UnicodeRange{0x23f3, 0x23f3, CodepointWidth::Wide},
UnicodeRange{0x2460, 0x24e9, CodepointWidth::Ambiguous},
UnicodeRange{0x24eb, 0x254b, CodepointWidth::Ambiguous},
UnicodeRange{0x2550, 0x2573, CodepointWidth::Ambiguous},
UnicodeRange{0x2580, 0x258f, CodepointWidth::Ambiguous},
UnicodeRange{0x2592, 0x2595, CodepointWidth::Ambiguous},
UnicodeRange{0x25a0, 0x25a1, CodepointWidth::Ambiguous},
UnicodeRange{0x25a3, 0x25a9, CodepointWidth::Ambiguous},
UnicodeRange{0x25b2, 0x25b3, CodepointWidth::Ambiguous},
UnicodeRange{0x25b6, 0x25b7, CodepointWidth::Ambiguous},
UnicodeRange{0x25bc, 0x25bd, CodepointWidth::Ambiguous},
UnicodeRange{0x25c0, 0x25c1, CodepointWidth::Ambiguous},
UnicodeRange{0x25c6, 0x25c8, CodepointWidth::Ambiguous},
UnicodeRange{0x25cb, 0x25cb, CodepointWidth::Ambiguous},
UnicodeRange{0x25ce, 0x25d1, CodepointWidth::Ambiguous},
UnicodeRange{0x25e2, 0x25e5, CodepointWidth::Ambiguous},
UnicodeRange{0x25ef, 0x25ef, CodepointWidth::Ambiguous},
UnicodeRange{0x25fd, 0x25fe, CodepointWidth::Wide},
UnicodeRange{0x2605, 0x2606, CodepointWidth::Ambiguous},
UnicodeRange{0x2609, 0x2609, CodepointWidth::Ambiguous},
UnicodeRange{0x260e, 0x260f, CodepointWidth::Ambiguous},
UnicodeRange{0x2614, 0x2615, CodepointWidth::Wide},
UnicodeRange{0x261c, 0x261c, CodepointWidth::Ambiguous},
UnicodeRange{0x261e, 0x261e, CodepointWidth::Ambiguous},
UnicodeRange{0x2640, 0x2640, CodepointWidth::Ambiguous},
UnicodeRange{0x2642, 0x2642, CodepointWidth::Ambiguous},
UnicodeRange{0x2648, 0x2653, CodepointWidth::Wide},
UnicodeRange{0x2660, 0x2661, CodepointWidth::Ambiguous},
UnicodeRange{0x2663, 0x2665, CodepointWidth::Ambiguous},
UnicodeRange{0x2667, 0x266a, CodepointWidth::Ambiguous},
UnicodeRange{0x266c, 0x266d, CodepointWidth::Ambiguous},
UnicodeRange{0x266f, 0x266f, CodepointWidth::Ambiguous},
UnicodeRange{0x267f, 0x267f, CodepointWidth::Wide},
UnicodeRange{0x2693, 0x2693, CodepointWidth::Wide},
UnicodeRange{0x269e, 0x269f, CodepointWidth::Ambiguous},
UnicodeRange{0x26a1, 0x26a1, CodepointWidth::Wide},
UnicodeRange{0x26aa, 0x26ab, CodepointWidth::Wide},
UnicodeRange{0x26bd, 0x26be, CodepointWidth::Wide},
UnicodeRange{0x26bf, 0x26bf, CodepointWidth::Ambiguous},
UnicodeRange{0x26c4, 0x26c5, CodepointWidth::Wide},
UnicodeRange{0x26c6, 0x26cd, CodepointWidth::Ambiguous},
UnicodeRange{0x26ce, 0x26ce, CodepointWidth::Wide},
UnicodeRange{0x26cf, 0x26d3, CodepointWidth::Ambiguous},
UnicodeRange{0x26d4, 0x26d4, CodepointWidth::Wide},
UnicodeRange{0x26d5, 0x26e1, CodepointWidth::Ambiguous},
UnicodeRange{0x26e3, 0x26e3, CodepointWidth::Ambiguous},
UnicodeRange{0x26e8, 0x26e9, CodepointWidth::Ambiguous},
UnicodeRange{0x26ea, 0x26ea, CodepointWidth::Wide},
UnicodeRange{0x26eb, 0x26f1, CodepointWidth::Ambiguous},
UnicodeRange{0x26f2, 0x26f3, CodepointWidth::Wide},
UnicodeRange{0x26f4, 0x26f4, CodepointWidth::Ambiguous},
UnicodeRange{0x26f5, 0x26f5, CodepointWidth::Wide},
UnicodeRange{0x26f6, 0x26f9, CodepointWidth::Ambiguous},
UnicodeRange{0x26fa, 0x26fa, CodepointWidth::Wide},
UnicodeRange{0x26fb, 0x26fc, CodepointWidth::Ambiguous},
UnicodeRange{0x26fd, 0x26fd, CodepointWidth::Wide},
UnicodeRange{0x26fe, 0x26ff, CodepointWidth::Ambiguous},
UnicodeRange{0x2705, 0x2705, CodepointWidth::Wide},
UnicodeRange{0x270a, 0x270b, CodepointWidth::Wide},
UnicodeRange{0x2728, 0x2728, CodepointWidth::Wide},
UnicodeRange{0x273d, 0x273d, CodepointWidth::Ambiguous},
UnicodeRange{0x274c, 0x274c, CodepointWidth::Wide},
UnicodeRange{0x274e, 0x274e, CodepointWidth::Wide},
UnicodeRange{0x2753, 0x2755, CodepointWidth::Wide},
UnicodeRange{0x2757, 0x2757, CodepointWidth::Wide},
UnicodeRange{0x2776, 0x277f, CodepointWidth::Ambiguous},
UnicodeRange{0x2795, 0x2797, CodepointWidth::Wide},
UnicodeRange{0x27b0, 0x27b0, CodepointWidth::Wide},
UnicodeRange{0x27bf, 0x27bf, CodepointWidth::Wide},
UnicodeRange{0x2b1b, 0x2b1c, CodepointWidth::Wide},
UnicodeRange{0x2b50, 0x2b50, CodepointWidth::Wide},
UnicodeRange{0x2b55, 0x2b55, CodepointWidth::Wide},
UnicodeRange{0x2b56, 0x2b59, CodepointWidth::Ambiguous},
UnicodeRange{0x2e80, 0x2e99, CodepointWidth::Wide},
UnicodeRange{0x2e9b, 0x2ef3, CodepointWidth::Wide},
UnicodeRange{0x2f00, 0x2fd5, CodepointWidth::Wide},
UnicodeRange{0x2ff0, 0x2ffb, CodepointWidth::Wide},
UnicodeRange{0x3000, 0x303e, CodepointWidth::Wide},
UnicodeRange{0x3041, 0x3096, CodepointWidth::Wide},
UnicodeRange{0x3099, 0x30ff, CodepointWidth::Wide},
UnicodeRange{0x3105, 0x312e, CodepointWidth::Wide},
UnicodeRange{0x3131, 0x318e, CodepointWidth::Wide},
UnicodeRange{0x3190, 0x31ba, CodepointWidth::Wide},
UnicodeRange{0x31c0, 0x31e3, CodepointWidth::Wide},
UnicodeRange{0x31f0, 0x321e, CodepointWidth::Wide},
UnicodeRange{0x3220, 0x3247, CodepointWidth::Wide},
UnicodeRange{0x3248, 0x324f, CodepointWidth::Ambiguous},
UnicodeRange{0x3250, 0x32fe, CodepointWidth::Wide},
UnicodeRange{0x3300, 0x4dbf, CodepointWidth::Wide},
UnicodeRange{0x4e00, 0xa48c, CodepointWidth::Wide},
UnicodeRange{0xa490, 0xa4c6, CodepointWidth::Wide},
UnicodeRange{0xa960, 0xa97c, CodepointWidth::Wide},
UnicodeRange{0xac00, 0xd7a3, CodepointWidth::Wide},
UnicodeRange{0xe000, 0xf8ff, CodepointWidth::Ambiguous},
UnicodeRange{0xf900, 0xfaff, CodepointWidth::Wide},
UnicodeRange{0xfe00, 0xfe0f, CodepointWidth::Ambiguous},
UnicodeRange{0xfe10, 0xfe19, CodepointWidth::Wide},
UnicodeRange{0xfe30, 0xfe52, CodepointWidth::Wide},
UnicodeRange{0xfe54, 0xfe66, CodepointWidth::Wide},
UnicodeRange{0xfe68, 0xfe6b, CodepointWidth::Wide},
UnicodeRange{0xff01, 0xff60, CodepointWidth::Wide},
UnicodeRange{0xffe0, 0xffe6, CodepointWidth::Wide},
UnicodeRange{0xfffd, 0xfffd, CodepointWidth::Ambiguous},
UnicodeRange{0x16fe0, 0x16fe1, CodepointWidth::Wide},
UnicodeRange{0x17000, 0x187ec, CodepointWidth::Wide},
UnicodeRange{0x18800, 0x18af2, CodepointWidth::Wide},
UnicodeRange{0x1b000, 0x1b11e, CodepointWidth::Wide},
UnicodeRange{0x1b170, 0x1b2fb, CodepointWidth::Wide},
UnicodeRange{0x1f004, 0x1f004, CodepointWidth::Wide},
UnicodeRange{0x1f0cf, 0x1f0cf, CodepointWidth::Wide},
UnicodeRange{0x1f100, 0x1f10a, CodepointWidth::Ambiguous},
UnicodeRange{0x1f110, 0x1f12d, CodepointWidth::Ambiguous},
UnicodeRange{0x1f130, 0x1f169, CodepointWidth::Ambiguous},
UnicodeRange{0x1f170, 0x1f18d, CodepointWidth::Ambiguous},
UnicodeRange{0x1f18e, 0x1f18e, CodepointWidth::Wide},
UnicodeRange{0x1f18f, 0x1f190, CodepointWidth::Ambiguous},
UnicodeRange{0x1f191, 0x1f19a, CodepointWidth::Wide},
UnicodeRange{0x1f19b, 0x1f1ac, CodepointWidth::Ambiguous},
UnicodeRange{0x1f200, 0x1f202, CodepointWidth::Wide},
UnicodeRange{0x1f210, 0x1f23b, CodepointWidth::Wide},
UnicodeRange{0x1f240, 0x1f248, CodepointWidth::Wide},
UnicodeRange{0x1f250, 0x1f251, CodepointWidth::Wide},
UnicodeRange{0x1f260, 0x1f265, CodepointWidth::Wide},
UnicodeRange{0x1f300, 0x1f320, CodepointWidth::Wide},
UnicodeRange{0x1f32d, 0x1f335, CodepointWidth::Wide},
UnicodeRange{0x1f337, 0x1f37c, CodepointWidth::Wide},
UnicodeRange{0x1f37e, 0x1f393, CodepointWidth::Wide},
UnicodeRange{0x1f3a0, 0x1f3ca, CodepointWidth::Wide},
UnicodeRange{0x1f3cf, 0x1f3d3, CodepointWidth::Wide},
UnicodeRange{0x1f3e0, 0x1f3f0, CodepointWidth::Wide},
UnicodeRange{0x1f3f4, 0x1f3f4, CodepointWidth::Wide},
UnicodeRange{0x1f3f8, 0x1f43e, CodepointWidth::Wide},
UnicodeRange{0x1f440, 0x1f440, CodepointWidth::Wide},
UnicodeRange{0x1f442, 0x1f4fc, CodepointWidth::Wide},
UnicodeRange{0x1f4ff, 0x1f53d, CodepointWidth::Wide},
UnicodeRange{0x1f54b, 0x1f54e, CodepointWidth::Wide},
UnicodeRange{0x1f550, 0x1f567, CodepointWidth::Wide},
UnicodeRange{0x1f57a, 0x1f57a, CodepointWidth::Wide},
UnicodeRange{0x1f595, 0x1f596, CodepointWidth::Wide},
UnicodeRange{0x1f5a4, 0x1f5a4, CodepointWidth::Wide},
UnicodeRange{0x1f5fb, 0x1f64f, CodepointWidth::Wide},
UnicodeRange{0x1f680, 0x1f6c5, CodepointWidth::Wide},
UnicodeRange{0x1f6cc, 0x1f6cc, CodepointWidth::Wide},
UnicodeRange{0x1f6d0, 0x1f6d2, CodepointWidth::Wide},
UnicodeRange{0x1f6eb, 0x1f6ec, CodepointWidth::Wide},
UnicodeRange{0x1f6f4, 0x1f6f8, CodepointWidth::Wide},
UnicodeRange{0x1f910, 0x1f93e, CodepointWidth::Wide},
UnicodeRange{0x1f940, 0x1f94c, CodepointWidth::Wide},
UnicodeRange{0x1f950, 0x1f96b, CodepointWidth::Wide},
UnicodeRange{0x1f980, 0x1f997, CodepointWidth::Wide},
UnicodeRange{0x1f9c0, 0x1f9c0, CodepointWidth::Wide},
UnicodeRange{0x1f9d0, 0x1f9e6, CodepointWidth::Wide},
UnicodeRange{0x20000, 0x2fffd, CodepointWidth::Wide},
UnicodeRange{0x30000, 0x3fffd, CodepointWidth::Wide},
UnicodeRange{0xe0100, 0xe01ef, CodepointWidth::Ambiguous},
UnicodeRange{0xf0000, 0xffffd, CodepointWidth::Ambiguous},
UnicodeRange{0x100000, 0x10fffd, CodepointWidth::Ambiguous}};
size_t CalculateWidthInternal(char32_t rune) {
const auto it = std::lower_bound(s_wideAndAmbiguousTable.begin(),
s_wideAndAmbiguousTable.end(), rune);
// For characters that are not _in_ the table, lower_bound will return the
// nearest item that is. We must check its bounds to make sure that our hit
// was a true hit.
if (it != s_wideAndAmbiguousTable.end() && rune >= it->lowerBound &&
rune <= it->upperBound) {
switch (it->width) {
case CodepointWidth::Ambiguous:
return 0;
case CodepointWidth::Wide:
return 2;
case CodepointWidth::Narrow:
return 1;
default:
break;
}
return 0;
}
return 1;
}
} // namespace unicode
|
It's... slightly more complicated than that. In addition to the table laid out in UCD EastAsianWidth 12.0, which expressly avoids specifying Emoji, there's the Emoji 12.0 table. That specifies which characters are emoji, but we can't just import that table as-is. It specifies a lot of things that aren't emoji as being emoji. Like this:
If we ingest this table as is, we'll look even more wrong than we already are. |
@DHowett-MSFT After testing, I found that I only need to add an emoji_width table and add the missing emoji symbols to this table to get these emoji widths correctly. (Some emoji display different widths in different fonts.) The unicode interval to be supplemented is as follows: struct interval {
char32_t first;
char32_t last;
};
constexpr const interval emoji_width[] = {
{0x2194, 0x2199},
{0x21A9, 0x21AA},
{0x231A, 0x231B},
{0x2328, 0x2328},
{0x23CF, 0x23CF},
{0x23E9, 0x23F3},
{0x23F8, 0x23FA},
{0x24C2, 0x24C2},
{0x25AA, 0x25AB},
{0x25B6, 0x25B6},
{0x25C0, 0x25C0},
{0x25FB, 0x25FE},
// u2600~u27BF // fast check
{0x2600, 0x27BF},
//---
{0x2934, 0x2935},
{0x2B05, 0x2B07},
{0x2B1B, 0x2B1C},
{0x2B50, 0x2B50},
{0x2B55, 0x2B55},
{0x3030, 0x3030},
{0x3297, 0x3297},
{0x3299, 0x3299},
// 0x1F004 0x1F0CF ... unicode double width 2
{0x1F300, 0x1F64F},
{0x1F680, 0x1F6FF},
{0x1F900, 0x1F9FF},
};
|
From Egmont Koblinger: > In terminal emulation, apps have to be able to print something and keep track of the cursor, whereas they by design have no idea of the font being used. In many terminals the font can also be changed runtime and it's absolutely not feasible to then rearrange the cells. In some other cases there is no font at all (e.g. the libvterm headless terminal emulation library, or a detached screen/tmux), or there are multiple fonts at once (a screen/tmux attached from multiple graphical emulators). > The only way to do that is via some external agreement on the number of cells, which is typically the Unicode EastAsianWidth, often accessed via wcwidth(). It's not perfect (changes through Unicode versions, has ambiguous characters, etc.) but is still the best we have. > glibc's wcwidth() reports 1 for ambiguous width characters, so the de facto standard is that in terminals they are narrow. > If the glyph is wider then the terminal has to figure out what to do. It could crop it (newer versions of Konsole, as far as I know), overflow to the right (VTE), shrink it (Kitty I believe does this), etc. See Also: https://bugzilla.gnome.org/show_bug.cgi?id=767529 https://gitlab.freedesktop.org/terminal-wg/specifications/issues/9 https://www.unicode.org/reports/tr11/tr11-34.html Salient point from proposed update to Unicode Standard Annex 11: > Note: The East_Asian_Width property is not intended for use by modern terminal emulators without appropriate tailoring on a case-by-case basis. Fixes #2066 Fixes #2375 Related to #900
@miniksa So definitely all the ZWJ emojis that force a "character" to an emoji render are affected, as mentioned probably because it's jamming two characters into one width, hence why the emoji comes out small. Spider (U+1F577️ U+FE0F) previously being mentioned as an example. So detecting a vs16 and adjusting the character width based on the number of following characters would do it, is that part of the #2928 PR? |
@benc-uk yep, it's because the actual icon is two characters, a "standard" cloud character, and then a ZWJ character which tells the terminal to "force" it to emoji style rather than the character style. VSCode uses xterm internally and it knows how to interpret that (though it has other spacing issues). Windows Terminal currently only sees it as "one character" in terms of width, so that's why it's half-sized. |
The table that we refer to in `CodepointWidthDetector.cpp` to determine whether or not a codepoint should be rendered as Wide vs Narrow was based off EastAsianWidth[1]. If a codepoint wasn't included in this table, they're considered Narrow. Many emojis aren't specified in the EAW list, so this PR supplements our table with emoji codepoints from emoji-data[2] in order to render most, if not all, emojis as full-width. There are certain codepoints I've added to the comments (in case we want to add them officially to the table in the future) that Microsoft decided to give an emoji presentation even if it's specified as Narrow/Ambiguous in the EAW list and are _not_ specified in the Unicode emoji list. These include all of the Mahjong Tiles block, different direction pencils (✎✐), different pointing index fingers (☜, ☞) among others. I have no idea if I've captured all of them, as I don't know of an easy way to detect which are Microsoft specific emojis. ## Validation Steps Performed I have looked at so many emojis that I dream emoji. These screenshots aren't encompassing _all_ emoji but I've tried to grab a couple from all across the codepoint ranges: Before: ![before](https://user-images.githubusercontent.com/57155886/81445092-2051a980-912d-11ea-9739-c9f588da407d.png) After: ![after](https://user-images.githubusercontent.com/57155886/81445107-2778b780-912d-11ea-9615-676c2150e798.png) [1] http://www.unicode.org/Public/UCD/latest/ucd/EastAsianWidth.txt [2] https://www.unicode.org/Public/13.0.0/ucd/emoji/emoji-data.txt Closes #900
The table that we refer to in `CodepointWidthDetector.cpp` to determine whether or not a codepoint should be rendered as Wide vs Narrow was based off EastAsianWidth[1]. If a codepoint wasn't included in this table, they're considered Narrow. Many emojis aren't specified in the EAW list, so this PR supplements our table with emoji codepoints from emoji-data[2] in order to render most, if not all, emojis as full-width. There are certain codepoints I've added to the comments (in case we want to add them officially to the table in the future) that Microsoft decided to give an emoji presentation even if it's specified as Narrow/Ambiguous in the EAW list and are _not_ specified in the Unicode emoji list. These include all of the Mahjong Tiles block, different direction pencils (✎✐), different pointing index fingers (☜, ☞) among others. I have no idea if I've captured all of them, as I don't know of an easy way to detect which are Microsoft specific emojis. ## Validation Steps Performed I have looked at so many emojis that I dream emoji. These screenshots aren't encompassing _all_ emoji but I've tried to grab a couple from all across the codepoint ranges: Before: ![before](https://user-images.githubusercontent.com/57155886/81445092-2051a980-912d-11ea-9739-c9f588da407d.png) After: ![after](https://user-images.githubusercontent.com/57155886/81445107-2778b780-912d-11ea-9615-676c2150e798.png) [1] http://www.unicode.org/Public/UCD/latest/ucd/EastAsianWidth.txt [2] https://www.unicode.org/Public/13.0.0/ucd/emoji/emoji-data.txt Closes #900 (cherry picked from commit 7ae3433)
🎉This issue was addressed in #5795, which has now been successfully released as Handy links: |
This removes all glyphs from the emoji list that do not default to "emoji presentation" (EPres). It removes all local overrides, but retain the comments about the emoji we left out that are Microsoft-specific. This brings us fully in line with the most popular Terminals on OS X, except that we squash our emoji down to fit in one cell and they let them hang over the edges and damage other characters. Oh well. Refs #900, #5914.
This removes all glyphs from the emoji list that do not default to "emoji presentation" (EPres). It removes all local overrides, but retains the comments about the emoji we left out that are Microsoft-specific. This brings us fully in line with the most popular Terminals on OS X, except that we squash our emoji down to fit in one cell and they let them hang over the edges and damage other characters. Oh well. ## Detailed Description of the Pull Request / Additional comments Late Friday evening, I tested my emoji test file on iTerm2. In so doing, I realized that @j4james and @leonMSFT were right the entire time in #5914: Emoji that require `U+FE0F` must not be double-width by default. I finally banged up a powershell script that parses the UCD and emits a codepoint width table. Once checked in, this will be definitive. Refs #900, #5914. Fixes #5941.
This removes all glyphs from the emoji list that do not default to "emoji presentation" (EPres). It removes all local overrides, but retains the comments about the emoji we left out that are Microsoft-specific. This brings us fully in line with the most popular Terminals on OS X, except that we squash our emoji down to fit in one cell and they let them hang over the edges and damage other characters. Oh well. ## Detailed Description of the Pull Request / Additional comments Late Friday evening, I tested my emoji test file on iTerm2. In so doing, I realized that @j4james and @leonMSFT were right the entire time in #5914: Emoji that require `U+FE0F` must not be double-width by default. I finally banged up a powershell script that parses the UCD and emits a codepoint width table. Once checked in, this will be definitive. Refs #900, #5914. Fixes #5941. (cherry picked from commit ba1a298)
Hey I was linked here from 4747, I made a post back in late April asking if there was a way I could revert the glyph scaling.. I was pretty hopeful when I saw came across the post again I followed the link above to the store and downloaded the recent version. (I dont know if it helps or not but I am on the win10 slow ring?). nyway things to note: pros
cons
I bounce between a few monospaced fonts customized with the devicon/font-awesome/powerline font type packages. At first I thouhgt all I had to do was simply find a sweet spot for the glyphs, and believe me I tried to cheat the system with my own font builds before I gave up and moved on.. Anyway I digress, Some of the BASIC wingdings and things like the check marks, and other glyphss which maintain similar advances to the main font chars dont appear to be affected all that much. Heres an example of how extreme the scaling can be is: https://i.imgur.com/nzdWrK9.png. moved away towards hyper because of the customization it provides and kind of forgot about win term, until the other day. Its actually such a clean experience, my ONLY gripes are the tabs being massive and the font issue and I guess take my fonts pretty seriously so its something I do care about although I know not everyone does... . I can live without full blown UI customizatiions abd what have you but the display gotta be good. (and for the record, it is super crisp. I REALLY like what has been done) (just give me my full width glyphs) |
So, icon fonts occupy codepoints that are reserved in all versions of Unicode and do not have the "wide glyph" or "emoji presentation" flags set. The best we can do going forward is to render them over two cells (spill) and let the righthand cell destroy the right half of the spillover. Folks who take fonts seriously should attempt to get those glyph width changes made standard 😉 until then, it's an absolute crapshoot. More info at #5095 (comment) (closed). |
The table that we refer to in `CodepointWidthDetector.cpp` to determine whether or not a codepoint should be rendered as Wide vs Narrow was based off EastAsianWidth[1]. If a codepoint wasn't included in this table, they're considered Narrow. Many emojis aren't specified in the EAW list, so this PR supplements our table with emoji codepoints from emoji-data[2] in order to render most, if not all, emojis as full-width. There are certain codepoints I've added to the comments (in case we want to add them officially to the table in the future) that Microsoft decided to give an emoji presentation even if it's specified as Narrow/Ambiguous in the EAW list and are _not_ specified in the Unicode emoji list. These include all of the Mahjong Tiles block, different direction pencils (✎✐), different pointing index fingers (☜, ☞) among others. I have no idea if I've captured all of them, as I don't know of an easy way to detect which are Microsoft specific emojis. ## Validation Steps Performed I have looked at so many emojis that I dream emoji. These screenshots aren't encompassing _all_ emoji but I've tried to grab a couple from all across the codepoint ranges: Before: ![before](https://user-images.githubusercontent.com/57155886/81445092-2051a980-912d-11ea-9739-c9f588da407d.png) After: ![after](https://user-images.githubusercontent.com/57155886/81445107-2778b780-912d-11ea-9615-676c2150e798.png) [1] http://www.unicode.org/Public/UCD/latest/ucd/EastAsianWidth.txt [2] https://www.unicode.org/Public/13.0.0/ucd/emoji/emoji-data.txt Closes microsoft#900
This removes all glyphs from the emoji list that do not default to "emoji presentation" (EPres). It removes all local overrides, but retains the comments about the emoji we left out that are Microsoft-specific. This brings us fully in line with the most popular Terminals on OS X, except that we squash our emoji down to fit in one cell and they let them hang over the edges and damage other characters. Oh well. ## Detailed Description of the Pull Request / Additional comments Late Friday evening, I tested my emoji test file on iTerm2. In so doing, I realized that @j4james and @leonMSFT were right the entire time in microsoft#5914: Emoji that require `U+FE0F` must not be double-width by default. I finally banged up a powershell script that parses the UCD and emits a codepoint width table. Once checked in, this will be definitive. Refs microsoft#900, microsoft#5914. Fixes microsoft#5941.
@meandmymind The default representation for ⚠ is "narrow, single-width, non-emoji". The bug here is that it is yellow (emoji presentation), not that it is small. |
Environment
Steps to reproduce
Setup a standard PowerLine using FiraCode in WSL(2) and navigate to a git repository that contains edits. Note if you
printf '✏'
that the pencil is double the size of the one in the Powerline prompt.Expected behavior
I expect emojis in the prompt to appear at the same size as those later in the same line.
Actual behavior
emoji size appears to be halved? See pic.
The text was updated successfully, but these errors were encountered: