Skip to content
This repository has been archived by the owner on Dec 22, 2023. It is now read-only.

Question Regarding Combined Characters and Regex #525

Open
AlanBurkhart opened this issue Jun 12, 2022 · 0 comments
Open

Question Regarding Combined Characters and Regex #525

AlanBurkhart opened this issue Jun 12, 2022 · 0 comments

Comments

@AlanBurkhart
Copy link

AlanBurkhart commented Jun 12, 2022

I have my own Regex find-replace dialog that's always worked pretty well. Except if a text document contains characters with more than one Unicode code point, it throws off the index of the match. One character position per combined character. In this case I wasn't searching for the offending character but rather specific text that came after. For example:

🕜 &#.128348; &#.x1F55C; Clock Face One-thirty

Searching for the ampersand matches the # sign. If I paste another clock face chr into the line, it'll match the "1". Is there a practical method for dealing with this? (dots inserted so entities displayed instead of characters)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant