Skip to content

Commit

Permalink
Do not parse hashtag emoji as tag — pt 2 (#2245)
Browse files Browse the repository at this point in the history
* Do not parse hashtag emoji as tag (#2242)

* fix: prevent hashtag emoji from being parsed as tag

* chore: fmt

* fix: properly calculate length of tag

* Add a couple tests

---------

Co-authored-by: Mary <[email protected]>
  • Loading branch information
estrattonbailey and mary-ext authored Feb 29, 2024
1 parent 4d062cb commit 61b3d25
Show file tree
Hide file tree
Showing 4 changed files with 32 additions and 12 deletions.
5 changes: 5 additions & 0 deletions .changeset/dull-hotels-beam.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
'@atproto/api': patch
---

Prevent hashtag emoji from being parsed as a tag
5 changes: 5 additions & 0 deletions .changeset/short-suits-destroy.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
'@atproto/api': patch
---

Properly calculate length of tag
14 changes: 6 additions & 8 deletions packages/api/src/rich-text/detection.ts
Original file line number Diff line number Diff line change
Expand Up @@ -70,27 +70,25 @@ export function detectFacets(text: UnicodeString): Facet[] | undefined {
}
}
{
const re = /(?:^|\s)(#[^\d\s]\S*)(?=\s)?/g
const re = /(^|\s)#((?!\ufe0f)[^\d\s]\S*)(?=\s)?/g
while ((match = re.exec(text.utf16))) {
let [tag] = match
const hasLeadingSpace = /^\s/.test(tag)
let [, leading, tag] = match

Check warning on line 75 in packages/api/src/rich-text/detection.ts

View workflow job for this annotation

GitHub Actions / Build & Publish

'leading' is never reassigned. Use 'const' instead

tag = tag.trim().replace(/\p{P}+$/gu, '') // strip ending punctuation

// inclusive of #, max of 64 chars
if (tag.length > 66) continue
if (tag.length === 0 || tag.length > 64) continue

const index = match.index + (hasLeadingSpace ? 1 : 0)
const index = match.index + leading.length

facets.push({
index: {
byteStart: text.utf16IndexToUtf8Index(index),
byteEnd: text.utf16IndexToUtf8Index(index + tag.length), // inclusive of last char
byteEnd: text.utf16IndexToUtf8Index(index + 1 + tag.length),
},
features: [
{
$type: 'app.bsky.richtext.facet#tag',
tag: tag.replace(/^#/, ''),
tag: tag,
},
],
})
Expand Down
20 changes: 16 additions & 4 deletions packages/api/tests/rich-text-detection.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -241,15 +241,16 @@ describe('detectFacets', () => {
['body #1', [], []],
['body #a1', ['a1'], [{ byteStart: 5, byteEnd: 8 }]],
['#', [], []],
['#?', [], []],
['text #', [], []],
['text # text', [], []],
[
'body #thisisa64characterstring_aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa',
['thisisa64characterstring_aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'],
[{ byteStart: 5, byteEnd: 71 }],
'body #thisisa64characterstring_aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa',
['thisisa64characterstring_aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'],
[{ byteStart: 5, byteEnd: 70 }],
],
[
'body #thisisa65characterstring_aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaab',
'body #thisisa65characterstring_aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaab',
[],
[],
],
Expand Down Expand Up @@ -297,6 +298,17 @@ describe('detectFacets', () => {
{ byteStart: 17, byteEnd: 22 },
],
],
['this #️⃣tag should not be a tag', [], []],
[
'this ##️⃣tag should be a tag',
['#️⃣tag'],
[
{
byteStart: 5,
byteEnd: 16,
},
],
],
]

for (const [input, tags, indices] of inputs) {
Expand Down

0 comments on commit 61b3d25

Please sign in to comment.