-
Notifications
You must be signed in to change notification settings - Fork 136
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: parse deep link correctly #573
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lib/ExpensiMark.js
Outdated
@@ -476,9 +470,28 @@ export default class ExpensiMark { | |||
} | |||
replacedText = replacedText.concat(textToCheck.substr(startIndex, (match.index - startIndex))); | |||
|
|||
// we want to avoid matching email address domains |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// we want to avoid matching email address domains | |
// We want to avoid matching domains in email addresses so we don't render them as URLs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's from old PR, I just moved it to other line. but new comment is more readable.
lib/ExpensiMark.js
Outdated
@@ -476,9 +470,28 @@ export default class ExpensiMark { | |||
} | |||
replacedText = replacedText.concat(textToCheck.substr(startIndex, (match.index - startIndex))); | |||
|
|||
// we want to avoid matching email address domains | |||
let abort = false; | |||
let shouldRetryByAtSign = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does "by at sign" mean? are we doing another try by removing the @ sign, by adding the @ sign, or by testing against a new regex with/out @ sign?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am also feeling that variable name is not reasonable.
I think isReparsedByAutolink
is more readable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do you think about new one?
lib/ExpensiMark.js
Outdated
// e.g. [email protected]/https://www.test.com | ||
// If the matched string faces @ sign before it, | ||
// We will retry to apply autolink rule to the string(e.g. /https://www.test.com) except for domain(e.g. expensify.com) after @ sign. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you update this comment to make it clearer?
- What is the e.g. an example of? shouldn't that be after another comment?
- What does "faces @ sign before it"? I suggest "If the matches string has a leading @ sign" -- use "leading" or "trailing" depending on the case
- It should say "why"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the e.g. an example of? shouldn't that be after another comment?
yes, it's example of comment. It should be after comment normally, but I tended to point '/https://www.test.com' and '[email protected]' in comment details
What does "faces @ sign before it"? I suggest "If the matches string has a leading @ sign" -- use "leading" or "trailing" depending on the case
Your suggestion is good, I will change comment as you recommended. "if the matched string has a leading @ sign"
It should say "why"
By a Old PR, if matched string has leading @ sign, It simply aborts to apply autolink rule. But we have edge cases like [email protected]/https://www.test.com, where https://www.test.com should be parsed as url.
To explain above reason, I added a example of edge test string.
As a result, I will change comment as following:
// If the matched Url has a leading @ sign, The Url's domain can be domain of email address.
// At this case, The matched string should not be parsed as Url,
// but it can contain another Url which should be parsed. (e.g. [email protected]/https://www.test.com)
// So We will retry to apply autolink rule to the string(e.g. /https://www.test.com) except for domain(e.g. expensify.com) after @ sign.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Friendly reminder, your thoughts ?
lib/ExpensiMark.js
Outdated
shouldRetryByAtSign = true; | ||
replacedText = replacedText.concat(domainMatch[1] + this.replace(domainMatch[3], {filterRules: ['autolink']})); | ||
} else { | ||
abort = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to add a comment explaining why we're setting abort to true here? It's not immediately obvious to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
__tests__/ExpensiMark-HTML-test.js
Outdated
@@ -609,6 +609,60 @@ test('Test a url ending with a closing parentheses autolinks correctly', () => { | |||
expect(parser.replace(testString)).toBe(resultString); | |||
}); | |||
|
|||
test('Test urls autolinks correctly', () => { | |||
let testString = '[email protected] https://www.expensify.com\n' + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a little hard to read and review, so I was thinking maybe we should add a test for each of these cases, but I think a better idea would be to make this an array with entries for each case, and I think even better would be an array of arrays, where the test string and the expected string are next to each other for easy comparison, readability, review, and updates.
If you agree, can you update accordingly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about dividing it into 3~4 test sections according to test string type, like Expensify/App#16762 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That works, but I still arrays are better than string concats, and an array of array such that the test and expected strings are next to each other is better than both
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The following implementation is acceptable ?
const testCases = [
{
testString: '[email protected] https://www.expensify.com',
resultString: '<a href="mailto:[email protected]">[email protected]</a> <a href="https://www.expensify.com" target="_blank" rel="noreferrer noopener">https://www.expensify.com</a>',
},
{
testString: '[email protected]://www.expensify.com',
resultString: '<a href="mailto:[email protected]">[email protected]</a>-<a href="https://www.expensify.com" target="_blank" rel="noreferrer noopener">https://www.expensify.com</a>',
},
];
testCases.forEach(testCase => {
expect(parser.replace(testCase.testString)).toBe(testCase.resultString);
});
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that looks better than what I had in mind :)
}, | ||
{ | ||
testString: 'expensify.com -expensify.com @expensify.com', | ||
resultString: '<a href="https://expensify.com" target="_blank" rel="noreferrer noopener">expensify.com</a> -<a href="https://expensify.com" target="_blank" rel="noreferrer noopener">expensify.com</a> @expensify.com', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a blocker, and it's not specific to this test case, but I'm wondering if we should use ://
instead of https://
, in case there are sites that don't support https so we don't end up rendering broken links
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't check https://
in the test case, you were about to point other cases ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what you mean. What I was wondering was wether the result string should be <a href="https://expensify.com/"...
or <a href="://expensify.com/"...
which would use whatever protocol is already being used.
This is related to my comment here, but we can keep this as is for now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes. <a href="https://expensify.com/"...
. is being used
}, | ||
{ | ||
testString: '//www.expensify.com?name=test&[email protected]', | ||
resultString: '//<a href="https://www.expensify.com?name=test&[email protected]" target="_blank" rel="noreferrer noopener">www.expensify.com?name=test&[email protected]</a>', |
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
resultString: '//<a href="https://www.expensify.com?name=test&[email protected]" target="_blank" rel="noreferrer noopener">www.expensify.com?name=test&[email protected]</a>', | ||
}, | ||
{ | ||
testString: '//staging.new.expensify.com/[email protected]', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add a test case for a string that starts with ://
like ://staging.new.expensify.com/[email protected]
and ensure that renders a link?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think ://
should be part of link unless it has http
or https
as prefix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean that we need the following test case instead of with //
?
{
testString: '://staging.new.expensify.com/[email protected]',
resultString: '://<a href="https://staging.new.expensify.com/[email protected]" target="_blank" rel="noreferrer noopener">staging.new.expensify.com/[email protected]</a>',
},
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this is my old school thinking, but on browsers, a link like ://expensify.com
translates into https://expensify.com
if the protocol being used on that page is https, and to http://expensify.com
is the protocol is http. That's what made me think we should parse ://expensify.com
as a URL, but maybe we don't need to do that, just wanted to get more thoughts.
This isn't a blocker
}, | ||
{ | ||
testString: 'https://www.expensify.com?name=test&[email protected]', | ||
resultString: '<a href="https://www.expensify.com?name=test&[email protected]" target="_blank" rel="noreferrer noopener">https://www.expensify.com?name=test&[email protected]</a>', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks wrong, since &
shouldn't be present in an href, this is only for HTML content
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'ts from old PR, and we have other test cases to write it in html attribute.
I think &: should be present there.
https://stackoverflow.com/questions/3705591/do-i-encode-ampersands-in-a-href
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice thanks for sharing that! TIL you can and should html-encode & signs in html attributes
}, | ||
{ | ||
testString: '[email protected] //staging.new.expensify.com/[email protected]&redirectUrl=https://google.com', | ||
resultString: '<a href="mailto:[email protected]">[email protected]</a> //<a href="https://staging.new.expensify.com/[email protected]&redirectUrl=https://google.com" target="_blank" rel="noreferrer noopener">staging.new.expensify.com/[email protected]&redirectUrl=https://google.com</a>', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lib/ExpensiMark.js
Outdated
// We want to avoid matching domains in email addresses so we don't render them as URLs. | ||
// If the matched Url has a leading @ sign, The Url's domain can be a domain of email address. | ||
// At this case, The matched string should not be parsed as Url, | ||
// But it can contain another Url which should be parsed. (e.g. [email protected]/https://www.test.com) | ||
// So We will retry to apply autolink rule to the string(e.g. /https://www.test.com) except for domain(e.g. expensify.com) after @ sign. | ||
let abort = false; | ||
let isReparsedByAutolink = false; | ||
|
||
if ((match.index !== 0) && (textToCheck[match.index - 1] === '@')) { | ||
const domainRegex = new RegExp('^(([a-z-0-9]+\\.)+[a-z]{2,})(\\S*)', 'i'); | ||
const domainMatch = domainRegex.exec(match[2]); | ||
|
||
if ((domainMatch !== null) && (domainMatch[3] !== '')) { | ||
isReparsedByAutolink = true; | ||
replacedText = replacedText.concat(domainMatch[1] + this.replace(domainMatch[3], {filterRules: ['autolink']})); | ||
} else { | ||
abort = true; | ||
} | ||
} | ||
|
||
if (abort || match[1].includes('<pre>')) { | ||
replacedText = replacedText.concat(textToCheck.substr(match.index, (match[0].length))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is still hard to understand, so here's a suggestion to make things clearer, but I'm not sure if what I'm saying in the comments, or if the updated variable names are accurate, so I could use your help
// We want to avoid matching domains in email addresses so we don't render them as URLs,
// but we need to check if there are valid URLs after the email address and render them accordingly,
// e.g. [email protected]/https://www.test.com
let isDoneMatching = false;
let shouldApplyAutoLinkAgain = true;
// If we find a URL with a leading @ sign, we need look for other domains in the rest of the string
if ((match.index !== 0) && (textToCheck[match.index - 1] === '@')) {
const domainRegex = new RegExp('^(([a-z-0-9]+\\.)+[a-z]{2,})(\\S*)', 'i');
const domainMatch = domainRegex.exec(match[2]);
// If we find another domain, we apply the auto link rule again and set a flag to avoid doing it
if ((domainMatch !== null) && (domainMatch[3] !== '')) {
replacedText = replacedText.concat(domainMatch[1] + this.replace(domainMatch[3], {filterRules: ['autolink']}));
shouldApplyAutoLinkAgain = false;
} else {
// Otherwise, we're done applying rules
isDoneMatching = true;
}
}
// Here we should probably explain in words what the code is doing, why, and what the condition means
if (isDoneMatching || match[1].includes('<pre>')) {
replacedText = replacedText.concat(textToCheck.substr(match.index, (match[0].length)));
} else if (shouldApplyAutoLinkAgain) {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Here we should probably explain in words what the code is doing, why, and what the condition means
it's out of scope for this PR.
// If we find a URL with a leading @ sign, we need look for other domains in the rest of the string
If we find a URL with a leading @ sign, we need look for other valid URL in the rest of the string.
// If we find another domain, we apply the auto link rule again and set a flag to avoid doing it
If we find another string trailing domain, we apply the auto link rule again and set a flag to avoid doing it
@cead22 I amended something on your comment, I think it's more accurate to explain
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's out of scope for this PR.
I understand you didn't write this in this PR, but we usually improve code around the code we're editing as we go, and given this isn't super clear, I think we should update it in this PR
If we find a URL with a leading @ sign, we need look for other valid URL in the rest of the string.
Small correction: If we find a URL with a leading @ sign, we need to look for other valid URLs in the rest of the string.
If we find another string trailing domain, we apply the auto link rule again and set a flag to avoid doing it
I don't think "string trailing domain" makes sense, how about
If we find another domain in the remainder of the string, we apply the auto link rule again and set a flag to avoid re-doing below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand you didn't write this in this PR, but we usually improve code around the code we're editing as we go, and given this isn't super clear, I think we should update it in this PR
for the condition match[1].includes('<pre>')
, we can comment like below.
// We don't want to apply link rule if match[1]
contains the code block inside the [] of the markdown e.g. [```example```](https://example.com)
other conditions's flags are already explained.
What the codes do can be figured out without comments, first one is to concatenate string without applying rule, second one is to apply rule
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we find another domain in the remainder of the string, we apply the auto link rule again and set a flag to avoid re-doing below
it's good
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me 👍
@0xmiroslav wanna give a final review on the latest update before we merge? |
reviewed and retested. all good to merge. |
Fixed Issues
$ Expensify/App#16762
Proposal: Expensify/App#16762 (comment)
Tests
Go to a chat and send following test strings and verify that urls and emails are parsed as expected.
Test Strings
Demo Video:
Screencast.from.6-9-23.09.04.20.webm
Autotest cases:
ExpensiMark-HTML-test.js new tests