Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: parse deep link correctly #573

Merged
merged 3 commits into from
Sep 11, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
105 changes: 105 additions & 0 deletions __tests__/ExpensiMark-HTML-test.js
Original file line number Diff line number Diff line change
Expand Up @@ -609,6 +609,111 @@ test('Test a url ending with a closing parentheses autolinks correctly', () => {
expect(parser.replace(testString)).toBe(resultString);
});

test('Test urls autolinks correctly', () => {
const testCases = [
{
testString: '[email protected] https://www.expensify.com',
resultString: '<a href="mailto:[email protected]">[email protected]</a> <a href="https://www.expensify.com" target="_blank" rel="noreferrer noopener">https://www.expensify.com</a>',
},
{
testString: '[email protected]://www.expensify.com',
resultString: '<a href="mailto:[email protected]">[email protected]</a>-<a href="https://www.expensify.com" target="_blank" rel="noreferrer noopener">https://www.expensify.com</a>',
},
{
testString: '[email protected]/https://www.expensify.com',
resultString: '<a href="mailto:[email protected]">[email protected]</a>/<a href="https://www.expensify.com" target="_blank" rel="noreferrer noopener">https://www.expensify.com</a>',
},
{
testString: '[email protected]?https://www.expensify.com',
resultString: '<a href="mailto:[email protected]">[email protected]</a>?<a href="https://www.expensify.com" target="_blank" rel="noreferrer noopener">https://www.expensify.com</a>',
},
{
testString: '[email protected]>https://www.expensify.com',
resultString: '<a href="mailto:[email protected]">[email protected]</a>&gt;<a href="https://www.expensify.com" target="_blank" rel="noreferrer noopener">https://www.expensify.com</a>',
},
{
testString: 'https://staging.new.expensify.com/details/[email protected]',
resultString: '<a href="https://staging.new.expensify.com/details/[email protected]" target="_blank" rel="noreferrer noopener">https://staging.new.expensify.com/details/[email protected]</a>',
},
{
testString: 'staging.new.expensify.com/details',
resultString: '<a href="https://staging.new.expensify.com/details" target="_blank" rel="noreferrer noopener">staging.new.expensify.com/details</a>',
},
{
testString: 'https://www.expensify.com?name=test&[email protected]',
resultString: '<a href="https://www.expensify.com?name=test&amp;[email protected]" target="_blank" rel="noreferrer noopener">https://www.expensify.com?name=test&amp;[email protected]</a>',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks wrong, since &amp; shouldn't be present in an href, this is only for HTML content

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'ts from old PR, and we have other test cases to write it in html attribute.
I think &amp: should be present there.
https://stackoverflow.com/questions/3705591/do-i-encode-ampersands-in-a-href

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is out of scope for this PR. All resultString in test cases converts & to &amp; (not only changes from this PR)
And this is actually decoded again:

Screenshot 2023-09-07 at 10 13 41 PM

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice thanks for sharing that! TIL you can and should html-encode & signs in html attributes

},
{
testString: 'https://staging.new.expensify.com/[email protected]',
resultString: '<a href="https://staging.new.expensify.com/[email protected]" target="_blank" rel="noreferrer noopener">https://staging.new.expensify.com/[email protected]</a>',
},
{
testString: 'staging.new.expensify.com/[email protected]',
resultString: '<a href="https://staging.new.expensify.com/[email protected]" target="_blank" rel="noreferrer noopener">staging.new.expensify.com/[email protected]</a>',
},
{
testString: 'http://necolas.github.io/react-native-web/docs/?path=/docs/components-pressable--disabled',
resultString: '<a href="http://necolas.github.io/react-native-web/docs/?path=/docs/components-pressable--disabled" target="_blank" rel="noreferrer noopener">http://necolas.github.io/react-native-web/docs/?path=/docs/components-pressable--disabled</a>',
},
{
testString: '-https://www.expensify.com /https://www.expensify.com @https://www.expensify.com',
resultString: '-<a href="https://www.expensify.com" target="_blank" rel="noreferrer noopener">https://www.expensify.com</a> /<a href="https://www.expensify.com" target="_blank" rel="noreferrer noopener">https://www.expensify.com</a> @https://www.expensify.com',
},
{
testString: 'expensify.com -expensify.com @expensify.com',
resultString: '<a href="https://expensify.com" target="_blank" rel="noreferrer noopener">expensify.com</a> -<a href="https://expensify.com" target="_blank" rel="noreferrer noopener">expensify.com</a> @expensify.com',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a blocker, and it's not specific to this test case, but I'm wondering if we should use :// instead of https://, in case there are sites that don't support https so we don't end up rendering broken links

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't check https:// in the test case, you were about to point other cases ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what you mean. What I was wondering was wether the result string should be <a href="https://expensify.com/"... or <a href="://expensify.com/"... which would use whatever protocol is already being used.

This is related to my comment here, but we can keep this as is for now

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. <a href="https://expensify.com/"... . is being used

},
{
testString: 'https//www.expensify.com',
resultString: 'https//<a href="https://www.expensify.com" target="_blank" rel="noreferrer noopener">www.expensify.com</a>',
},
{
testString: '//www.expensify.com?name=test&[email protected]',
resultString: '//<a href="https://www.expensify.com?name=test&amp;[email protected]" target="_blank" rel="noreferrer noopener">www.expensify.com?name=test&amp;[email protected]</a>',

This comment was marked as resolved.

},
{
testString: '//staging.new.expensify.com/[email protected]',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add a test case for a string that starts with :// like ://staging.new.expensify.com/[email protected] and ensure that renders a link?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think :// should be part of link unless it has http or https as prefix

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean that we need the following test case instead of with // ?

{
            testString: '://staging.new.expensify.com/[email protected]',
            resultString: '://<a href="https://staging.new.expensify.com/[email protected]" target="_blank" rel="noreferrer noopener">staging.new.expensify.com/[email protected]</a>',
        },

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this is my old school thinking, but on browsers, a link like ://expensify.com translates into https://expensify.com if the protocol being used on that page is https, and to http://expensify.com is the protocol is http. That's what made me think we should parse ://expensify.com as a URL, but maybe we don't need to do that, just wanted to get more thoughts.

This isn't a blocker

resultString: '//<a href="https://staging.new.expensify.com/[email protected]" target="_blank" rel="noreferrer noopener">staging.new.expensify.com/[email protected]</a>',
},
{
testString: '/[email protected]',
resultString: '/details?login=<a href="mailto:[email protected]">[email protected]</a>',
},
{
testString: '?name=test&[email protected]',
resultString: '?name=test&amp;email=<a href="mailto:[email protected]">[email protected]</a>',
},
{
testString: 'example.com/https://www.expensify.com',
resultString: '<a href="https://example.com/https://www.expensify.com" target="_blank" rel="noreferrer noopener">example.com/https://www.expensify.com</a>',
},
{
testString: '[email protected] staging.new.expensify.com/[email protected]&redirectUrl=https://google.com',
resultString: '<a href="mailto:[email protected]">[email protected]</a> <a href="https://staging.new.expensify.com/[email protected]&amp;redirectUrl=https://google.com" target="_blank" rel="noreferrer noopener">staging.new.expensify.com/[email protected]&amp;redirectUrl=https://google.com</a>',
},
{
testString: '[email protected] //staging.new.expensify.com/[email protected]&redirectUrl=https://google.com',
resultString: '<a href="mailto:[email protected]">[email protected]</a> //<a href="https://staging.new.expensify.com/[email protected]&amp;redirectUrl=https://google.com" target="_blank" rel="noreferrer noopener">staging.new.expensify.com/[email protected]&amp;redirectUrl=https://google.com</a>',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we be URL-encoding @ signs as %40? I thought it was weird to have explicit @ signs in URLs at first and testing this testString on slack, I noticed it renders something very different which isn't related to this, but made me wonder again

Copy link
Contributor Author

@Antasel Antasel Sep 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Screenshot_1

On discord, we have explicit @ signs in Urls,
and on this Github platform as well

},
{
testString: '[email protected]://staging.new.expensify.com/[email protected]&redirectUrl=https://google.com',
resultString: '<a href="mailto:[email protected]">[email protected]</a>-<a href="https://staging.new.expensify.com/[email protected]&amp;redirectUrl=https://google.com" target="_blank" rel="noreferrer noopener">https://staging.new.expensify.com/[email protected]&amp;redirectUrl=https://google.com</a>',
},
{
testString: '[email protected]/https://example.com/[email protected][email protected]',
resultString: '<a href="mailto:[email protected]\">[email protected]</a>/<a href="https://example.com/[email protected][email protected]" target="_blank" rel="noreferrer noopener">https://example.com/[email protected][email protected]</a>',
},
{
testString: '[email protected]/[email protected]/https://example.com/[email protected][email protected]',
resultString: '<a href="mailto:[email protected]">[email protected]</a>/<a href="mailto:[email protected]">[email protected]</a>/<a href="https://example.com/[email protected][email protected]" target="_blank" rel="noreferrer noopener">https://example.com/[email protected][email protected]</a>',
},
];

testCases.forEach(testCase => {
expect(parser.replace(testCase.testString)).toBe(testCase.resultString);
});
});

test('Test markdown style email link with various styles', () => {
const testString = 'Go to ~[Expensify]([email protected])~ '
+ '_[Expensify]([email protected])_ '
Expand Down
32 changes: 24 additions & 8 deletions lib/ExpensiMark.js
Original file line number Diff line number Diff line change
Expand Up @@ -446,12 +446,6 @@ export default class ExpensiMark {
let startIndex = 0;

while (match !== null) {
// we want to avoid matching email address domains
let abort = false;
if ((match.index !== 0) && (textToCheck[match.index - 1] === '@')) {
abort = true;
}

// we want to avoid matching ending ) unless it is a closing parenthesis for the URL
if (textToCheck[(match.index + match[2].length) - 1] === ')' && !match[2].includes('(')) {
match[0] = match[0].substr(0, match[0].length - 1);
Expand All @@ -476,9 +470,31 @@ export default class ExpensiMark {
}
replacedText = replacedText.concat(textToCheck.substr(startIndex, (match.index - startIndex)));

if (abort || match[1].includes('<pre>')) {
// We want to avoid matching domains in email addresses so we don't render them as URLs,
// but we need to check if there are valid URLs after the email address and render them accordingly,
// e.g. [email protected]/https://www.test.com
let isDoneMatching = false;
let shouldApplyAutoLinkAgain = true;

// If we find a URL with a leading @ sign, we need look for other domains in the rest of the string
if ((match.index !== 0) && (textToCheck[match.index - 1] === '@')) {
const domainRegex = new RegExp('^(([a-z-0-9]+\\.)+[a-z]{2,})(\\S*)', 'i');
const domainMatch = domainRegex.exec(match[2]);

// If we find another domain in the remainder of the string, we apply the auto link rule again and set a flag to avoid re-doing below.
if ((domainMatch !== null) && (domainMatch[3] !== '')) {
replacedText = replacedText.concat(domainMatch[1] + this.replace(domainMatch[3], {filterRules: ['autolink']}));
shouldApplyAutoLinkAgain = false;
} else {
// Otherwise, we're done applying rules
isDoneMatching = true;
}
}

// We don't want to apply link rule if match[1] contains the code block inside the [] of the markdown e.g. [```example```](https://example.com)
if (isDoneMatching || match[1].includes('<pre>')) {
replacedText = replacedText.concat(textToCheck.substr(match.index, (match[0].length)));
} else {
} else if (shouldApplyAutoLinkAgain) {
const urlRegex = new RegExp(`^${LOOSE_URL_REGEX}$|^${URL_REGEX}$`, 'i');

// `match[1]` contains the text inside the [] of the markdown e.g. [example](https://example.com)
Expand Down