You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I note that url-bot seems to decline to recognize a substring of an IRC message as a URL if that substring is not delimited on each side by whitespace (or by the start or end of the IRC message). Seeing no evidence in the existing issue tickets that this has been questioned previously, I would like to suggest that this may be overly conservative.
In particular, appendix C of the current IETF RFC on URLs, IETF RFC 3986, suggests, besides whitespace, delimiting URLs with double quotation marks or < and > (I tend to follow the latter suggestion) and recommends that,
For robustness, software that accepts user-typed URI [sic] should attempt to recognize and strip [...] delimiters [...]
The text was updated successfully, but these errors were encountered:
Hi, thanks for the suggestion, and yes, I totally agree. The parser used in url-bot currently is rather simplistic, only splitting message strings by whitespace. In any case it would clearly require a more complex parser than what we have to achieve good results, or to adhere more closely to the spec.
As it turns out, there seems to be a crate, urlocate, which seems to be designed for doing just what would be needed, extracting URLs from context, so that seems to be a good candidate. It also has no dependencies, which is nice. So I'm thinking that could be worth some investigation as a direction to go with this.
I note that url-bot seems to decline to recognize a substring of an IRC message as a URL if that substring is not delimited on each side by whitespace (or by the start or end of the IRC message). Seeing no evidence in the existing issue tickets that this has been questioned previously, I would like to suggest that this may be overly conservative.
In particular, appendix C of the current IETF RFC on URLs, IETF RFC 3986, suggests, besides whitespace, delimiting URLs with double quotation marks or
<
and>
(I tend to follow the latter suggestion) and recommends that,The text was updated successfully, but these errors were encountered: