-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid redirecting user to Via if target URL is known to embed client #155
Conversation
If the target URL is known to embed the client, sending the user directly to that URL will be a better experience than sending the user through a proxy service. There are various ways that we could detect whether a target URL embeds the client. This commit implements the dumbest possible approach of a hardcoded list of URL patterns. For consistency with how annotation search works, HTTP and HTTPS are treated as equivalent. The query string and fragment are also ignored when testing for a possible match.
90bb105
to
4c0ba42
Compare
Codecov Report
@@ Coverage Diff @@
## master #155 +/- ##
==========================================
- Coverage 90.43% 90.32% -0.11%
==========================================
Files 10 12 +2
Lines 533 641 +108
==========================================
+ Hits 482 579 +97
- Misses 51 62 +11
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. I'm glad we're using a simple hard-coded list here, rather than trying to do something automatic. I think just giving people at Hypothesis an admin page to edit the list manually would be a sensible next evolution of it (if we find ourselves having to change the list often). Cool that this solution is already gonna apply to two of our own sites, as well!
bouncer/embed_detector.py
Outdated
|
||
Patterns are shell-style wildcards ('*' matches any number of chars, '?' | ||
matches a single char). | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Attribute docstrings are actually supposed to go below the attribute, not above it. I'm not sure whether or not tools (e.g. Sphinx) will pick this up if it's above the attribute. To be honest, though, this isn't public API (not even in the sense of API that this module exports for other bouncer modules to use), it's only used within this module, so I think a simple code comment rather than a docstring would be fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I'll make it a comment.
"web.hypothes.is/blog/*", | ||
] | ||
|
||
COMPILED_PATTERNS = [re.compile(fnmatch.translate(pat)) for pat in PATTERNS] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not super familiar but I'm wondering why it's necessary to translate the fnmatch pattern into a regex? Can't fnmatch be used directly to match against the URL?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fnmatch
internally works by translating the shell-style wildcard into a regex which is then matched against the target string. This is a small optimization by pre-compiling the target pattern.
Agreed. For that we would need to decide where to store the list and depending on where that is, provide APIs + UIs to manage it. |
fnmatch patterns have enough flexibility for this use case and this avoids the need to remember to escape periods, which is easy to forget otherwise. The resulting patterns are also easier to read.
5a34ed0
to
73e4789
Compare
If the target URL is known to embed the client, sending the user
directly to that URL will be a better experience than sending the user
through a proxy service.
There are various ways that we could detect whether a target URL embeds
the client. This commit implements the dumbest possible approach of a
hardcoded list of URL patterns.
The initial list includes a couple of Hypothesis-related URLs. Our first use for this will be in solving hypothesis/product-backlog#814. We'll add the URL pattern(s) for that once they have been confirmed.