Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid redirecting user to Via if target URL is known to embed client #155

Merged
merged 2 commits into from
Sep 28, 2018

Conversation

robertknight
Copy link
Member

If the target URL is known to embed the client, sending the user
directly to that URL will be a better experience than sending the user
through a proxy service.

There are various ways that we could detect whether a target URL embeds
the client. This commit implements the dumbest possible approach of a
hardcoded list of URL patterns.

The initial list includes a couple of Hypothesis-related URLs. Our first use for this will be in solving hypothesis/product-backlog#814. We'll add the URL pattern(s) for that once they have been confirmed.

If the target URL is known to embed the client, sending the user
directly to that URL will be a better experience than sending the user
through a proxy service.

There are various ways that we could detect whether a target URL embeds
the client. This commit implements the dumbest possible approach of a
hardcoded list of URL patterns.

For consistency with how annotation search works, HTTP and HTTPS are
treated as equivalent. The query string and fragment are also ignored
when testing for a possible match.
@codecov
Copy link

codecov bot commented Sep 27, 2018

Codecov Report

Merging #155 into master will decrease coverage by 0.1%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #155      +/-   ##
==========================================
- Coverage   90.43%   90.32%   -0.11%     
==========================================
  Files          10       12       +2     
  Lines         533      641     +108     
==========================================
+ Hits          482      579      +97     
- Misses         51       62      +11
Impacted Files Coverage Δ
bouncer/views.py 89.88% <100%> (-4.35%) ⬇️
tests/bouncer/views_test.py 100% <100%> (ø) ⬆️
bouncer/embed_detector.py 100% <100%> (ø)
tests/bouncer/embed_detector_test.py 100% <100%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update aa33921...73e4789. Read the comment docs.

@seanh seanh self-assigned this Sep 28, 2018
Copy link
Contributor

@seanh seanh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. I'm glad we're using a simple hard-coded list here, rather than trying to do something automatic. I think just giving people at Hypothesis an admin page to edit the list manually would be a sensible next evolution of it (if we find ourselves having to change the list often). Cool that this solution is already gonna apply to two of our own sites, as well!


Patterns are shell-style wildcards ('*' matches any number of chars, '?'
matches a single char).
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Attribute docstrings are actually supposed to go below the attribute, not above it. I'm not sure whether or not tools (e.g. Sphinx) will pick this up if it's above the attribute. To be honest, though, this isn't public API (not even in the sense of API that this module exports for other bouncer modules to use), it's only used within this module, so I think a simple code comment rather than a docstring would be fine.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I'll make it a comment.

"web.hypothes.is/blog/*",
]

COMPILED_PATTERNS = [re.compile(fnmatch.translate(pat)) for pat in PATTERNS]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not super familiar but I'm wondering why it's necessary to translate the fnmatch pattern into a regex? Can't fnmatch be used directly to match against the URL?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fnmatch internally works by translating the shell-style wildcard into a regex which is then matched against the target string. This is a small optimization by pre-compiling the target pattern.

@robertknight
Copy link
Member Author

. I think just giving people at Hypothesis an admin page to edit the list manually would be a sensible next evolution of it (if we find ourselves having to change the list often)

Agreed. For that we would need to decide where to store the list and depending on where that is, provide APIs + UIs to manage it.

fnmatch patterns have enough flexibility for this use case and this avoids the
need to remember to escape periods, which is easy to forget otherwise. The
resulting patterns are also easier to read.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants