Skip to content

Commit

Permalink
Allow ACLJs to use *, SURT wildcard to match all URLs (#882)
Browse files Browse the repository at this point in the history
Also adds tests and documentation
  • Loading branch information
tw4l authored Apr 3, 2024
1 parent d1e1636 commit 86ee3bd
Show file tree
Hide file tree
Showing 5 changed files with 26 additions and 0 deletions.
10 changes: 10 additions & 0 deletions docs/manual/access-control.rst
Original file line number Diff line number Diff line change
Expand Up @@ -105,6 +105,12 @@ Given these rules, a user would:
* but would receive an 'access blocked' error message when viewing ``http://httpbin.org/`` (block)
* would receive a 404 not found error when viewing ``http://httpbin.org/anything`` (exclude)

To match any possible URL in an .aclj file, set ``*,`` as the leading SURT, for example::

*, - {"access": "allow"}

Lines starting with ``*,`` should generally be at the end of the file, respecting the reverse alphabetical order.


Access Types: allow, block, exclude, allow_ignore_embargo
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Expand Down Expand Up @@ -149,6 +155,10 @@ To make this work, pywb must be running behind an Apache or Nginx system that is

For example, this header may be set based on IP range, or based on password authentication.

To allow a user access to all URLs, overriding more specific rules and the ``default_access`` configuration setting, use the ``*,`` SURT::

*, - {"access": "allow", "user": "staff"}

Further examples of how to set this header will be provided in the deployments section.

**Note: Do not use the user-based rules without configuring proper authentication on an Apache or Nginx frontend to set or remove this header, otherwise the 'X-Pywb-ACL-User' can easily be faked.**
Expand Down
4 changes: 4 additions & 0 deletions pywb/warcserver/access_checker.py
Original file line number Diff line number Diff line change
Expand Up @@ -260,6 +260,10 @@ def find_access_rule(self, url, ts=None, urlkey=None, collection=None, acl_user=
if key.startswith(acl_key):
acl_obj = CDXObject(acl)

# Check for "*," in ACL, which matches any URL
if acl_key == b"*,":
acl_obj = CDXObject(acl)

if acl_obj:
user = acl_obj.get('user')
if user == acl_user:
Expand Down
1 change: 1 addition & 0 deletions sample_archive/access/allow_all.aclj
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
*, - {"access": "allow", "user": "staff"}
7 changes: 7 additions & 0 deletions tests/config_test_access.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,13 @@ collections:
acl_paths:
- ./sample_archive/access/pywb.aclj

pywb-wildcard-surt:
index_paths: ./sample_archive/cdx/
archive_paths: ./sample_archive/warcs/
default_access: block
acl_paths:
- ./sample_archive/access/allow_all.aclj




4 changes: 4 additions & 0 deletions tests/test_acl.py
Original file line number Diff line number Diff line change
Expand Up @@ -96,5 +96,9 @@ def test_allowed_different_coll_acl_dir(self):

assert '"http://httpbin.org/anything/resource.json"' in resp.text

def test_allow_all_acl_user_specific(self):
resp = self.testapp.get('/pywb-wildcard-surt/mp_/http://example.com/', status=451)

assert 'Access Blocked' in resp.text

resp = self.testapp.get('/pywb-wildcard-surt/mp_/http://example.com/', headers={"X-Pywb-Acl-User": "staff"}, status=200)

0 comments on commit 86ee3bd

Please sign in to comment.