Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove "proxy" references and fix scripts' paths #210

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 2 additions & 13 deletions job-conf.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,6 @@ Example

id: myjob
time_limit: 60 # seconds
proxy: 127.0.0.1:8000 # point at warcprox for archiving
ignore_robots: false
max_claimed_sites: 2
warcprox_meta:
Expand Down Expand Up @@ -186,16 +185,6 @@ enforced at the seed level. If a time limit is specified at the top level, it
is inherited by each seed as described above, and enforced individually on each
seed.

``proxy``
~~~~~~~~~
+--------+----------+---------+
| type | required | default |
+========+==========+=========+
| string | no | *none* |
+--------+----------+---------+
HTTP proxy, with the format ``host:port``. Typically configured to point to
warcprox for archival crawling.

``ignore_robots``
~~~~~~~~~~~~~~~~~
+---------+----------+-----------+
Expand Down Expand Up @@ -226,8 +215,8 @@ to contact the operator if the crawl is causing problems.
+============+==========+===========+
| dictionary | no | ``false`` |
+------------+----------+-----------+
Specifies the ``Warcprox-Meta`` header to send with every request, if ``proxy``
is configured. The value of the ``Warcprox-Meta`` header is a json blob. It is
Specifies the ``Warcprox-Meta`` header to send with every request.
The value of the ``Warcprox-Meta`` header is a json blob. It is
used to pass settings and information to warcprox. Warcprox does not forward
the header on to the remote site. For further explanation of this field and
its uses see
Expand Down
5 changes: 2 additions & 3 deletions vagrant/README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -31,16 +31,15 @@ Then you can run brozzler-new-site:

::

(brozzler-ve3)vagrant@brzl:~$ brozzler-new-site --proxy=localhost:8000 http://example.com/
(brozzler-ve3)vagrant@brzl:~$ brozzler-new-site http://example.com/


Or brozzler-new-job (make sure to set the proxy to localhost:8000):
Or brozzler-new-job:

::

(brozzler-ve3)vagrant@brzl:~$ cat >job1.yml <<EOF
id: job1
proxy: localhost:8000 # point at warcprox for archiving
seeds:
- url: https://example.org/
EOF
Expand Down
7 changes: 4 additions & 3 deletions vagrant/vagrant-brozzler-new-job.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,16 +33,17 @@ def main(argv=[]):
'job_conf_file', metavar='JOB_CONF_FILE',
help='brozzler job configuration file in yaml')
args = arg_parser.parse_args(args=argv[1:])
args.job_conf_file = os.path.realpath(args.job_conf_file)

# cd to path with Vagrantfile so "vagrant ssh" knows what to do
os.chdir(os.path.dirname(__file__))
os.chdir(os.path.realpath(os.path.dirname(__file__)))

with open(args.job_conf_file, 'rb') as f:
subprocess.call([
'vagrant', 'ssh', '--',
'f=`mktemp` && cat > $f && '
'/home/vagrant/brozzler-ve3/bin/python '
'/home/vagrant/brozzler-ve3/bin/brozzler-new-job $f'],
'/opt/brozzler-ve3/bin/python '
'/opt/brozzler-ve3/bin/brozzler-new-job $f'],
stdin=f)

if __name__ == '__main__':
Expand Down
2 changes: 1 addition & 1 deletion vagrant/vagrant-brozzler-new-site.py
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,7 @@ def main(argv=[]):
options.append('--verbose')

# cd to path with Vagrantfile so "vagrant ssh" knows what to do
os.chdir(os.path.dirname(__file__))
os.chdir(os.path.realpath(os.path.dirname(__file__)))

cmd = (
'/opt/brozzler-ve3/bin/python /opt/brozzler-ve3/bin/brozzler-new-site '
Expand Down