A basic and tiny crawler that uses the Web-State-Machine algorithms.
First, you will need the dependencies, just install Node.js, npm and bower
Install npm dependencies
npm install
Install browser dependencies
bower install
You can run website on any webserver, but since some of the examples in /sites/
are written in php
, you may want to run this:
php -S localhost:8000
Requires PHP version 5.4.0
or higher. For details, see PHP Built-in web server
You could also use python like this:
python -m SimpleHTTPServer 8000
Don't try to run index.html
directly from your filesystem because you will get some javascript error like Error: Permission denied to access property 'document'
or SecurityError: The operation is insecure.
You should consider launching Google Chrome
with --disable-web-security
to bypass SOP with javascript execution inside an iframe
.
Note: it won't allow access to websites that specify the origin in their header.
More info can be found on stackoverflow chrome-disable-same-origin-policy
Use gruntjs
grunt
At this time, there is not much in gruntfile
yet, livereload works atm but we may add minification and other cool things later ;)
- Provide a way to run headleassly using PhantomJS or something like dalekjs to run this in multiple browsers.
- Fix the
vanilla
infinite loop at end of crawl.- Temp fix: uncheck
Explore automatically
inWSM settings
when it's done
- Temp fix: uncheck
- Set a max width for
Next element to click:
inWSM statistics
- Move
AutoRefresh
on a fixed place so it's easier to click when graphs updates automatically
Pull requests are welcome!