µ-Crawler

A basic and tiny crawler that uses the Web-State-Machine algorithms.

Installation

First, you will need the dependencies, just install Node.js, npm and bower

Install npm dependencies

npm install

Install browser dependencies

bower install

Usage

You can run website on any webserver, but since some of the examples in /sites/ are written in php, you may want to run this:

php -S localhost:8000

Requires PHP version 5.4.0 or higher. For details, see PHP Built-in web server

You could also use python like this:

python -m SimpleHTTPServer 8000

Don't try to run index.html directly from your filesystem because you will get some javascript error like Error: Permission denied to access property 'document' or SecurityError: The operation is insecure.

Same Origin Policy

You should consider launching Google Chrome with --disable-web-security to bypass SOP with javascript execution inside an iframe.

Note: it won't allow access to websites that specify the origin in their header.

More info can be found on stackoverflow chrome-disable-same-origin-policy

Development

Use gruntjs

grunt

At this time, there is not much in gruntfile yet, livereload works atm but we may add minification and other cool things later ;)

Todo

Provide a way to run headleassly using PhantomJS or something like dalekjs to run this in multiple browsers.
Fix the vanilla infinite loop at end of crawl.
- Temp fix: uncheck Explore automatically in WSM settings when it's done
Set a max width for Next element to click: in WSM statistics
Move AutoRefresh on a fixed place so it's easier to click when graphs updates automatically

Pull requests are welcome!

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
images		images
scripts		scripts
sites		sites
style		style
.bowerrc		.bowerrc
.gitignore		.gitignore
DotAnimator.php		DotAnimator.php
Gruntfile.js		Gruntfile.js
Licence.md		Licence.md
ReadMe.md		ReadMe.md
bower.json		bower.json
gpl-3.0.txt		gpl-3.0.txt
index.html		index.html
out.gif		out.gif
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

µ-Crawler

Installation

Usage

Same Origin Policy

Development

Todo

About

Releases

Packages

Contributors 2

Languages

License

WebMole/Micro-Crawler

Folders and files

Latest commit

History

Repository files navigation

µ-Crawler

Installation

Usage

Same Origin Policy

Development

Todo

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages