Change Log since 0.6.2:
v0.7.0 26edf33
- added feature to allow crawling specific file extensions (html, htm, txt)
- added check to keep crawler from crawling offsite URLs
- added flag "-delay" to avoid rate limiting (-delay 100 == 100ms delay between URL requests)
- added write buffer for better performance on large files
- increased crawl depth from 5 to 100 (not recommended, but enabled for edge cases)
- fixed out of bounds slice bug when crawling URLs with NIL characters
- fixed bug when attempting to crawl deeper than available URLs to crawl
- fixed crawl depth calculation
- optimized code which runs 2.8x faster vs v0.6.x during bench testing
-
v0.7.1 81a5439
- added progress bars to word / ngrams processing & file writing operations
- added RAM usage monitoring
- optimized order of operations for faster processing with less RAM
TO-DO: refactor code
045bca70d0f8be6326c9bae5c4f412f0af183f2859b5ac30f4e6efdfe06316bd spider_amd64.bin
8b0525a46a6aca19256e1326338a59e58585933558e85319d68cb0c609c500b2 spider_amd64-darwin
9671739d795c8913659c8169827124ba78725aef3205579d688d058571a9c96b spider_amd64.exe
50093e85868b77f40e5ece131597e9bbcda646fb2a80970a4d6791e7292a8f01 spider_arm64.bin
0ce2d7b5b232f82c3fe3fd5ca45659e7746400db4ac7842e29757fc226f26d76 spider_armhf.bin
Jotti Antivirus Scan Results
https://virusscan.jotti.org/en-US/filescanjob/fhv8de86sm,jrqzgdwd2b,sc18q9y8uj,gmi0zwcqs2,kjjl0g74m9