Releases: topherPedersen/asktoby.php
AskToby.PHP Version 3.0.0
Version 2 of AskToby.PHP used a simple 3 table database. It ran well up until around 100,000 pages crawled. At this point, the crawler slowed nearly to a halt, most likely due to the sheer size of the tables. Version 3 seeks to fix this problem by breaking each of the three original tables into twenty six. Other than the breaking up the tables, this version is essentially the same. However, breaking up the database design required a complete re-write of the crawler. With this release I hope that the crawler will be able to index 1,000,000 pages as opposed to the 100,000 crawled by Version 2 of the toby-crawler.
AskToby.PHP Version 3.0.0a1 [ALPHA]
I've started work on AskToby.PHP Version 3. The crawler has been written, however no user interface code has been written. Also, the crawler has a major bug that needs to be fixed. However, the core code for Version 3 is all here!
AskToby.PHP Version 2.0.6
"Move Fast and Break Things"
Release Notes: Tweaked the crawl speed a bit. Versions 2 still has tons of bugs, but I'm still adding small patches here and there because I'm trying crawl 1 million pages before shutting down the AskToby.PHP crawler on my server. At the time of this release the AskToby.PHP crawler has crawled 88,000 pages in about a week. After the bot crawls 1,000,000 pages, I will stop patching Version 2 and will move on to Version 3.
AskToby.PHP Version 2.0.5
"Move Fast and Break Things"
Dialed back the crawl rate because it was causing my virtual private server to crash. It appears that as the MySQL database grows, performance decreases dramatically. Will likely need to read up on database design and then redesign the database.
AskToby.PHP Version 2.0.4
"Move Fast And Break Things"
Security Patch & Bug Fix Added
NOTE: The 2.0.4 version of the crawler was able to scrape 87444 web pages in 7 days on a cheap GoDaddy VPS before the server began crashing today. It may be time to move on to writing version 2.1.
AskToby.PHP Version 2.0.3
"Move Fast And Break Things"
Added a Security Patch (SQL Injection)
asktoby.php Version 2.0.2
"Done Is Better Than Perfect"
Version 2.0.2 just adds a little styling to the user-interface. It definitely still needs work to look good on different screen sizes, but should look decent on most laptops. No work has been done on the mobile version of the interface. In fact, there is a little snippet of JavaScript code that whites out the interface completely on mobile and triggers an alert that reads: "I haven't made asktoby.php mobile friendly yet. So if you're seeing this JavaScript alert, join my open source project and make it mobile friendly!".
asktoby.php Version 2.0.0
"Done is Better Than Perfect"
I completely rewrote the tobycrawler bot to increase its performance. The old crawler ran in series, crawling one site before crawling the next. Version 2.0.0 has been rewritten to crawl 20 sites at a time in parallel. This has resulted in a 10X performance increase. However, additional performance increases will likely need to be made through increasing server bandwidth, as bandwidth appears to be the limiting factor in crawl speed, much more than computing power. Also, please note that this code was committed to github as it was when I first ran the new crawler successfully, and has not been refactored at all. Likewise, much of the new code was written on a quirky text editor on my Chromebook which has a tendency to throw off all of my indentation. But hey, it works!
asktoby.php Version 1.0.0
"Done is Better than Perfect"
This is the code as it was when I made my first successful search query. There are tons of bugs, and the web crawler couldn't be any slower, but it works! The web crawler currently runs in series, one site crawled at a time. However, Version 2 will need to run in parallel with possibly 100 sites at a time in order to successfully index every top level domain homepage on the web.
Version 1.0 crawls at a pace of 3 sites a minute. This works out to roughly 1.5 million sites a year. However, with Version 2.0 the web crawler should be able to hit 100 million plus sites a year, enough to index every top level domain homepage on the web.