This project crawls and rips entire tumblr image blogs into a local cache, and provides a flexible interface to navigate through the store images.
It is something of a re-implementation of the Tumblr Collage chrome extension, which I felt became too unstable on larger image sites. The browser-based nature of the original extension means it wasn't really possible to implement things I really wanted, like saving your current location in a large collection of images.
This is also just a little demo project I built to experiment with a number of technologies I had my eye on for a while, so don't expect much in the way of support.
- CouchDB database, with all images as attachments.
- Elasticsearch used to index and query the data. (Indexes the couch documents via the couchdb river)
- Node.js based proxy, that mostly is a straight pass-through to ES/Couch.
- CouchApp to host the UI. (my first)
- node.io based scraping back-end, allowing multi-threaded mirroring of sites. (my first)
- Angular based front end (my first)
- Yeoman used for the couchapp generator (my first time using)
- Bower used for client-side dependencies, as I'm usually a browserify guy (my first time)
- Grunt used for complex build processes (my first time using for non-trivial things)
- Masonry based cascading grid layout (my first time using)
- ngInfiniteScroll for infinite scrolling (my first time using)
- couchdb running on localhost:5984
- elasticsearch running on localhost:9200
- node.js + npm
# how to get said requirements running :
# get the osx command line tools first
# either from developer.apple.com/download, or from xcode.
# install homebrew
ruby -e "$(curl -fsSL https://raw.github.com/mxcl/homebrew/go)"
# install elastic search
brew install elasticsearch
plugin -install elasticsearch/elasticsearch-river-couchdb/1.2.0
# install couch db from the package on couchdb.apache.org
# install nvm for node
touch ~/.profile
curl https://raw.github.com/creationix/nvm/master/install.sh | sh
source ~/.nvm/nvm.sh
# install node
nvm install 0.10
nvm use 0.10
npm install -g grunt-cli bower node.io
# check out this repo
git checkout 'whatever'
cd browsr
# main app
npm install
# couch app
cd couch-app
npm install
bower install
sh install.sh
# build an pushes app to couchdb
grunt
# run the main node app
cd ..
node index.js
application should now be listening on localhost:5000
to download blogs, either use the command line :
cd $browsrDir
node.io tumblr sitename [start] [amount] [perbatch]
//ex:
node.io tumblr jl8comic 1 200 10
this is also mounted on localhost, so you can access the following to do the same :
http://localhost:5000/jobs/tumblr/jl8comic/1/200/10
I use a bookmarklet to do that.
- Better initiation of download tasks, rather than cli or url hacking
- Show completion status of download tasks
- Store download task history for future re-init.
- Schedule regular fetching of specific tasks.
- Configure right-click-action to have 'favorite' and 'tag' modes.
- Figure out how to set a proper 'lastSeen' tag on multiple images at once.
- Integrate with tumblr api through the passport-tumblr auth strategy. Should sync likes and follows.