nodejs daemon application which will scrap the imdb movie data and insert into the local mongo database. It has a movie dashboard which updates data getting entered into the local database into a dashboard with websockets.
- Mongo DB installed
- Node.js installed
- Make sure you have node.js and MongoDB installed.
- cd to the application root and do
npm install
which installs the dependencies - do
node app.js
to start the daemon. - take
localhost:3000
to see the scrapped movie data pushed on to dashboard and check your localmongo
instance for the collected data.
- Mongo Grid FS for storing the images scrapped from imdb.
- Websockets for realtime dashboard.
- cheerio - as html parser
- express - for the dashboard app
- jade - templating engine
- mongodb - node driver for mongo
- socket.io - for the realtime push
The configuration of this application can be done via the config.json file in the application root. The various config parameters are:
-
mongodb : The mongo db connection url which has the ip,port, authentication and the db details. If you are connectiong to the local mongo please leave it as such. The default database used will be imdb connection to the mongo instance at localhost:27017.
-
mongocollection : Name the collection to which the movie data has to be inserted. Defaults to events.
-
movieId : The imdb movie id from which we have to scrap the data. You can set it to 1 if you want to scrap all the data. The movie id refers to the integer id in the imdb url after the
tt
. -
application_port : The port at which the dashboard app should run. Defaults to 3000.
-
req_pool : The http request pool size. This refers to the maximum number of http requests that would be initiated in parallel to the imdb website.