-
Notifications
You must be signed in to change notification settings - Fork 3
/
Copy pathINSTALL.txt
79 lines (43 loc) · 1.97 KB
/
INSTALL.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
INSTALL
DEV ENVIRONMENT
pip install virtualenv
pip install virtualenvwrapper
** setup virtualenvwrapper - http://virtualenvwrapper.readthedocs.org/en/latest/
mkvirtualenv scraperdev
workon scraperdev
pip install scrapy feedparser xlrd mysql-python python-dateutil psycopg2 pdfminer beautifulsoup
** manual install of FTTools (should setup so we can install directly from the git repo)
Prerequisites
Python 2.6+
Scrapy 0.16 pip install scrapy (web scrpaing framework)
Feedparser 5.1 pip install feedparser (for parsing RSS feeds)
xlrd 0.7 pip install xldr (for parsing excel files)
mysql-python pip install mysql-python (python interface to mysql)
python-dateutil 2.1 pip install python-dateutil (fuzzy date parsing)
psycopg2 pip install psycopg2 (python interface to PostGreSQL)
FTTools 1.1.5 [manaul install] (fusion table sync utility)
pdfminer 20110515 pip install pdfminer (pdf parser)
beautifulsoup 3.2.1 pip install beautifulsoup (html parser)
fracfocustool 0.0.2 [manual install] (FracFocus.org pdf parser)
MYSQL database setup
create a new database
create a user will full permissions to the db
import schema/mysql.sql
test data in ./data/
PGSQL database setup
create a new database
create a user with full permissions to the db
install POSTGIS extensions into database
CREATE EXTENSION postgis;
import schema/pgsql.sql
Scraper Settings
update nrc/nrc/settings.py
set database parameters
Test Scenarios
Dependency test
scrapy crawl BotMonitor
with an empty database this should do nothing. But it will test all teh dependencies nicely
RssFeeds
import data/RssFeed_Data.sql into the mysql database
scrapy crawl RssFeedScraper
This should scrape items from 2 feeds and generate some alerts