Skip to content

Commit

Permalink
Merge pull request #6 from codders/feat/web-interface
Browse files Browse the repository at this point in the history
Added web interface for Flathunter
  • Loading branch information
codders authored May 27, 2020
2 parents 77f163f + d194a06 commit 60757d4
Show file tree
Hide file tree
Showing 37 changed files with 978 additions and 50 deletions.
2 changes: 1 addition & 1 deletion .coveragerc
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
[run]
branch = True
source = flathunter
command_line = -m unittest discover -s test
command_line = -m pytest

[report]
exclude_lines =
Expand Down
24 changes: 24 additions & 0 deletions .gcloudignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
# This file specifies files that are *not* uploaded to Google Cloud Platform
# using gcloud. It follows the same syntax as .gitignore, with the addition of
# "#!include" directives (which insert the entries of the given .gitignore-style
# file at that point).
#
# For more information, run:
# $ gcloud topic gcloudignore
#
.gcloudignore
# If you would like to upload your .git directory, .gitignore file or files
# from your .gitignore file, remove the corresponding line
# below:
.git
.gitignore

# Python pycache:
__pycache__/
# Ignored by the build system
/setup.cfg
/Pipfile
/Pipfile.lock

# Don't upload the database
processed_ids.db
1 change: 0 additions & 1 deletion .travis.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
language: python
python:
- "3.5"
- "3.6"
- "3.7"
- "3.8"
Expand Down
5 changes: 4 additions & 1 deletion Pipfile
Original file line number Diff line number Diff line change
Expand Up @@ -13,10 +13,13 @@ PyYAML = "==5.1.2"
requests = "==2.22.0"
urllib3 = "==1.25.6"
lxml = "==4.4.1"
html5lib = "==1.0.1"
coverage = "*"
codecov = "*"
requests-mock = "*"
Flask = "*"
Flask-API = "*"
firebase-admin = "*"
mock-firestore = "*"

[dev-packages]

Expand Down
333 changes: 319 additions & 14 deletions Pipfile.lock

Large diffs are not rendered by default.

83 changes: 79 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,18 @@ A Telegram bot to help people with their flat search

Flathunter is a Python application which periodically [scrapes](https://en.wikipedia.org/wiki/Web_scraping) property listings sites that the user has configured to find new apartment listings, and sends notifications of the new apartment to the user via [Telegram](https://en.wikipedia.org/wiki/Telegram_%28software%29).

## Table of Contents

- [Background](#background)
- [Install](#install)
- [Usage](#usage)
- [Command-line Interface](#command-line-interface)
- [Web Interface](#web-interface)
- [Testing](#testing)
- [Credits](#credits)
- [Contributing](#contributing)
- [License](#license)

## Background

There are at least four different rental property marketplace sites that are widely used in Germany - [ImmoScout24](https://www.immobilienscout24.de/), [immowelt](https://www.immowelt.de/), [WG-Gesucht](https://www.wg-gesucht.de/) and [ebay Kleinanzeigen](https://www.ebay-kleinanzeigen.de/). Most people end up searching through listings on all four sites on an almost daily basis during their flat search.
Expand All @@ -18,7 +30,7 @@ With Flathunter, instead of visiting the same pages on the same four sites every

## Install

Flathunter is a Python (v3.5+) project - you will need Python3 installed to run the code. We recommend using [pipenv](https://pipenv-fork.readthedocs.io/en/latest/) to setup and configure your project. Install `pipenv` according to the instructions on the `pipenv` site, then run:
Flathunter is a Python (v3.6+) project - you will need Python3 installed to run the code. We recommend using [pipenv](https://pipenv-fork.readthedocs.io/en/latest/) to setup and configure your project. Install `pipenv` according to the instructions on the `pipenv` site, then run:

```sh
$ pipenv install
Expand All @@ -32,6 +44,14 @@ $ pipenv shell

to launch a Python environment with the dependencies that your project requires.

Note that a `requirements.txt` file is included in this repository for compatibilty with Google Cloud. It should not be treated as canonical.

For development purposes, you need to install the flathunter module in your current environment. Simply run:

```sh
pip install -e .
```

### Configuration

Before running the project for the first time, copy `config.yaml.dist` to `config.yaml`. The `urls` and `telegram` sections of the config file must be configured according to your requirements before the project will run.
Expand Down Expand Up @@ -62,8 +82,42 @@ To use the distance calculation feature a [Google API-Key](https://developers.go

Since this feature is not free, it is "disabled". Read line 62 in hunter.py to re-enable it.

### Google Cloud Deployment

You can run `flathunter` on Google's App Engine, in the free tier, at no cost. To get started, first install the [Google Cloud SDK](https://cloud.google.com/sdk/docs) on your machine, and run:

```
$ gcloud init
```

to setup the SDK. You will need to create a new cloud project (or connect to an existing project). The Flathunters organisation uses the `flathunters` project ID to deploy the application. If you need access to deploy to that project, contact the maintainers.

```
$ gcloud config set project flathunters
```

You will need to add the project ID to `config.yaml` under the key `google_cloud_project_id`.

Google Cloud [doesn't currently support Pipfiles](https://stackoverflow.com/questions/58546089/does-google-app-engine-flex-support-pipfile). To work around this restriction, the `Pipfile` and `Pipfile.lock` have been added to `.gcloudignore`, and a `requirements.txt` file has been generated using `pip freeze`. You may need to update the `requirements.txt` if the Pipfile has been updated. You will need to remove the line `pkg-resources==0.0.0` from `requirements.txt` for a successful deploy.

To deploy the app, run:

```
$ gcloud app deploy
```

Your project will need to have the [Cloud Build API](https://console.developers.google.com/apis/api/cloudbuild.googleapis.com/overview) enabled, which requires it to be linked to a billing-enabled account. It also needs [Cloud Firestore API](https://console.cloud.google.com/apis/library/firestore.googleapis.com) to be enabled for the project. Firestore needs to be configured in [Native mode](https://cloud.google.com/datastore/docs/upgrade-to-firestore).

Instead of running with a timer, the web interface depends on periodic calls to the `/hunt` URL to trigger searches (this avoids the need to have a long-running process in the on-demand compute environment). You can configure Google Cloud to automatically hit the URL by deploying the cron job:

```
$ gcloud app deploy cron.yaml
```

## Usage

### Command-line Interface

By default, the application runs on the commandline and outputs logs to `stdout`. It will poll in a loop and send updates after each run. The `processed_ids.db` file contains details of which listings have already been sent to the Telegram bot - if you delete that, it will be recreated, and you may receive duplicate listings.

```
Expand All @@ -77,18 +131,39 @@ optional arguments:
--config CONFIG, -c CONFIG
Config file to use. If not set, try to use
'~git-clone-dir/config.yaml'
```

### Web Interface

You can alternatively launch the web interface by running the `main.py` application:

```
$ python main.py
```

This uses the same config file as the Command-line Interface, and launches a web page at [http://localhost:8080](http://localhost:8080).

Alternatively, run the server directly with Flask:

```
$ FLASK_APP=flathunter.web flask run
```

## Testing

The `unittest`-based test suite can be run with:
The test suite can be run with `pytest`:

```sh
$ pytest
```

from the project root. If you encounter the error `ModuleNotFoundError: No module named 'flathunter'`, run:

```sh
$ python -m unittest discover -s test
pip install -e .
```

from the project root.
to make the current project visible to your pip environment.

## Maintainers

Expand Down
1 change: 1 addition & 0 deletions app.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
runtime: python37
4 changes: 4 additions & 0 deletions cron.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
cron:
- description: "Hunt for flats"
url: /hunt
schedule: every 30 minutes synchronized
11 changes: 5 additions & 6 deletions flathunter.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@
from flathunter.crawl_immowelt import CrawlImmowelt
from flathunter.idmaintainer import IdMaintainer
from flathunter.hunter import Hunter
from flathunter.config import Config

__author__ = "Jan Harrie"
__version__ = "1.0"
Expand Down Expand Up @@ -41,12 +42,12 @@ def launch_flat_hunt(config):
searchers = [CrawlImmobilienscout(), CrawlWgGesucht(), CrawlEbayKleinanzeigen(), CrawlImmowelt()]
id_watch = IdMaintainer('%s/processed_ids.db' % os.path.dirname(os.path.abspath(__file__)))

hunter = Hunter(config)
hunter.hunt_flats(searchers, id_watch)
hunter = Hunter(config, searchers, id_watch)
hunter.hunt_flats()

while config.get('loop', dict()).get('active', False):
time.sleep(config.get('loop', dict()).get('sleeping_time', 60 * 10))
hunter.hunt_flats(searchers, id_watch)
hunter.hunt_flats()


def main():
Expand All @@ -63,9 +64,7 @@ def main():

# load config
config_handle = args.config
__log__.info("Using config %s" % config_handle.name)
print(config_handle.name)
config = yaml.safe_load(config_handle.read())
config = Config(config_handle.name)

# check config
if not config.get('telegram', dict()).get('bot_token'):
Expand Down
26 changes: 26 additions & 0 deletions flathunter/config.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
import os
import yaml
import logging

class Config:

__log__ = logging.getLogger(__name__)

def __init__(self, filename=None, string=None):
if string is not None:
self.config = yaml.safe_load(string)
return
if filename is None:
filename = os.path.dirname(os.path.abspath(__file__)) + "/../config.yaml"
self.__log__.info("Using config %s" % filename)
with open(filename) as file:
self.config = yaml.safe_load(file)

def __iter__(self):
return self.config.__iter__()

def __getitem__(self, value):
return self.config[value]

def get(self, key, value=None):
return self.config.get(key, value)
2 changes: 1 addition & 1 deletion flathunter/crawl_ebaykleinanzeigen.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ def get_page(self, search_url):
resp = requests.get(search_url, headers={'User-Agent': self.USER_AGENT}) # TODO add page_no in url
if resp.status_code != 200:
self.__log__.error("Got response (%i): %s" % (resp.status_code, resp.content))
return BeautifulSoup(resp.content, 'html5lib')
return BeautifulSoup(resp.content, 'html.parser')

def extract_data(self, soup):
entries = list()
Expand Down
2 changes: 1 addition & 1 deletion flathunter/crawl_wggesucht.py
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ def extract_data(self, soup):
title = title_row.text.strip()
url = base_url + title_row.find('a')['href']
detail_string = row.find("div", { "class": "col-xs-11" }).text.strip().split("|")
details_array = list(map(lambda s: re.sub(' +', ' ', re.sub('\W', ' ', s.strip())), detail_string))
details_array = list(map(lambda s: re.sub(' +', ' ', re.sub(r'\W', ' ', s.strip())), detail_string))
numbers_row = row.find("div", { "class": "middle" })
price = numbers_row.find("div", { "class": "col-xs-3" }).text.strip()
rooms = re.findall(r'\d Zimmer', details_array[0])[0][:1]
Expand Down
60 changes: 60 additions & 0 deletions flathunter/googlecloud_idmaintainer.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
import logging
import firebase_admin
import traceback
import datetime
from firebase_admin import credentials
from firebase_admin import firestore

from flathunter.config import Config

class ConnectionWrapper:

def __init__(self, db):
self.db = db

def __enter__(self):
# Return a new client for this thread
return firestore.client()

def __exit__(self, exc_type, exc_value, tb):
if exc_type is not None:
traceback.print_exception(exc_type, exc_value, tb)
return False
return True

class GoogleCloudIdMaintainer:
__log__ = logging.getLogger(__name__)

def __init__(self):
project_id = Config().get('google_cloud_project_id')
if project_id is None:
raise Exception("Need to project a google_cloud_project_id in config.yaml")
firebase_admin.initialize_app(credentials.ApplicationDefault(), {
'projectId': project_id
})
self.db = firestore.client()

def connect(self):
return ConnectionWrapper(self.db)

def add(self, expose_id, connection=None):
self.__log__.debug('add(' + str(expose_id) + ')')
self.db.collection(u'exposes').document(str(expose_id)).set({ u'id': expose_id })

def get(self, connection=None):
res = []
for doc in self.db.collection(u'exposes').stream():
res.append(doc.to_dict()[u'id'])

self.__log__.info('already processed: ' + str(len(res)))
self.__log__.debug(str(res))
return res

def get_last_run_time(self, connection=None):
for doc in self.db.collection(u'executions').order_by(u'timestamp', direction=firestore.Query.DESCENDING).limit(1).stream():
return doc.to_dict()[u'timestamp']

def update_last_run_time(self, connection=None):
time = datetime.datetime.now()
self.db.collection(u'executions').add({ u'timestamp': time })
return time
Loading

0 comments on commit 60757d4

Please sign in to comment.