Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Refactor #72

Closed
wants to merge 60 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
60 commits
Select commit Hold shift + click to select a range
015a38b
Add regression test
y0hy0h Dec 4, 2017
fb7c4f8
Create aachen package and move parser file and tests into it
y0hy0h Jan 10, 2018
4615e5c
Fix aachen_test.py import and rename to regression_test.py
y0hy0h Jan 10, 2018
f909889
Test parsers/ on build
y0hy0h Jan 11, 2018
fd8547e
Make XML output deterministic
y0hy0h Jan 11, 2018
b235233
Base regression test on snapshots
y0hy0h Jan 19, 2018
8f15ab7
Update .gitignore to exclude virtual environment directories
y0hy0h Jan 19, 2018
dc8cc32
Determine parser's URL programmatically
y0hy0h Jan 19, 2018
1727544
Make regression tests generic
y0hy0h Jan 19, 2018
0c963c1
Flatten Aachener parser package
y0hy0h Jan 19, 2018
0172252
Print feedback when updating snapshots
y0hy0h Jan 19, 2018
65c2b26
Add missing build dependency
y0hy0h Jan 19, 2018
d673ff6
Make regression test independent of request-mock package
y0hy0h Jan 21, 2018
12adcc4
Remove unneeded build dependency
y0hy0h Jan 21, 2018
39ef9c7
Add copyable output on failed regression test
y0hy0h Jan 21, 2018
626ce6b
Respect encoding during snapshot upgrade
y0hy0h Jan 21, 2018
c8f11fc
Only allow snapshot updates for individual canteens
y0hy0h Jan 22, 2018
a8e1538
Allow updates of all parsers-under-test' snapshots with flag
y0hy0h Jan 22, 2018
ad40add
Use upstream PyOpenMensa determinism instead of workaround
y0hy0h Jan 24, 2018
9078f01
Detect URL's to store as snapshots automatically
y0hy0h Feb 4, 2018
f3d1dc0
Update snapshots for Aachen
y0hy0h Feb 4, 2018
acd931e
Use unittest.mock for intercepting requests
y0hy0h Feb 5, 2018
811d3b6
Fix HTTP requests not being intercepted
y0hy0h Feb 5, 2018
ff43e5d
Pretty print website snapshot
y0hy0h Feb 5, 2018
519ac05
Fix Aachener usage of requests for testability
y0hy0h Feb 6, 2018
ecce574
Update snapshots
y0hy0h Feb 6, 2018
ff3fa65
Refactor Aachener parser
y0hy0h Jan 22, 2018
663f8eb
Use model for Meal
y0hy0h Jan 22, 2018
e7ccc7b
Use model for table entry
y0hy0h Jan 22, 2018
af1f8f0
Add model for category
y0hy0h Jan 25, 2018
3700e45
Refactor Aachener methods
y0hy0h Jan 25, 2018
a618997
Move up side effects
y0hy0h Jan 25, 2018
140c38b
Reduce use of LazyBuilder and fix model
y0hy0h Jan 25, 2018
c263bb8
Remove docstrings
y0hy0h Jan 29, 2018
0740615
Move conversions from model to user
y0hy0h Feb 2, 2018
8e57932
Fix category ordering
y0hy0h Feb 2, 2018
2c45b52
Use model-based XML generation
y0hy0h Feb 2, 2018
4d055cc
Move error handling up
y0hy0h Feb 2, 2018
cefb263
Remove unused LazyBuilder code
y0hy0h Feb 2, 2018
ba0c5c7
Move aachen into own package, implement legend parsing for new website
y0hy0h Feb 2, 2018
e8d1ab5
Move Aachen to own package, implement parsing for alternate website
y0hy0h Feb 2, 2018
c754b1b
Implement equality for model and rename feed_model to openmensa_model
y0hy0h Feb 2, 2018
7567911
Fix parser bugs
y0hy0h Feb 2, 2018
00f4667
Refactor OpenMensa model for encapsulation
y0hy0h Feb 4, 2018
6bfddcf
Make Aachener model hashable
y0hy0h Feb 6, 2018
2b635fe
Determine available categories programmatically
y0hy0h Feb 6, 2018
8cd88bb
Filter out meals containing unavailability disclaimer
y0hy0h Feb 6, 2018
be1d9e4
Handle closed days
y0hy0h Feb 6, 2018
e42d8c7
Update Aachener snapshots
y0hy0h Feb 6, 2018
28a46ed
Defend against whitespace terrorism
y0hy0h Feb 6, 2018
47000db
Factor Aachen into smaller methods
y0hy0h Feb 6, 2018
b691a12
Refactor Aachener parser for readability
y0hy0h Feb 6, 2018
6dd874d
Export openmensa_model in setup.py
y0hy0h Feb 6, 2018
911960c
Use OrderedCounter
y0hy0h Feb 6, 2018
22f3d99
Make OpenMensa `Price` closer to XML, replace it with Aachener model
y0hy0h Feb 7, 2018
11a48ef
Allow complete initialization of OpenMensa model objects
y0hy0h Feb 7, 2018
9a20377
Offload conversion from custom to OpenMensa model from parser to mode…
y0hy0h Feb 7, 2018
b61d956
Move local ignored folders out of .gitignore
y0hy0h Feb 7, 2018
d4880a3
Rename OpenMensa model's `DayClosed` to `ClosedDay`
y0hy0h Feb 7, 2018
87c3572
Fix Aachener model import
y0hy0h Feb 8, 2018
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,2 @@
__pycache__
build
.idea/
1 change: 1 addition & 0 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ script:
- dpkg --info ../*.deb
- dpkg --contents ../*.deb
- sudo build_scripts/travis-install-and-setup-deb.sh
- py.test-3 ./parsers -vv
- wget --output-document=/dev/null --input-file=build_scripts/test-urls.txt
- wget --output-document=/dev/null --input-file=build_scripts/maybe-urls.txt || true
after_success:
Expand Down
2 changes: 1 addition & 1 deletion build_scripts/travis-install-deps.sh
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
#!/bin/bash
set -ex
apt-get update -qq
apt-get install -qq python3 python3-setuptools python3-bs4 python3-lxml uwsgi uwsgi-plugin-python3 devscripts debhelper
apt-get install -qq python3 python3-setuptools python3-bs4 python3-lxml python3-pytest uwsgi uwsgi-plugin-python3 devscripts debhelper
136 changes: 0 additions & 136 deletions parsers/aachen.py

This file was deleted.

3 changes: 3 additions & 0 deletions parsers/aachen/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
from . import model
from . import openmensa_model
from .aachen import parser
221 changes: 221 additions & 0 deletions parsers/aachen/aachen.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,221 @@
import copy
import re
from urllib import request

from bs4 import BeautifulSoup as parse, NavigableString

from parsers.aachen.utils import OrderedCounter
from pyopenmensa.feed import buildLegend, convertPrice, extractDate
from utils import Parser
from . import model as Aachen
from . import openmensa_model as OpenMensa


def parse_url(url, today=False):
raw_this_week_html = request.urlopen(url + '_diese_woche.html').read()
raw_next_week_html = request.urlopen(url + '_naechste_woche.html').read()

document_this_week = parse(raw_this_week_html, 'lxml')
document_next_week = parse(raw_next_week_html, 'lxml')

this_week_days = parse_document(document_this_week)
next_week_days = parse_document(document_next_week)
all_days = this_week_days + next_week_days

legend = parse_legend(document_this_week)
feed = convert_to_openmensa_feed(all_days, legend)
return feed.to_string()


def parse_legend(legend_container):
legend_div = legend_container.find(attrs={'class': 'bottom-wrap'})
additive_container, allergens = legend_div.find_all('div')

raw_legend_string = additive_container.text + allergens.text
regex = r'\((?P<name>[\dA-Z]+)\) (?P<value>[\wäüöÄÜÖß ]+)'
return buildLegend(text=raw_legend_string, regex=regex)


def parse_document(document):
table = document.find(attrs={'class': 'dc-wrap'}).table

category_counter = OrderedCounter(parse_categories(table))

day_columns = transpose_table_to_day_columns(table)
all_days = [parse_day(category_counter, column) for column in day_columns]
return all_days


def parse_categories(category_container):
table_rows = category_container.find_all('tr')
# Get first cell of all rows, excluding header row
category_table_cells = [row.find('td') for row in table_rows[1:]]

categories = [
parse_category(cell) for cell in category_table_cells
]
return categories


def parse_category(category_cell):
if len(category_cell.contents) == 3: # <td>Klassiker <br> 2,60€</td>
category_name, price = parse_main_category(category_cell)
else: # <td>Nebenbeilage</td>
category_name, price = parse_side_category(category_cell)

return Aachen.Category(category_name, price)


def parse_main_category(category_cell):
category_name_element, _, price_string_element = category_cell.children
category_name = str(category_name_element)

price_string = str(price_string_element)
price = convertPrice(price_string)
# Subsidized categories
if category_name in ['Tellergericht', 'Vegetarisch', 'Empfehlung des Tages', 'Klassiker',
'Süßspeise']:
subsidized_roles = [Aachen.Role('student'), Aachen.Role('other', 150)]
price = Aachen.PriceWithRoles(price, subsidized_roles)

return category_name, price


def parse_side_category(category_cell):
category_name = str(category_cell.text)
price = None

return category_name, price


def transpose_table_to_day_columns(table):
return [
[row.contents[column_index] for row in table.find_all('tr')]
for column_index in range(1, 6)
]


def parse_day(category_counter, day_column):
categories = extract_categories(category_counter, day_column)

day_date_string = day_column[0].text
date = extractDate(day_date_string)

if all(map(is_empty_category, categories)):
return OpenMensa.ClosedDay(date)
else:
return OpenMensa.Day(date, categories)


def is_empty_category(category):
return len(category.meals) == 0


def extract_categories(category_counter, day_column):
row_counter = 1
categories = []
for (template_category, occurrences) in category_counter.items():
category = copy.deepcopy(template_category)
for meal_number in range(occurrences):
meal = parse_meal(day_column[row_counter])
if meal:
category.append(meal)

row_counter += 1

if not is_empty_category(category):
categories.append(category)
return categories


def parse_meal(meal_container):
if 'main-dish' in meal_container.parent['class']:
description_container = meal_container.find('p', attrs={'class': 'dish-text'})
elif 'side-dish' in meal_container.parent['class']:
description_container = meal_container
else:
raise ValueError("Element {} should have a parent with either the `main-dish` "
"or `side-dish` class.".format(meal_container))

if description_container and description_container.text:
return parse_meal_description(description_container)
else:
return None


def parse_meal_description(description_container):
raw_description = get_description(description_container)

if re.search(r'((heute )?kein (\w)*angebot|geschlossen)', raw_description, re.IGNORECASE):
return None

all_note_keys, description_without_notes = extract_note_keys(description_container,
raw_description)

return Aachen.Meal(description_without_notes, all_note_keys)


def get_description(description_container):
description_elements = description_container.contents
description_string_parts = [element.string for element in description_elements
if isinstance(element, NavigableString)]
# Clean leading and trailing whitespace
description_string_parts = list(map(
lambda string: re.sub(r'(^\s+|\s+$)', '', string),
description_string_parts
))
# Clean redundant whitespace
description_string_parts = list(map(
lambda string: re.sub(r'\s+', ' ', string),
description_string_parts
))
raw_description = ' | '.join(description_string_parts)
return raw_description


def extract_note_keys(description_container, raw_description):
note_regex = re.compile(r' \(((?:[A-Z\d]+,?)+)\)')

all_note_keys = set()
for match in note_regex.finditer(raw_description):
note_group = match.group(1)
note_keys = note_group.split(',')
all_note_keys.update(note_keys)

if description_container.parent.find('img', attrs={'class': 'vegan'}) is not None:
all_note_keys.add('vegan')

# Remove notes from description
cleaned_description = note_regex.sub('', raw_description)

return all_note_keys, cleaned_description


def convert_to_openmensa_feed(all_days, legend):
canteen = OpenMensa.Canteen()
for day in all_days:
if isinstance(day, OpenMensa.ClosedDay):
canteen.insert(OpenMensa.ClosedDay(day.date))
else:
openmensa_categories = [category.convert_to_openmensa_model(legend)
for category in day.categories]
openmensa_day = OpenMensa.Day(day.date, openmensa_categories)
canteen.insert(openmensa_day)
return canteen


parser = Parser(
'aachen',
handler=parse_url,
shared_prefix='http://www.studierendenwerk-aachen.de/files/content/Downloads/Gastronomie/Speiseplaene/speiseplan_mensa_'
)

parser.define('academica', suffix='academica')
parser.define('ahorn', suffix='ahornstrasse')
parser.define('templergraben', suffix='bistro_templergraben')
parser.define('bayernallee', suffix='bayernallee')
parser.define('eupenerstrasse', suffix='eupener_strasse')
parser.define('goethestrasse', suffix='goethestrasse')
parser.define('vita', suffix='vita')
parser.define('suedpark', suffix='suedpark')
parser.define('juelich', suffix='juelich')
Loading