Skip to content

binary-butterfly/ParkAPI2-sources

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

39 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data sources for ParkAPI2

test

This repository hosts the data sources (downloader, scraper) for the parkendd.de service which lists the number of free spaces of parking lots across Germany and abroad.

The repository for the database and API is ParkAPI2

Usage

The scraper.py file is a command-line tool for developing, testing and finally integrating new data sources. It's output is always json formatted.

Each data source is actually called a Pool and usually represents one website from which lot data is collected.

Listing

To view the list of all pool IDs, type:

python scraper.py list

Scraping

To download and extract data, type:

python scraper.py scrape [-p <pool-id> ...] [--cache]

The -p or --pools parameter optionally filters the available sources by a list of pool IDs.

The optional --cache parameter caches all web requests which is a fair thing to do during scraper development. If you have old cache files and want to create new ones then run with --cache write to fire new web requests and write the new files and then use --cache afterwards.

Validation

python scraper.py validate [-mp <max-priority>] [-p <pool-id> ...] [--cache]

The validate command validates the resulting snapshot data against the json schema and prints warnings for fields that should be defined. Use -mp 0 or --max-priority 0 to only print severe errors and --max-priority 1 to include warnings about missing data in the most important fields like latitude, longitude, address and capacity.

Use validate-text to print the data in human-friendly format.

Contribution

Please feel free to ask questions by opening a new issue.

A data source needs to define a PoolInfo object and for each parking lot a LotInfo and a LotData object (defined in util/structs.py). The python file that defines the source can be placed at the project root or in a sub-directory and is automatically detected by scraper.py as long as the util.ScraperBase class is sub-classed.

An example for scraping an html-based website:

from typing import List
from util import *


class MyCity(ScraperBase):
    
    POOL = PoolInfo(
        id="my-city",
        name="My City",
        public_url="https://www.mycity.de/parken/",
        source_url="https://www.mycity.de/parken/auslastung/",
        attribution_license="CC-0",
    )

    def get_lot_data(self) -> List[LotData]:
        timestamp = self.now()
        soup = self.request_soup(self.POOL.source_url)
        
        lots = []
        for div in soup.findall("div", {"class": "special-parking-div"}):

            # ... get info from html dom

            lots.append(
                LotData(
                    id=name_to_id("mycity", lot_id),
                    timestamp=timestamp,
                    lot_timestamp=last_updated,
                    status=state,
                    num_occupied=lot_occupied,
                    capacity=lot_total,
                )
            )

        return lots

The PoolInfo is a static attribute of the scraper class and the get_lot_data method must return a list of LotData objects. It's really basic and does not contain any further information about the parking lot, only the ID, status, free spaces and total capacity.

Meta information

Additional lot information is either taken from a geojson file or the get_lot_infos method of the scraper class. The scraper.py will merge the LotInfo and the LotData together to create the final output which must comply with the json schema.

The geojson file should have the same name as the scraper file, e.g. example.geojson. If the file exists, it will be used and it's properties must fit the util.structs.LotInfo object. If it's not existing, the method get_lot_infos on the scraper class will be called an should return a list of LotInfo objects.

Some websites do provide most of the required information and it might be easier to scrape it from the web pages instead of writing the geojson file by hand. However, it might not be good practice to scrape this info every other minute. To generate a geojson file from the lot_info data:

# delete the old file if it exists
rm example.geojson  
# run `get_lot_infos` and write to geojson 
#   (and filter for the `example` pool) 
python scraper.py write-geojson -p example

The command show-geojson will write the contents to stdout for inspection.

About

Data sources for ParkAPI / parkendd.de

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%