This repository is a part of the multi-repository project dystonse
. See the main repository for more information.
Java-powered tools supporting the dystonse-algorithm. The aim is bridging the gap between Berlins real time data (in the proprietary HAFAS format) and other available date, which is published as GTFS realtime feed, so that software can be built that works on either kind of data stream.
Currently there is only a very simple data logger to get delay information from VBB and write it to a database.
More tools, structure and documentation will follow, see below.
Compile using Maven by running:
mvn package
This tool makes a request to the VBB real time HAFAS API (See explanations here) to get the current position and delay of vehicles and writes them into a MySQL table. It is in very early development stage, but anyway it has already collected over four million records.
usage: Import [-h <arg>] [-u <arg>] [-p <arg>] -d <arg> [-r <arg>] [-help | -c | -s]
-h,--host <arg> Hostname or IP of the database server
-u,--user <arg> User name for the database server
-p,--password <arg> Password for the database server
-d,--database <arg> The database name
-r,--rect <arg> Use this bounding box to limit queries. Provide four values separated by
semicolons
-help,--help Print command line syntax
-c,--create-table Executes a CREATE TABLE statement instead of inserting data
-s,--show-table Prints out a CREATE TABLE statement instead of inserting data
You should use -c
or -s
on the first run to initialize the table schema.
For each invocation without -c
or -s
, it performs a single request to the API and another one to the database. For continous data collection, you might set up a cronjob which runs every minute.
This tool is in very early developtment and does not do what it's supposed to do. It can currently be used to view small subset of the collected data on a map. See the code for details.
usage: Geocode [-help] -r <arg> [-h <arg>] [-u <arg>] [-p <arg>] -d <arg>
-help,--help Print command line syntax
-r,--route <arg> Name of the route which shall be shown.
-h,--host <arg> Hostname or IP of the database server
-u,--user <arg> User name for the database server
-p,--password <arg> Password for the database server
-d,--database <arg> The database name
Later on, it will
- Fetch the locations of some vehicles (filter criteria tbd.)
- Fetch the corresponding route shapes (note that each route might have several alternative shapes)
- (maybe) fetch the schedule data for the vehicle
- match each vehicle position to a specific route shape
- find the position of the vehicle along the route to compute streckenkilometer
- (maybe) perform a fresh estimation of where it should have been at that time / when it should have been at that place
This tool will use statistic algorithms to form a model that can predict the future delay of vehicles based on the current delay and other relevant predictive variables.
Possible variables (in order of descending obviousness):
- Current delay
- Route (discreet value)
- Directions (discreet value)
- Position on route
- Time of day
- Day of (discreet value)
- Type of day, like e.g. weekday, weekend (derived, discreet value)
- Typ of vehicle (discreet value)
- Age of measurement (to account for temporarily repeating patterns of delay, e.g. due to construction)
- Current delay of previous vehicle
- Headway between current and previous vehicle (derived)
A first query on that model that's actually useful to plan a trip from X to Y would look like this: For this vehicle, which is now at place A and should already be at place B, what's the probability distribution of times when it will depart at place X?
For each of several possible departure times at place C, we can then ask: For this vehicle, given that it will depart at place C at time t and should by then be at place D, what's the probability distribution of times when it will arrive at place Y? Which is the same kind of query with different input values. In this scenario, the higher variance of predictions in the later future is already handled by the explicit enumeration of possible departure times at place C.
In the near future, dystonse-tools will support the database schemas defined by the popular python tools gtfsdb and gtfsrdb, as well as the data they create. By then, dystonse-tools should be able to create the needed tables automatically.
If you want to have those schemas now, you can either download, install and run them to create the needed tables (which maybe quite a hassle) or have a look at the SQL scripts from the directory schema.
Over the course of 2017, the following tools/features are planned:
- GtfsRealtimeExport - Output a GTFS realtime feed like bullrunner-gtfs-realtime-generator does
- Analyse - Perform several statistical analyses on the collected delay data, focused on delay prediction
- MapToShape - Map (lat,lon)-Locations to shapes from a GTFS feed
You can follow @Dysonse on Twitter to stay up to date or get in touch.