-
Notifications
You must be signed in to change notification settings - Fork 15
DatabaseImport
This page is for documenting step-by-step how we go from raw BEST datasets for routes, stops, and schedules to the final ChaloBEST database, and GTFS.
The BEST datasets are officially updated every four months of the year, with effect from 1 March, 1 July, and 1 November annually. BEST has shared these datasets with us reliably since January 2011.
Data comes in spreadsheet (xlsx) format with the following interlinked tables, on which ChaloBEST database depends for complete import:
- RouteAtlas contains schedules of every current bus route in detail, especially time, date, and route variations.
- RouteDetails contains the current list of all routes with with a 4 digit code, all stops in each route with a 4 digit code, and distance between major stops every 1km or so.
- RouteMaster contains a formatted list of all current routes, with mappings between the 4 digit code found in RouteDetails and its matching alias in RouteAtlas.
- StopMaster contains the current list of all bus stops with a 4 digit code each, and an area code of 4 digits for the area in which the stop falls, and a road code of 4 digits for the road on which the stop falls.
- AreaMaster contains the current list of areas in Mumbai with a 4 digit code each.
- RoadMaster contains the current list of areas in Mumbai with a 4 digit code each.
make sure order of columns is the same as the older versions of the
csv files.
In the AM,N,PM columns and others
Mostly containing 'TR' 'I' 'PS' values.
Ctrl+F for these patterns when in the excel sheet is usually enough.
Relative Location of this folder:
Chalobest/db_csv_files
Chalobest/chaloBEST
imports/import_atlas.py
imports/data_mapper.py
import_atlas is used for the Atlas.csv. Atlas is a separate file because its easily the most fuzzy and error prone import we have in ChaloBEST.
data_mapper is used to import the rest of the BEST csv data files. Import of StopMaster, AreaMaster and RoadMaster as given by BEST, with foreign keys for StopCode, AreaCode and RoadCode which link them together.
The structure of Data mapper: - CsvLoader('') This function creates an import process skeleton. Generates
_error.json file for any errors encountered in the errors/ folder. - *_save() methods
- Which take in data one row at a time.
This allows us to tweak any one save file process,
if the need arises.
- route_save(), road_save()
- helper functions
- fire_up()
- This function simply calls CsvLoader() for each csv file present.
Uses a saveorder dict for ordering the table imports.
runs the helper functions
and initiates importUniqueRoute process
calls the postload cleanup functions
and the fix_missing_data scripts ().
In an ideal world, this should be the only function to call.
from imports import data_mapper as dm
from imports import import_atlas as ia
from imports import fix_missing_atlas_data as fixes
dm.fire_up()
fixes.do()
This will produce a bunch of stats and log files for manual cleaning.
dm.CsvLoader('Csv file name') loads the csv files individually, in case one needs it.
As given in RouteAtlas, every route has unique variations, from the full extent of the From-To route to midway stops and morning and night routes while buses are shunted between home depots. Therefore every Route has several UniqueRoutes. The Atlas import is a bit non automatable given the nature of anomalies in it. It will have to be done manually, the code steps generate various files.
The import process is composed of
1. CsvToJson()
Creates a json file out of the csv file.
2. processJson()
- Description -
- Creates a AtlasCopied.json (with empty
- fields from the csv being replaced with values above them )
- Requires -
- routeMapping.json
- Which is a dictionary of routecode and route Alias
- Needed for linking the atlas to the Route table.
3. groupUnique()
4. importUniqueroutes()
- Areas Admin http://chalobest.in/admin/mumbai/area/
- Roads Admin http://chalobest.in/admin/mumbai/road/
- Stops Admin http://chalobest.in/admin/mumbai/stop/
- Holidays Admin http://chalobest.in/admin/mumbai/holiday/
- Error Log for BEST
- Questions for BEST
As given in RouteAtlas, in many instances the Headway/Frequency and Runtime for UniqueRoutes are missing for some variations. What we do in this case is to copy up and down... back and forth...
- GTFS Validation Results http://gtfs.chalobest.in/gtfs_0.9.5.log.html
- http://trip.chalobest.in
- Amalgamate Mumbai Suburban Railways schedules/timetable