Skip to content

Best database

shekhark edited this page Jan 12, 2013 · 1 revision

Table of Contents

Atlas Import Revamp, 11-12 March 2011

We continue to be plagued by problems to do with missing values in BEST Atlas, for purpose of calculating schedule and trip times in GTFS and for anything to do with bus schedules and timings.

In December 2011 we last attempted to process and import the BEST Atlas (November 2011 edition) by isolating UniqueRoutes with specific schedule (day of the week or holiday) types. We did not, however, import all the rows in Atlas and lost a great deal of important data for routing and scheduling.

See below the different columns in BEST Atlas and specific descriptions of the problems of missing values and problems encountered with the import scripts.

From and To Stops

  • Many rows are missing From and To stops.
  • Check with Sanjay how missing From and To stops are treated
  • Should take the prior row's value for From stop or To stop.

Full and Unique Routes

  • There seems to be a mistake in the way that FullRoutes have been imported from Atlas, where if there is more than one row where Span is the maximum, only one is imported.
  • This has significant cascading effects when BEST Atlas is parsed into FullRoutes and UniqueRoutes in the ChaloBEST database. This must be urgently corrected.
  • Route 259 (Oshiwara to Gorai Depot has same from and to stop) in UniqueRoutes. Route 523 Ltd has the wrong To stop (Dindoshi-Dindoshi).
  • For 81 Ltd in in Atlas there are two schedules for "MS" schedule type, but the ChaloBEST database only has one.
  • We hypothesise that a simplifying assumption was made at the time of import of UniqueRoutes that the combination of sub-route (unique from/to) and schedule type is unique. In fact, there can be multiple schedule types that share the same sub-route.
  • This is instantiated in 81 Ltd where there are two schedules for "MS", one in morning-afternoon and another in afternoon-evening.
  • As seen below, there are only three UniqueRoutes for 81LTD, whereas in Atlas there are five (see rows 314-318):
  • Or for 174, there are nine rows for unique routes, but only two schedule types (MS&HOL and SUN), and in the ChaloBEST DB only two have been imported:

Schedule Type

  • Where ScheduleType is null, within each unique route/row, copy the values from the prior row where it is not null.
  • Already done in Sanjay's earlier Atlas import. Check for odd strings like "FW", "AW", etc.

Route Span

  • Where RouteSpan is null, do not assume zero, calculate the cumulative distance between From stop and To stop in that row.
  • Lookup to RouteDetails (foreign key to StopCode) and add up the Distance between the stops to get the correct distance.

RunTime

  • Where Runtime values are null, calculate the runtime using the speed and time of day and day of the week. To do this:
    • Limit by time of day in FirstFrom, FirstTo, LastFrom, LastTo and corresponding Runtime column in Atlas for four runtimes: 7.00-11.00, 11.00-17.00, 17.00-20.00, or 20.00-7.00.
    • Calculate the average speed of the full route which has a corresponding Runtime column for the time slab, and multiply that value by the RouteSpan for the route with missing values.
  • Populate the missing Runtime values for *all* missing fields within a route, schedule and runtime slab recursively.

Headway/Frequency

  • Where Headway values are null, find the nearest headway time of day and day of the week. To do this:
    • Limit by time of day in FirstFrom, FirstTo, LastFrom, LastTo and corresponding Headway columns in Atlas for five headways: 0.00-7.00, 7.00-11.00, 11.00-17.00, 17.00-20.00, or 20.00-0.00.
    • Find the next nearest Headway value within the schedule type (time and day of the week).
  • Populate the missing Headway values for *all* missing fields within a route, schedule and headway slab recursively.

BEST Database Review, 7 December 2011

A Tale of Two Atlases

In meetings with the BEST Traffic Planning Department in Nov and Dec 2011, we were able to clarify issues with the earlier datasets they shared with us (documented below) and request cleaner and more raw data. The BEST has also undertaken steps to correct the earlier problems of HardCodedRoutes which did not follow the earlier four-digit routecode logic (first three digits for Route Number and last for Route Type).

As of November 2011, they have now re-organised the RouteType, with unique numbers for nine different types of buses. They have also now sent us entirely new tables for RouteMaster and RoadMaster, and expanded columns inside StopMaster to denote roads and direction. Their new StopMaster dataset contains many more stops than before, as they now show stops between intersecting roads as per RoadMaster.

While the Master tables for Routes, Stops, Areas, Roads, and Fares have a clean logic and relations via codes and keys, the Atlas is revised more frequently (every three to four months) and comes in different versions. Atlas contains all the routes, but no keys to join via RouteCode in the Master and RouteDetails tables. AtlasRaw comes with RouteCodes, but with all the special AS, AC and Express Routes missing (the erstwhile HardCodedRoutes, see below).

The Master tables and RouteDetails tables are beautifully structured and make it easy for us to add new tables for Landmarks, add StopNames and AreaNames, draw AreaPolygons and RoadLines, etc. within a simple relational-spatial database schema. The problem is with matching and joining the data in Atlas and AtlasRaw with their respective RouteCodes and StopCodes, for which we propose to create a new table called AtlasNew (see diagram below).

Creating AtlasNew from BEST's Atlas and AtlasRaw

To rectify these issues with transforming BEST's Atlas into a proper relational database, First we must join RouteCode used in the RouteMaster and RouteDetails tables to RouteAlias in Atlas. Secondly, we must match and join the full name text strings in the From and To fields in Atlas to StopCodes in StopMaster. Both of these will involve some amount of fuzzy text-matching.

Problem 1: Matching RouteAlias with RouteCode

1. To join RouteAlias in Atlas to RouteAlias in RouteMaster, we need to do a fuzzy match between the route number and suffix "Ltd", "R", "Exp", etc. in Atlas RouteAlias with the full form of this route in RouteMaster RouteAlias. We noted that in the latter master table there were variations of routes that are not noted in Atlas, which only shows one route number/type for all the rows with variations of the route. However certain routes like 2 have regular, limited and now AC services, so there are routecodes for 0020, 0021, and 0026.

2. Since route types in Atlas are only indicated as a text string RouteAlias, the only way to distinguish the different rows per route in Atlas is by their Span, since distances vary with different route variations. In RouteMaster every RouteCode has a unique Distance which allows us to distinguish between variations in that route. Also in RouteDetails Distance the sum of the values per RouteCode will also equal the RouteMaster Distance. If there is any value less than the maximum Atlas Span, then the route is a variation of the stereotypical full route, i.e. one where Atlas From is the same stop as the first StopCode for any route in RouteDetails and also where Atlas To is the same stop as the last StopCode in RouteDetails for any given route.

3. AtlasRaw is an alternative version of Atlas with different information, but which has RouteCodes for every RouteAlias. Using these two tables, we can perform the same task as in 1. and 2. above much more easily. However, the problem of missing 122 HardCodedRoutes from AtlasRaw remains. In Atlas all BEST routes are given, with a total number of 1755 routes and their variations. In AtlasRaw only 1603 routes and their variations are given.

4. When we last confronted this problem in July 2011, we found it was enough to copy and paste the RouteCode column from AtlasRaw into Atlas and manually enter the remaining 122 RouteCodes by hand, with reference to the HardCodedRoutes table where the remaining routes are indicated. So for the 122 routes leftover (assuming that the rest of the 1603 routes in Atlas and AtlasRaw match perfectly in 1. above) we can populate AtlasNew with the RouteCodes in HardCodedRoutes.

Problem 2: Matching From and To with StopCodes

5. Dealing with variations in any full route is the second big problem of matching/replacing Atlas From and Atlas To with their respective StopCodes in StopMaster to produce our cleaner and better NewAtlas. We decided to try this by calculating distances.

6. Since we can assume that the full routes have the same Atlas Span as RouteMaster Distance and match them automatically with the StopCodes for Atlas From and Atlas To contained in RouteDetails, we are left with the others which are shorter versions of the full route.

7. In the latest StopMaster table from BEST we confirmed that the new column Chowki is a true/false binary which indicates if the StopCode is a route terminus or not. From an eyeball inspections there are only around 10% of StopMaster StopCodes which are true Chowki. This means there are only around 250-300 possible string matches for non-full routes in Atlas From and Atlas To which we must match to their StopCodes and put in NewAtlas.

8. AtlasRaw also contains a field called Primary for every "RouteCode", indicating if the route is 1 All Days, 2 for Mon-Fri, 3 for Saturdays, and 4 for Sundays/Holidays. This can be matched with Schedule in Atlas with some massaging of the data, since most variations of full routes are based on the day of the week. Comparison of Atlas with AtlasRaw may yield more insights in how to solve this second problem.

Notes on Processing and Lookup Tables, 17 Dec 2011

9. In processing Atlas as above into AtlasNew, the question arose as to how to treat the many fields which have null values. The rule is that only RouteCode/RouteAlias (subject to join), Depot and Schedule should inherit the previous value in their respective columns if they are null values.

10. All other fields, mainly Headway, should remain null as in the original Atlas. We noted that null values for Headway only occur in minor variations of the full route, for which BEST does not maintain rigourous schedules.

11. The Depot field in Atlas and StopMaster is a two or three letter code which corresponds to the terminus depots of every bus route. We have manually created a Depot lookup table which contains the depot's letter code, the full form of its name (i.e. Dharavi, Worli, Bandra, Magathana, Dindoshi, etc.) and the corresponding StopCode for the depot. These StopCodes were taken from StopMaster manually, and it appears that for several depots there are more than one StopCode. Dharavi Depot (DH) has four unique StopCodes, Gorai Depot (GRD), Magathane (MGT) and Kurla (KLD) each have two unique StopCodes.

12. We have also manually created lookup tables for Schedule which resolves the letter codes in Atlas to their numbered days of the week, with HOL given the number 8. We have also created a separate Holidays table with a list of the dates and names of state holidays from Maharashtra Govt Gazette in MM.DD.YY format.

Database Schema, Tables and Keys

The table below shows the schema and design for our integrated ChaloBEST database, which includes:

  • BOLD for all primary and foreign keys for joining these tables -- RouteCode, StopCode, RoadCode, AreaCode, Schedule and Holiday:
  • RED for the BEST Atlas and AtlasRaw, AreaMaster, StopMaster, RoadMaster, FareMaster, RouteMaster, RouteDetails provided by the BEST;
  • ORANGE for fields in Atlas needed rectify the above problems and join RouteAlias to RouteCodes and From and To to StopCodes;
  • GREY for redundant or currently unused fields in the original BEST datasets;
  • YELLOW for all fields and lookup tables for timing (headway/frequency) and schedule calculation;
  • PINK for all fields concerned with distance (span/km) calculation;
  • GREEN for our new Area, Stop, Road and Landmark tables for Marathi, alternative names, storing location (point, line or polygon), and the slug for the permanent URL at http://chalobest.in/*;
  • BLUE for spatial fields/geometries of lat-lon points for StopCodes, polygons for AreaCodes, and lines for RoadCodes;

Database Diagram

Atlas AtlasRaw StopMaster AreaMaster RoadMaster FareMaster RouteMaster RouteDetails RouteTypes HardCodedRoutes AtlasNew Schedule Holidays Depot Stop Area Road Landmark
RouteAlias RouteAlias StopCode AreaCode RoadCode Slab RouteCode RouteCode RouteCode RouteCode RouteCode Schedule Day DepotCode StopCode AreaCode RoadCode LandmarkName
BusesAM RouteCode StopName AreaName RoadName ORD RouteAlias StopSerial RouteType RouteAlias RouteType Days Name DepotName StopNameMarathi AreaNameMarathi RoadNameMarathi LandmarkMarathi
BusesNoon From StopFlag LTD From StopCode FareType FareType FromStopCode Slug Slug Slug Slug
BusesPM FirstFrom Chowki EXP To Stage FirstFrom AlternativeName AlternativeName AlternativeName LandmarkLatLon
RouteType LastFrom RoadCode AS Distance Km LastFrom LatLonUp AreaPolygon RoadLine
Depot To AreaCode ACEXP Stages ToStopCode LatLonDown
From FirstTo Depot FirstTo
FirstFrom LastTo LastTo
LastFrom AM Span
To Noon HeadwayUpTo7
FirstTo PM Headway7to11
LastTo Schedule Headway11to17
Span Primary Headway17to20
Runtime7to11 Headway20toLast
Runtime11to17 Schedule
Runtime17toEnd Depot
RuntimeNight Primary
HeadwayUpto7
Headway7to11
Headway11to17
Headway17to20
Headway20toLast
ReliefPoint
TravelTime
Schedule

Database Integration, July-August 2011

This is the design schema for our integrated database, which includes:

  • The core Atlas, AreaMaster, StopMaster and Route tables provided by the BEST (coloured in RED);
  • Supplementary tables for routes and scheduling to augment the BEST data, derived from their additional tables (coloured in YELLOW);
  • New AreaLocation, StopLocation, and AreaName, StopName tables created by Gnowledge and CAMP for Marathi translations and alternative names (coloured in GREEN);
  • Data contributed by MESN for storing landmarks, streets, and features in PostGIS for lat-lon coordinates for StopCodes and polygons for AreaCodes (coloured in BLUE);
  • All primary and foreign keys for joining these tables -- RouteCode, AreaCode, StopCode, Schedule and Holiday (indicated as BOLD):
Atlas StopMaster AreaMaster ExpressRoutes AC-ExpressRoutes AS-Routes RegularRoutes RouteType HardCodedRoutes Schedule Holidays StopLocation AreaLocation StopName AreaName
RouteCode StopCode AreaCode RouteCode RouteCode RouteCode RouteCode RouteCode RouteCode Schedule Holiday StopCode AreaCode StopCode AreaCode
RouteAlias StopName AreaName StopSerial StopSerial StopSerial StopSerial RouteType RouteAlias Days Days LatLonUp LatLonPolygon StopNameMarathi AreaNameMarathi
From AreaCode StageNo StageNo StageNo StageNo LatLonDown ShortName ShortName
FirstFrom DisplayName StopCode StopCode StopCode StopCode AlternativeName1 AlternativeName1
LastFrom Length Stage Stage Stage Stage AlternativeName2 AlternativeName2
To Km Km Km Km AlternativeName3 AlternativeName3
FirstTo StopName StopName StopName StopName Street
LastTo Landmark1
RouteSpan Landmark2
RunTime7to11 Landmark3
RunTime11to17
RunTime17toEnd
RunTimeNight
HeadwayBefore7
Headway7to11
Headway11to17
Headway17to20
Headway20toLast
ReliefPoint
TravelTime
Schedule

BEST Database, May 2011

Original Database Schema from BEST

Courtesy of Mr Datta Ghogare of BEST Traffic Planning Dept, on 10 May 2011 we received two excel files, "best_atlas" is the same as the earlier atlas.dbf and covers time and schedule for all routes, "best_route_data" has seven sheets, all eight tables are shown below. RouteCodes is a key to the different route codes, StopMaster is the earlier stopmast.dbf, AreaMaster is areamast.dbf, and these tables were earlier all in routedetl.dbf but are now separated by type: ExpressRoutes, AC-ExpressRoutes, AS-Routes, and Regular Routes.

Atlas RouteCodes* StopMaster AreaMaster ExpressRoutes AC-ExpressRoutes AS-Routes RegularRoutes
Route RouteCodes StopCode A_Code (AreaCode) RNO (RouteCode) Route Route RNO (RouteCode)
From RouteType StopName AreaName StopSr (StopSerial) StopSerial StopSerial StopSr (StopSerial)
FirstFrom FareType AreaCode StageNo StageNo StageNo StageNo
LastFrom HardCodedRoutes DisplayName StopCD (StopCode) StopCode StopCode StopCD (StopCode)
To RouteAlias Length Stage Stage Stage Stage
FirstTo FareType Km Km Km Km
LastTo StopName StopName StopName StopName
RouteSpan
RunTime7to11
RunTime11to17
RunTime17toEnd
RunTimeNight
HeadwayBefore7
Headway7to11
Headway11to17
Headway17to20
Headway20toLast
ReliefPoint
TravelTime
ScheduleType

Problems with BEST Database

  • The table above called RouteCodes contains two tables which divide route codes into normal codes which follow a consistent logic, and HardCodedRoutes which have been added on an ad-hoc basis (see explanation of logic below):
RouteCodes SpecialRoutes
RouteCode HardCodedRoutes
RouteType RouteAlias
FareType FareType
  • The original BEST Atlas spreadsheet above did not contain RouteCodes, by which the Atlas can be joined to the four route tables (AS, Express, AC-Express and Regular Routes). When we approached BEST for a revised Atlas, they sent us one without RunTime or Headway as in the original Atlas. However we were able to manually join the RouteCode in the new Atlas with the old one by simple copy and paste of the 1559 routes. The remaining 123 routes had to be coded manually using the HardCodedRoutes table in the SpecialRoutes table.
  • RouteCode (also named as RNO or Route in the four route tables) is the main key linking Atlas to the four route tables and always will contain four digits. The first three indicate the bus route number (if it is double digit the first digit will be 0). The last digit indicates the RouteType:
    • 0 Ordinary
    • 1 Limited
    • 2 Ordinary Extra
    • 3 Limited Extra
    • 4 Ring Ordinary
    • 5 Ring Limited
    • 6 Express
    • 9 AS Regular or AC Express
    • Since the fourth digit of RouteCode always indicates type, we should always query RouteCode directly rather than the Route in Atlas which often has sufffixes like Ltd., Exp., R, etc.
    • This works for 1559 routes which follow the route coding system explained above.
  • It appears that since the introduction of so many new categories of bus, the old route coding system is being bypassed in favour of hard coding numbers for new routes on an ad hoc basis, hence the need for the HardCodedRoutes table. However except for these 123 hard coded routes, all other routes follow the above system.

Revised BEST Database Schema, June 2011

This has been superseded and absorbed in our newly integrated database with spatial coordinates, landmarks, and Marathi translation. See database integration notes above.

Atlas RouteType HardCodedRoutes Schedule Holidays StopMaster AreaMaster ExpressRoutes AC-ExpressRoutes AS-Routes RegularRoutes
RouteAlias RouteCode RouteCode Schedule Holidays StopCode AreaCode RouteCode RouteCode RouteCode RouteCode
From RouteType RouteAlias Days StopName AreaName StopSerial StopSerial StopSerial StopSerial
FirstFrom AreaCode StageNo StageNo StageNo StageNo
LastFrom DisplayName StopCode StopCode StopCode StopCode
To Length Stage Stage Stage Stage
FirstTo Km Km Km Km
LastTo StopName StopName StopName StopName
RouteSpan
RunTime7to11
RunTime11to17
RunTime17toEnd
RunTimeNight
HeadwayBefore7
Headway7to11
Headway11to17
Headway17to20
Headway20toLast
ReliefPoint
TravelTime
Schedule
  • In the revised schema we have split the RouteCodes table into RouteType and HardCodedRoutes. We have discarded the FareType data as it is not needed.
  • Note that all RouteCodes are four digits, where the first three digits are the bus route number (except for HardCodedRoutes) and the fourth digit of both RouteCode and HardCodedRoute always corresponds RouteType.
  • HardCodedRoutes is an anomaly in the BEST's own route coding logic since the first three digits are just an internal ID, hence the need for a lookup table.
  • Schedule and Holiday will always be the first to check in any query depending on the day of the week when the SMS is received.
    • Schedule decodes the days of the week, numbered from 1-7 for Monday to Sunday, and "H" for holidays.
    • The fixed dates for state holidays in 2011 taken from Maharashtra Govt Gazette are in MM.DD.YY format.
  • In this revised Atlas, we have renamed Route as RouteAlias to avoid confusion.
  • Also note the following fields have been renamed as they are keys for linking between tables:
    • A_Code as AreaCode.
    • StopSr as StopSerial and StopCd as StopCode.
    • RNO and Route as RouteCode.

BusRoutes.in (BRI) Database Schema, April 2011

In the workshop on 30 April 2011 we explored mapping the schema from the BEST database to the models used by http://busroutes.in in the hope of borrowing their interface for developing a web and OpenStreetMap frontend to the service.

BusRoutes.in Data Model

Stage Routes RouteStage
display.name display.name route
lat slug stage
lon mtc_name* sequence
location* types
mtc.name* start
importance end
is.terminus stages
softlinks time
city fare
type
city

Mapping BRI Data to BEST Schema

This is the mapping of the BEST to busroutes.in schemas we devised in the workshop on 30 April 2011. We will have to rework this schema to bring in the BEST data which was lost in the import to the BusRoutes model, or design a new model ourselves. We decided in July 2011 against any such attempt and will build our own web frontend.

Stage Routes RouteStage
display.name = stopname display.name = route route = route_code
lat = null slug = assigned by django stage
lon = null mtc_name = area_code sequence
location = null types = "b" (constant)
mtc.name = area_code start = from/stopcode
importance = null end = to/stopcode
is.terminus = null stages = link via RouteStage
softlinks = null time = null
city = "mumbai" (constant) fare = null
type = "b" (constant)
city = mumbai (constant)

BEST Database August 2005

This section is very old and is now deprecated. Only reproduced here for reference.

These are the four linked tables provided by the BEST Traffic Planning Dept as of August 2005, which we imported into postgresql database in the 30 December 2010 workshop, and have used for all testing and queries up to May 2010 when we received new data from BEST.

  • Note that the keys linking the four tables are area code, route code, and stop code. Area codes are spelled both as "area_code" and "areacode", so are stop codes spelled both as "stop_code" and "stopcode" in different tables.
  • Also note that routes include both the bus route number and suffixes such as "LTD.", "EXP", for special buses. All routes of the same number share a route code but have different route numbers. We need a separate linked column for "types" (see below) to denote these special busses (as also "AC", "AS", and other new prefixes since 2005).
areamast.dbf routdetl.dbf stopmast.dbf atlas.dbf
area_code rout_code stopcode routecode
area_name stop_sr stopname route
stop_code areacode from
stage_code areaname first_from
kms last_from
area_code to
first_to
last_to
routedist
run_time1
run_time2
run_time3
run_time4
frequency1
frequency2
frequency3
frequency4
schedule
Clone this wiki locally