Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update routes for specific dates #638

Open
wants to merge 56 commits into
base: master
Choose a base branch
from
Open

Update routes for specific dates #638

wants to merge 56 commits into from

Conversation

Brian-Lee
Copy link
Contributor

@Brian-Lee Brian-Lee commented Apr 24, 2020

Fixes #626

Proposed changes

Get the feed from transitfeeds (https://api.transitfeeds.com/v1/getFeedVersions?key={your-key}3&feed=sfmta%2F60&page=1&limit=10&err=1&warn=1)
Use get the last 10 (changeable) previous versions of the GTFS files.
Use those files in save_routes.py to save routes to subfolders in /backend/data.
Subfolders and files have date suffixs.

These older versions of route JSON files can then be used for historical queries.
...

Screenshot

folder_screenshot

...
Tests:
Command: python3 save_routes.py --agency trimet
(saves route_v3a_trimet_2020-MM-DD) where MM-DD is today's date)
...
Command: python3 save_routes.py --agency trimet --date 2000-02-02
(gives error "an active GTFS for this date was not found
" as we don't have a GTFS file qualifying for 2002-02-02)
...
Command: python3 save_routes.py --agency muni --date 2020-02-22
(saves routes_v3a_muni-2020-02-18 as gtfs-muni-2020-02-18.zip is the recentmost qualifying GTFS)
image

...

...

@Brian-Lee Brian-Lee changed the title Brians branch Fixes #626 Apr 24, 2020
@Brian-Lee Brian-Lee added the DO NOT MERGE A draft or abandoned PR label Apr 24, 2020
@EddyIonescu EddyIonescu changed the title Fixes #626 Update routes for specific dates Apr 26, 2020
Copy link
Member

@EddyIonescu EddyIonescu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking this on! I think you're trying to do too much in this PR. My goal for #626 was for save_routes.py to take in a date argument (right now it always uses today's date) and based on that date, to look up the appropriate GTFS file (which is already there; a different script will get them from transitfeeds; for now you'd test by manually downloading older GTFS files and renaming them).

More specifically, you would:

  • allow save_routes to take in a date argument (optional, otherwise uses today's date and behaves the same as it currently does, which is to fetch the latest GTFS and use that).
  • If a date is provided, save_routes passes the filepath to the proper GTFS (ie. the latest GTFS that is at or before the date provided) to GtfsScraper (let's use gtfs_path instead of archiving_url). If gtfs_path is None (it's an optional argument) then it downloads the latest GTFS as it already does.
  • Right now the GTFS is stored as gtfs-{agency}.zip. With the date, a possible format could be gtfs-{agency}_YYYY-MM-DD.zip. For now you would create those files manually as to test your PR. The date of the GTFS will be its start date (which is the version_date in TransitFeeds).
  • When the date was passed in, the output will be stored in a a folder named the same way you've already implemented.

Each of the points below will be its own issue and PR (I'll create separate issues for these), which you're free to take on (they'll be marked as Good First Issue).

  • Archived versions of GTFS files will be stored in S3 as to not have to rely on TransitFeeds each time we might want to re-make the route-configs. gtfs_path will turn from a local path to an S3 path that you'll have to fetch.
  • Write a script that, given a date and agency, downloads the appropriate GTFS from TransitFeeds and uploads it to S3. It doesn't download anything from TransitFeeds if the GTFS is already in S3. This script would be deployed to run before save_routes does.
  • Have save_routes get the appropriate GTFS from S3 instead of downloading the latest.

backend/save_routes.py Outdated Show resolved Hide resolved
backend/save_routes.py Outdated Show resolved Hide resolved
backend/save_routes.py Outdated Show resolved Hide resolved
@Brian-Lee Brian-Lee requested a review from EddyIonescu May 7, 2020 03:05
Copy link
Member

@EddyIonescu EddyIonescu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @Brian-Lee I'm really sorry for the delay in reviewing this, I'll be faster in the follow-ups. Getting closer, I've added some comments. Request a review from me in the PR whenever you're ready. Thanks!

backend/save_routes.py Outdated Show resolved Hide resolved
backend/models/gtfs.py Outdated Show resolved Hide resolved
backend/models/routeconfig.py Outdated Show resolved Hide resolved
backend/save_routes.py Outdated Show resolved Hide resolved
backend/save_routes.py Outdated Show resolved Hide resolved
backend/models/gtfs.py Outdated Show resolved Hide resolved
backend/save_routes.py Outdated Show resolved Hide resolved
backend/save_routes.py Outdated Show resolved Hide resolved
@Brian-Lee
Copy link
Contributor Author

Hey @Brian-Lee I'm really sorry for the delay in reviewing this, I'll be faster in the follow-ups. Getting closer, I've added some comments. Request a review from me in the PR whenever you're ready. Thanks!

No worries! Thanks for helping me!

@Brian-Lee Brian-Lee requested a review from EddyIonescu May 21, 2020 23:11
Copy link
Member

@EddyIonescu EddyIonescu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wondering, could you, in the PR description, write out a list of what you did to test your PR? A section on Tests is part of the PR template (not completely sure), so that's missing.

It should look something like this:

Tests:
Command: python3 save_routes.py --agency trimet
...
Command: python3 save_routes.py --agency trimet --date 2000-02-02
(should give error as we don't have a GTFS from then)
...
Command: python3 save_routes.py --agency trimet --date 2020-02-22
... (saves route_v3a_trimet_2020-02-22...)

backend/save_routes.py Show resolved Hide resolved
backend/save_routes.py Show resolved Hide resolved
self.agency = agency
self.agency_id = agency_id = agency.id
gtfs_cache_dir = f'{util.get_data_dir()}/gtfs-{agency_id}'

download_gtfs_data(agency, gtfs_cache_dir)
get_gtfs_data(agency, gtfs_cache_dir, gtfs_path=gtfs_path)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For say trimet, the GTFS directory ends up being:
/data/gtfs-trimet/gtfs-trimet-2020-02-22 instead of /data/gtfs-trimet_2020-02-22 (for a sample date).

Then, in line 127 the GTFS that's passed in ends up being the one in gtfs_cache_dir, so gtfs_path ends up not being used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added an if-else which should solve that. I think there are cleaner ways, and I haven't yet done a good job of testing this fix.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am going to re-request a review at this point. Thank you so much for looking through this over and over again!

@Brian-Lee
Copy link
Contributor Author

Just wondering, could you, in the PR description, write out a list of what you did to test your PR? A section on Tests is part of the PR template (not completely sure), so that's missing.

It should look something like this:

Tests:
Command: python3 save_routes.py --agency trimet
...
Command: python3 save_routes.py --agency trimet --date 2000-02-02
(should give error as we don't have a GTFS from then)
...
Command: python3 save_routes.py --agency trimet --date 2020-02-22
... (saves route_v3a_trimet_2020-02-22...)

Great suggestion. I put in a few hypothetical test cases. The actual results depend on the date and the contents of your directory.

@Brian-Lee Brian-Lee requested a review from EddyIonescu May 22, 2020 18:13
@hathix
Copy link
Member

hathix commented May 28, 2020

Eddy, does this look good at this point?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow running save_routes on historic GTFS files
3 participants