Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Stagecoach Open Data as a source #120

Open
JackGilmore opened this issue Aug 4, 2022 · 3 comments · May be fixed by #258
Open

Add Stagecoach Open Data as a source #120

JackGilmore opened this issue Aug 4, 2022 · 3 comments · May be fixed by #258
Assignees
Labels
data engineering Things related to data: scraping, cleaning, labelling, transformation new source Adding a new data source to the pipeline research Further information is needed

Comments

@JackGilmore
Copy link
Member

On the back of some Twitter enquiries about bus open data I discovered Stagecoach publish their schedules and fares as open data: https://www.stagecoachbus.com/open-data

As these are just file downloads as a page we'll need to write a scraper for this.

Considerations

  • What is the license for these? It doesn't appear to be explicitly stated. Maybe worth getting in touch with Stagecoach on the email address on the page to ask them
  • The file downloads themselves are just zip files that are split up by region that contain XML files
    • Should we consider unzipping these files and serving the individual XML files?
    • The file downloads cover regions outside Scotland (e.g. England and Wales). Should we include these?
@JackGilmore JackGilmore added research Further information is needed data engineering Things related to data: scraping, cleaning, labelling, transformation labels Aug 4, 2022
@johnnymck
Copy link

The page states that the data is

available to the public, for personal, educational or commercial use

so we should be fine to publish this ourselves.

Furthermore, the data is auto-updating, ie. when fares get updated, so we should really make wee scrape script to update this data every so often.

@JackGilmore
Copy link
Member Author

Sweet! I think the best way to split these up would be to have a dataset per region and then have the individual files with the schedules and fares attached to the dataset e.g.

@JackGilmore
Copy link
Member Author

@johnnymck did you manage to get anything started on this that you could commit to a branch/fork?

@JackGilmore JackGilmore added the new source Adding a new data source to the pipeline label Oct 8, 2022
@JackGilmore JackGilmore linked a pull request Oct 29, 2023 that will close this issue
12 tasks
@JackGilmore JackGilmore self-assigned this Oct 29, 2023
JackGilmore added a commit that referenced this issue Nov 11, 2023
- Update Scottish Parliament scraper to use new common classes for JSON output
- Update merge_data.py to use common JSON schema for ScotParl and Stagecoach
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data engineering Things related to data: scraping, cleaning, labelling, transformation new source Adding a new data source to the pipeline research Further information is needed
Projects
Development

Successfully merging a pull request may close this issue.

2 participants