Skip to content

Commit

Permalink
feat: Add GBFS feeds to the database (#674)
Browse files Browse the repository at this point in the history
  • Loading branch information
cka-y authored Aug 14, 2024
1 parent c1525c5 commit 9d4b8d5
Show file tree
Hide file tree
Showing 21 changed files with 689 additions and 274 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/db-update-dev.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ on:
- main
paths:
- 'liquibase/changelog.xml'
- 'api/src/scripts/populate_db.py'
- 'api/src/scripts/populate_db*'
repository_dispatch: # Update on mobility-database-catalog repo dispatch
types: [ catalog-sources-updated ]
workflow_dispatch:
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/db-update-qa.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ on:
- main
paths:
- 'liquibase/changelog.xml'
- 'api/src/scripts/populate_db.py'
- 'api/src/scripts/populate_db*'
workflow_dispatch:
jobs:
update:
Expand Down
22 changes: 20 additions & 2 deletions .github/workflows/db-update.yml
Original file line number Diff line number Diff line change
Expand Up @@ -189,15 +189,33 @@ jobs:
id: getpath
run: echo "PATH=$(realpath sources.csv)" >> $GITHUB_OUTPUT

- name: Update Database Content
- name: Download systems.csv
run: wget -O systems.csv https://raw.githubusercontent.com/MobilityData/gbfs/master/systems.csv

- name: Get full path of systems.csv
id: getsyspath
run: echo "PATH=$(realpath systems.csv)" >> $GITHUB_OUTPUT

- name: GTFS - Update Database Content
run: scripts/populate-db.sh ${{ steps.getpath.outputs.PATH }} > populate.log

- name: Upload log file for verification
- name: GBFS - Update Database Content
run: scripts/populate-db.sh ${{ steps.getsyspath.outputs.PATH }} gbfs >> populate-gbfs.log

- name: GTFS - Upload log file for verification
if: ${{ always() }}
uses: actions/upload-artifact@v4
with:
name: populate-${{ inputs.ENVIRONMENT }}.log
path: populate.log

- name: GBFS - Upload log file for verification
if: ${{ always() }}
uses: actions/upload-artifact@v4
with:
name: populate-gbfs-${{ inputs.ENVIRONMENT }}.log
path: populate-gbfs.log

update-gcp-secret:
name: Update GCP Secrets
if: ${{ github.event_name == 'repository_dispatch' || github.event_name == 'workflow_dispatch' }}
Expand Down
5 changes: 4 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Mobility Feed API

![Deploy Feeds API - QA](https://github.com/MobilityData/mobility-feed-api/workflows/Deploy%20Feeds%20API%20-%20QA/badge.svg?branch=main)
![Deploy Web App - QA](https://github.com/MobilityData/mobility-feed-api/actions/workflows/web-app.yml/badge.svg?branch=main)
![Deploy Web App - QA](https://github.com/MobilityData/mobility-feed-api/actions/workflows/web-qa.yml/badge.svg?branch=main)
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)

The Mobility Feed API service a list of open mobility data sources from across the world. This repository is the effort the initial effort to convert the current [The Mobility Database Catalogs](https://github.com/MobilityData/mobility-database-catalogs) in an API service.
Expand All @@ -10,6 +10,9 @@ The Mobility Feed API service a list of open mobility data sources from across t

Mobility Feed API is not released yet; any code or service hosted is considered as **Work in Progress**. For more information regarding the current Mobility Database Catalog, go to [The Mobility Database Catalogs](https://github.com/MobilityData/mobility-database-catalogs).

## GBFS Feeds
The repository also includes GBFS feeds extracted from [`systems.csv`](https://github.com/MobilityData/gbfs/blob/master/systems.csv) in the [GBFS repository](https://github.com/MobilityData/gbfs). However, these feeds are not being served yet. The supported versions of these feeds are specified in the file [api/src/scripts/gbfs_utils/gbfs_versions.py](https://github.com/MobilityData/mobility-feed-api/blob/main/api/src/scripts/gbfs_utils/gbfs_versions.py).

# Authentication

To access the Mobility Feed API, users need to authenticate using an access token. Here is the step-by-step process to obtain and use an access token:
Expand Down
6 changes: 5 additions & 1 deletion api/src/database/database.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
from sqlalchemy import create_engine, inspect
from sqlalchemy.orm import load_only, Query, class_mapper, Session

from database_gen.sqlacodegen_models import Base, Feed, Gtfsfeed, Gtfsrealtimefeed
from database_gen.sqlacodegen_models import Base, Feed, Gtfsfeed, Gtfsrealtimefeed, Gbfsfeed
from sqlalchemy.orm import sessionmaker
import logging
from typing import Final
Expand Down Expand Up @@ -43,6 +43,10 @@ def configure_polymorphic_mappers():
gtfsrealtimefeed_mapper.inherits = feed_mapper
gtfsrealtimefeed_mapper.polymorphic_identity = Gtfsrealtimefeed.__tablename__.lower()

gbfsfeed_mapper = class_mapper(Gbfsfeed)
gbfsfeed_mapper.inherits = feed_mapper
gbfsfeed_mapper.polymorphic_identity = Gbfsfeed.__tablename__.lower()


class Database:
"""
Expand Down
2 changes: 2 additions & 0 deletions api/src/feeds/impl/feeds_api_impl.py
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,7 @@ def get_feed(
feed = (
FeedFilter(stable_id=id, provider__ilike=None, producer_url__ilike=None, status=None)
.filter(Database().get_query_model(Feed))
.filter(Feed.data_type != "gbfs") # Filter out GBFS feeds
.first()
)
if feed:
Expand All @@ -79,6 +80,7 @@ def get_feeds(
status=status, provider__ilike=provider, producer_url__ilike=producer_url, stable_id=None
)
feed_query = feed_filter.filter(Database().get_query_model(Feed))
feed_query = feed_query.filter(Feed.data_type != "gbfs") # Filter out GBFS feeds
# Results are sorted by provider
feed_query = feed_query.order_by(Feed.provider, Feed.stable_id)
feed_query = feed_query.options(*BasicFeedImpl.get_joinedload_options())
Expand Down
1 change: 1 addition & 0 deletions api/src/feeds/impl/search_api_impl.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@ def add_search_query_filters(query, search_query, data_type, feed_id, status) ->
Filter values are trimmed and converted to lowercase.
The search query is also converted to its unaccented version.
"""
query = query.filter(t_feedsearch.c.data_type != "gbfs") # Filter out GBFS feeds
if feed_id:
query = query.where(t_feedsearch.c.feed_stable_id == feed_id.strip().lower())
if data_type:
Expand Down
Empty file.
82 changes: 82 additions & 0 deletions api/src/scripts/gbfs_utils/comparison.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
import pandas as pd
from sqlalchemy.orm import joinedload
from database_gen.sqlacodegen_models import Gbfsfeed


def generate_system_csv_from_db(df, db_session):
"""Generate a DataFrame from the database with the same columns as the CSV file."""
stable_ids = "gbfs-" + df["System ID"]
query = db_session.query(Gbfsfeed)
query = query.filter(Gbfsfeed.stable_id.in_(stable_ids.to_list()))
query = query.options(
joinedload(Gbfsfeed.locations), joinedload(Gbfsfeed.gbfsversions), joinedload(Gbfsfeed.externalids)
)
feeds = query.all()
data = []
for feed in feeds:
system_id = feed.externalids[0].associated_id
auto_discovery_url = feed.auto_discovery_url
feed.gbfsversions.sort(key=lambda x: x.version, reverse=False)
supported_versions = [version.version for version in feed.gbfsversions]
data.append(
{
"System ID": system_id,
"Name": feed.operator,
"URL": feed.operator_url,
"Country Code": feed.locations[0].country_code,
"Location": feed.locations[0].municipality,
"Auto-Discovery URL": auto_discovery_url,
"Supported Versions": " ; ".join(supported_versions),
}
)
return pd.DataFrame(data)


def compare_db_to_csv(df_from_db, df_from_csv, logger):
"""Compare the database to the CSV file and return the differences."""
df_from_csv = df_from_csv[df_from_db.columns]
df_from_db = df_from_db.fillna("")
df_from_csv = df_from_csv.fillna("")

if df_from_db.empty:
logger.info("No data found in the database.")
return None, None

# Align both DataFrames by "System ID"
df_from_db.set_index("System ID", inplace=True)
df_from_csv.set_index("System ID", inplace=True)

# Find rows that are in the CSV but not in the DB (new feeds)
missing_in_db = df_from_csv[~df_from_csv.index.isin(df_from_db.index)]
if not missing_in_db.empty:
logger.info("New feeds found in CSV:")
logger.info(missing_in_db)

# Find rows that are in the DB but not in the CSV (deprecated feeds)
missing_in_csv = df_from_db[~df_from_db.index.isin(df_from_csv.index)]
if not missing_in_csv.empty:
logger.info("Deprecated feeds found in DB:")
logger.info(missing_in_csv)

# Find rows that are in both, but with differences
common_ids = df_from_db.index.intersection(df_from_csv.index)
df_db_common = df_from_db.loc[common_ids]
df_csv_common = df_from_csv.loc[common_ids]
differences = df_db_common != df_csv_common
differing_rows = df_db_common[differences.any(axis=1)]

if not differing_rows.empty:
logger.info("Rows with differences:")
for idx in differing_rows.index:
logger.info(f"Differences for System ID {idx}:")
db_row = df_db_common.loc[idx]
csv_row = df_csv_common.loc[idx]
diff = db_row != csv_row
logger.info(f"DB Row: {db_row[diff].to_dict()}")
logger.info(f"CSV Row: {csv_row[diff].to_dict()}")
logger.info(80 * "-")

# Merge differing rows with missing_in_db to capture all new or updated feeds
all_differing_or_new_rows = pd.concat([differing_rows, missing_in_db]).reset_index()

return all_differing_or_new_rows, missing_in_csv
81 changes: 81 additions & 0 deletions api/src/scripts/gbfs_utils/fetching.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
import requests


def fetch_data(auto_discovery_url, logger, urls=[], fields=[]):
"""Fetch data from the auto-discovery URL and return the specified fields."""
fetched_data = {}
if not auto_discovery_url:
return
try:
response = requests.get(auto_discovery_url)
response.raise_for_status()
data = response.json()
for field in fields:
fetched_data[field] = data.get(field)
feeds = None
for lang_code, lang_data in data.get("data", {}).items():
if isinstance(lang_data, list):
lang_feeds = lang_data
else:
lang_feeds = lang_data.get("feeds", [])
if lang_code == "en":
feeds = lang_feeds
break
elif not feeds:
feeds = lang_feeds
logger.info(f"Feeds found from auto-discovery URL {auto_discovery_url}: {feeds}")
if feeds:
for url in urls:
fetched_data[url] = get_field_url(feeds, url)
return fetched_data
except requests.RequestException as e:
logger.error(f"Error fetching data for autodiscovery url {auto_discovery_url}: {e}")
return fetched_data


def get_data_content(url, logger):
"""Utility function to fetch data content from a URL."""
try:
if url:
response = requests.get(url)
response.raise_for_status()
system_info = response.json().get("data", {})
return system_info
except requests.RequestException as e:
logger.error(f"Error fetching data content for url {url}: {e}")
return None


def get_field_url(fields, field_name):
"""Utility function to get the URL of a specific feed by name."""
for field in fields:
if field.get("name") == field_name:
return field.get("url")
return None


def get_gbfs_versions(gbfs_versions_url, auto_discovery_url, auto_discovery_version, logger):
"""Get the GBFS versions from the gbfs_versions_url."""
# Default version info extracted from auto-discovery url
version_info = {
"version": auto_discovery_version if auto_discovery_version else "1.0",
"url": auto_discovery_url,
}
try:
if not gbfs_versions_url:
return [version_info]
logger.info(f"Fetching GBFS versions from: {gbfs_versions_url}")
data = get_data_content(gbfs_versions_url, logger)
if not data:
logger.warning(f"No data found in the GBFS versions URL -> {gbfs_versions_url}.")
return [version_info]
gbfs_versions = data.get("versions", [])

# Append the version info from auto-discovery if it doesn't exist
if not any(gv.get("version") == auto_discovery_version for gv in gbfs_versions):
gbfs_versions.append(version_info)

return gbfs_versions
except Exception as e:
logger.error(f"Error fetching version data: {e}")
return [version_info]
19 changes: 19 additions & 0 deletions api/src/scripts/gbfs_utils/gbfs_versions.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
OFFICIAL_VERSIONS = [
"1.0",
"1.1-RC",
"1.1",
"2.0-RC",
"2.0",
"2.1-RC",
"2.1-RC2",
"2.1",
"2.2-RC",
"2.2",
"2.3-RC",
"2.3-RC2",
"2.3",
"3.0-RC",
"3.0-RC2",
"3.0",
"3.1-RC",
]
27 changes: 27 additions & 0 deletions api/src/scripts/gbfs_utils/license.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
LICENSE_URL_MAP = {
"CC0-1.0": "https://creativecommons.org/publicdomain/zero/1.0/",
"CC-BY-4.0": "https://creativecommons.org/licenses/by/4.0/",
"CDLA-Permissive-1.0": "https://cdla.io/permissive-1-0/",
"ODC-By-1.0": "https://www.opendatacommons.org/licenses/by/1.0/",
}

DEFAULT_LICENSE_URL = "https://creativecommons.org/licenses/by/4.0/"


def get_license_url(system_info, logger):
"""Get the license URL from the system information."""
try:
if system_info is None:
return None

# Fetching license_url or license_id
license_url = system_info.get("license_url")
if not license_url:
license_id = system_info.get("license_id")
if license_id:
return LICENSE_URL_MAP.get(license_id, DEFAULT_LICENSE_URL)
return DEFAULT_LICENSE_URL
return license_url
except Exception as e:
logger.error(f"Error fetching license url data from system info {system_info}: \n{e}")
return None
Loading

0 comments on commit 9d4b8d5

Please sign in to comment.