-
Notifications
You must be signed in to change notification settings - Fork 4
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
feat: Automate location extraction and english translation (#642)
- Loading branch information
Showing
27 changed files
with
1,413 additions
and
355 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file was deleted.
Oops, something went wrong.
File renamed without changes.
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,26 @@ | ||
## Function Workflow | ||
|
||
1. **Eventarc Trigger**: The original function is triggered by a `CloudEvent` indicating a GTFS dataset upload. It parses the event data to identify the dataset and calculates the bounding box and location information from the GTFS feed. | ||
|
||
2. **Pub/Sub Triggered Function**: A new function is triggered by Pub/Sub messages. This allows for batch processing of dataset extractions, enabling multiple datasets to be processed in parallel without waiting for each one to complete sequentially. | ||
|
||
3. **HTTP Triggered Batch Function**: Another function, triggered via HTTP request, identifies all latest datasets lacking bounding box or location information. It then publishes messages to the Pub/Sub topic to trigger the extraction process for these datasets. | ||
|
||
4. **Data Parsing**: Extracts `stable_id`, `dataset_id`, and the GTFS feed `url` from the triggering event or message. | ||
|
||
5. **GTFS Feed Processing**: Retrieves bounding box coordinates and other location-related information from the GTFS feed located at the provided URL. | ||
|
||
6. **Database Update**: Updates the bounding box and location information for the dataset in the database. | ||
|
||
## Expected Behavior | ||
|
||
- Bounding boxes and location information are extracted for the latest datasets that are missing them, improving the efficiency of the process by utilizing both batch and individual dataset processing mechanisms. | ||
|
||
## Function Configuration | ||
|
||
The functions rely on the following environment variables: | ||
- `FEEDS_DATABASE_URL`: The database URL for connecting to the database containing GTFS datasets. | ||
|
||
## Local Development | ||
|
||
Local development of these functions should follow standard practices for GCP serverless functions. For general instructions on setting up the development environment, refer to the main [README.md](../README.md) file. |
4 changes: 2 additions & 2 deletions
4
...ns-python/extract_bb/function_config.json → ...hon/extract_location/function_config.json
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
File renamed without changes.
File renamed without changes.
File renamed without changes.
42 changes: 42 additions & 0 deletions
42
functions-python/extract_location/src/bounding_box/bounding_box_extractor.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,42 @@ | ||
import numpy | ||
from geoalchemy2 import WKTElement | ||
|
||
from database_gen.sqlacodegen_models import Gtfsdataset | ||
|
||
|
||
def create_polygon_wkt_element(bounds: numpy.ndarray) -> WKTElement: | ||
""" | ||
Create a WKTElement polygon from bounding box coordinates. | ||
@:param bounds (numpy.ndarray): Bounding box coordinates. | ||
@:return WKTElement: The polygon representation of the bounding box. | ||
""" | ||
min_longitude, min_latitude, max_longitude, max_latitude = bounds | ||
points = [ | ||
(min_longitude, min_latitude), | ||
(min_longitude, max_latitude), | ||
(max_longitude, max_latitude), | ||
(max_longitude, min_latitude), | ||
(min_longitude, min_latitude), | ||
] | ||
wkt_polygon = f"POLYGON(({', '.join(f'{lon} {lat}' for lon, lat in points)}))" | ||
return WKTElement(wkt_polygon, srid=4326) | ||
|
||
|
||
def update_dataset_bounding_box(session, dataset_id, geometry_polygon): | ||
""" | ||
Update the bounding box of a dataset in the database. | ||
@:param session (Session): The database session. | ||
@:param dataset_id (str): The ID of the dataset. | ||
@:param geometry_polygon (WKTElement): The polygon representing the bounding box. | ||
@:raises Exception: If the dataset is not found in the database. | ||
""" | ||
dataset: Gtfsdataset | None = ( | ||
session.query(Gtfsdataset) | ||
.filter(Gtfsdataset.stable_id == dataset_id) | ||
.one_or_none() | ||
) | ||
if dataset is None: | ||
raise Exception(f"Dataset {dataset_id} does not exist in the database.") | ||
dataset.bounding_box = geometry_polygon | ||
session.add(dataset) | ||
session.commit() |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.