Replies: 4 comments 5 replies
-
I haven't worked with data across packages so my insight is rather limited.
I think this is a reasonable approach. I'm not sure about the differences between gtfstools and gtfsio but using S3 classes could be useful within gtfstools: as_dt_gtfs = function(x, ...) {
UseMethod("as_dt_gtfs")
}
as_dt_gtfs.tidygtfs = function(x, ...) {
# convert from tidygtfs
}
as_dt_gtfs.gtfs = function(x, ...) {
# convert from gtfs
}
Again, can't speak on issues with gtfstools but a gtfs object is always a list of data.frames. Tibbles extend In my experience you just run into problems if you want to use data.table syntax in your workflow. Things like |
Beta Was this translation helpful? Give feedback.
-
Thanks @dhersz for those insightful thoughts. I still maintain that interoperability between our packages is really important, and anything that reduces or frustrates that is likely to lead to less people using our software. Moreover, this broader community forming around GTFS in R is relatively unique in my experience across a lot of R sub-communities, in terms of having so many different developers of different yet overlapping packages all actively cooperating and contributing. One of my alterior motivations here is actually for us to be some kind of "meta benchmark" in community software development. I think maximal interoperability is a key component of that, which is why I tend to push rather strongly on this issue. I won't likely have a lot of time for the next couple of months to do too much here, but will be jumping very actively back into GTFS stuff from around Sept 22 onwards. That work will start at the natural start point for all of us of |
Beta Was this translation helpful? Give feedback.
-
Hi guys. I've finally been able to spend some time on this again after a long hiatus, and decided to put some thought on how to better "structure" the interoperability. First, I'd like to say that I followed @polettif advice and implemented a library(gtfstools)
data_path <- system.file(
"extdata",
"google_transit_nyc_subway.zip",
package = "tidytransit"
)
gtfs <- gtfsio::import_gtfs(data_path)
class(gtfs)
#> [1] "gtfs" "list"
gtfs$calendar[, .(start_date, end_date)] |> head()
#> start_date end_date
#> 1: 20180624 20181028
#> 2: 20180624 20181028
#> 3: 20180624 20181028
#> 4: 20180624 20181028
#> 5: 20180624 20181028
#> 6: 20180624 20181028
dt_gtfs <- as_dt_gtfs(gtfs)
class(dt_gtfs)
#> [1] "dt_gtfs" "gtfs" "list"
dt_gtfs$calendar[, .(start_date, end_date)] |> head()
#> start_date end_date
#> 1: 2018-06-24 2018-10-28
#> 2: 2018-06-24 2018-10-28
#> 3: 2018-06-24 2018-10-28
#> 4: 2018-06-24 2018-10-28
#> 5: 2018-06-24 2018-10-28
#> 6: 2018-06-24 2018-10-28
tidygtfs <- tidytransit::read_gtfs(data_path)
class(tidygtfs)
#> [1] "tidygtfs" "gtfs"
tidygtfs$stop_times[, c("arrival_time", "departure_time")] |> head()
#> # A tibble: 6 × 2
#> arrival_time departure_time
#> <time> <time>
#> 1 06'00" 06'00"
#> 2 07'30" 07'30"
#> 3 09'00" 09'00"
#> 4 10'30" 10'30"
#> 5 12'00" 12'00"
#> 6 13'00" 13'00"
class(tidygtfs$stop_times$arrival_time)
#> [1] "hms" "difftime"
dt_gtfs <- as_dt_gtfs(tidygtfs)
class(dt_gtfs)
#> [1] "tidygtfs" "gtfs"
dt_gtfs$stop_times[, c("arrival_time", "departure_time")] |> head()
#> arrival_time departure_time
#> 1: 00:06:00 00:06:00
#> 2: 00:07:30 00:07:30
#> 3: 00:09:00 00:09:00
#> 4: 00:10:30 00:10:30
#> 5: 00:12:00 00:12:00
#> 6: 00:13:00 00:13:00
class(dt_gtfs$stop_times$arrival_time)
#> [1] "character" You'll notice that these methods always return a
Following the same logic, we need to make sure that packages that include non-standard representations should always return adequately typed GTFS objects. That is to say that any {gtfstools} function that returns a GTFS object should return a A current violation to the first rule (add additional classes to signal deviations from the standard) are objects that result from Therefore, I propose the following next steps in the development cycle geared towards interoperability:
Wrapping everything up, I'd like to hear what you think about the proposed "rules" and implementation plans outline above. I'd be happy to help with the implementation of these methods in {tidytransit} (and {gtfsrouter}, if need be) if you're short on time in the following months. I'll be on vacations from next week to early January, but I think can work on having a working prototype on {gtfstools} by the middle of next week and could start working on the other packages from mid-January onward. Cheers! |
Beta Was this translation helpful? Give feedback.
-
Some progress on the {gtfstools} side of things, data_path <- system.file(
"extdata",
"google_transit_nyc_subway.zip",
package = "tidytransit"
)
tidygtfs <- tidytransit::read_gtfs(data_path)
gtfstools::get_trip_speed(tidygtfs)
#> trip_id origin_file speed
#> 1: ASP18GEN-1037-Sunday-00_000600_1..S03R shapes 24.32779
#> 2: ASP18GEN-1037-Sunday-00_002600_1..S03R shapes 24.32779
#> 3: ASP18GEN-1037-Sunday-00_004600_1..S03R shapes 24.32779
#> 4: ASP18GEN-1037-Sunday-00_006600_1..S03R shapes 24.32779
#> 5: ASP18GEN-1037-Sunday-00_007200_1..N03R shapes 24.11986
#> ---
#> 17142: BSP18GEN-R087-Weekday-00_142400_R..N27R shapes 24.23187
#> 17143: BSP18GEN-R087-Weekday-00_143500_R..S27R shapes 23.87020
#> 17144: BSP18GEN-R087-Weekday-00_144600_R..N27R shapes 25.79521
#> 17145: BSP18GEN-R087-Weekday-00_146600_R..N27R shapes 25.79521
#> 17146: BSP18GEN-R087-Weekday-00_148600_R..N27R shapes 25.79521
filtered_gtfs <- gtfstools::filter_by_route_id(tidygtfs, "1")
filtered_gtfs$trips
#> route_id service_id trip_id
#> 1: 1 ASP18GEN-1037-Sunday-00 ASP18GEN-1037-Sunday-00_000600_1..S03R
#> 2: 1 ASP18GEN-1037-Sunday-00 ASP18GEN-1037-Sunday-00_002600_1..S03R
#> 3: 1 ASP18GEN-1037-Sunday-00 ASP18GEN-1037-Sunday-00_004600_1..S03R
#> 4: 1 ASP18GEN-1037-Sunday-00 ASP18GEN-1037-Sunday-00_006600_1..S03R
#> 5: 1 ASP18GEN-1037-Sunday-00 ASP18GEN-1037-Sunday-00_007200_1..N03R
#> ---
#> 1042: 1 ASP18GEN-1087-Weekday-00 ASP18GEN-1087-Weekday-00_144900_1..N03R
#> 1043: 1 ASP18GEN-1087-Weekday-00 ASP18GEN-1087-Weekday-00_145900_1..N03R
#> 1044: 1 ASP18GEN-1087-Weekday-00 ASP18GEN-1087-Weekday-00_147200_1..N03R
#> 1045: 1 ASP18GEN-1087-Weekday-00 ASP18GEN-1087-Weekday-00_148550_1..N03R
#> 1046: 1 ASP18GEN-1087-Weekday-00 ASP18GEN-1087-Weekday-00_149900_1..N03R
#> trip_headsign direction_id block_id shape_id
#> 1: South Ferry 1 1..S03R
#> 2: South Ferry 1 1..S03R
#> 3: South Ferry 1 1..S03R
#> 4: South Ferry 1 1..S03R
#> 5: Van Cortlandt Park - 242 St 0 1..N03R
#> ---
#> 1042: Van Cortlandt Park - 242 St 0 1..N03R
#> 1043: Van Cortlandt Park - 242 St 0 1..N03R
#> 1044: Van Cortlandt Park - 242 St 0 1..N03R
#> 1045: Van Cortlandt Park - 242 St 0 1..N03R
#> 1046: Van Cortlandt Park - 242 St 0 1..N03R
class(filtered_gtfs)
#> [1] "dt_gtfs" "gtfs" "list" 😃 As you can see, we have applied two functions from {gtfstools} toolset to a gtfsio_gtfs <- gtfsio::import_gtfs(data_path)
gtfstools::get_trip_duration(gtfsio_gtfs)
#> trip_id duration
#> 1: ASP18GEN-1037-Sunday-00_000600_1..S03R 58.0
#> 2: ASP18GEN-1037-Sunday-00_002600_1..S03R 58.0
#> 3: ASP18GEN-1037-Sunday-00_004600_1..S03R 58.0
#> 4: ASP18GEN-1037-Sunday-00_006600_1..S03R 58.0
#> 5: ASP18GEN-1037-Sunday-00_007200_1..N03R 58.5
#> ---
#> 19886: SIR-SP2018-SI017-Sunday-00_138600_SI..S03R 42.0
#> 19887: SIR-SP2018-SI017-Sunday-00_141100_SI..N03R 42.0
#> 19888: SIR-SP2018-SI017-Sunday-00_141600_SI..S03R 42.0
#> 19889: SIR-SP2018-SI017-Sunday-00_144100_SI..N03R 42.0
#> 19890: SIR-SP2018-SI017-Sunday-00_147100_SI..N03R 42.0 When converting tidygtfs_sf <- tidytransit::gtfs_as_sf(tidygtfs)
head(tidygtfs_sf$shapes)
#> Simple feature collection with 6 features and 1 field
#> Geometry type: LINESTRING
#> Dimension: XY
#> Bounding box: xmin: -74.01513 ymin: 40.70207 xmax: -73.89858 ymax: 40.88925
#> Geodetic CRS: WGS 84
#> shape_id geometry
#> 1 1..N03R LINESTRING (-74.01366 40.70...
#> 2 1..N12R LINESTRING (-74.01366 40.70...
#> 3 1..N13R LINESTRING (-74.01366 40.70...
#> 4 1..S03R LINESTRING (-73.89858 40.88...
#> 5 1..S04R LINESTRING (-73.90087 40.88...
#> 6 1..S12R LINESTRING (-73.95036 40.82...
dt_gtfs <- gtfstools::as_dt_gtfs(tidygtfs_sf)
head(dt_gtfs$shapes)
#> shape_id shape_dist_traveled shape_pt_lon shape_pt_lat shape_pt_sequence
#> 1: 1..N03R 0.0000 -74.01366 40.70207 1
#> 2: 1..N03R 157.6631 -74.01479 40.70320 2
#> 3: 1..N03R 161.4820 -74.01482 40.70323 3
#> 4: 1..N03R 165.1992 -74.01485 40.70325 4
#> 5: 1..N03R 168.8195 -74.01487 40.70328 5
#> 6: 1..N03R 172.3934 -74.01489 40.70331 6 Other than that, most of the conversion process is just converting tibbles to data.tables and converting some columns (tidytransit uses {hms} to handle time columns, gtfsio saves dates as integers in the yyyymmdd format, etc). Most of {gtfstools} functions can already handle other types of GTFS objects (using the dev version), with the exception of I'm going on vacation tomorrow and will be away until the beginning of January. I hope you enjoy your celebrations and I'm excited for what 2023 may bring to us. Cheers! |
Beta Was this translation helpful? Give feedback.
-
Hi all.
Recently we have had a few issues/discussions (ipeaGIT/gtfstools#48, ipeaGIT/gtfs2gps#245 and ipeaGIT/gtfs2gps#238) on compatibility between GTFS objects created by different packages.
The overall issue seems that each of our packages include its own
read_gtfs()
function, in which ad-hoc decisions are made (e.g. convert a field to a specific class, convert the whole table to a specific type of dataframe, etc). In the end, we have many objects which are not necessarily compatible with our functions.@mpadge suggested ditching the GTFS reading functions in favor of converting from standards in the beginning of every function call if we needed that. My immediate reaction was negative, because I originally imagined each of our packages having its own "opinionated" view on how each table/field should be dealt with (for example, tidytransit converts tables to tibbles, gtfstools converts dates to Date, etc). To separate each of our own "views", we could easily assign a different class to a GTFS object (
dt_gtfs
in the case of gtfstools,tidygtfs
in the case of tidytransit) and done, we wouldn't have problems dealing with it internally.The problem is that when we do that we lose package interoperability. If I want to use a tidygtfs object with a gtfstools function, I won't be able to. In the case of tidytransit and gtfstools the difference makes sense, because I can't use data.table syntax with tibbles (although I think the opposite may be possible, but I haven't checked it). But preventing gtfs2gps and gtfsrouter objects from using gtfstools functions because they don't assign a
dt_gtfs
class to their object is lame, because they also use data.table - there wouldn't be a syntax issue preventing their objects from being processed with gtfstools functions. Of course, some of the required fields may be absent, or may be of different types, but that's exaclty what gtfsioassert_file_exists()
,assert_field_exists()
andassert_field_class()
are for.So it seems like assigning a custom class to a GTFS object makes sense in some cases, but not in others. It doesn't make sense to assign a
dt_gtfs
class, because gtfsio objects are already data.tables by default and that prevents other data.table based packages from using gtfstools functions. It makes sense to assign atidygtfs
class, because it prevents incompatible syntax from being used.So I'm now leaning towards Mark's suggestion in favor of ditching gtfstools custom read and write functions and instead substituting the date conversion to function like
convert_dates()
or whatever. But why I'm exposing this discussion to gtfsio, instead of keeping it in gtfstools? Because we also want to be able to use tidytransit objects in our functions!Since gtfsio results data.table-based objects anyway, we could ditch the
dt_gtfs
altogether and instead just rely on the defaultc("gtfs", "list")
classes. Again, I don't think it's wise to do that in the case of tidytransit due to the reasons listed above, so we could create (in gtfsio) aconvert_to_standard()
generic and a specificconvert_to_standard.tidygtfs()
method, in which we would convert the tibble-based objects to data.table-based objects (and convert any necessary columns as well).So now, if I wanted to use a tidygtfs in any of the gtfstools functions I could simply check:
The main advantage of this approach, instead of creating the conversion function inside tidytransit for example, is that we wouldn't need to import the package, and instead keep our dependency to gtfsio only.
The issue now would be how to deal with gtfstools/gtfs2gps/gtfsrouter -> tidytransit. I tried using
tidytransit::gtfs_as_sf()
with a gtfstools objects and it seems to have worked, which reinforce the feeling that tidyverse syntax is compatible with data.tables, but it still requires more robust testing. If that's the case, though, we wouldn't have to do much (anything?), and tidytransit would already work with any gtfs-inheriting objects.If you like the proposed solution I can go ahead and implement a first draft of it.
Sorry for the long post, but I felt like the discussion deserved some context and analysis, instead of simply jumping directly to the proposal. Cheers, all the best!
Beta Was this translation helpful? Give feedback.
All reactions