Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revise travel day #82

Open
wants to merge 19 commits into
base: 53-paths-with-79
Choose a base branch
from

Conversation

sgreenbury
Copy link
Collaborator

@sgreenbury sgreenbury commented Jan 8, 2025

This PR:

  • Adds Configurable common day of travel for households and revise the filtering of NTS trips to before matching to increase matches with non-missing trips
  • Updates modelling of choosing a sample travel day for individuals given pwkstat in SPC
  • Adds initial approach for matching for unmatched individuals after household matching
  • Revises the variable used for region to "PSUStatsReg_B01ID" as this is included for 2021 and 2022

@sgreenbury sgreenbury changed the base branch from main to 53-paths-with-79 January 8, 2025 18:34
@sgreenbury sgreenbury force-pushed the 53-paths-with-79-revise-trav-day branch from c122be5 to ce5a261 Compare January 14, 2025 18:39

# Ensure that the households have at least one day in `nts_days_of_week` that
# all household members have trips for
if config.parameters.common_household_day:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new parameter boolean common_household_day determines whether all individuals of the household need to have a TravDay in common.

# Ensure that the households have at least one day in `nts_days_of_week` that
# all household members have trips for
if config.parameters.common_household_day:
hids = households_with_common_travel_days(
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gets the subset of households where all individuals have a common TravDay that is in the set of configured days (config.parameters.nts_days_of_week)

nts_trips, config.parameters.nts_days_of_week
)
else:
hids = households_with_travel_days_in_nts_weeks(
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gets the subset of households where all individuals have any TravDay that is in the set of configured days (config.parameters.nts_days_of_week)

nts_trips, config.parameters.nts_days_of_week
)

# Subset individuals and households given filtering of trips
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Subset to the households subset above before matching to ensure matches have required TravDays

Comment on lines +956 to +968
# match remaining individuals
remaining_ids = spc_edited.loc[
~spc_edited.index.isin(matches_ind.keys()), "id"
].to_list()
matches_remaining_ind = match_remaining_individuals(
df1=spc_edited,
df2=nts_individuals,
matching_columns=["age_group", "sex"],
remaining_ids=remaining_ids,
show_progress=True,
)
matches_ind.update(matches_remaining_ind)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add matching for any remaining individuals that were part of unmatched households. It might be worth considering if this should be more configurable.

src/acbm/preprocessing.py Outdated Show resolved Hide resolved
Comment on lines +112 to +130
# PSUGOR_B02ID but does not have values for 2021 and 2022
# region_dict = {
# -10.0: "DEAD",
# -9.0: "DNA",
# -8.0: "NA",
# 1.0: "North East",
# 2.0: "North West",
# 3.0: "Yorkshire and the Humber",
# 4.0: "East Midlands",
# 5.0: "West Midlands",
# 6.0: "East of England",
# 7.0: "London",
# 8.0: "South East",
# 9.0: "South West",
# 10.0: "Wales",
# 11.0: "Scotland",
# }

# PSUStatsReg_B01ID but does not have values for 2021 and 2022
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be made configurable or to removed before merging.

# In the PSU table, create a column with the region names
psu["region_name"] = psu["PSUGOR_B02ID"].map(region_dict)
# psu["region_name"] = psu["PSUGOR_B02ID"].map(region_dict)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above

Comment on lines +135 to +149
1.0: "Northern, Metropolitan",
2.0: "Northern, Non-metropolitan",
3.0: "Yorkshire / Humberside, Metropolitan",
4.0: "Yorkshire / Humberside, Non-metropolitan",
5.0: "East Midlands",
6.0: "East Anglia",
7.0: "South East (excluding London Boroughs)",
8.0: "London Boroughs",
9.0: "South West",
10.0: "Wales",
11.0: "Scotland",
10.0: "West Midlands, Metropolitan",
11.0: "West Midlands, Non-metropolitan",
12.0: "North West, Metropolitan",
13.0: "North West, Non-metropolitan",
14.0: "Wales",
15.0: "Scotland",
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revised regions using field "PSUStatsReg_B01ID" as this provides values for 2021 and 2022.

]

# Generate random sample of days by household
get_chosen_day(config).to_parquet(
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Get a chosen day for each individual to represent a "sample" day given the configured days of the week and whether the household is configured to share a common day.

Comment on lines +649 to +678
# If pwkstat = 1 (full time)
# and a work travel day is available
pl.when(pl.col("pwkstat").eq(1) & pl.col("TravDayWork").is_not_null())
.then(pl.col("TravDayWork"))
.otherwise(
# If pwkstat = 1 (full time)
# and a work travel day is NOT available
pl.when(pl.col("pwkstat").eq(1) & pl.col("TravDayWork").is_null())
.then(pl.col("TravDayAny"))
.otherwise(
# If pwkstat = 2 (part time)
# and a work travel day is available
# and a non-work travel day is available
pl.when(
pl.col("pwkstat").eq(2)
& pl.col("TravDayWork").is_not_null()
& pl.col("TravDayNonWork").is_not_null()
)
.then(
# Sample either TravDayWork or TravDayNonWork
# stochastically given config
pl.col("TravDayWork")
# TODO: update from config
if np.random.random() < 1
else pl.col("TravDayNonWork")
)
.otherwise(pl.col("TravDayAny"))
)
)
.alias("ChosenTravDay")
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Samples a chosen day given an individual's pwkstat value to increase the likelihood of choosing a day that includes a work trip.

@sgreenbury sgreenbury marked this pull request as ready for review January 16, 2025 20:51
matches_remaining_ind = match_remaining_individuals(
df1=spc_edited,
df2=nts_individuals,
matching_columns=["age_group", "sex"],
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could update the matching_columns here to enable more precision when not using households: e.g. for employment status and urban rural classification.

@sgreenbury sgreenbury force-pushed the 53-paths-with-79-revise-trav-day branch from b210e29 to 1dd281b Compare January 22, 2025 18:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant