-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Revise travel day #82
base: 53-paths-with-79
Are you sure you want to change the base?
Conversation
c122be5
to
ce5a261
Compare
|
||
# Ensure that the households have at least one day in `nts_days_of_week` that | ||
# all household members have trips for | ||
if config.parameters.common_household_day: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new parameter boolean common_household_day
determines whether all individuals of the household need to have a TravDay
in common.
# Ensure that the households have at least one day in `nts_days_of_week` that | ||
# all household members have trips for | ||
if config.parameters.common_household_day: | ||
hids = households_with_common_travel_days( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gets the subset of households where all individuals have a common TravDay
that is in the set of configured days (config.parameters.nts_days_of_week
)
nts_trips, config.parameters.nts_days_of_week | ||
) | ||
else: | ||
hids = households_with_travel_days_in_nts_weeks( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gets the subset of households where all individuals have any TravDay
that is in the set of configured days (config.parameters.nts_days_of_week
)
nts_trips, config.parameters.nts_days_of_week | ||
) | ||
|
||
# Subset individuals and households given filtering of trips |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Subset to the households subset above before matching to ensure matches have required TravDay
s
# match remaining individuals | ||
remaining_ids = spc_edited.loc[ | ||
~spc_edited.index.isin(matches_ind.keys()), "id" | ||
].to_list() | ||
matches_remaining_ind = match_remaining_individuals( | ||
df1=spc_edited, | ||
df2=nts_individuals, | ||
matching_columns=["age_group", "sex"], | ||
remaining_ids=remaining_ids, | ||
show_progress=True, | ||
) | ||
matches_ind.update(matches_remaining_ind) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add matching for any remaining individuals that were part of unmatched households. It might be worth considering if this should be more configurable.
# PSUGOR_B02ID but does not have values for 2021 and 2022 | ||
# region_dict = { | ||
# -10.0: "DEAD", | ||
# -9.0: "DNA", | ||
# -8.0: "NA", | ||
# 1.0: "North East", | ||
# 2.0: "North West", | ||
# 3.0: "Yorkshire and the Humber", | ||
# 4.0: "East Midlands", | ||
# 5.0: "West Midlands", | ||
# 6.0: "East of England", | ||
# 7.0: "London", | ||
# 8.0: "South East", | ||
# 9.0: "South West", | ||
# 10.0: "Wales", | ||
# 11.0: "Scotland", | ||
# } | ||
|
||
# PSUStatsReg_B01ID but does not have values for 2021 and 2022 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be made configurable or to removed before merging.
# In the PSU table, create a column with the region names | ||
psu["region_name"] = psu["PSUGOR_B02ID"].map(region_dict) | ||
# psu["region_name"] = psu["PSUGOR_B02ID"].map(region_dict) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above
1.0: "Northern, Metropolitan", | ||
2.0: "Northern, Non-metropolitan", | ||
3.0: "Yorkshire / Humberside, Metropolitan", | ||
4.0: "Yorkshire / Humberside, Non-metropolitan", | ||
5.0: "East Midlands", | ||
6.0: "East Anglia", | ||
7.0: "South East (excluding London Boroughs)", | ||
8.0: "London Boroughs", | ||
9.0: "South West", | ||
10.0: "Wales", | ||
11.0: "Scotland", | ||
10.0: "West Midlands, Metropolitan", | ||
11.0: "West Midlands, Non-metropolitan", | ||
12.0: "North West, Metropolitan", | ||
13.0: "North West, Non-metropolitan", | ||
14.0: "Wales", | ||
15.0: "Scotland", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Revised regions using field "PSUStatsReg_B01ID"
as this provides values for 2021 and 2022.
] | ||
|
||
# Generate random sample of days by household | ||
get_chosen_day(config).to_parquet( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Get a chosen day for each individual to represent a "sample" day given the configured days of the week and whether the household is configured to share a common day.
# If pwkstat = 1 (full time) | ||
# and a work travel day is available | ||
pl.when(pl.col("pwkstat").eq(1) & pl.col("TravDayWork").is_not_null()) | ||
.then(pl.col("TravDayWork")) | ||
.otherwise( | ||
# If pwkstat = 1 (full time) | ||
# and a work travel day is NOT available | ||
pl.when(pl.col("pwkstat").eq(1) & pl.col("TravDayWork").is_null()) | ||
.then(pl.col("TravDayAny")) | ||
.otherwise( | ||
# If pwkstat = 2 (part time) | ||
# and a work travel day is available | ||
# and a non-work travel day is available | ||
pl.when( | ||
pl.col("pwkstat").eq(2) | ||
& pl.col("TravDayWork").is_not_null() | ||
& pl.col("TravDayNonWork").is_not_null() | ||
) | ||
.then( | ||
# Sample either TravDayWork or TravDayNonWork | ||
# stochastically given config | ||
pl.col("TravDayWork") | ||
# TODO: update from config | ||
if np.random.random() < 1 | ||
else pl.col("TravDayNonWork") | ||
) | ||
.otherwise(pl.col("TravDayAny")) | ||
) | ||
) | ||
.alias("ChosenTravDay") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Samples a chosen day given an individual's pwkstat
value to increase the likelihood of choosing a day that includes a work trip.
matches_remaining_ind = match_remaining_individuals( | ||
df1=spc_edited, | ||
df2=nts_individuals, | ||
matching_columns=["age_group", "sex"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could update the matching_columns here to enable more precision when not using households: e.g. for employment status and urban rural classification.
b210e29
to
1dd281b
Compare
This PR:
pwkstat
in SPC"PSUStatsReg_B01ID"
as this is included for 2021 and 2022