You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The scaling function should be used to scale values within a hierarchy so that the more detailed levels aggregated together equal the aggregate level. Input data dt is defined by a set of id_cols along with a column that is to be scaled over col_stem.
The column that is to be scaled can be two different types of variables col_type
Categorical variable like location or sex etc. col_type = categorical
Numeric interval variable like age or year. These are defined by the start and end of each interval. col_type = interval
Two basic types of use cases
Input data is expected to be "square" and is an exact match with the pre-defined hierarchy. Basic assertions need to be done and the function should be optimized for speed.
Input data is not "square" and may not match up exactly with the pre-defined hierarchy. More detailed assertions and standardization need to be done.
Example 1: aggregating across locations, some years may have different sets of locations available.
Example 2: aggregating across locations, some locations may have different age groups available so need to be collapsed to the most detailed common age groups prior to each level of aggregation.
Example 3: aggregating across age groups, some locations may have different age groups available so need to map correctly from each detailed age group to the aggregate, and make sure each location has the entire expected age range.
Implementation Details
Assertions
Square datasets only
all combinations of all unique values of each id_cols exist.
all of the most detailed nodes in the hierarchy exist.
Non-square datasets only
each level of scaling need to check for square data and potentially collapse interval columns to the most detailed common intervals.
What is the expected behavior when...
it is not possible to scale to one aggregate given the available input data? missing_dt_severity
Implemented
For example when scaling to a national location, one subnational may be missing.
Default is to throw an error.
Warn or ignore, then skips impossible scaling and continues with others.
Skip the check and make scaling anyway.
when interval variables do not exactly match up in the input data? collapse_interval_cols
Implemented
For example when scaling to a national location, one subnational may have five year age groups and another has single year age groups.
Default is to throw error.
Option to automatically collapse to most detailed common intervals
when scaling a categorical variable, and one of the interval id_cols has overlapping intervals?overlapping_dt_severity
Implemented
For example when scaling subnational to national values, and the subnationals have a mix of five-year and single-year age groups, and some subnationals have both.
Default is to throw error.
Warn or ignore, then drops overlapping intervals and continue.
Skip the check and continue with scaling.
when scaling a categorical variable with multiple levels in the mapping but one level is missing? collapse_missing
For example when scaling county to state to national values and some years don't have state values. Should be able to collapse the mapping to know how the county level maps to the national level
Implemented
Default is to throw an error.
Option to automatically drop missing nodes from the mapping.
when value_cols have NA values like #49na_value_severity
Implemented
Default is to throw error.
Warn or ignore, then drop missing values and continue with scaling.
Skip check for NA values and include in scaling.
Implementation steps
Clean up testing script for scaling (right now potentially too long and hard to follow).
Add square argument to determine amount of flexibility in inputs.
Add na_value_severity argument
The text was updated successfully, but these errors were encountered:
Basic Description
The scaling function should be used to scale values within a hierarchy so that the more detailed levels aggregated together equal the aggregate level. Input data
dt
is defined by a set ofid_cols
along with a column that is to be scaled overcol_stem
.The column that is to be scaled can be two different types of variables
col_type
col_type = categorical
col_type = interval
Two basic types of use cases
Implementation Details
Assertions
Square datasets only
id_cols
exist.Non-square datasets only
What is the expected behavior when...
it is not possible to scale to one aggregate given the available input data?
missing_dt_severity
For example when scaling to a national location, one subnational may be missing.
when interval variables do not exactly match up in the input data?
collapse_interval_cols
For example when scaling to a national location, one subnational may have five year age groups and another has single year age groups.
when scaling a categorical variable, and one of the interval
id_cols
has overlapping intervals?overlapping_dt_severity
For example when scaling subnational to national values, and the subnationals have a mix of five-year and single-year age groups, and some subnationals have both.
when scaling a categorical variable with multiple levels in the mapping but one level is missing?
collapse_missing
For example when scaling county to state to national values and some years don't have state values. Should be able to collapse the mapping to know how the county level maps to the national level
when
value_cols
haveNA
values like #49na_value_severity
NA
values and include in scaling.Implementation steps
square
argument to determine amount of flexibility in inputs.na_value_severity
argumentThe text was updated successfully, but these errors were encountered: