You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
linelist::tags_df() is not comparable to datatagr::labels_df()
linelist::tags_df() generates an output that secures downstream analysis in the outbreak analytics pipeline
datatagr::labels_df() generates an output that helps me to showcase the dataset with labels
Can we have a function in datatagr that still inherits the power of tagging columns to get a validated set of them for secure downstream analysis? Is datatagr::make_datatagr() in the capacity to create a tagged dataframe? If this has been discussed elsewhere, I am happy to read it.
In the reprex below I compare package features.
library(datatagr)
library(linelist)
library(labelled)
library(dplyr)
# linelist ----------------------------------------------------------------dataset<-outbreaks::mers_korea_2015$linelistdataset %>%
dplyr::as_tibble() %>%
linelist::make_linelist(
location="place_infect",
date_onset="dt_onset"
) %>%
linelist::validate_linelist() %>%
linelist::tags_df()
#> # A tibble: 162 × 2#> date_onset location #> <date> <fct> #> 1 2015-05-11 Middle East #> 2 2015-05-18 Outside Middle East#> 3 2015-05-20 Outside Middle East#> 4 2015-05-25 Outside Middle East#> 5 2015-05-25 Outside Middle East#> 6 2015-05-24 Outside Middle East#> 7 2015-05-21 Outside Middle East#> 8 2015-05-26 Outside Middle East#> 9 NA Outside Middle East#> 10 2015-05-21 Outside Middle East#> # ℹ 152 more rows# datatagr ----------------------------------------------------------------datatagr_out<-cars %>%
dplyr::as_tibble() %>%
# Create a datatagr objectdatatagr::make_datatagr(
speed='Miles per hour'
) %>%
# Validate the data are of a specific typedatatagr::validate_datatagr(
speed='numeric'
) %>%
# extract dataframe of labelled variablesdatatagr::labels_df()
datatagr_out#> # A tibble: 50 × 2#> `Miles per hour` dist#> <dbl> <dbl>#> 1 4 2#> 2 4 10#> 3 7 4#> 4 7 22#> 5 8 16#> 6 9 10#> 7 10 18#> 8 10 26#> 9 10 34#> 10 11 17#> # ℹ 40 more rows# The action below may not be expected to be done in an analysis pipelinedatatagr_out %>%
# standardize column names of a data framecleanepi::standardize_column_names()
#> # A tibble: 50 × 2#> miles_per_hour dist#> <dbl> <dbl>#> 1 4 2#> 2 4 10#> 3 7 4#> 4 7 22#> 5 8 16#> 6 9 10#> 7 10 18#> 8 10 26#> 9 10 34#> 10 11 17#> # ℹ 40 more rows# labelled ----------------------------------------------------------------
var_label(cars) <-list(
speed='Miles per hour'
)
cars %>%
labelled::var_label()
#> $speed#> [1] "Miles per hour"#> #> $dist#> NULL
In direct response to the issue title: There is no tags_df() because the naming of tags has been dropped throughout the package (pending the rename of the package).
All functionality that remains is indeed labels_df(), and good to hear the feedback around how it does or does not work for you 😊 We will not be reintroducing the tags_df() as the naming does not fit, but I am happy to consider your second suggested change for integration ("get only the labelled columns"). It may make sense to only have the labelled and validated ones in there. In order to make that comparison, could you add a direct comparison between linelist and datatagr, for the same data?
Your third proposed change ("get standardised column names"), I am not sure about. The package scope is not to wrangle variable names into a prettier format. In your example, the renaming of speed into miles_per_hour does not necessarily make the output of labels_df more usable, if we also retain the labels. It may make sense if we drop the label attribute when using labels_df, and put the label information in the variable name (snake_case formatted), but not both. Would you be okay with dropping the labels and interoperability with labelled in that scenario?
Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
linelist::tags_df()
is not comparable todatatagr::labels_df()
linelist::tags_df()
generates an output that secures downstream analysis in the outbreak analytics pipelinedatatagr::labels_df()
generates an output that helps me to showcase the dataset with labelsCan we have a function in
datatagr
that still inherits the power of tagging columns to get a validated set of them for secure downstream analysis? Isdatatagr::make_datatagr()
in the capacity to create a tagged dataframe? If this has been discussed elsewhere, I am happy to read it.In the reprex below I compare package features.
Created on 2024-10-08 with reprex v2.1.1
Describe the solution you'd like
A clear and concise description of what you want to happen.
datatagr::tags_df()
function to get tagged-only and validated-only columns for downstream analysisdatatagr::labels_df()
to get only the labelled columns (motivating downstream analysis restricted to labelled and validated columns only)datatagr::labels_df()
to get standardised column names (to avoid usingcleanepi
downstream) with labels interoperable with {labelled} (possibly)Additional context
Add any other context or screenshots about the feature request here.
The text was updated successfully, but these errors were encountered: