Skip to content

File Structure

edambo edited this page Dec 8, 2024 · 10 revisions

This page describes the organization of the files in the detect_fu_interviews_public repository. This information can be used to navigate and appropriately modify folders and files.

Page contents

Folder structure

  • codebooks: This folder contains the codebooks for the F/U interview data frames. It also contains the code used to create the codebooks.

    • variable_descriptions: This folder contains RDS files of variable names and descriptions extracted from the codebook attributes.
  • data: This folder contains data files (e.g., csv, Rds).

  • data_management: This folder contains code files used to import, clean, and transform the F/U interview data.

    • qaqc: This folder contains code files used to check the quality of the data.
    • unique_person_identification: This folder contains code files used to create unique person identifiers.
  • docs: This folder contains Word, PDF, and other documents that aren't direct inputs to, or outputs of, any data management or analysis code, but they do provide context or other useful information.

  • exploratory: This folder contains code for minor one-off and exploratory analyses.

  • r: This folder contains R scripts. Typically, R scripts are only used for writing custom functions.

What do all the code files in this repository do?

In general, you won't be able to do much with the files in this repository without the DETECT Follow-Up Interviews data. So, if you haven't already, you will need to download the DETECT data to your computer. Below are descriptions for the purposes of each of the files in this repository:

  • data_management folder

    • data_01_aps_investigations_import.qmd -  Cleans the APS investigations dataset and creates an RDS file containing the cleaned data.

    • data_01_consent_import.qmd - Cleans the consent dataset and creates an RDS file containing the cleaned data.

    • data_01_clutter_scale_import.qmd -  Cleans the clutter scale dataset and creates an RDS file containing the cleaned data.

    • data_01_general_health_import.qmd -  Cleans the general health dataset and creates an RDS file containing the cleaned data.

    • data_01_observational_measures_import.qmd - Cleans the observational measures dataset and creates an RDS file containing the cleaned data.

    • data_01_self-report_import.qmd - Cleans the self-report dataset and creates an RDS file containing the cleaned data.

    • data_01_sociodemographic_information_import.qmd - Cleans the sociodemographic information dataset and creates an RDS file containing the cleaned data.

    • data_01_lead_panel_assessment_import.qmd - Cleans the LEAD panel assessment dataset and creates an RDS file containing the cleaned data.

    • data_01_participant_import.qmd - Cleans the participant dataset and creates an RDS file containing the cleaned data.

    • unique_person_identification

      • data_01_unique_person_fastlink_detect_fu_data.qmd - Creates fastlink output to identify unique people with different MedStar IDs in the DETECT FU Interview study and assign unique identifiers to them.
      • data_02_unique_person_detect_fu_data.qmd - Includes code that processes the fastlink output and outputs a participant dataset with a unique identifier.
      • data_unique_person_01_within_set_aps.qmd - Cleans data from APS for recreating participant unique identifiers.
  • codebooks folder

    • data_01_aps_investigations_codebook.qmd - Creates codebook for the APS investigations dataset using the RDS file generated by data_02_aps_investigations_import.qmd.

    • data_01_clutter_scale_codebook.qmd - Creates codebook for the clutter scale dataset using the RDS file generated by data_03_clutter_scale_import.qmd.

    • data_01_general_health_codebook.qmd - Creates codebook for the general health dataset using the RDS file generated by data_04_general_health_import.qmd.

    • data_01_observational_measures_codebook.qmd - Creates codebook for the observational measures dataset using the RDS file generated by data_05_observational_measures_import.qmd.

    • data_01_self_report_codebook.qmd - Creates codebook for the self-report dataset using the RDS file generated by data_06_self-report_import.qmd.

    • data_01_sociodemographic_information_codebook.qmd - Creates codebook for the self-report dataset using the RDS file generated by data_07_sociodemographic_information_import.qmd.

    • data_01_lead_panel_assessment_codebook.qmd - Creates codebook for the LEAD panel assessment dataset using the RDS file generated by data_08_lead_panel_assesment_import.qmd.

    • data_01_participant_codebook.qmd - Creates codebook for the participant dataset using the RDS file generated by data_09_participant_import.qmd.

    • data_01_merged_detect_fu_data_codebook.qmd - Creates codebook for the merged data set using the RDS file generated by data_10_merged_detect_fu_data.qmd.

  • r folder

    • add_attributes_code.R - WIP

    • add_shade_column.R – This file contains a function that adds a new column called shade to a data frame, which is always in the first position. The value of shade will alternate between TRUE and FALSE according to the value of the var column. It will be used to add a background to the DataTable with every other variable shaded.

    • broad_check_message.R - Function that generates a data file info message.

    • cont_stats.R – This file contains a function that creates statistical summary columns containing calculated values for n, mean, and 95% confidence interval; and median and 95% confidence interval for a numeric column using the n_mean_ci and n_median_ci functions.

    • cont_stats_grouped.R - This file contains a function that creates statistical summary columns containing calculated values for N, mean and 95% confidence interval; and median and 95% confidence interval for a numeric column grouped by another column using the n_mean_ci_grouped and n_median_ci_grouped functions.

    • convert_label_to_cb_add_col_attributes.R – This file contains a function that creates a new file using the code in data_survey_23_codebook.Rmd with the label function replaced by the cb_add_col_attributes function (WIP L2C).

    • data_cleaning_tools.R – This file contains functions used in the data management data cleaning files. These functions provide summaries of data variable values that help in the data-cleaning process.

    • fact_reloc.R - This file contains a function that creates a new factor version of a column and then positions the new factor column directly behind the non-factor version of the column in the data frame. The new factor column will automatically use the _f naming convention. Internally, it uses a combination of factor() and dplyr::relocate().

    • get_unique_value_summary.R - This file contains a function that creates a summary table of unique values in target columns, with counts.

    • identify_codebook_variables_to_update.R - The purpose of the function in this file is to compare the variables included in the last run of the codebook file to the variables in the last run of the data cleaning file so it's easier to determine which variables need to be included and which need to be removed from the codebook.

    • import_data.R - This file contains a function that imports the data that will be used to create master tables. This dataset, combined_participant_data.rds is created in link2care_public/data_survey_21_merge.Rmd. Additionally, this code assumes that this file is being run from the SharePoint General folder. The data will be imported at the top of every .qmd file. That way, the details in the Administrative Information table on the home page are correct for all tables (L2C).

    • lead_determination_vs_detect_tool_confusion_matrix_by_abuse_type.R - The function in this file creates a data frame with two variables: The final abuse determination made by the LEAD panel (for a specified abuse type or the aggregate of all types); and the abuse determination made using the DETECT tool at the initial visit. This data frame is then used to create a formatted confusion matrix and a summary table containing prevalence, sensitivity, and specificity calculations.

    • lead_determination_vs_detect_tool_item_confusion_matrix.R - The function in this file creates a data frame with two variables: The final abuse determination made by the LEAD panel; and the abuse determination made using the DETECT tool at the initial visit (for a specified DETECT tool item or the aggregate of all items). This data frame is then used to create a formatted confusion matrix and a summary table containing prevalence, sensitivity, and specificity calculations.

    • lead_positive_votes_analysis.R - The function in this file creates a data frame that summarizes the lead panel data by including columns that indicate: the total number of positive votes; the proportion of positive votes; whether there are any positive votes for each type of abuse for each assessment; whether there are any positive votes across all subtypes of abuse for each assessment; the LEAD Assessment determination based on a majority vote for each assessment; the final LEAD Assessment determination - the result of the majority vote of the secondary assessment (if one was done) or initial assessment (if a secondary assessment wasn't done).

    • merge_unique_id_to_detect_data_set.R - The function in this file merges the unique person ID from the participant_unique_id.rds file to a DETECT FU Interview data set.

    • missingness_pattern.R - The function in this file creates a flextable showing missingness patterns for a data frame.

    • missingness_summary.R - The function in this file creates a flextable showing a missingness summary of the number of missing and non-missing rows for each data frame variable specified.

    • n_mean_ci.R - The function in this file creates statistical summary columns containing calculated values for N, mean, and 95% confidence interval for a numeric column.

    • n_mean_ci_grouped.R - The function in this file creates statistical summary columns containing calculated values for N, mean, and 95% confidence interval for a numeric column grouped by another column.

    • n_median_ci.R - The function in this file creates statistical summary columns containing calculated values for N, median, and 95% confidence interval for a numeric column.

    • n_median_ci_grouped.R - The function in this file creates statistical summary columns containing calculated values for N, median, and 95% confidence interval for a numeric column grouped by another column.

    • n_percent_ci.R - The function in this file creates statistical summary columns containing calculated values for N, overall percent, and 95% confidence interval for a categorical column.

    • n_percent_ci_grouped.R - The function in this file creates statistical summary columns containing calculated values for N, overall percent, and 95% confidence interval for a categorical column grouped by another column.

    • nums_to_na.R - The functions in this file convert selected numeric values to NA.

    • recoding_factoring_relocating.R - The functions in this file recode character columns to numeric columns, convert the numeric columns into factors, and then relocate the factor versions of the columns directly after the numeric versions.

    • unique_case_count.R - The function in this file creates a summary that includes the count and proportion for each unique ID grouped by a given variable.

    • variable_descriptions.R - The function in this file generates a data frame of column names and descriptions based on the attributes created in the codebook generation files using the cb_add_col_ attributes function.

Top

File naming conventions

  • Separate data cleaning and data analysis into separate Rmd files.

    • Data cleaning files should be named:
      • data_[order number]_[purpose]
      • Example: data_03_prep_for_sna
    • Analysis files that do not directly create a table or figure should be named:
      • analysis_[order number]_[brief summary of content]
      • Example: analysis_01_exploratory
    • Analysis files that DO directly create a table or figure should be named:
      • table_[brief summary of content] or
      • fig_[brief summary of content]
      • Example: table_network_characteristics
  • Images should be png and should be saved to the img folder and given a descriptive name.

  • Word and pdf files should be saved to the docs/ folder and given a descriptive name.

  • RDS, RData, CSV, Excel, etc. files should be saved to the data/ folder and given a descriptive name.

DETECT Follow-up Interview Public Wiki

Clone this wiki locally