Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Review the code used to create the analysis data #33

Open
10 tasks
mbcann01 opened this issue Dec 4, 2023 · 19 comments
Open
10 tasks

Review the code used to create the analysis data #33

mbcann01 opened this issue Dec 4, 2023 · 19 comments
Assignees
Labels
data wrangling A data wrangling task documentation Improvements or additions to documentation

Comments

@mbcann01
Copy link
Member

mbcann01 commented Dec 4, 2023

Overview

Several GRAs have worked hard to create an analysis data frame from the separate files exported by FileMaker Pro as part of the DETECT follow-up interviews. I need to go review them all for correctness and stylistic consistency. Additionally, I need to create some instructions for using/updating files in the future.

Links

Tasks

  • Review all of the data import and cleaning files.
    • Use the here package to facilitate file import and export.
    • Remove files that are no longer needed. Are the QAQC files needed? If so, does it make sense to consolidate them with the data cleaning files?
    • Make sure that the code is all styled consistently and in accordance with the Tidyverse Style Guide.
  • Add attributes for the codebook.
  • Do we want to create separate codebooks for each separate data frame with instructions for merging them in the description?
  • Change the names of all the files that clean the individual data sets to "data_01...". For example, "data_03_clutter_scale_import.qmd" to "data_01_clutter_scale_import.qmd". Why? Because there is no natural order and they don't build upon one another. They are independent code bases.
  • Update the Wiki. Document where the files are, what they do, and how they fit together.
  • Make sure you are able to recreate the codebook files.
  • Update the data on SharePoint. Let Ebie know.
@mbcann01 mbcann01 self-assigned this Dec 4, 2023
@mbcann01 mbcann01 converted this from a draft issue Dec 4, 2023
@mbcann01 mbcann01 added documentation Improvements or additions to documentation data wrangling A data wrangling task labels Dec 4, 2023
@mbcann01 mbcann01 added this to the Prepare data for analysis milestone Dec 4, 2023
@mbcann01
Copy link
Member Author

@edambo I think you will take a first pass at this, correct?

@edambo
Copy link
Collaborator

edambo commented Jan 10, 2024

I'm not sure I understand @mbcann01. Are you asking if I worked on these already? I think I did, unless you noticed something I missed.

@mbcann01
Copy link
Member Author

Hi @edambo ! I haven't checked yet. We just talked about it on Monday, so I didn't assume that you'd already done it. If so great!

@edambo
Copy link
Collaborator

edambo commented Jan 11, 2024

@mbcann01 Ah, yes. I think I did this already if you mean deleting the files.

@mbcann01
Copy link
Member Author

@edambo What about the other tasks listed above? Did you happen to do either of them?

@edambo
Copy link
Collaborator

edambo commented Jan 11, 2024

@mbcann01 Yes, I think they were done before the holidays. I changed the style as you requested, but it's possible I missed something. Let me know if this is the case.

@mbcann01
Copy link
Member Author

@edambo , I started looking through the code. In data_01_aps_investigations_import.qmd line 26, the code to import the data looks like this:

aps_inv <- read_csv("../data/filemaker_pro_exports/aps_investigations_import.csv")

However, that produces and error on my computer because that is not the file path to the data. Of course, I can just change the file path in the code to match the file path on my computer, but I want it to run on your computer too. What is the path to this file in your computer?

@edambo
Copy link
Collaborator

edambo commented Mar 1, 2024 via email

@edambo
Copy link
Collaborator

edambo commented Mar 1, 2024 via email

@mbcann01
Copy link
Member Author

mbcann01 commented Mar 15, 2024

2024-03-15

Left off at:

  • Review all of the data import and cleaning files.
  • Finished data_02.
  • Going through check_consenting_participants.qmd to see if that needs to be a separate file or if it can be combined with data_02. Left off on line 22 -- creating a more efficient way to read in the files.

Copy and paste for commits:

Brad's review of data_02_consent_import.qmd
Part of #33 
- Use the `here` package to facilitate file import and export.
- Made headings more consistent.
- Checked for overlap with qaqc/check_consenting_participants.qmd. There was nothing in the QAQC file that wasn't also in the data import file. Deleted the QAQC file.
- Added two carriage returns before level one headings.

mbcann01 added a commit that referenced this issue Mar 15, 2024
Part of #33
- Use the `here` package to facilitate file import and export.
- Checked for overlap with `qaqc/data_01_aps_recode_factors.Rmd`. There was nothing in the QAQC file that wasn't also in the data import file. Deleted the QAQC file.
- Added two carriage returns before level one headings.
mbcann01 added a commit that referenced this issue Mar 18, 2024
Part of #33
- Use the `here` package to facilitate file import and export.
- Made headings more consistent.
- Checked for overlap with qaqc/check_consenting_participants.qmd. There was nothing in the QAQC file that wasn't also in the data import file. Deleted the QAQC file.
- Added two carriage returns before level one headings.
@mbcann01
Copy link
Member Author

mbcann01 commented Mar 21, 2024

2024-03-21

Left off at:

  • Review all of the data import and cleaning files.
  • Finished data_02.
  • Going through check_consenting_participants.qmd to see if that needs to be a separate file or if it can be combined with data_02. Left off on line 115 -- trying to recreate Ebie's results.

Copy and paste for commits:

Brad's review of check_consenting_participants.qmd
Part of #33 
- Use the `here` package to facilitate file import and export.
- Made headings more consistent.
- Checked for overlap with qaqc/check_consenting_participants.qmd. There was nothing in the QAQC file that wasn't also in the data import file. Deleted the QAQC file.
- Added two carriage returns before level one headings.
- Simplified some of the code.

@mbcann01
Copy link
Member Author

mbcann01 commented Mar 22, 2024

2024-03-22

Left off at:

  • Review all of the data import and cleaning files.
  • Finished data_02.
  • Going through check_consenting_participants.qmd to see if that needs to be a separate file or if it can be combined with data_02. Left off on line 118 -- trying to recreate Ebie's results.

Copy and paste for commits:

Brad's review of check_consenting_participants.qmd
Part of #33 
- Use the `here` package to facilitate file import and export.
- Made headings more consistent.
- Checked for overlap with qaqc/check_consenting_participants.qmd. There was nothing in the QAQC file that wasn't also in the data import file. Deleted the QAQC file.
- Added two carriage returns before level one headings.
- Simplified some of the code.

@mbcann01
Copy link
Member Author

mbcann01 commented Apr 4, 2024

2024-04-04

Left off at:

  • Review all of the data import and cleaning files.
  • Finished data_02.
  • Going through check_consenting_participants.qmd to see if that needs to be a separate file or if it can be combined with data_02. Left off on line 118 -- trying to recreate Ebie's results.
  • During this process, I discovered that there is an NA value for MedStar ID in the self_report_import.rds data frame. I need to remove that row and then come back to check_consenting_participants.qmd

Reviewing data_06_self_report_import.qmd

  • I left off on line 148. I started making a function to recode the character variables to numeric, then create factor versions, and then relocate the factor versions next to the numeric versions. This should remove a lot of repetition from the code, but it probably doesn't buy me a lot. PLEASE do the other bullets below BEFORE finishing the function.
  • Figure out if the row with an NA value for MedStar ID needs to be removed or not.
  • There is a lot of tidyselect code being used to select columns inside of across() for recoding factors. Let's use explicit column names instead so that the code is easier to reason about.
  • Change coding for all "Yes/No" columns from "1/2" to "1/0".
  • Spot check the factor code.
  • Finish the recode_factor_relocate function (totally optional).

Copy and paste for commits:

Brad's review of data_06_self_report_import.qmd
Part of #33 
- Use the `here` package to facilitate file import and export.
- Made headings more consistent.
- Checked for overlap with qaqc/data_01_self_report_recode_factors.Rmd. There was nothing in the QAQC file that wasn't also in the data import file. Deleted the QAQC file.
- Added two carriage returns before level one headings.
- Dropped row with missing data.

@mbcann01
Copy link
Member Author

mbcann01 commented Apr 5, 2024

2024-04-05

Reviewing data_06_self_report_import.qmd

  • Left off on line 163
  • I created some helper functions that recode, factor, and relocate. Use them to:
    • There is a lot of tidyselect code being used to select columns inside of across() for recoding factors. Let's use explicit column names instead so that the code is easier to reason about.
    • Change coding for all "Yes/No" columns from "1/2" to "1/0".
    • Check to make sure the "Don't know" values are being recoded correctly.
    • Spot check the factor code. data_01_self_report_recode_factor.Rmd seems more accurate so far.

Note: I got an error saying NA's were introduced by coercion. It had to do with the way "Don't know" was written. Just run unique(self_rep$neglect_go_help). Then, copy and paste. It won't look any different to your eye, but it should fix the problem.

When you are done reviewing data_06_self_report_import.qmd, change coding for all "Yes/No" columns from "1/2" to "1/0" in data_01 and data_02.

Then, go back to reviewing check_consenting_participants.qmd.

mbcann01 added a commit that referenced this issue Apr 5, 2024
Going through check_consenting_participants.qmd to see if that needs to be a separate file or if it can be combined with data_02. Left off on line 118 -- trying to recreate Ebie's results.
mbcann01 added a commit that referenced this issue Apr 5, 2024
Part of #33
While reviewing this file, I discovered that there is an NA value for MedStar ID in the self_report_import.rds data frame. I need to remove that row and then come back to this file.
mbcann01 added a commit that referenced this issue Apr 5, 2024
Part of #33
- Use the `here` package to facilitate file import and export.
- Figure out if the row with an NA value for MedStar ID needs to be removed or not.
mbcann01 added a commit that referenced this issue Apr 5, 2024
#33
I created this code while reviewing `data_06_self_report_import.qmd`. As I'm writing this, it didn't seem like the payoff of changing all of the code was worth the effort. However, I want to save the code -- at least for now -- in case I change my mind or think I will find it useful in some other context. So, I moved it to a new file: exploratory/recoding_factoring_relocating.qmd
mbcann01 added a commit that referenced this issue Apr 6, 2024
#33
Using recoding_factoring_relocating.R to make the code easier to read.
@mbcann01
Copy link
Member Author

mbcann01 commented Apr 9, 2024

2024-04-09

Reviewing data_06_self_report_import.qmd

  • Left off on line 510
  • Base this file off of qaqc/data_01_self_report_recode_facotors.Rmd. That file recoded columns more accurately in some cases.
  • There is a weird encoding issue with the "Number of times" columns. Figure that out, then resume cleaning up the rest of the code for recoding, factoring, and relocating.

When you are done reviewing data_06_self_report_import.qmd, change coding for all "Yes/No" columns from "1/2" to "1/0" in data_01 and data_02.

Then, go back to reviewing check_consenting_participants.qmd.

  • Use the double colon method instead of loading the readr package.

@mbcann01
Copy link
Member Author

mbcann01 commented Apr 10, 2024

2024-04-10, 2024-04-11

Reviewing data_06_self_report_import.qmd

  • Base this file off of qaqc/data_01_self_report_recode_facotors.Rmd. That file recoded columns more accurately in some cases.
  • When you are done reviewing data_06_self_report_import.qmd, change coding for all "Yes/No" columns from "1/2" to "1/0" in data_01 and data_02.
  • Then, go back to reviewing check_consenting_participants.qmd.
    • Use the double colon method instead of loading the readr package.

Copy and paste for commits:

Brad's review of data_01_aps_investigations_import.qmd
Part of #33 
- Started using the functions in recoding_factoring_relocating.R and nums_to_na.R to clean and transform categorical variables.
- Changed coding for all "Yes/No" columns from "1/2" to "1/0".
- Spot check the factor code.
- Use the `here` package to facilitate file import and export.
- Made headings more consistent.
- Checked for overlap with qaqc/data_01_aps_recode_factors.Rmd. After a review, I concluded that we are safe to delete the QAQC file.

mbcann01 added a commit that referenced this issue Apr 10, 2024
#33
Use double colon method instead.
mbcann01 added a commit that referenced this issue Apr 10, 2024
mbcann01 added a commit that referenced this issue Apr 10, 2024
#33
Delete this file when Brad has finished reviewing the repo.
mbcann01 added a commit that referenced this issue Apr 10, 2024
mbcann01 added a commit that referenced this issue Apr 10, 2024
mbcann01 added a commit that referenced this issue Apr 10, 2024
mbcann01 added a commit that referenced this issue Apr 10, 2024
@mbcann01
Copy link
Member Author

mbcann01 commented Apr 11, 2024

2024-04-11

  • Finished reviewing data_06_self_report_import.qmd, data_01_aps_investigations_import.qmd, and data_02_consent_import.qmd
  • Go back to reviewing check_consenting_participants.qmd
  • Trying to recreate Ebie's results.
  • Need to just finish checking for error messages.
  • Her code is returning two rows. My code is returning 4 rows.
  • Also, remove the need to import detect_fu_data_merged.rds

@mbcann01
Copy link
Member Author

mbcann01 commented Apr 16, 2024

2024-04-16

  • Finished reviewing check_consenting_participants.qmd.
  • We found records that needed to be deleted from some of the data sets downloaded from FileMaker Pro. After removing those records, the code in check_consenting_participants.qmd will no longer return the same results. For example, the MedStar ID ending in "...ff587" should not have been included in aps_investigations_import.rds, so we went back to data_01_aps_investigations_import.qmd and removed it. Now, when we run the code below to look for MedStar IDs that appear in the APS Investigations data, but not the consent data, "...ff587" will no longer appear. Therefore, we are primarily keeping this file around as a record of what we did rather than something we need to continue doing. Having said that, there could be additional rows that need to be removed in the future if people get on FM Pro and start clicking things. That will sometimes cause FM Pro to automatically generate values (e.g., name) in the survey data.
  • Next, review data_03_clutter_scale_import.qmd

mbcann01 added a commit that referenced this issue Apr 16, 2024
Part of #33
- Removed the tidyselect code being used to select columns inside of across() for recoding factors. Now, we use explicit column names instead so that the code is easier to reason about.
- Started using the functions in recoding_factoring_relocating.R and nums_to_na.R to clean and transform categorical variables.
- Changed coding for all "Yes/No" columns from "1/2" to "1/0".
- Spot check the factor code.
- Finish the recode_factor_relocate function (totally optional).
- Use the `here` package to facilitate file import and export.
- Made headings more consistent.
- Checked for overlap with qaqc/data_01_self_report_recode_factors.Rmd. After a review, I concluded that we are safe to delete the QAQC file.
mbcann01 added a commit that referenced this issue Apr 16, 2024
Part of #33
- Started using the functions in recoding_factoring_relocating.R and nums_to_na.R to clean and transform categorical variables.
- Changed coding for all "Yes/No" columns from "1/2" to "1/0".
- Spot check the factor code.
- Use the `here` package to facilitate file import and export.
- Made headings more consistent.
- Checked for overlap with qaqc/data_01_aps_recode_factors.Rmd. After a review, I concluded that we are safe to delete the QAQC file.
mbcann01 added a commit that referenced this issue Apr 16, 2024
Part of #33
- Checked MedStar IDs for participants who did not give consent to participate.
- Removed records from the follow-up interview survey data sets for MedStar IDs that did not have a consent document on file.
mbcann01 added a commit that referenced this issue Apr 16, 2024
#33
Change the hyphen to an underscore in the file name.
@mbcann01
Copy link
Member Author

mbcann01 commented Apr 26, 2024

2024-04-26

Copy and paste for commits:

Brad's review of data_03_clutter_scale_import.qmd
Part of #33 
- Started using the functions in recoding_factoring_relocating.R and nums_to_na.R to clean and transform categorical variables.
- Changed coding for all "Yes/No" columns from "1/2" to "1/0".
- Spot check the factor code.
- Use the `here` package to facilitate file import and export.
- Made headings more consistent.
- Checked for overlap with qaqc/data_01_clutter_recode_factors.Rmd. After a review, I concluded that we are safe to delete the QAQC file.

mbcann01 added a commit that referenced this issue Apr 26, 2024
Did this while working on #33
Move unit tests for nums_to_na.R and recoding_factoring_relocating.R to the tests folder.

This is removes the problem of the test data hanging around in the global environment and is ultimately a more sustainable solution.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data wrangling A data wrangling task documentation Improvements or additions to documentation
Projects
Status: In Progress
Development

No branches or pull requests

2 participants