-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merge MedStar data with APS data #27
Comments
2019-06-21: Left off at line 99 |
Left off on 470: Get the code to work, then move it around and make it pretty. |
If it runs out of memory again after trying min.weight = 0.05, then I'm going to have to go back to fastLink and manual review. I just have to get this data merge done. |
Left off trying to figure out the best way to reduce the data.
|
Left off at line 923. When there is more than one match, keep the closest in time only. |
Need to make sure I'm using rules to filter matches in a very systematic way. |
|
|
|
- Part of #27 - Replaced spaces with underscores in address street name. - Deleted data_medstar_epcr_02_variable_management.nb.html. It's unnecissary and just takes up extra space. - Saved medstar_epcr_02_variable_management.rds as RDS instead of Feather. It doesn't seem like Feather ever really caught on.
Separated data_medstar_aps_merged_01_merge.Rmd into multiple files. - Part of #27 - The first file is data_medstar_aps_merged_01_recordlinkage.Rmd. - Also created data_medstar_aps_merged_02_refine_possible_matches.Rmd
Left off at data_medstar_aps_merged_01_recordlinkage.Rmd, line 64.
|
Goal: We want to measure the agreement between the results of DETECT screenings and the results APS investigations.
Problem 1: Currently, the results of the DETECT screenings are in a dataset we received from MedStar Mobile Healthcare and the results of APS investigations are in a separate dataset we received from APS. We need to merge the two separate datasets into a single dataset that can be used for analysis.
Problem 2: There is no common identifier variable in both datasets that we can use to match records in the MedStar data with records in the APS data. Therefore, we will have to match based on name and date of birth, which we have in both datasets.
Problem 3. Although we have name and date of birth (dob) in both datasets, we can't match records across datasets in a deterministic way (i.e., IF first name = John in MedStar AND first name = John in APS THEN match, ELSE no match) because there are typos in the data. For example, "John" and "Jon" clearly being the same person (i.e., same last name, dob, and address).
Solution: Therefore, we will need to link records across the datasets probabilistically. R has at least two packages that are designed for probabilistic record linking:
Steps in the record linking process:
-[ ] Next step...
Old stuff....
I copied "data_medstar_aps_merged_01.Rmd" from the 5-week analysis project to the 1-year analysis project. Before moving on to trying to get FastLink to work or writing you own matching algorithm, see if you can get this file to work using the new RecordLinkage big data classes.
https://cran.r-project.org/web/packages/RecordLinkage/vignettes/BigData.pdf
After you finish matching, consider breaking this code up into 3 separate files:
The text was updated successfully, but these errors were encountered: