Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preserve unmapped values #163

Open
ALightNHS opened this issue Feb 3, 2023 · 6 comments
Open

Preserve unmapped values #163

ALightNHS opened this issue Feb 3, 2023 · 6 comments

Comments

@ALightNHS
Copy link

Is it possible to preserve the unmapped and missing/invalid source values when converting from source tables to the CDM tables?

@ALightNHS ALightNHS reopened this Feb 3, 2023
@PhilAppleby
Copy link
Collaborator

I would need to know more about what you mean by "preserve".

Rejected data does not generate CDM output as that would be meaningless

@ALightNHS
Copy link
Author

ALightNHS commented Feb 6, 2023

Thank you for your response.
I would like to see unmapped (potentially error-prone) source values in the CDM output as this would help to identify data quality issues/ inconsistencies where "similar" fields are captured in different systems.

For example, if patient height is stored in two different datasets, it would be useful to map these data sources, and then identify inconsistent source values per patient in the CDM.

Another side to this question is: how would we map source fields containing unstructured text data to the CDM? It wouldn't be feasible to apply the same mapping logic to a json config for clinical notes.

@PhilAppleby
Copy link
Collaborator

Hello again, could you let me have more information on the particular use-case you have in mind?

The software was designed, in collaboration with data partners, to map from input values to output OMOP concepts. Placing an input value in the OMOP output, unless explicitly mapped as a "source_value", would be a violation of this principle.

Also, with reference to your height example, if a person's information is captured as part of two different datasets, this tool will not be aware of this as it works in isolation on each data set individually. We could not use information from one dataset for the other unless we had explicit approvals to do so and therefore this tool has been designed to work on each data set in complete isolation.

Additionally, a file "summary.tsv" is produced which contains no detailed data but gives an indication of rejected input numbers as percentages.

Finally we do have manual methods for mapping clinical notes to OMOP concepts you would need to contact our data team for guidance on that.

@ALightNHS
Copy link
Author

Hi Phil,
Firstly, I would like to say thank you for your responses and patience. I am very excited about CaRROT and believe that it will have a significant impact.

With your permission, I would appreciate the opportunity to exchange emails to discuss this further?

Otherwise, I will try to explain further. I realise that my particular use-case for CaRROT (and the CDM in general) goes against their intended designs. I am trying to take advantage of the CDM's relational schema to integrate multiple data sources, identify inconsistencies, and then to diagnose and resolve these at source.

@PhilAppleby
Copy link
Collaborator

Hi there, as the development of CaRROT-CDM is funded by health data research projects we can discuss further if you contact me using my University of Dundee email account - [email protected]. Could you also identify yourself so I know with whom I'm talking?

@ALightNHS
Copy link
Author

Hi Phil, my name is Anthony Lighterness - I'm a data scientist at The Christie NHS FT. I'll send you an email, thank you for that!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants