-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow levels to be missing for fct_concept
#40
base: main
Are you sure you want to change the base?
Conversation
@mlondschien just had a quick look at this. Iiuc, what you're after is a concept class that allows for returning arbitrary strings. Sure, we could make the I guess the reason why we never felt a need for this is inter-source data harmonization: if you define a concept with categorical values, all data sources need to stick to the same "labels" otherwise you can't really do any "source-agnostic" down-steam work. For the concrete use-case ("caregiver"), is the set of levels very large? Or is there another reason for not specifying this ahead of time? Could this concept be implemented for other datasets as well? Not super sure whether this actually works, as we never tried out this specifically. The heavy reliance on S3 was in part to allow for extensibility, so you might be able to define your own concept class ( |
Yes, around ~10'000 levels.
I'm using this locally. It does work. Is there a downside to using levels here? |
@mlondschien I assume in that case you are not actually interested in the "values" themselves but more in a grouping this induces? If this is actually the case, why not simply return an "auto increment" ID (integer valued; in data.table you could use the If you were interested in contributing such a concept to ricu, we'd have to think about how to do this across datasets. We'd have to somehow "namespace" the IDs, as the MIMIC caregiver no. 1 almost surely is distinct from the HiRID caregiver no. 1. But step by step. |
Yes, I one-hot encode them later. However, I prefer leaving them as is, as this simplifies downstream analysis when talking to collaborators. IIUC factors are nothing other than
I currently simply treat |
Any chance we could merge this? This would allow me to work with the main branch of ricu and not my personal fork. |
The docs state
Currently, a set of factor levels needs to be specified. Sometimes this is not desirable. This PR allows not specifying
levels
infct_cncpt
.