Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

String columns for labels are not identified as discrete when dtype is object #6

Open
kayla-jackson opened this issue Oct 1, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@kayla-jackson
Copy link
Collaborator

kayla-jackson commented Oct 1, 2024

With the following example, the using bulk_labels from the .obs attribute works fine, because the labels here are correctly identified as categorical.

import scanpy as sc
from concordex.utils._labels import Labels

# Categorical labels
ad = sc.datasets.pbmc68k_reduced()
labels = Labels("bulk_labels")
labels.extract(ad)
print(labels.labeltype)

...but if we update the column so that the dtype is object, the labels are incorrectly described as continuous

# Object labels
ad.obs['bulk_labels'] = ad.obs['bulk_labels'].astype(object)
labels = Labels("bulk_labels")
labels.extract(ad)
print(labels.labeltype)

This will almost certainly be a problem if a pandas reader (e.g. pd.read_csv) is used to read in metadata from a file. I'm wondering if I should do the conversion internally, with warning, or stop with error. I'm guessing that continuous columns with string representations of NULL/NaN will also be read in as object, so internal conversion in this case would be the wrong thing to do here. We could implement some of the R logic here and do a proper "guess" of the column type, but I'd like to avoid checking each item of the column, to confirm that object vs string.

@kayla-jackson kayla-jackson added the bug Something isn't working label Oct 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant