-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add process for converting merged objects to AnnData #613
Conversation
I've gone ahead and tested this and this is ready for review! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! The only thing I am thinking is I know I saw some comment go by somewhere (github? slack?) recently from @jashapiro about updating the extension we use for hdf5 files... Should we do this? @jashapiro please weigh in!
I think if we did that then we would need to change all the individual AnnData file extensions and then dev would also have to update their code. They can have |
It is in #616, which seems like the place to do it (but AlexsLemonade/scpcaTools#244 has to happen first, as we do checks in |
Closes #604
⚠️ Stacked on #610 (Because this is slightly dependent on decisions made in #610, I'm going to wait to request review)
This PR adds a process to convert the merged SCE objects to an AnnData object. When converting to AnnData, I chose to use the same scripts that we use in the main workflow to convert the object and then move the counts. The reason for this is I think it's probably best to be consistent in how we format our objects. In particular, with the AnnData objects, we add the sample metadata as columns in the
colData
and do a little bit of renaming. I don't think there's any reason we shouldn't be doing that here.This process doesn't differ much from the conversion process in the main workflow, except here, we know that the only time we will have an altExp is if the
has_adt
value istrue
. I use that value to determine if the feature file should be created.Also, all of the objects are considered processed objects here, so I run
move_counts_anndata.py
for all HDF5 files. Again, I'm doing this so that how we name things is consistent with the other AnnData objects.The last idea I had was maybe we also want to add the sample metadata to the
colData
of the merged objects. That way, things like diagnosis, age, etc., would be easier to work with instead of living in a separate data frame in themetadata
. What do we think?