Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add process for converting merged objects to AnnData #613

Merged
merged 6 commits into from
Dec 14, 2023

Conversation

allyhawkins
Copy link
Member

Closes #604
⚠️ Stacked on #610 (Because this is slightly dependent on decisions made in #610, I'm going to wait to request review)

This PR adds a process to convert the merged SCE objects to an AnnData object. When converting to AnnData, I chose to use the same scripts that we use in the main workflow to convert the object and then move the counts. The reason for this is I think it's probably best to be consistent in how we format our objects. In particular, with the AnnData objects, we add the sample metadata as columns in the colData and do a little bit of renaming. I don't think there's any reason we shouldn't be doing that here.

This process doesn't differ much from the conversion process in the main workflow, except here, we know that the only time we will have an altExp is if the has_adt value is true. I use that value to determine if the feature file should be created.

Also, all of the objects are considered processed objects here, so I run move_counts_anndata.py for all HDF5 files. Again, I'm doing this so that how we name things is consistent with the other AnnData objects.

The last idea I had was maybe we also want to add the sample metadata to the colData of the merged objects. That way, things like diagnosis, age, etc., would be easier to work with instead of living in a separate data frame in the metadata. What do we think?

@allyhawkins allyhawkins marked this pull request as draft December 8, 2023 18:33
Base automatically changed from allyhawkins/altexp-merged-objects to development December 11, 2023 15:17
@allyhawkins allyhawkins marked this pull request as ready for review December 14, 2023 15:47
@allyhawkins
Copy link
Member Author

I've gone ahead and tested this and this is ready for review!

Copy link
Member

@jashapiro jashapiro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Member

@sjspielman sjspielman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! The only thing I am thinking is I know I saw some comment go by somewhere (github? slack?) recently from @jashapiro about updating the extension we use for hdf5 files... Should we do this? @jashapiro please weigh in!

@allyhawkins
Copy link
Member Author

LGTM! The only thing I am thinking is I know I saw some comment go by somewhere (github? slack?) recently from @jashapiro about updating the extension we use for hdf5 files... Should we do this? @jashapiro please weigh in!

I think if we did that then we would need to change all the individual AnnData file extensions and then dev would also have to update their code. They can have .hdf5, .h5ad, or .h5, so I don't think it's worth the headache personally.

@jashapiro
Copy link
Member

It is in #616, which seems like the place to do it (but AlexsLemonade/scpcaTools#244 has to happen first, as we do checks in scpcaTools)

@allyhawkins allyhawkins merged commit 8a4ac59 into development Dec 14, 2023
3 checks passed
@allyhawkins allyhawkins deleted the allyhawkins/merged-sce-to-anndata branch December 14, 2023 16:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants