Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accessing Metadata from AnnData #50

Open
shahrozeabbas opened this issue Sep 27, 2022 · 6 comments
Open

Accessing Metadata from AnnData #50

shahrozeabbas opened this issue Sep 27, 2022 · 6 comments

Comments

@shahrozeabbas
Copy link

Hello,

I am attempting to export metadata after running inferCNV. I am able to export the CNV score and the Leiden clusters, however I would like to access everything that is also available in the R version such as the loss or gain of each chromosome for each cell. The R version seems to release a large table (~200) columns with data for each chromosome. Is it possible to access this somehow using the python version?

@grst
Copy link
Member

grst commented Sep 28, 2022

Hi,

the matrix with CNV scores is stored in

> adata.obsm["X_cnv"]
<184x9111 sparse matrix of type '<class 'numpy.float64'>'
	with 214913 stored elements in Compressed Sparse Row format>

where each row is a cell and each column a genomic region.

Additionally, there's information which columns in this matrix belong to which chromosome in

> adata.uns["cnv"]["chr_pos"]
{'chr1': 0,
 'chr2': 915,
 'chr3': 1574,
 'chr4': 2141,
 'chr5': 2454,
 'chr6': 2902,
 'chr7': 3394,
 'chr8': 3874,
 'chr9': 4195,
 'chr10': 4564,
 'chr11': 4955,
 'chr12': 5494,
 'chr13': 6009,
 'chr14': 6179,
 'chr15': 6499,
 'chr16': 6791,
 'chr17': 7209,
 'chr18': 7787,
 'chr19': 7928,
 'chr20': 8523,
 'chr21': 8781,
 'chr22': 8880}

i.e. in this example

adata.obsm["X_cnv"][:, 0:915]

contains the scores for chr1.

hope that helps,
Gregor

@shahrozeabbas
Copy link
Author

Hello,

Yes this is helpful, thanks! However, it looks like this info is a superset of the table described here. Is there a way to acquire the 'map_metadata_from_infercnv.txt' described in that link directly from the infercnv object? Either that or maybe is there a way to calculate these data from what's available in data.obsm["X_cnv"]?

Thank you for your help,
Shahroze

@grst
Copy link
Member

grst commented Oct 6, 2022

Unfortunately, segmentation (e.g. using HMM) is currently not implemented in infercnvpy (See also #1).
In principle, you can aggregate the CNV matrix, if you are interested in a certain region, e.g. indices 915:1200 (roughly) refer to the first half of chromosome 2. If you are interested in this region, you could do

cnv_mat = adata.obsm["X_cnv"]
chr2_score = np.mean(cnv_mat[:, 915:1200], axis=1)

to get a score for each cell.

@zhangpebbels
Copy link

Hello~
I got into some trouble. I wanna get the cnv region in chr8 del.I wanna konw which genes del in chr8. but now through the 'X_cnv', I can just get the number but not the correct geneID. And metadata 'chromosome' is not paired with 'X_cnv'.I run infercnvpy with exclude "X,Y,MT,nan",but the number is wrong.chr14:170genes but in the ['chr_pos']:chr14:180

>>adata.var['chromosome'].value_counts()
chr1      525
chr2      347
chr17     308
chr19     307
chr11     301
chr12     287
chr3      281
chr6      277
chr5      240
chr7      237
chr16     207
chr10     192
chr4      188
chr9      170
chr14     170
chr8      169
chrX      152
chr15     147
chr20     135
chr22     128
chr13      90
chr18      68
chr21      49
chrMT      18
chrnan      3

chr1 0
chr2 525
chr3 872
chr4 1153
chr5 1341
chr6 1581
chr7 1858
chr8 2095
chr9 2264
chr10 2434
chr11 2626
chr12 2927
chr13 3214
chr14 3313
chr15 3483
chr16 3630
chr17 3837
chr18 4145
chr19 4244
chr20 4551
chr21 4686
chr22 4785

So could you add the geneID in the ['X_cnv'].or maybe other ways to get the CNV region.Thanks so much.Waiting for your reply.

@grst
Copy link
Member

grst commented Jan 9, 2023

Hi @zhangpebbels,

yes, the metadata in var does not correspond to the data in X_cnv, as one is based on genes and the other on bins that may consist of multiple genes.

@redst4r has been working on a feature to retrieve genes for each bin in #58. But there are still some tests failing and I'm not entirely sure what the status of that PR is.

@jpark27
Copy link

jpark27 commented Oct 28, 2023

Hi, @grst @redst4r

Thank you for sharing great wrapper for infercnvpy. I've been trying to annotate matching gene on heatmap plot (c.f., bottom for all/subset of matching gene symbols) but tuning on show_gene_labels=True only show relevant segment. I wonder is there any work around solution I can try? Possibly @redst4r already found solution but forgot to update repo? Any tips would be much appreciated :-)

best,
Jun

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants