Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IndexError while running sEVtras calculator #30

Open
songnsnow opened this issue Nov 4, 2024 · 3 comments
Open

IndexError while running sEVtras calculator #30

songnsnow opened this issue Nov 4, 2024 · 3 comments

Comments

@songnsnow
Copy link

songnsnow commented Nov 4, 2024

Hi, I've been trying to run the second part (SEVtras.ESAI_calculator) of the tool, but I am getting the error below.

/home/kleinman/seol.song/projects/kleinman/seol.song/from_hydra/SEVtras/venv_3.11.5/lib/python3.11/site-packages/SEVtras/functional.py:81: FutureWarning: Use anndata.concat instead of AnnData.concatenate, AnnData.concatenate is deprecated and will be removed in the future. See the tutorial for concat at: https://anndata.readthedocs.io/en/latest/concatenation.html
  adata_combined = adata_cell_raw.concatenate(adata_ev, batch_key = OBSev)
/home/kleinman/seol.song/projects/kleinman/seol.song/from_hydra/SEVtras/venv_3.11.5/lib/python3.11/site-packages/scanpy/preprocessing/_pca.py:317: ImplicitModificationWarning: Setting element `.obsm['X_pca']` of view, initializing view as actual.
  adata.obsm[key_obsm] = X_pca
/home/kleinman/seol.song/projects/kleinman/seol.song/from_hydra/SEVtras/venv_3.11.5/lib/python3.11/site-packages/SEVtras/functional.py:126: FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`
  near_neighbor_dat.loc[i, 'times'] = tmp_times[0]
Traceback (most recent call last):
  File "/project/kleinman/seol.song/from_hydra/SEVtras/scripts/hgg/sevtras_calc.py", line 25, in <module>
    SEVtras.ESAI_calculator(
  File "/home/kleinman/seol.song/projects/kleinman/seol.song/from_hydra/SEVtras/venv_3.11.5/lib/python3.11/site-packages/SEVtras/main.py", line 188, in ESAI_calculator
    celltype_e_number, adata_evS, adata_com = deconvolver(adata_ev, adata_cell, species, OBSsample, OBScelltype, OBSev, OBSMpca, cellN, Xraw, normalW)
                                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/kleinman/seol.song/projects/kleinman/seol.song/from_hydra/SEVtras/venv_3.11.5/lib/python3.11/site-packages/SEVtras/functional.py", line 126, in deconvolver
    near_neighbor_dat.loc[i, 'times'] = tmp_times[0]
                                        ~~~~~~~~~^^^
  File "/home/kleinman/seol.song/projects/kleinman/seol.song/from_hydra/SEVtras/venv_3.11.5/lib/python3.11/site-packages/pandas/core/series.py", line 1118, in __getitem__
    return self._values[key]
           ~~~~~~~~~~~~^^^^^
IndexError: index 0 is out of bounds for axis 0 with size 0

I looked into the deconvolver function, and I believe the error is caused by empty tmp_times Series. I think some of the rows (cells) in near_neighbor_dat are not tracing back to a celltype value, which might be causing the error.

Also, I noticed that the X_pca and X_umap values of adata_cell is not being utilized at least up until the deconvolver function, since adata_combined is being reprocessed with scanpy downstream analysis. However, I would like to use the X_pca and X_umap values that I imported from my analysis through Seurat.

Here is my code used for converting my SeuratObject to h5ad. I used the code provided by the SEVtras tutorial (there was a code for exporting the PCA and UMAP values from Seurat, there was no code to input the values into the scanpy h5ad object, so I wrote my own for importing PCA and UMAP values).

# Load 
import scanpy as sc
import anndata
from scipy import io
from scipy.sparse import coo_matrix, csr_matrix
import os
import pandas as pd

# Provided by SEVtras
save_path = '/home/kleinman/seol.song/projects/kleinman/seol.song/from_hydra/SEVtras/data/'
X = io.mmread(os.path.join(save_path, "hgg_processed_matrix.mtx"))
adata = anndata.AnnData(X=X.transpose().tocsr())
metadata = pd.read_csv(os.path.join(save_path, "hgg_processed_metadata.csv"))

with open(os.path.join(save_path, "hgg_processed_gene_names.csv"),'r') as f:
    gene_names = f.read().splitlines()

adata.obs = metadata
adata.obs.index = adata.obs["barcode"]
adata.var.index = gene_names

# Load PCA csv
pca_df = pd.read_csv(os.path.join(save_path, "hgg_processed_pca.csv"), index_col=0)

# Add PCA and UMAP values to the AnnData object
adata.obsm['X_pca'] = pca_df.values
adata.obsm['X_umap'] = metadata[['UMAP_1', 'UMAP_2']].values

# Remove the 'barcode' column from adata.obs
adata.obs = adata.obs.drop(columns=['barcode'])

In conclusion, these are my final questions:

  1. What exactly is causing the IndexError?
  2. Can I use PCA and UMAP values from my Seurat analysis for running the code (+ drawing the UMAP)? This is crucial to me since I would like to reduce batch effects, which would require integrative analysis and altering of PCA and UMAP.

Thanks for your help in advance!

@RuiqiaoHe
Copy link
Member

Could you please check the shape of adata_ev and adata_cell that you input for SEVtras.ESAI_calculator by following code?
print(adata_ev)
print(adata_cell)
There may be a mismatch problem between the two adata objects.

@songnsnow
Copy link
Author

Here is the output from print(adata_ev) and print(adata_cell).

>>> print(adata_ev)
AnnData object with n_obs × n_vars = 4383 × 24040
    obs: 'n_genes_by_counts', 'total_counts', 'total_counts_mt', 'pct_counts_mt', 'total_counts_ribo', 'pct_counts_ribo', 'n_genes', 'score', 'batch', 'sEV', 'score1'
>>> print(adata_cell)
AnnData object with n_obs × n_vars = 36076 × 31706
    obs: 'orig.ident', 'nCount_RNA', 'nFeature_RNA', 'raw_batch_number', 'n_genes_by_counts', 'total_counts', 'total_counts_mt', 'pct_counts_mt', 'total_counts_ribo', 'pct_counts_ribo', 'n_genes', 'score', 'batch', 'sEV', 'score1', 'barcode_prefix', 'batch_number', 'cell.barcode', 'sample.id', 'sample.id.new', 'Paper_ID', 'Sample', 'percent.mito', 'percent.ribo', 'G1.S.score_Whitfield', 'G2.M.score_Whitfield', 'S.Score', 'G2M.Score', 'Phase', 'Paper_group', 'Paper_subgroup', 'Location', 'Age_group', 'Sex', 'Technology', 'Consensus_Jessa', 'Consensus_Pombo_Raw', 'CAWPE_Score_Pombo', 'Class', 'Celltype', 'celltype', 'Level2_Celltype', 'clst_itg', 'seurat_clusters', 'UMAP_1', 'UMAP_2'
    obsm: 'X_pca', 'X_umap'

@RuiqiaoHe
Copy link
Member

RuiqiaoHe commented Nov 5, 2024

At the moment I can't figure out what the bug is. Would it be convenient to share testing adata objects for me? I'll test it in my environment. The address is ruiqiaohe#gmail.com (change # to @).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants