Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simulate errors out when celltype names are only numbers, requires text prefix to run correctly. #100

Open
nagendraKU opened this issue Jul 2, 2021 · 1 comment
Labels
enhancement New feature or request

Comments

@nagendraKU
Copy link

I ran scaden simulate with celltype names being the Leiden cluster numbers. Got the following error message and the data.h5ad file was not created.

INFO Datasets: ['testdata_all_bat'] bulk_simulator.py:84
INFO Simulating data from testdata_all_bat bulk_simulator.py:89
INFO Loading testdata_all_bat dataset ... bulk_simulator.py:141
INFO Merging unknown cell types: ['unknown'] bulk_simulator.py:107
INFO Subsampling testdata_all_bat ... bulk_simulator.py:110
/home/ku_user/scadendl/lib64/python3.6/site-packages/anndata/_core/anndata.py:120: ImplicitModificationWarning: Transforming to str index.
warnings.warn("Transforming to str index.", ImplicitModificationWarning)
... storing 'ds' as categorical
Traceback (most recent call last):
File "/home/ku_user/scadendl/lib64/python3.6/site-packages/anndata/_io/utils.py", line 209, in func_wrapper
return func(elem, key, val, *args, **kwargs)
File "/home/ku_user/scadendl/lib64/python3.6/site-packages/anndata/_io/h5ad.py", line 247, in write_dataframe
col_names = [check_key(c) for c in df.columns]
File "/home/ku_user/scadendl/lib64/python3.6/site-packages/anndata/_io/h5ad.py", line 247, in
col_names = [check_key(c) for c in df.columns]
File "/home/ku_user/scadendl/lib64/python3.6/site-packages/anndata/_io/utils.py", line 109, in check_key
raise TypeError(f"{key} of type {typ} is an invalid key. Should be str.")
TypeError: 0 of type <class 'int'> is an invalid key. Should be str.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/home/ku_user/scadendl/bin/scaden", line 8, in
sys.exit(main())
File "/home/ku_user/scadendl/lib64/python3.6/site-packages/scaden/main.py", line 48, in main
cli()
File "/home/ku_user/scadendl/lib64/python3.6/site-packages/click/core.py", line 1137, in call
return self.main(*args, **kwargs)
File "/home/ku_user/scadendl/lib64/python3.6/site-packages/click/core.py", line 1062, in main
rv = self.invoke(ctx)
File "/home/ku_user/scadendl/lib64/python3.6/site-packages/click/core.py", line 1668, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/ku_user/scadendl/lib64/python3.6/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/ku_user/scadendl/lib64/python3.6/site-packages/click/core.py", line 763, in invoke
return __callback(*args, **kwargs)
File "/home/ku_user/scadendl/lib64/python3.6/site-packages/scaden/main.py", line 215, in simulate
fmt=data_format,
File "/home/ku_user/scadendl/lib64/python3.6/site-packages/scaden/simulate.py", line 22, in simulation
bulk_simulator.simulate()
File "/home/ku_user/scadendl/lib64/python3.6/site-packages/scaden/simulation/bulk_simulator.py", line 90, in simulate
self.simulate_dataset(dataset)
File "/home/ku_user/scadendl/lib64/python3.6/site-packages/scaden/simulation/bulk_simulator.py", line 130, in simulate_dataset
ann_data.write(os.path.join(self.out_dir, dataset + ".h5ad"))
File "/home/ku_user/scadendl/lib64/python3.6/site-packages/anndata/_core/anndata.py", line 1911, in write_h5ad
as_dense=as_dense,
File "/home/ku_user/scadendl/lib64/python3.6/site-packages/anndata/_io/h5ad.py", line 111, in write_h5ad
write_attribute(f, "obs", adata.obs, dataset_kwargs=dataset_kwargs)
File "/usr/lib64/python3.6/functools.py", line 807, in wrapper
return dispatch(args[0].class)(*args, **kw)
File "/home/ku_user/scadendl/lib64/python3.6/site-packages/anndata/_io/h5ad.py", line 130, in write_attribute_h5ad
_write_method(type(value))(f, key, value, *args, **kwargs)
File "/home/ku_user/scadendl/lib64/python3.6/site-packages/anndata/_io/utils.py", line 216, in func_wrapper
) from e
TypeError: 0 of type <class 'int'> is an invalid key. Should be str.

Above error raised while writing key 'obs' of <class 'h5py._hl.files.File'> from /.

I then appended "celltype_" to the Leiden cluster numbers (eg: celltype_13) in the celltype file, and simulate runs correctly, generating the data.h5ad file. I still get the following warning message though.

/home/ku_user/scadendl/lib64/python3.6/site-packages/anndata/_core/anndata.py:120: ImplicitModificationWarning: Transforming to str index.
warnings.warn("Transforming to str index.", ImplicitModificationWarning)
... storing 'ds' as categorical

@nagendraKU nagendraKU changed the title Simulate errors out when celltype names are only numbers, requires text suffix to run correctly. Simulate errors out when celltype names are only numbers, requires text prefix to run correctly. Jul 2, 2021
@KevinMenden
Copy link
Owner

Hi @nagendraKU ,

thanks for reporting that. Yes using only numbers can cause problem - I will try to catch that and issue a better warning.
You can ignore the ImplicitModificationWarning though, that shouldn't cause a problem.

@KevinMenden KevinMenden added the enhancement New feature or request label Jul 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants