Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Identical values reported for "var.nnz" and "var.n_measured_obs" for different datasets retrieved via get_anndata() #1281

Open
khughitt opened this issue Sep 16, 2024 · 1 comment
Assignees
Labels
bug Something isn't working

Comments

@khughitt
Copy link

Describe the bug

The nnz and n_measured_obs fields report the same (global?) values for adata.var.nnz and adata.var.n_measured_obs regardless of the dataset queried.

Tested for different dataset ids, but presumably this applies to all queries and not just those pulling a single dataset.

To Reproduce

import cellxgene_census

d1 = "00ff600e-6e2e-4d76-846f-0eec4f0ae417"
d2 = "0c9a8cfb-6649-4d52-b418-6d8e56bd7afe"

with cellxgene_census.open_soma(census_version="2024-07-01") as census:
    ad1 = cellxgene_census.get_anndata(
        census,
        organism="Homo sapiens",
        obs_value_filter=f"dataset_id == '{d1}'"
    )

    ad2 = cellxgene_census.get_anndata(
        census,
        organism="Homo sapiens",
        obs_value_filter=f"dataset_id == '{d2}'"
    )

    # True
    (ad1.var.nnz == ad2.var.nnz).all()

    # True
    (ad1.var.n_measured_obs == ad2.var.n_measured_obs).all()

Expected behavior

Query/dataset-specific values should be returned.

Environment

Arch Linux (64-bit)

Package                              Version
------------------------------------ --------------
aiobotocore                          2.14.0
aiohappyeyeballs                     2.4.0
aiohttp                              3.10.5
aioitertools                         0.12.0
aiosignal                            1.3.1
amply                                0.1.6
anndata                              0.10.9
anyio                                4.4.0
appdirs                              1.4.4
argon2-cffi                          23.1.0
argon2-cffi-bindings                 21.2.0
argparse-dataclass                   2.0.0
array_api_compat                     1.8
arrow                                1.3.0
asttokens                            2.4.1
async-lru                            2.0.4
attmap                               0.13.2
attrs                                24.2.0
Babel                                2.14.0
beautifulsoup4                       4.12.3
bleach                               6.1.0
bokeh                                3.5.2
botocore                             1.35.7
Brotli                               1.1.0
cached-property                      1.5.2
cellxgene-census                     1.15.0
certifi                              2024.8.30
cffi                                 1.17.1
charset-normalizer                   3.3.2
click                                8.1.7
cloudpickle                          3.0.0
colorama                             0.4.6
colorcet                             3.1.0
comm                                 0.2.2
conda-inject                         1.3.2
ConfigArgParse                       1.7
connection-pool                      0.0.3
contourpy                            1.3.0
cycler                               0.12.1
cytoolz                              0.12.3
dask                                 2024.8.2
dask-expr                            1.1.13
datashader                           0.16.3
datrie                               0.8.2
debugpy                              1.8.5
decorator                            5.1.1
defusedxml                           0.7.1
distributed                          2024.8.2
docutils                             0.21.2
dpath                                2.2.0
eido                                 0.2.2
entrypoints                          0.4
exceptiongroup                       1.2.2
executing                            2.1.0
fastjsonschema                       2.20.0
fonttools                            4.53.1
fqdn                                 1.5.1
frozenlist                           1.4.1
fsspec                               2024.9.0
get-annotations                      0.1.2
gitdb                                4.0.11
GitPython                            3.1.43
h11                                  0.14.0
h2                                   4.1.0
h5py                                 3.11.0
hdf5plugin                           5.0.0
hpack                                4.0.0
httpcore                             1.0.5
httpx                                0.27.2
humanfriendly                        10.0
hyperframe                           6.0.1
idna                                 3.8
igraph                               0.11.6
imagecodecs                          2024.6.1
imageio                              2.35.1
immutables                           0.20
importlib_metadata                   8.4.0
importlib_resources                  6.4.4
iniconfig                            2.0.0
ipykernel                            6.29.5
ipython                              8.27.0
ipywidgets                           8.1.5
isoduration                          20.11.0
jedi                                 0.19.1
Jinja2                               3.1.4
jmespath                             1.0.1
joblib                               1.4.2
json5                                0.9.25
jsonpointer                          3.0.0
jsonschema                           4.23.0
jsonschema-specifications            2023.12.1
jupyter_client                       8.6.2
jupyter_core                         5.7.2
jupyter-events                       0.10.0
jupyter-lsp                          2.2.5
jupyter_server                       2.14.2
jupyter_server_terminals             0.5.3
jupyterlab                           4.2.5
jupyterlab_pygments                  0.3.0
jupyterlab_server                    2.27.3
jupyterlab_widgets                   3.0.13
kiwisolver                           1.4.7
lazy_loader                          0.4
legacy-api-wrap                      1.4
leidenalg                            0.10.2
llvmlite                             0.43.0
locket                               1.0.0
logmuse                              0.2.6
lz4                                  4.3.3
markdown-it-py                       3.0.0
MarkupSafe                           2.1.5
matplotlib                           3.9.2
matplotlib-inline                    0.1.7
mdurl                                0.1.2
mistune                              3.0.2
msgpack                              1.0.8
multidict                            6.0.5
multipledispatch                     0.6.0
munkres                              1.1.4
natsort                              8.4.0
nbclient                             0.10.0
nbconvert                            7.16.4
nbformat                             5.10.4
nest_asyncio                         1.6.0
networkx                             3.3
notebook_shim                        0.2.4
numba                                0.60.0
numpy                                1.26.4
overrides                            7.7.0
packaging                            24.1
pandas                               2.2.2
pandocfilters                        1.5.0
param                                2.1.1
parso                                0.8.4
partd                                1.4.2
patsy                                0.5.6
peppy                                0.40.5
pexpect                              4.9.0
pickleshare                          0.7.5
pillow                               10.4.0
pip                                  24.2
pkgutil_resolve_name                 1.3.10
plac                                 1.4.3
platformdirs                         4.2.2
pluggy                               1.5.0
prometheus_client                    0.20.0
prompt_toolkit                       3.0.47
psutil                               6.0.0
ptyprocess                           0.7.0
PuLP                                 2.8.0
pure_eval                            0.2.3
pyarrow                              17.0.0
pyarrow-hotfix                       0.6
pycparser                            2.22
pyct                                 0.5.0
Pygments                             2.18.0
pynndescent                          0.5.13
pyparsing                            3.1.4
PySocks                              1.7.1
pytest                               8.3.2
python-dateutil                      2.9.0
python-json-logger                   2.0.7
pytz                                 2024.1
PyWavelets                           1.7.0
PyYAML                               6.0.2
pyzmq                                26.2.0
referencing                          0.35.1
requests                             2.32.3
reretry                              0.11.8
rfc3339-validator                    0.1.4
rfc3986-validator                    0.1.1
rich                                 13.7.1
rpds-py                              0.20.0
s3fs                                 2024.9.0
scanpy                               1.10.2
scikit-image                         0.24.0
scikit-learn                         1.5.1
scikit-misc                          0.1.4
scipy                                1.14.1
seaborn                              0.13.2
Send2Trash                           1.8.3
session-info                         1.0.0
setuptools                           73.0.1
six                                  1.16.0
slack_sdk                            3.32.0
smart_open                           7.0.4
smmap                                5.0.0
snakemake                            8.20.1
snakemake-interface-common           1.17.3
snakemake-interface-executor-plugins 9.2.0
snakemake-interface-report-plugins   1.0.0
snakemake-interface-storage-plugins  3.3.0
sniffio                              1.3.1
somacore                             1.0.11
sortedcontainers                     2.4.0
soupsieve                            2.5
stack-data                           0.6.2
statsmodels                          0.14.2
stdlib-list                          0.10.0
tabulate                             0.9.0
tblib                                3.0.0
terminado                            0.18.1
texttable                            1.7.0
threadpoolctl                        3.5.0
throttler                            1.2.2
tifffile                             2024.8.30
tiledb                               0.29.1
tiledbsoma                           1.11.4
tinycss2                             1.3.0
tomli                                2.0.1
toolz                                0.12.1
toposort                             1.10
tornado                              6.4.1
tqdm                                 4.66.5
traitlets                            5.14.3
types-python-dateutil                2.9.0.20240906
typing_extensions                    4.12.2
typing-utils                         0.1.0
tzdata                               2024.1
ubiquerg                             0.8.0
umap-learn                           0.5.6
uri-template                         1.3.0
urllib3                              2.2.2
veracitools                          0.1.3
wcwidth                              0.2.13
webcolors                            24.8.0
webencodings                         0.5.1
websocket-client                     1.8.0
wheel                                0.44.0
widgetsnbextension                   4.0.13
wrapt                                1.16.0
xarray                               2024.7.0
xyzservices                          2024.9.0
yarl                                 1.10.0
yte                                  1.5.4
zict                                 3.0.0
zipp                                 3.20.1
zstandard                            0.23.0

Additional context

I checked the docs just to make sure that this is not expected behavior and it also suggests that the expected behavior is for the values to be relative to the (dataset) queried:

n_measured_obs — the “measured” cells for this gene, effectively the number of cells for which this gene was measured in their respective dataset.

source: https://chanzuckerberg.github.io/cellxgene-census/articles/2023/20231012-normalized_layer_precalc_stats.html

--

Thanks for all of your work on this!

It's appreciated.

@khughitt khughitt added the bug Something isn't working label Sep 16, 2024
@ivirshup
Copy link
Collaborator

Thanks for the bug report @khughitt!

We are tracking this, and it looks related to #1284. But I think your interpretation is correct, just checking in with the schema owners on internal channels to make sure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants