Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug in read when omero metadata is present #60

Closed
LucaMarconato opened this issue Dec 6, 2022 · 5 comments · Fixed by #127
Closed

bug in read when omero metadata is present #60

LucaMarconato opened this issue Dec 6, 2022 · 5 comments · Fixed by #127

Comments

@LucaMarconato
Copy link
Member

LucaMarconato commented Dec 6, 2022

Converting the morphology_mip of the xenium dataset (see spatialdata-sandbox) leads to the following .zattrs.

{
  "multiscales" : [ {
    "metadata" : {
      "method" : "loci.common.image.SimpleImageScaler",
      "version" : "Bio-Formats 6.10.1"
    },
    "axes" : [ {
      "name" : "t",
      "type" : "time"
    }, {
      "name" : "c",
      "type" : "channel"
    }, {
      "name" : "z",
      "type" : "space"
    }, {
      "unit" : "micrometer",
      "name" : "y",
      "type" : "space"
    }, {
      "unit" : "micrometer",
      "name" : "x",
      "type" : "space"
    } ],
    "name" : "Image0",
    "datasets" : [ {
      "path" : "0",
      "coordinateTransformations" : [ {
        "scale" : [ 1.0, 1.0, 1.0, 0.2125, 0.2125 ],
        "type" : "scale"
      } ]
    }, {
      "path" : "1",
      "coordinateTransformations" : [ {
        "scale" : [ 1.0, 1.0, 1.0, 0.425, 0.425 ],
        "type" : "scale"
      } ]
    }, {
      "path" : "2",
      "coordinateTransformations" : [ {
        "scale" : [ 1.0, 1.0, 1.0, 0.85, 0.85 ],
        "type" : "scale"
      } ]
    }, {
      "path" : "3",
      "coordinateTransformations" : [ {
        "scale" : [ 1.0, 1.0, 1.0, 1.7, 1.7 ],
        "type" : "scale"
      } ]
    }, {
      "path" : "4",
      "coordinateTransformations" : [ {
        "scale" : [ 1.0, 1.0, 1.0, 3.4, 3.4 ],
        "type" : "scale"
      } ]
    }, {
      "path" : "5",
      "coordinateTransformations" : [ {
        "scale" : [ 1.0, 1.0, 1.0, 6.8, 6.8 ],
        "type" : "scale"
      } ]
    }, {
      "path" : "6",
      "coordinateTransformations" : [ {
        "scale" : [ 1.0, 1.0, 1.0, 13.6, 13.6 ],
        "type" : "scale"
      } ]
    }, {
      "path" : "7",
      "coordinateTransformations" : [ {
        "scale" : [ 1.0, 1.0, 1.0, 27.2, 27.2 ],
        "type" : "scale"
      } ]
    }, {
      "path" : "8",
      "coordinateTransformations" : [ {
        "scale" : [ 1.0, 1.0, 1.0, 54.4, 54.4 ],
        "type" : "scale"
      } ]
    } ],
    "version" : "0.4"
  } ],
  "omero" : {
    "channels" : [ {
      "color" : "FFFFFF",
      "coefficient" : 1,
      "active" : true,
      "label" : "DAPI",
      "window" : {
        "min" : 0.0,
        "max" : 5658.0,
        "start" : 0.0,
        "end" : 5658.0
      },
      "family" : "linear",
      "inverted" : false
    } ],
    "rdefs" : {
      "defaultT" : 0,
      "model" : "greyscale",
      "defaultZ" : 0
    }
  }
}

The function _read_multiscale() from spatialdata/_io/read..py has the following values for node.metadata

In[3]: node.metadata
Out[3]: 
{'axes': [{'name': 't', 'type': 'time'},
  {'name': 'c', 'type': 'channel'},
  {'name': 'z', 'type': 'space'},
  {'unit': 'micrometer', 'name': 'y', 'type': 'space'},
  {'unit': 'micrometer', 'name': 'x', 'type': 'space'}],
 'name': ['DAPI'],
 'coordinateTransformations': [[{'scale': [1.0, 1.0, 1.0, 0.2125, 0.2125],
    'type': 'scale'}],
  [{'scale': [1.0, 1.0, 1.0, 0.425, 0.425], 'type': 'scale'}],
  [{'scale': [1.0, 1.0, 1.0, 0.85, 0.85], 'type': 'scale'}],
  [{'scale': [1.0, 1.0, 1.0, 1.7, 1.7], 'type': 'scale'}],
  [{'scale': [1.0, 1.0, 1.0, 3.4, 3.4], 'type': 'scale'}],
  [{'scale': [1.0, 1.0, 1.0, 6.8, 6.8], 'type': 'scale'}],
  [{'scale': [1.0, 1.0, 1.0, 13.6, 13.6], 'type': 'scale'}],
  [{'scale': [1.0, 1.0, 1.0, 27.2, 27.2], 'type': 'scale'}],
  [{'scale': [1.0, 1.0, 1.0, 54.4, 54.4], 'type': 'scale'}]],
 'visible': [True],
 'contrast_limits': [[0.0, 5658.0]],
 'colormap': [[[0, 0, 0], [1, 1, 1]]]}

Note how the name is ['DAPI'] instead of Image0. Having a list instead of a string triggers a bug downstream (see traceback):

/Users/macbook/miniconda3/envs/ome/lib/python3.10/site-packages/anndata/experimental/pytorch/_annloader.py:18: UserWarning: Сould not load pytorch.
  warnings.warn("Сould not load pytorch.")
Traceback (most recent call last):
  File "/Users/macbook/Library/Application Support/JetBrains/Toolbox/apps/PyCharm-P/ch-0/222.4459.20/PyCharm.app/Contents/plugins/python/helpers/pydev/pydevd.py", line 1496, in _exec
    pydev_imports.execfile(file, globals, locals)  # execute the script
  File "/Users/macbook/Library/Application Support/JetBrains/Toolbox/apps/PyCharm-P/ch-0/222.4459.20/PyCharm.app/Contents/plugins/python/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "/Users/macbook/embl/projects/basel/spatialdata-io/src/spatialdata_io/readers/read_xenium.py", line 189, in <module>
    sdata = SpatialData.read("./data.zarr/")
  File "/Users/macbook/embl/projects/basel/spatialdata/spatialdata/_core/_spatialdata.py", line 242, in read
    sdata = read_zarr(file_path)
  File "/Users/macbook/embl/projects/basel/spatialdata/spatialdata/_io/read.py", line 82, in read_zarr
    images[k] = _read_multiscale(node, fmt)
  File "/Users/macbook/embl/projects/basel/spatialdata/spatialdata/_io/read.py", line 39, in _read_multiscale
    return MultiscaleSpatialImage.from_dict(multiscale_image)
  File "/Users/macbook/miniconda3/envs/ome/lib/python3.10/site-packages/datatree/datatree.py", line 793, in from_dict
    new_node = cls(name=node_name, data=data)
  File "/Users/macbook/miniconda3/envs/ome/lib/python3.10/site-packages/multiscale_spatial_image/multiscale_spatial_image.py", line 32, in __init__
    super().__init__(data=data, name=name, parent=parent, children=children)
  File "/Users/macbook/miniconda3/envs/ome/lib/python3.10/site-packages/datatree/datatree.py", line 326, in __init__
    ds = _coerce_to_dataset(data)
  File "/Users/macbook/miniconda3/envs/ome/lib/python3.10/site-packages/datatree/datatree.py", line 68, in _coerce_to_dataset
    ds = data.to_dataset()
  File "/Users/macbook/miniconda3/envs/ome/lib/python3.10/site-packages/xarray/core/dataarray.py", line 620, in to_dataset
    result = self._to_dataset_whole(name)
  File "/Users/macbook/miniconda3/envs/ome/lib/python3.10/site-packages/xarray/core/dataarray.py", line 568, in _to_dataset_whole
    if name in self.coords:
  File "/Users/macbook/miniconda3/envs/ome/lib/python3.10/site-packages/xarray/core/coordinates.py", line 74, in __contains__
    return key in self._names
TypeError: unhashable type: 'list'

We need to check more datasets (with more than one channel) to see where exactly the bug lies (could also be in multiscale-spatial-image). For the moment I am using a workaround to deal with this edge case.

Also 'visible', 'contrast_limits' and 'colormap' are wrapped in lists, so probably this bug could manifest also when dealing with them.

@giovp
Copy link
Member

giovp commented Feb 20, 2023

hitting this rn, here's comments + questions:

  • The reason for support for omero metadata is mostly due to saving channel names from xarray and allowing for non-numerical channel names. This is easy and doable.
  • If there is other info from omero metadata, do we want to load it in memory at all? I would be against it atm because we don't really have any use of it (except for napari, that we could make use of it, opened an issue here omero metadata info used in napari ome/napari-ome-zarr#79 )

I'd be curious to hear more thoughts from @scverse/spatialdata and @melonora on this.

I think that first I'd just add support for labels <> xarray channels coordinates but maybe there is something else that should be prioritised.

@melonora
Copy link
Collaborator

melonora commented Feb 20, 2023

I agree it is mostly important for viewing. Particular use cases for this in general in my opinion:

  1. When dealing with highly multiplexed images, not having to set all the viewing parameters every time.
  2. Reproducibility of annotation views

However, I feel like these are more important when there is a proper multicanvas view in Napari. For multicanvas view we want to support view configs. So perhaps for now, it doesn't need to have a high priority. Unless it is easy to implement then I would say yes, it would be nice to have this in memory for the reasons mentioned above.

Just to be certain, are were here speaking of omero metadata according to the transitional omero metadata spec in ome-ngff?

Edit
The reason why it is perhaps more important when having multicanvas view is because this would allow for showing individual channels right next to one another with linked cameras. Here the view is not dependent on which channels you try to combine in a composite image. With a composite image, the viewing parameters could be highly dependent on the channels that you show at a particular moment.

@giovp
Copy link
Member

giovp commented Feb 20, 2023

The function _read_multiscale() from spatialdata/_io/read..py has the following values for node.metadata

I also just tried and the error is different, now it throw error here:

encoded_ngff_transformations = multiscales[0]["coordinateTransformations"]

because the xenium example doesn't have coordinateTransformation in top level but only in scales. This is because I believe specs is flexible about this. I think this raises a larger issue on how much compatibility we want to have between out readers and the one found in the wild that adhere to v0.4. I think we want to be perfectible compatible but I would wait for the coordinate transforms to be merged in v0.5 to really worry about that.

@giovp
Copy link
Member

giovp commented Feb 20, 2023

I will close this issue and open 2 new ones regarding some compatibility and omero metadata.

@LucaMarconato
Copy link
Member Author

LucaMarconato commented Feb 21, 2023

A few more details on this.

  1. The .zattrs that I show above has been generated with a first draft of the xenium reader in spaitaldata-io and it used bioformats2raw. Now that we parse lazily with dask-image into a MultiscaleSpatialImage object and we use ome-zarr-py for writing, we don't have the .zattrs like that anymore.

  2. I agree on waiting for the v0.5 transformations to be more defined before polishing the io for transformations (I have created this issue on that Conversion between "new transforms" and "ngff transforms" #114). Meanwhile the code writes images to Zarr with ome-zarr-py passing an empty transformation list like this (from write.py)

        write_multi_scale_ngff(
            pyramid=data,
            group=_get_group_for_writing_data(),
            fmt=fmt,
            axes=parsed_axes,
            coordinate_transformations=None,
            storage_options=storage_options,
            **metadata,
        )

which creates the transformations for multiscale inside datasets and puts axes in the top level

We then add the v0.5 transformations inside "coordinateTransformation" with the function overwrite_coordinate_transformations_raster().

This mix of v0.4 and v0.5 transformations is because ome-zarr-py still requires the v0.4 transformations and in this way we can still read the data (I think the io would break if axes are not present in the top level) and the multiscale transformations. When ome-zarr-py will be updated we can remove the axes from the top levels and we can adjust the multiscale transformations to v0.5, which basically amounts to change the names of the various scales from "0", "1", ... to "/0", "/1", ...

  1. I agree with Giovanni to start parsing only the channel names from the omero metadata and to store them as xarray coordinates (and writing them back to the omero metadata instead that in the xarray object). We can always store the omero metadata redundantly in-memory in a redundant way if @melonora needs it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants