Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

total coordinates #199

Open
d-v-b opened this issue Jun 21, 2023 · 14 comments
Open

total coordinates #199

d-v-b opened this issue Jun 21, 2023 · 14 comments

Comments

@d-v-b
Copy link
Contributor

d-v-b commented Jun 21, 2023

problem statement

Although we store images as N-dimensional arrays, it's important to remember that images are made of individual measurements / samples, and that in many imaging modalities each sample has its own coordinate metadata. I'm going to use the term "total coordinates" to describe coordinate metadata for each sample of an image; this stands in contrast to coordinate metadata that assigns a single coordinate value to an entire image. OME-NGFF does not support total coordinates. Failing to do so means we are under-describing our images, which impairs the ability to properly analyze, share, and reproduce bioimaging datasets.

examples

I will illustrate this with a few examples:

  • Point-scanning microscopes acquire individual samples at different times, and at different axial positions (unless the field curvature is null). Properly mapping samples into physical coordinates requires encoding the acquisition time, and the axial position, of each individual sample.
  • sCMOS cameras often run in rolling-shutter mode, in which case rows of samples are acquired sequentially to form a single image. As with the above example, mapping samples into physical coordinates requires encoding the acquisition time of each individual sample.
  • Fast volumetric microscopy often requires a continuously moving objective lens, which means that each 2D plane is acquired across a varying axial interval (i.e., the first row and the last row of pixels are at different axial positions in any given image). Similar to the above examples, we need to encode the z position of each individual sample in order to relate our data to physical coordinates.

Note that these are not mutually exclusive scenarios -- a point-scanning system with field curvature my use a fast z-scanner for rapid 3D imaging. If speed demands it, we can make the point-scanning bidirectional, and the z-scanning as well, which means that temporally adjacent samples / planes are no longer spatially adjacent. Encoding "total coordinates" for this image would require storing 3 spatial coordinates and one temporal coordinate for each individual sample.

matrix transformations don't work

I note that matrix transformations completely fail here, because

  • matrix transformations would require all data to have bloated dimensions (a 2D image has to be promoted to 4D for a time + z axis embedding)
  • matrix transformations can't handle irregular temporal or spatial sampling, which are very real properties of image acquisition
  • Matrix transformations are basically functions, but for things like spatially varying field curvature or spatially varying illumination intensity, there is no guarantee that a functional form of the coordinate can be found. We will be stuck with data in these cases.

Unsatisfying alternatives include raster-scanning-specific metadata, field-curvature-specific metadata, etc, but this seems like a pretty ugly way to solve a general problem, and it's not easily extensible.

total coordinates in ome-ngff

Is there demand for storing these kind of coordinates in ome-ngff? If not, we can close this issue, with the caveat that we are under-representing our imaging data.

It will surprise no one that my proposed solution is to use the CF dimensions + coordinates model.

This has come up before:

@bogovicj
Copy link
Contributor

bogovicj commented Jun 21, 2023

Depending on how general what you have in mind is, I think much of this can be covered with the
coordinates type of transformation and that I expect will be in v0.5 (and is close to the CF coordinates model).

It allows one to explicitly (with an array) specify the coordinates at every sample of some other array
(or some general coordinate system, but let's ignore that for now).

A pretty general example

Say we have a 2D array (size N x M) and want to encode 4 coordinate measurements at every sample:

  • two spatial positions (x,y)
  • the exact time that the measurement was taken for that array sample (t)
  • the pressure in the room at that time (p), for some reason

We'd store an array of size 4 x N x M where that first dimension holds the [p,t,y,x] coordinates.

Limitations

All coordinates need to be of the same type (usually floating point).

I've spec'd that all coordinates need to be stored in the same array, and I believe I recommend
that the coordinate dimension be the fastest varying, since most use cases that I'm aware of
benefit from that access pattern.

For example, We could not assign a string valued coordinate to one dimension, and that might be nice if we had a channel
dimension, say.

One way we might address that limitation

(if we need to)

A way to generalize my spec in a nice way, could be to have a the array that stores coordinates be a
vector type (we've talked about this before). So for the above example, the coordinate array might instead by N x M that
stores 4-vectors. If we allow those vectors to be of mixed type, then it would be fine to store something like
[0.5, 1.1, "DAPI", 2229] for a coordinate, and that could be nice.

If there are use cases for that, I'd be happy to talk through something like this after v0.5 is done.

@d-v-b
Copy link
Contributor Author

d-v-b commented Jun 21, 2023

Thanks john, that's super helpful! I missed / forgot about that part of your PR. I will continue the conversation about that particular proposal over in #138

I'll keep this issue open for general discussion of the idea.

@clbarnes
Copy link

clbarnes commented Jul 5, 2023

Are total coordinates even compatible with chunked storage? Presumably you have to hope that the coordinates are monotonically increasing in each dimension, but even then, you'd potentially have to go fishing to find the chunk any given coordinate is in, and probably store some information about average coordinate spacing to even start looking in the right place.

Also, is this very different to #178 , in the limit?

@d-v-b
Copy link
Contributor Author

d-v-b commented Jul 8, 2023

Are total coordinates even compatible with chunked storage? Presumably you have to hope that the coordinates are monotonically increasing in each dimension, but even then, you'd potentially have to go fishing to find the chunk any given coordinate is in, and probably store some information about average coordinate spacing to even start looking in the right place.

I don't see any intrinsic conflict between total coordinates and chunked storage. Xarray basically implements what I'm describing, and they have no issue using dask arrays or zarr arrays to store coordinate variables.

Also, is this very different to #178 , in the limit?

Yes, it is very different. I'm proposing associating each sample of an image with additional data. So for a given image, you can't have more elements in a coordinate variable than samples in your image. But the points described in #178 are not associated with individual samples of an image, they are vectors in continuous space, and you could (in principle) have many more points in an image than you have discrete samples.

@clbarnes
Copy link

clbarnes commented Jul 8, 2023

Don't xarray's coordinates still describe a grid (there may be another feature I'm not aware of)? Even if the spacing in either axis isn't regular, it's still made up of squares. If you had total coordinates (which I'm interpreting as "points aren't necessarily on a grid, although possibly not far off"), how would you decide which chunk to look for a particular point in, when indexing with world coordinates? Around the chunk edges you'd get some bleeding across, where a point voxel-ly in one chunk is world-ly in the next chunk over. Near the corners of a 2D chunk you might have to look in 4 chunks, 8 chunks for 3D, etc.. That feels more like a spatial lookup problem, like that in the other issue.

@d-v-b
Copy link
Contributor Author

d-v-b commented Jul 8, 2023

Maybe it's easier to express with an example, which simulates a small 2D image from a raster-scanning imaging system (e.g., a confocal microscope, or an SEM):

import numpy as np
from xarray import DataArray

shape = (9,9)
sampling_interval = .1

# cool data that came from the instrument as 2D image
data = np.random.randint(0, 255, shape)

# an array of values simulating spherical distortion in z
z = np.hypot(*np.meshgrid(np.arange(-4,5), 
                               np.arange(-4,5)))

# an array of timepoints that simulates bidirectional scan timing
t = (np.ones(np.prod(shape)) * sampling_interval).cumsum().reshape(shape)
for y in range(shape[0]):
    if y % 2 == 1:
        t[y] = t[y][::-1]

xdata = DataArray(data, 
          dims=('x','y'),
          coords={'x': ('x', np.arange(shape[0]) * .2, {'units': 'nm'}),
                  'y': ('y', np.arange(shape[1]) * .1, {'units': 'nm'}),
                  'z': (('y', 'x'), z, {'units': 'nm'}),
                  't': (('y', 'x'), t, {'units': 'ms'})},
         attrs = {'foo': 'bar'})

The data itself is 2D, because that's what comes out of the instrument, which is why we have 2 dims. but! each point of the data came from an event in real life (i.e., 4D space). To describe that process more completely we need some way to associate z values, and t values, to each one of our samples. We accomplish this by defining an array of z values, and an array of t values, and adding them in as multidimensional coordinates.

note that this is not a contrived example -- any sufficiently fast raster-scanning scope is described by this data structure. we just don't typically represent the data from these scopes very precisely.

As far as I can tell, the usefulness of this representation is orthogonal to chunking, or using a coordinate as an index (which of course you could do here, e.g. by using regular numpy operations on the coordinate arrays).

@d-v-b
Copy link
Contributor Author

d-v-b commented Jul 8, 2023

Maybe coining the term "total coordinates" was confusing, and I should have said "CF conventions / xarray data model", but I worry that this turns people off of the idea 😅

@bogovicj
Copy link
Contributor

bogovicj commented Jul 10, 2023

@clbarnes

Don't xarray's coordinates still describe a grid

This is my understanding also, and would like to know if I'm missing something.

@d-v-b

I should have said "CF conventions / xarray data model"

My understanding of this proposal is that it is more general than xarray's model (though iirc CF has something more general). Where xarray is not as general as the coordinates transform I mention above is in the way @clbarnes described - it describes an axis-aligned grid. Every discrete slice of a multi-dim array must have the same coordinate value. In your example, every sample in data[0,:] would have x=0 right?

xarray's coordionate model is supported in my PR using a combination of coordinates and by_dimension.

I'm sure its possible to flatten data and have a coordinate for every sample, but I'm sure this is a typical or intended use.

@d-v-b
Copy link
Contributor Author

d-v-b commented Jul 10, 2023

@clbarnes

Don't xarray's coordinates still describe a grid

I'm not sure if I understand this question. Maybe the term "grid" is overloaded here, because the image data itself is a grid by definition, and so any array with shape isomorphic to the image data can be said to "describe a grid". I chose the word "isomorphic" here to include the case of 1D coordinate variables like x and y in my example that are effectively a compressed representation of an ND array.

But even if the image data is stored as a grid, that doesn't mean it was actually acquired from a perfect grid. In fact, I think image data is rarely, if ever, acquired from a truly perfect grid 4D. There is no constraint in xarray that requires that the coordinate data represented by a coordinate variable represent a perfect grid.

@bogovicj

Every discrete slice of a multi-dim array must have the same coordinate value. In your example, every sample in data[0,:] would have x=0 right?

Only because x was defined as a 1D coordinate variable. 2D coordinate variables like t and z do not have this property. Both t and z have a coordinate value for each sample in the data. Note that this is not a contrived scenario -- field curvature and sample acquisition times are real facts about fast raster scanning microscopy.

xarray's coordionate model is supported in my PR using a combination of coordinates and by_dimension.

Can you show an example of how would you express the t and z coordinate variables from my example with coordinates and by_dimension?

@clbarnes
Copy link

The angle I'm coming from is this:

  • image samples are stored in an axis-aligned grid which is implicitly isotropic
  • voxel resolution metadata can make those cubes anisotropic, but still of consistent size
  • coordinate arrays (like xarray) can make the cubes inconsistently sized, but still cubes
  • non-orthogonal coordinate arrays could feasibly make the cubes into hyper-trapezoids
    • affine transformations too, I guess? With some added fun twists

In all of these cases, you can go from world space to coordinate to voxel coordinate and back easily; if you have a world coordinate you don't have to go fishing to find the voxel coordinate and therefore the chunk coordinate (except in the coordinate array, but that's low-dimensionality and so not much of a problem).

Point clouds are stored in a tree of whatever kind; looking up the data for given world coordinate is considerably harder, and storage is much less efficient.

If I'm understanding you correctly, you're looking for something which sits in between those two extremes: something which is stored like an array, but can be loaded more like a point cloud (where the intensity of a voxel can be looked up by the somewhat-arbitrary coordinates associated with it). This feels like a worst of both worlds situation - you may be able to guess a region of array that any given point might lie in, given its world coordinates, but you don't get the spatial tree's lookup efficiency, or the array's predictability of where that point might be. If a voxel is on the corner of one chunk, the coordinates associated with it might mean that its world coordinates put it closer to the samples in the next chunk(s) over. So which chunk do you load in order to find the intensity, or do you just fish around until you find it?

It totally makes sense to store this data in order to feed into a registration step (which John is much more qualified to speak on, of course!) so that you can then get it into a more access-efficient format (e.g. an axis-aligned grid). But I'm struggling to see an efficient way to use it for lookups more generally.

@d-v-b
Copy link
Contributor Author

d-v-b commented Jul 11, 2023

If I'm understanding you correctly, you're looking for something which sits in between those two extremes: something which is stored like an array, but can be loaded more like a point cloud (where the intensity of a voxel can be looked up by the somewhat-arbitrary coordinates associated with it). This feels like a worst of both worlds situation - you may be able to guess a region of array that any given point might lie in, given its world coordinates, but you don't get the spatial tree's lookup efficiency, or the array's predictability of where that point might be. If a voxel is on the corner of one chunk, the coordinates associated with it might mean that its world coordinates put it closer to the samples in the next chunk(s) over. So which chunk do you load in order to find the intensity, or do you just fish around until you find it?

Look at the example I gave here. That's what I'm looking for. There's nothing in that example that resembles a point cloud. There's a 2D image, with dimensions named y and x; dimension y is associated with a 1D array named y, and dimension x is associated with a 1D array named x. Both of these arrays contain regularly spaced values, but they didn't have to. Alone, these two coordinate variables are sufficient to associate each element of the 2D image with a pair of values that define a position in a 2D space; but as per the description of the example (raster scanning imaging data from real life), we need to specify the z and t values for each sample. This is accomplished via z and t arrays, which are both 2D, because the z and t values vary along the two dimensions of the data. This is a compact description of a single image produced by a raster-scanning microscope. There's nothing to do with point clouds or chunking in this example, so I'm not sure where those come in.

If I'm understanding you correctly, you're looking for something which sits in between those two extremes: something which is stored like an array, but can be loaded more like a point cloud (where the intensity of a voxel can be looked up by the somewhat-arbitrary coordinates associated with it).

I don't think this portrayal is correct, and furthermore (going back to the xarray code example), the z and t coordinates of individual samples from a microscopy image are anything but arbitrary!

I ask you to consider the following question: given that raster-scanning images are 2D, but each sample of the image is acquired sequentially (i.e., at different times), and at different axial positions (because field curvature), how would you represent the image data + x + y +z + t coordinates as a data structure? That's the central problem here. At this level of detail, it has nothing to do with point clouds or chunks or even storage.

@bogovicj
Copy link
Contributor

Can you show an example of how would you express the t and z coordinate variables from my example with coordinates and by_dimension?

Happy to. We'll need four zarr arrays:

  • xCoords containing the data np.arange(shape[0]) * .2
  • yCoords containing the data np.arange(shape[1]) * .1
  • zCoords containing the data in your z variable
  • tCoords containing the data in your t variable

First we need to

define coordinate systems
{
    "coordinateSystems": [
        { 
            "name": "/my/data", 
            "axes": [ 
                {"name": "x", "type": "array"},
                {"name": "y", "type": "array"}
            ]
        },
    { 
            "name": "out", 
            "axes": [ 
                {"name": "x", "type": "space", "unit": "nm"}, 
                {"name": "y", "type": "space", "unit": "nm"}, 
                {"name": "z", "type": "space", "unit": "nm"}, 
                {"name": "t", "type": "time",  "unit": "ms"}
            ]
        }
    ]
}

Using only coordinates and byDimension, transform the arrays coordinates to the 4d output coordinates:

{
    "coordinateTransformations": [ 
        {
            "type": "byDimension",
            "input": "/my/data",
            "output": "out",
            "transformations" : [
                { "type": "coordinates", "path": "xCoords", "input": ["x"], "output": ["x"]},
                { "type": "coordinates", "path": "yCoords", "input": ["y"], "output": ["y"]},
                { "type": "coordinates", "path": "zCoords", "input": ["y","x"], "output": ["z"]},
                { "type": "coordinates", "path": "tCoords", "input": ["y","x"], "output": ["t"]},
            ]
        }
    ]
}

As you'd expect, it mirrors pretty closely the xarray code. Because, for example, the line
{ "type": "coordinates", "path": "zCoords", "input": ["y","x"], "output": ["z"]},
communicates literally the same information as
'z': (('y', 'x'), z, {'units': 'nm'}),

For this example, I'd prefer using a scale transformation for x and y like this:

{
    "coordinateTransformations": [ 
        {
            "type": "byDimension",
            "input": "/my/data",
            "output": "out",
            "transformations" : [
                { "type": "scale", "scale": [0.2, 0.1], "input": ["x", "y"], "output": ["x", "y"]}, 
                { "type": "coordinates", "path": "zCoords", "input": ["y","x"], "output": ["z"]},
                { "type": "coordinates", "path": "tCoords", "input": ["y","x"], "output": ["t"]},
            ]
        }
    ]
}

@d-v-b
Copy link
Contributor Author

d-v-b commented Jul 11, 2023

Thanks @bogovicj, that example is super helpful!

@thewtex how does this look to you re: xarray compatibility?

@clbarnes
Copy link

I think we've been talking at cross purposes a bit - I do understand the information you're trying to express and that is probably the right way to do it (a separate 2(+channel)D array). I guess I was getting ahead of myself and thinking about how such data would be used but that is out of scope.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants