Storing additional metadata with indices? #26

bwpriest · 2022-12-22T20:30:20Z

Right now, we can use saltatlas::dhnsw::query_with_features() to compute a visitor lambda that is a function of the features associated with the nearest neighbors of the query point. These features are the same elements used to compute the metric determining nearness. However, it is desirable to also store and return other meta data associated with the neighbors.

For example, say that we want to perform nearest neighbors regression. We have several samples of a domain X with a range Y. We ingest samples of (X, Y) into our nearest neighbors index, using distance in X as our metric. Then, a visitor lambda for a query q would find the set of neighbor indices N, and obtain the associated data X_N, Y_N. We then compute our prediction of the value of the response in Y for q based upon a weighted average of the Y_N, where the weights come from the distances of q to the elements of X_N.

It is currently possible to perform a workflow like this by way of concatenating the samples of (X, Y) together into a single vector, and then writing your distance function in such a way that it only considers the features in X, and ignores the responses in Y. We can then use saltatlas::dhnsw::query_with_features() to get these whole concatenated vectors and perform our weighted sum using the components corresponding to Y.

However, this description requires a fair amount of work of the user, and requires that X and Y elements be represented by the same type. It would be preferable for satlatlas to instead encode the features and other data separately, and to support visitor query functions that can be functions of either or both.

The text was updated successfully, but these errors were encountered:

bwpriest · 2023-01-10T05:32:44Z

@steiltre any thoughts on this? Were you planning on having the user get indices to another distributed container (e.g. a ygm::container::map) to look up associated information about returned neighbors, or were you thinking about some version of what I described above? I'm happy to discuss at some point in the near future. Although I can use the workaround I described for now, I need this functionality for some applications.

steiltre · 2023-01-11T05:44:00Z

I was thinking of having the additional metadata stored alongside the indexed points so they can be easily gathered at the same time I'm gathering the nearest neighbor features in query_with_features(). I haven't thought through the details, but I was thinking of adding template parameters to determine whether metadata is needed and the types of metadata.

One question I had about your use case: are you expecting to know the feature vector and metadata at insertion time, and have the metadata remain constant?

bwpriest · 2023-01-11T05:47:12Z

For my current use case I am expecting to know features and metadata at insert time, and they will remain constant. Future versions of the use case may involve the features changing over time, although I think that will be a lot more complicated for saltatlas.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Storing additional metadata with indices? #26

Storing additional metadata with indices? #26

bwpriest commented Dec 22, 2022

bwpriest commented Jan 10, 2023

steiltre commented Jan 11, 2023

bwpriest commented Jan 11, 2023

Storing additional metadata with indices? #26

Storing additional metadata with indices? #26

Comments

bwpriest commented Dec 22, 2022

bwpriest commented Jan 10, 2023

steiltre commented Jan 11, 2023

bwpriest commented Jan 11, 2023