Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Storing additional metadata with indices? #26

Open
bwpriest opened this issue Dec 22, 2022 · 3 comments
Open

Storing additional metadata with indices? #26

bwpriest opened this issue Dec 22, 2022 · 3 comments

Comments

@bwpriest
Copy link
Member

Right now, we can use saltatlas::dhnsw::query_with_features() to compute a visitor lambda that is a function of the features associated with the nearest neighbors of the query point. These features are the same elements used to compute the metric determining nearness. However, it is desirable to also store and return other meta data associated with the neighbors.

For example, say that we want to perform nearest neighbors regression. We have several samples of a domain X with a range Y. We ingest samples of (X, Y) into our nearest neighbors index, using distance in X as our metric. Then, a visitor lambda for a query q would find the set of neighbor indices N, and obtain the associated data X_N, Y_N. We then compute our prediction of the value of the response in Y for q based upon a weighted average of the Y_N, where the weights come from the distances of q to the elements of X_N.

It is currently possible to perform a workflow like this by way of concatenating the samples of (X, Y) together into a single vector, and then writing your distance function in such a way that it only considers the features in X, and ignores the responses in Y. We can then use saltatlas::dhnsw::query_with_features() to get these whole concatenated vectors and perform our weighted sum using the components corresponding to Y.

However, this description requires a fair amount of work of the user, and requires that X and Y elements be represented by the same type. It would be preferable for satlatlas to instead encode the features and other data separately, and to support visitor query functions that can be functions of either or both.

@bwpriest
Copy link
Member Author

@steiltre any thoughts on this? Were you planning on having the user get indices to another distributed container (e.g. a ygm::container::map) to look up associated information about returned neighbors, or were you thinking about some version of what I described above? I'm happy to discuss at some point in the near future. Although I can use the workaround I described for now, I need this functionality for some applications.

@steiltre
Copy link
Collaborator

I was thinking of having the additional metadata stored alongside the indexed points so they can be easily gathered at the same time I'm gathering the nearest neighbor features in query_with_features(). I haven't thought through the details, but I was thinking of adding template parameters to determine whether metadata is needed and the types of metadata.

One question I had about your use case: are you expecting to know the feature vector and metadata at insertion time, and have the metadata remain constant?

@bwpriest
Copy link
Member Author

For my current use case I am expecting to know features and metadata at insert time, and they will remain constant. Future versions of the use case may involve the features changing over time, although I think that will be a lot more complicated for saltatlas.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants