You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Right now, we can use saltatlas::dhnsw::query_with_features() to compute a visitor lambda that is a function of the features associated with the nearest neighbors of the query point. These features are the same elements used to compute the metric determining nearness. However, it is desirable to also store and return other meta data associated with the neighbors.
For example, say that we want to perform nearest neighbors regression. We have several samples of a domain X with a range Y. We ingest samples of (X, Y) into our nearest neighbors index, using distance in X as our metric. Then, a visitor lambda for a query q would find the set of neighbor indices N, and obtain the associated data X_N, Y_N. We then compute our prediction of the value of the response in Y for q based upon a weighted average of the Y_N, where the weights come from the distances of q to the elements of X_N.
It is currently possible to perform a workflow like this by way of concatenating the samples of (X, Y) together into a single vector, and then writing your distance function in such a way that it only considers the features in X, and ignores the responses in Y. We can then use saltatlas::dhnsw::query_with_features() to get these whole concatenated vectors and perform our weighted sum using the components corresponding to Y.
However, this description requires a fair amount of work of the user, and requires that X and Y elements be represented by the same type. It would be preferable for satlatlas to instead encode the features and other data separately, and to support visitor query functions that can be functions of either or both.
The text was updated successfully, but these errors were encountered:
@steiltre any thoughts on this? Were you planning on having the user get indices to another distributed container (e.g. a ygm::container::map) to look up associated information about returned neighbors, or were you thinking about some version of what I described above? I'm happy to discuss at some point in the near future. Although I can use the workaround I described for now, I need this functionality for some applications.
I was thinking of having the additional metadata stored alongside the indexed points so they can be easily gathered at the same time I'm gathering the nearest neighbor features in query_with_features(). I haven't thought through the details, but I was thinking of adding template parameters to determine whether metadata is needed and the types of metadata.
One question I had about your use case: are you expecting to know the feature vector and metadata at insertion time, and have the metadata remain constant?
For my current use case I am expecting to know features and metadata at insert time, and they will remain constant. Future versions of the use case may involve the features changing over time, although I think that will be a lot more complicated for saltatlas.
Right now, we can use
saltatlas::dhnsw::query_with_features()
to compute a visitor lambda that is a function of the features associated with the nearest neighbors of the query point. These features are the same elements used to compute the metric determining nearness. However, it is desirable to also store and return other meta data associated with the neighbors.For example, say that we want to perform nearest neighbors regression. We have several samples of a domain
X
with a rangeY
. We ingest samples of(X, Y)
into our nearest neighbors index, using distance inX
as our metric. Then, a visitor lambda for a queryq
would find the set of neighbor indicesN
, and obtain the associated dataX_N
,Y_N
. We then compute our prediction of the value of the response inY
forq
based upon a weighted average of theY_N
, where the weights come from the distances ofq
to the elements ofX_N
.It is currently possible to perform a workflow like this by way of concatenating the samples of
(X, Y)
together into a single vector, and then writing your distance function in such a way that it only considers the features inX
, and ignores the responses inY
. We can then usesaltatlas::dhnsw::query_with_features()
to get these whole concatenated vectors and perform our weighted sum using the components corresponding toY
.However, this description requires a fair amount of work of the user, and requires that
X
andY
elements be represented by the same type. It would be preferable for satlatlas to instead encode the features and other data separately, and to support visitor query functions that can be functions of either or both.The text was updated successfully, but these errors were encountered: