-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial Subset with Streaming SO #29
Comments
Oof, thanks for the report! Okay, I can look into it. I'm not sure when I'll get to it, so you might be better suited looking into an alternate approach in the meantime but I'll let you know when I fix it. |
Thanks. Any insight into the best number of nearest neighbors to use? I've started with 1000. Also, where to put the "pre-computed" distances for the sparse matrix encoding? I don't see it as a parameter. |
If you use precomputed distances you can set `metric="precomputed"` and
then pass the sparse matrix into `fit` or `fit_transform` as normal.
I think that there's been some work suggesting that using log2(n_examples)
neighbors is sufficient to achieve some theoretical properties, but I can't
remember what those properties are.
…On Wed, Oct 13, 2021 at 4:57 AM devinity1337 ***@***.***> wrote:
Thanks. Any insight into the best number of nearest neighbors to use? I've
started with 1000.
Also, where to put the "pre-computed" distances for the sparse matrix
encoding? I don't see it as a parameter.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#29 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AA54IEAVOJ734MLALCXCKITUGVX2JANCNFSM5FTVTRWA>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
The code seems to work, but what distance metric is it actually using? |
It assumes that you're passing in a similarity matrix yourself where 1 is
most similar, as opposed to 0 meaning least distant, rather than
calculating anything itself. A problem is that most standard similarity
functions don't produce sparse similarity matrices, even if many of the
elements are small. If you manually produce a similarity matrix that is
sparse, it knows how to use that sparsity to speed up the algorithm, though.
…On Wed, Oct 13, 2021 at 10:57 PM devinity1337 ***@***.***> wrote:
from apricot import FacilityLocationSelection
import numpy
from scipy.sparse import csr_matrix
X = numpy.random.uniform(0, 1, size=(6000, 6000))
X = (X + X.T) / 2.
X[X < 0.9] = 0.0
X_sparse = csr_matrix(X)
#FacilityLocationSelection(500, 'precomputed', verbose=True).fit(X)
FacilityLocationSelection(500, 'precomputed', verbose=True).fit(X_sparse)
The code seems to work, but what distance metric is it actually using?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#29 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AA54IEC7PROYQEHWANEZ5L3UGZWMVANCNFSM5FTVTRWA>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
Gives an error:
ValueError: operands could not be broadcast together with shapes (995,) (1000,)
The size mismatch is equal to the initial subset size and the error only occurs when I use an initial subset so something goes wrong with the streaming with the initial subset.
The text was updated successfully, but these errors were encountered: