Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial Subset with Streaming SO #29

Open
devinity1337 opened this issue Oct 8, 2021 · 5 comments
Open

Initial Subset with Streaming SO #29

devinity1337 opened this issue Oct 8, 2021 · 5 comments

Comments

@devinity1337
Copy link

from apricot import FacilityLocationSelection
import numpy

X = numpy.exp(numpy.random.randn(1000, 50))
print(X)

X_corr = numpy.corrcoef(X) ** 2

model = FacilityLocationSelection(10, 'corr',initial_subset=[1, 5, 6, 8, 10])
model.partial_fit(X)

print(model.ranking, X_corr[model.ranking].max(axis=0).sum())

Gives an error:

ValueError: operands could not be broadcast together with shapes (995,) (1000,)

The size mismatch is equal to the initial subset size and the error only occurs when I use an initial subset so something goes wrong with the streaming with the initial subset.

@jmschrei
Copy link
Owner

jmschrei commented Oct 8, 2021

Oof, thanks for the report! Okay, I can look into it. I'm not sure when I'll get to it, so you might be better suited looking into an alternate approach in the meantime but I'll let you know when I fix it.

@devinity1337
Copy link
Author

Thanks. Any insight into the best number of nearest neighbors to use? I've started with 1000.

Also, where to put the "pre-computed" distances for the sparse matrix encoding? I don't see it as a parameter.

@jmschrei
Copy link
Owner

jmschrei commented Oct 13, 2021 via email

@devinity1337
Copy link
Author

from apricot import FacilityLocationSelection
import numpy
from scipy.sparse import csr_matrix

X = numpy.random.uniform(0, 1, size=(6000, 6000))
X = (X + X.T) / 2.
X[X < 0.9] = 0.0
X_sparse = csr_matrix(X)

#FacilityLocationSelection(500, 'precomputed', verbose=True).fit(X)
FacilityLocationSelection(500, 'precomputed', verbose=True).fit(X_sparse)

The code seems to work, but what distance metric is it actually using?

@jmschrei
Copy link
Owner

jmschrei commented Oct 14, 2021 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants