-
-
Notifications
You must be signed in to change notification settings - Fork 323
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unsupervised models and prediction #99
Comments
Most unsupervised learning does not consider prediction as part of the task. I suggest to not approach these algorithms from a supervised "fit then predict" mindset at all, it does not do them justice. |
Thanks a lot for your feedback!
I agree that a post-hoc nearest-neighbors approach to enable prediction does not feel right and does not do the algorithms justice. However, I'm mainly interested in unsupervised outlier detection algorithms that can be used for prediction fairly naturally. As far as I can tell, this includes most of the neighbor-based approaches, where you can simply compare a prediction to the points in the training set. Edit: I also wouldn't say, for example, that the prediction disturbs the meaning of the LOF score. |
It doesn't disturb the meaning, but the results of prediction will not agree with the results of the original algorithm. |
And in many cases it will even be better to depart, e.g., from LOF and try to cleanly transfer the key idea into an open-world "train-test split setting", rather than hack this into the original closed-world setting. This makes things both a lot easier and more efficient than trying to stay close to the original LOF. |
@davnn for implementing a set of outlier detection algorithms in Julia, it may be worth trying to implement the generalized pattern introduced here: Schubert, E., Zimek, A. & Kriegel, HP. Local outlier detection reconsidered: a generalized view on locality with applications to spatial, video, and network outlier detection. Data Min Knowl Disc 28, 190–237 (2014). https://doi.org/10.1007/s10618-012-0300-z This avoids redundancies when implementing this. Its not very elegantly possible in Java right now though, because generics cannot be primitives; and most of the time we will be dealing with primitive values (e.g., a local density estimate). There is a partial implementation of this pattern in ELKI for the parallelized versions, but the code has to work around a lot of Java limitations. But for a Julia implementation, this may be an interesting starting point to approach it as a multi-stage mapping process, with these stages shared across different detectors. |
Thanks for the hint! I've already used that paper as a reference for the different approaches a couple of times ;). I'll investigate how those generalizations could be implemented further down the line. |
If you used that paper as a reference, I would appreciate if it were included in the references list of the package. ;-) |
Hi, I would like to use ELKI to validate our OutlierDetection.jl algorithms. It appears that for unsupervised algorithms there is no facility to learn a model from data and predict on another dataset, right? Has this been done somewhere previously?
The text was updated successfully, but these errors were encountered: