Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added pair counting fmeasure metric #220

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

dinarior
Copy link

@dinarior dinarior commented Aug 22, 2021

I often use this metric, I think it's worth having.

refs:
https://nlp.stanford.edu/IR-book/html/htmledition/evaluation-of-clustering-1.html

https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.214.7233&rep=rep1&type=pdf

Included also is precision and recall for clustering, I was not sure about the proper name (e.g. precision is already in use by Julia base).

The _pair_confusion_matrix is translated from sklearn's https://github.com/scikit-learn/scikit-learn/blob/2beed55847ee70d363bdbfe14ee4401438fba057/sklearn/metrics/cluster/_supervised.py#L154

there is a small duplication with the rand index, which also require this matrix, but as I did not want to modify it to use my new function right now, but in a separated or (if at all).

@kmsquire
Copy link
Contributor

The _pair_confusion_matrix is translated from sklearn's https://github.com/scikit-learn/scikit-learn/blob/2beed55847ee70d363bdbfe14ee4401438fba057/sklearn/metrics/cluster/_supervised.py#L154

If it's a direct translation, you'll have to include the license here (or ask the scikit-learn folks if a translation of their code can be MIT licensed. But I'm guessing that would be difficult.).

I don't do much with this package, but I can review this. However, we'll need to figure out the license stuff first (i.e., do we really want to include BSD licensed code here.)

If _pair_confusion_matrix is simple (and it sounds like it should be), you could just include a description of the code, remove it from here, and someone else can implement it. Maybe me, but it can be anyone that hasn't seen the scikit-learn code. That way, we wouldn't have to worry about the license.

@nalimilan
Copy link
Member

Given how short and simple the code is, it probably won't have to be considered as derived from NumPy if you adapt it to make it more Julian and more efficient, as in the end the only think that will remain from NumPy is the algorithm. For example, sum(c.*c) should we written as sum(abs2, c), sum(c,dims=1)[:] as vec(sum(c, dims=1)) and so on.

@dinarior
Copy link
Author

Thanks,
I intend to rewrite it, maybe extract common functionalities from the ARI metric, hopefully, will get to it soon enough.

@wildart
Copy link
Contributor

wildart commented Dec 25, 2021

I've just re-implemented this functionality in #227 to fix ARI calculations.

@dinarior
Copy link
Author

Great!, I will wait for it to get pushed and update this commit accordingly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants