Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement entire database retrieval metrics (after getting checkpoints) #54

Open
2 of 4 tasks
AdrianM0 opened this issue May 12, 2024 · 3 comments
Open
2 of 4 tasks
Assignees
Labels
analysis Refers to metrics and any analysis scripts data Related to dataset creation/curation

Comments

@AdrianM0
Copy link
Collaborator

AdrianM0 commented May 12, 2024

  • check the retrieval between non-paired modalities
  • embedding arithmetic
  • check if embeddings for the same SMILES but rearranged are close to each other
  • retrieval top k between central modality and the other modalities
@AdrianM0 AdrianM0 self-assigned this May 12, 2024
@AdrianM0 AdrianM0 added data Related to dataset creation/curation analysis Refers to metrics and any analysis scripts labels May 12, 2024
@kjappelbaum
Copy link
Collaborator

I feel for this, at least for the subset where we can cleanly convert between modalities, it would be great to have the same dataset size from which we retrieve.

In other cases, I would sample to a fixed size

@kjappelbaum
Copy link
Collaborator

kjappelbaum commented Jul 15, 2024

additional metrics to consider:

  • mean reciprocal rank
  • Discounted cumulative gain (DCG)

@AdrianM0
Copy link
Collaborator Author

additional metrics to consider:

  • mean reciprocal rank
  • Discounted cumulative gain (DCG)

I am doing now MRR on the batch only.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
analysis Refers to metrics and any analysis scripts data Related to dataset creation/curation
Projects
None yet
Development

No branches or pull requests

2 participants