Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Un-embed clusters as an alternative to summarizing #22

Open
enjalot opened this issue Feb 21, 2024 · 1 comment
Open

Un-embed clusters as an alternative to summarizing #22

enjalot opened this issue Feb 21, 2024 · 1 comment
Labels
enhancement New feature or request help wanted Extra attention is needed python
Milestone

Comments

@enjalot
Copy link
Owner

enjalot commented Feb 21, 2024

With vec2txt we should be able to get a reasonably useful sentence out of the average embeddings of a cluster. This could serve as the cluster label, or perhaps as guidance for summarizing the label.

https://github.com/jxmorris12/vec2text/

There are pre-trained models, like for OpenAI's text-embedding-ada-002 and perhaps others. Part of this issue might be helping to pre-train for other supported models in our list.

One could imagine a new API endpoint that takes in an embedding vector and outputs a sentence. We could also have an alternative summarize script that uses this instead (or in conjunction with) summarizing. We currently have a description field per cluster which is not really being used, it could be populated with this or we could add another field.

@enjalot enjalot added enhancement New feature or request help wanted Extra attention is needed python labels Feb 21, 2024
@dhruv-anand-aintech
Copy link
Contributor

could BerTopic also be a viable alternative to using LLM for summarization/topic name generation?

@enjalot enjalot added this to the 2.0 milestone Sep 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed python
Projects
None yet
Development

No branches or pull requests

2 participants