Global UMAPs #81

enjalot · 2024-11-11T20:46:22Z

It would be good to support trained UMAPs that are linked to a particular embedding model. Similar to the way we train an SAE to "unfold" a representative set of directions in the embedding model's latent space, we could train a representative UMAP by trying to cover the space of the model.

The quality and the size of that UMAP may need to be explored, but it seems worth it to allow the user to map any dataset using a supported embedding model (like nomic-embed-text-v1.5) to the same 2D space. This would let you quickly see which parts of the model space "light up" for a given dataset.

See https://enjalot.github.io/latent-taxonomy/ for an example of a UMAP calculated on the top activating samples for SAE features.

enjalot added enhancement New feature or request python web labels Nov 11, 2024

enjalot added this to the 2.0 milestone Nov 11, 2024

enjalot changed the title ~~Global UMAP from SAE~~ Global UMAPs Dec 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Global UMAPs #81

Global UMAPs #81

enjalot commented Nov 11, 2024 •

edited

Loading

Global UMAPs #81

Global UMAPs #81

Comments

enjalot commented Nov 11, 2024 • edited Loading

enjalot commented Nov 11, 2024 •

edited

Loading