This repo contains tools for interpreting protein language models using sparse autoencoders (SAEs). Our SAE visualizer is available at interprot.com and our SAE models weights are on HuggingFace. For more information, check out our preprint.
viz
contains the frontend app for visualizing SAE features. interprot
is a Python package for SAE training, evaluation, and interpretation.
Check out this demo notebook for SAE inference with a custom input sequence.
The visualizer is a React app with some RunPod serverless functions that serve our SAEs.
cd viz
pnpm install
pnpm run dev
The RunPod serverless functions live in their own repos:
- SAE inference: https://github.com/liambai/sae-inference
- SAE steering: https://github.com/liambai/sae-steering
The visualizer and several of our analysis scripts require the generation of files (also referred to as visualization files) which summarize each SAE latent.
- Generate the visualization files using
interprot/make_viz_files/__main__.py
- Compute family specificity using
interprot/scripts/run_compute_family_specificity.py
- Classify latents by activation pattern using
interprot/scripts/run_viz_file_analysis.py
. This will also compute many more statistics about the latents.
The input sequences to the visualization file generation script can be found here.
pip install pre-commit
pre-commit install
docker compose build
docker compose run --rm interprot bash
We find linear probes over SAE latents to be a powerful tool for uncovering interpretable features. Here's a demo notebook on the subcellular localization classification task.