FREUD (Feature Retrieval, Editing and Understanding for Developers) is a codebase for discovering and analyzing intermediate activations in audio models. It provides
- Code for training sparse autoencoders on audio model activations
- An interactive GUI for inspecting base model activations as well as learned autoencoder features
Currently, it is compatible with OpenAI's Whisper family of models.
Checkpoints with the corresponding training run logs are available on Huggingface.
You can demo the GUI here. Input an MLP neuron index and see melspecs of audio that strongly activate that feature with that neuron's activation values overlaid, i.e strong activations for index 0 correspond to an "m" phoneme. You can also a record or upload short audio clip and see which features it activates most strongly.
- Create a virtual env. I used conda and python 3.10:
conda create -n whisper-interp python=3.10
- Activate your virtual env and install pytorch:
conda init whisper-interp; conda install pytorch -c pytorch
- Install the rest of the dependencies:
pip install -r requirements.txt
- Download the LibriSpeech datasets:
python -m src.scripts.download_audio_datasets
- Running the GUI requires installing NodeJS if it isn't already installed on your machine. Once installed,
cd
into thegui
directory and runnpm install
to install GUI dependencies. - Install ffmpeg if it isn't already installed on your machine.
- Config files in
configs/features
andconfigs/train
specify"cuda"
as the device, but you can change this field to"cpu"
if you're not using an NVIDIA GPU. - If you're low on disk space:
- You can set the
collect_max
parameter for configs inconfig/features
to an integer amount to save activations for only that many files. - Collection steps are optional, at the cost of slower feature search and training:
- For feature search, omit the
--from_disk
flag - For training, set the
from_disk
field in the training config tofalse
.
- For feature search, omit the
- Once you've started the GUI webserver,
cd
into thegui
directory and run the commandnpm run start
. The GUI will be displayed athttp://localhost:3000/
. The client code assumes that the GUI server is running on port 5555 oflocalhost
, which will not be the case if you are running the server on a remote machine. In that case, edit the filegui/src/ActivationDisplay.js
to setAPI_BASE_URL
to the correct remote URL. - If you find activation search too slow, set the
--files_to_search
flag ofsrc.scripts.gui_server
to N in order to search through only N files in the dataset. - I look for interesting features by inputing clips to the Upload Audio tab of the GUI, making note of the top feature indexes for the uploaded clip and then checking if the pattern held for files returned by the Activation Search results for those indexes. I've found this to be a more productive (and fun!) than browsing indexes at random.
According to previous results, "neurons in the MLP layers of the encoder are highly interpretable.". Follow the steps in this section to replicate the results of section 1.1 of the linked post.
- Collect MLP activations from the speech dataset:
python -m src.scripts.collect_activations --config configs/features/tiny_block_2_mlp_1_test.json
- Start the GUI server and follow step 3 of #General Notes to view activations:
python -m src.scripts.gui_server --config configs/features/tiny_block_2_mlp_1.json --from_disk
Interesting things to note:
- The top activations for the first 50 MLP neurons follow the pattern laid out in the linked section's table. However, when if you look at strongly negative activations by setting activation value to 0 and checking "use absolute value", you'll see that the most strongly negative activations are also appear to follow the same pattern!
- When MLP neurons correspond to features, those features tend to be phonetic rather than anything of broader semantic meaning. A few potential exceptions:
- 1110 activates before pauses between words where you would a comma to appear in the transcript
- 38 appears to activate most strongly at the start of an exclamation?
If you like, you can repeat the same steps above on the residual stream output of block 2 rather than just MLP activations. As per section 1.2 of the link, looking at single indices for these activations will fail to yield human-comprehensible features, though some correspond have maybe-interesting activation patterns:
- 85 alternates high and low on the scale of 0.7 seconds
- 232 has a strong negative activation roughly equidistant between strong positive activations at the silence before speech
These steps will train an sparse autoencoder dictionary for block 2 of Whisper Tiny, following the autoencoder architecture of Interpreting OpenAI's Whisper.
- Collect block 2 activations for the train, validation and test datasets:
python -m src.scripts.collect_activations --config configs/features/tiny_block_2_train; python -m src.scripts.collect_activations --config configs/features/tiny_block_2_dev; python -m src.scripts.collect_activations --config configs/features/tiny_block_2_test;
- Train a SAE:
python -m src.scripts.train_sae --config configs/train/tiny_l1.json
- Tensorboard training logs and checkpoints will be saved to the directory
runs/
- Once the run has completed to your satisfaction, you can collect trained SAE activations:
python -m src.scripts.collect_activations --config configs/features/tiny_l1_sae.json
- Start the GUI server and follow step 3 of General notes to view activations::
python -m src.scripts.gui_server --config configs/features/tiny_l1_sae.json --from_disk
These steps will train a sparse autoencoder based on Eleuther AI's implementation of TopK autoencoders. It uses TopK activation and AuxK loss introduced by Gao et al. 2024 in order to combat dead dictionary entries.
- Follow step 1 of the section above to (optionally) collect block 2 activations.
- Train a SAE:
python -m src.scripts.train_sae --config configs/train/tiny_topk.json
- See step 2 note above for logging and checkpoint information
- After the run's completion, collect trained SAE activations:
python -m src.scripts.collect_activations --config configs/features/tiny_topk_sae.json
- Start the GUI server and follow step 3 of General notes to view activations::
python -m src.scripts.gui_server --config configs/features/tiny_topk_sae.json --from_disk
- Collect block 16 activations for the train and validation datasets:
python -m src.scripts.collect_activations --config configs/features/large_v3_block_16_train_10k.json; python -m src.scripts.collect_activations --config configs/features/large_v3_block_16_dev
- To economize disk space
configs/features/large_v3_block_16_train_10k.json
only collects activations for 10000 files, but you can alter that number in the config as you wish (or omit caching activations to disk altogether, see General Note 2)
- Once the run has completed, collect trained SAE activations:
python -m src.scripts.collect_activations --config configs/features/large_v3_l1_sae.json
- Start the GUI server and follow step 3 of General notes to view activations:
python -m src.scripts.gui_server --config configs/features/large_v3_l1_sae.json --from_disk
Gong et al. 2023 demonstrated that unlike most ASR models, Whisper Large encodes information about background noise deep into its intermediate representation. Following the paper, we train on the AudioSet dataset and test on ESC-50. I found that L1-regularized SAE training to be unstable, so I trained a TopK one.
- Download the AudioSet and ESC-50 datasets:
python -m src.scripts.download_audio_datasets --dataset audioset; python -m src.scripts.download_audio_datasets --dataset esc-50
- Collect activations:
python -m src.scripts.collect_activations --config configs/features/large_v1_block_16_audioset_train.json; python -m src.scripts.collect_activations --config configs/features/large_v1_block_16_audioset_train.json;
- Train a SAE:
python -m src.scripts.train_sae --config configs/train/tiny_topk.json
- Collect SAE activations:
python -m src.scripts.collect_activations --config configs/features/topk_large_v1_whisper-at.json
- Start the GUI server and follow step 3 of General notes to view activations:
python -m src.scripts.gui_server --config configs/features/topk_large_v1_whisper-at.json --from_disk