InterpTools is intended to be a growing and improving toolkit of resources for doing interpretability work with LLM-style models. As we identify more tools that we need, we can continue to add them to this toolkit to make extracting, analyzing, and modifying model internals as simple, instructive, and fun as possible.
At this point, I am relying largely upon the transformer_lens python library. Documentation here:
Here are a few tutorial colab notebooks that should also help you get started with TransformerLens and model interp more generally: