Use AI to generate guides to code repositories.
NEO: Can you fly that thing?
TRINITY: Not yet. Tank, I need a pilot program for a military B-212 helicopter. Hurry!
[seconds later ...]
TRINITY: Let's go.— The Matrix (1999)
You can see the output of repo-guide on its own repository at https://wolfmanstout.github.io/repo-guide/. This is automatically generated and published after every release. As of 1/4/2025, this consumes under 25K tokens on each run and costs less than 1 cent on Gemini 1.5 Flash (in fact it's using Gemini 2.0 Experimental, which is currently free).
NOTE: The guides generated by repo-guide
are designed to complement, not
replace, human-authored documentation. This project aims to make open source
contribution more accessible by providing detailed guides that go beyond what's
practical for human authors to maintain. We intend to empower the end-user, and
never be perceived as AI slop. Every
page of documentation includes an "AI-generated" notice in the footer for full
transparency.
For more on why I built this, check out my blog post: Repo-guide: Mapping Code Repositories with AI.
Install this tool using pip
, pipx
, or uv tool install
, e.g.:
pip install repo-guide
By default it uses Gemini Flash as the AI model, which requires an API
key. You can either set the
LLM_GEMINI_KEY
environment variable or install Simon Willison's
LLM command-line tool and use llm keys set gemini
to store your key in a configuration file.
DISCLAIMER: LLM API calls may cost money. Although this tool displays token counts and provides methods to limit token usage, you are ultimately responsible for any costs incurred, including costs that may be higher than expected due to bugs in this tool. Consider setting hard limits or other protections in your API accounts where possible.
This tool currently only supports Git repositories, with some additional features for GitHub repositories (e.g. links to files).
Typical usage:
repo-guide <path_to_cloned_repo_or_subdirectory>
This will create a generated_docs
directory within the current directory,
populate it with an AI-generated Markdown guide, then run a private
MkDocs server at localhost:8000
to serve the docs.
It will show a progress bar as it generates docs, including how many tokens the
model has used (combining input + output). You can start viewing the docs
immediately, and the page will automatically reload as new docs are generated.
If you kill the server and need to restart it later, by default it will reuse
any previously-generated Markdown files, so you can simply rerun the same
command. You can also add --no-resume
to delete and regenerate the files, or
--no-gen
to explicitly disable doc generation (e.g. even if a new directory
has been added).
If you wish to deploy the generated guide, add either --build
or
--gh-deploy
. The former will simply build a static HTML site in
generated_docs/site
that you can copy to any host, and the latter will build
and deploy to GitHub Pages, as described by MkDocs documentation: Deploying
your docs. For example,
you can fork a repo, then run repo-guide my_fork --gh-deploy
to deploy to your
fork's GitHub Pages (https://username.github.io/my_fork). The first time you do
this, you will also need to navigate to your fork's settings on GitHub, click
Pages, then choose "Deploy from branch" and "gh-pages" as the branch.
If you wish to customize MkDocs flags used for serving or deploying your guide,
you can add --no-serve
when building the guide and run MkDocs commands
directly. You'll need to install MkDocs and the necessary dependencies (e.g.
with uv tool install mkdocs --with mkdocs-material,bleach,bleach-allowlist
),
then you can run commands like mkdocs serve -f generated_docs/mkdocs.yml
. If
you deploy to GitHub Pages, add --rename-dot-github
when running repo-guide
so that any documentation files generated for a .github
directory are put into
_github
instead of .github
, which will not be served.
Here are some of the most common flags you may want to use:
--output-dir
: Change where the generated docs are written.-v
or--verbose
: Prints details on doc generation progress instead of a progress bar.--model
: Sets the LLM model to use. As of 2/5/2025, the default is Gemini 2.0 Flash, which costs a dollar for 10 million tokens and has a 1 million token context window, making it a great fit. You can try other Gemini models, or OpenAI models ifOPENAI_API_KEY
is set, as supported by simonw/llm and simonw/llm-gemini.--token-budget
: Sets an approximate token budget to avoid overspending. Tokens are counted after each LLM call, so the actual number may be higher.--custom-instructions
and--custom-instructions-file
: Use either of these to append custom instructions to the system prompt. Let me know if you come up with something that significantly improves the general result quality!
For a full description of command line flags, run:
repo-guide --help
You can also use:
python -m repo_guide --help
If the command fails either due to an error or hitting the token budget, simply
rerun the command and it will resume and retry (unless --no-resume
is
applied). Most common model errors (e.g. rate-limiting) should be automatically
retried with exponential backoff. You can --ignore
large generated or binary
files that aren't automatically filtered out (the tool automatically respects
.gitignore
files and ignores files annotated in git ls-files --eol
as
non-text). If you still hit the model token limit, try setting
--files-token-limit
, which is applied per-directory.
LLMs are unpredictable, and the generated Markdown may contain errors and broken links. The system prompt tries to mitigate common issues, but they happen anyways. The only real fix to this will be better models, which will surely come soon.
To contribute to this tool, use uv. The following command will establish the venv and run tests:
uv run pytest
To run repo-guide locally, use:
uv run repo-guide