Document a project can be tedious, espacialy the larger the project is. In the same time, understanding the structure of the larger project can be a very useful knowledge ressource to start a new project. Indeed, you can use an existing structure as a canvas, use specific parts of an existinng structure or improve an existing structure for a new project. In order to achieve this work automatically, I had the idea to use LLM to track function dependencies between each other within a project scripts. After tracking all the dependencies, the LLM will then produce a directed acyclic graph of the dependencies in the form of a script in the Mermaid language.
The project is divided in two parts :
- extract_an_process : Extract python files from list of paths, process them as strings and stack them together.
- create_dag : Read the python scripts as a stack string and with the help of the Langchain library and the OpenAI API, generate a script in the Mermaid language corresponding to an acyclilc graph of the interactions of the functions and objects within their modules, corresponding to the inner structure of the project where belong the python scripts.
There is also these importants files used to customize the behavior of the app :
- params.py : You can find here the dict with the OpenAI model parameters such as the GPT model version (currently GPT 4o) and the temperature argument conditioning the variability in the model output (currently 0.7). You can change all these arguments and add others.
- prepromt.py : The preprompt containing the instructions to the GPT model to follow in order to understand the scripts and generate the desired output.
The main tool used in this project is Langchain for Python. This is a library aimed to interact with LLM APIs or local LLMs and used to build application around the possibilities given by LLMs.
Follow these steps :
- Clone the curent repository using
gh repo clone https://github.com/DridrM/langchain-auto-dag-app/tree/master
- Create an OpenAI account and create an OPENAI_API_KEY
- Create a .env file inside the project directory and paste it the
OPENAI_API_KEY
(with this exact name) - Install pyenv and poetry
- Install direnv and hook it into your .bashrc or .zshrc
- Install python 3.12 with pyenv
- Make python 3.12 local to your project directory
- Indicate poetry to use python 3.12 with this command in your terminal (linux systems):
poetry env use $(pyenv which python)
- Create the virtual environment with
poetry shell
- Inside the shell, install all the requirements and the project configuration with
poetry install
- Leave the shell with
exit
, and run these commands :chmod +x install.sh
andchmod +x auto_dag.sh
. Theinstall.sh
file is used to create the command line interface. Theauto_dag.sh
is the interface between the command line tool and the main python script. - Execute the
install.sh
file withbash install.sh
or./install.sh
.
You are now ready to use the command line tool.
Simply type inside your shell autodag path/to/script_1 path/to/script_2 ... path/to/script_n
with the paths to the python scripts of your project.