Infollama is a Python server that manages a token-protected proxy for Ollama.
Infollama also retrieves and displays in a real time UI useful details about the Ollama server, including available models, running models, file size, RAM usage, and more. It also provides hardware information, particularly GPU and RAM usage.
- Run a proxy to access your Ollama API server, on localhost, LAN and WAN
- Protect your Ollama server with one token by user or usage
- Display usefull details about Ollama server (models, running models, size) and hardware device informations (CPU, GPUS, RAM and VRAM usage).
- Log Ollama API calls in a log file (as an HTTP log file type) with different levels: NEVER, ERROR, INFO, and ALL, including the full JSON prompt request
- Python 3.10 or higher
- Ollama server running on your local machine (See Ollama repository)
- Tested on Linux Ubuntu, Windows 10/11, macOS with Mx Silicon Chip
-
Clone the repository:
git clone https://github.com/toutjavascript/infollama-proxy.git cd infollama-proxy
-
Create and activate a virtual environment:
python -m venv venv source venv/bin/activate # On Windows use `venv\Scripts\activate`
-
Install the required dependencies:
pip install -r requirements.txt
Run the script with the following command:
python proxy.py
Open the browser and navigate to http://localhost:11430/info
to access the Infollama Proxy web UI.
You can modify launch configuration with theses parameters:
usage: proxy.py [-h] [--base_url BASE_URL] [--host HOST] [--port PORT] [--cors CORS] [--anonym ANONYM] [--log LOG]
--base_url BASE_URL The base_url of localhost Ollama server (default: http://localhost:11434)
--host HOST The host name for the proxy server (default: 0.0.0.0)
--port PORT The port for the proxy server (default: 11430)
--cors CORS The cors policy for the proxy server (default: *)
--anonym ANONYM Authorize the proxy server to be accessed anonymously without token (default: False)
--log LOG Define the log level that is stored in proxy.log (default: PROMPT, Could be NEVER|ERROR|INFO|PROMPT|ALL)
This repository is under heavy construction. To update the source code from GitHub, open a terminal in the infollama-proxy
folder and launch a pull request:
git pull
Infollama is not only a proxy server but also a powerfull web UI that displays hardware status, like GPU usage and temperatures, memory usage, and other information.

You can now use the proxy to chat with your Ollama server. Infollama works as an OpenAI Compatible LLM Server, You must give ths Base URL with port 11430
:
- base_url is now http://localhost:11430/v1
Do not forget to provide a valid token, starting with pro_
, defined in users.conf
file:
- api_key = "pro_xxxxxxxxxxxxxx"
Token definitions are set in the users.conf
file. During first launch, the users.conf
is created with users.default.conf
file. This text file lists the tokens line by line with this format:
user_type:user_name:token
user_type
can be user
or admin
. An admin
user can access more APIs (like, pull, delete, copy, ...) and can view the full log file in the web UI.
user_name
is a simple string of text
token
is a string that needs to starts with pro_
Parameters are separated with :
If --anonym
parameter is set to something at starts, the users.conf
is ignored and all the accesses are authorised. User name is set to openbar
.
You can log every prompt that are sent to server. Note that responses are not logged to preserve privacy and disk size. This proxy app has several levels of logging:
NEVER
: No logs at all.ERROR
: Log only error and not authorised requests.INFO
: Log usefull access (not api/ps, api/tags, ...), excluding promptsPROMPT
: Log useful access (not api/ps, api/tags, ...), including promptsALL
: Log every event, including prompts
By default, the level is set to PROMPT
.
Log file uses Apache server log format. For example, one line with PROMPT
level looks like this:
127.0.0.1 - user1 [16/Jan/2025:15:53:10] "STREAM /v1/chat/completions HTTP/1.1" 200 {'model': 'falcon3:1b', 'messages': [{'role': 'system', 'content': "You are a helpful web developer assistant and you obey to user's commands"}, {'role': 'user', 'content': ' Give me 10 python web servers. Tell me cons and pros. Conclude by choosing the easiest one. Do not write code.'}], 'stream': True, 'max_tokens': 1048}
Correcting bug and user issues is priority.
- Add buttons to start and stop models
- Add dark/light display mode
- Secure token storage with HTTPOnly cookie or browser keychain if available
- Add a GPU database to compare LLM performances
- Create a more efficient installation process (docker and .bat)
- Add a simple API to that returns the current usage from server (running models, hardware details, Free available VRAM, ...)
- Add a web UI to view or export logs (by user or full log if admin is connected)
- Add integrated support for tunneling to web
- Add a fallback system to access an other LLM provider if the current one is down
- Add an easy LLM speed benchmark
- Add a log file size checker
Beacause I needed two functionnalities :
- Access to Ollama server on LAN and over the web. As Ollama is not protected by token access, I need to manage it in a simple way.
- Realtime view of Ollama server status
If you see this error message Error get_device_info(): no module name 'distutils'
, try to update your install with:
pip install -U pip setuptools wheel
Fully tested with solutions like
-
nGrok
ngrok http http://localhost:11430
-
bore.pub (but no SSL support)
bore local 11430 --to bore.pub
IF YOU OPEN INFOLLAMA OVER THE WEB, DO NOT FORGET TO CHANGE THE DEFAULT TOKENS IN users.conf
FILE
With a web access, the diagram shows you acces from outside your LAN
We welcome contributions from the community. Please feel free to open an issue or a pull request.