Many computers lack the powerful GPUs required to run large models like Ollama, preventing numerous users from enjoying the conveniences of local large models, such as article optimization, meeting summary extraction, and English email composition. A new method now enables local Ollama invocation of Google Colab's free GPU for rapid AI response generation.
Naturally, the first step is to have a Google account. This is widely covered online, so we won't delve into the details here.
Visit the prepared Ollama.ipynb at .
Sign up for Ngrok (free) and obtain your token at Ngrok Dashboard. Fill in your token in the Colab notebook.
In the Colab notebook, replace token="Your Ngrok token"
within the code block 3 with your actual Ngrok token.
Choose the GPU T4 for your session.
Follow the steps 1, 2, 3 in the notebook. After completing step 3, you will receive an URL like https://xxxxxxx.ngrok-free.app
.
Install Ollama from Ollama Download Page, available for macOS, Linux, and Windows.
On your computer, set the environment variable with export OLLAMA_HOST=https://xxxxxxx.ngrok-free.app/
.
Execute ollama run model_name
, for example, ollama run gemma
. Wait for the model to load. Although it appears to run locally, it actually invokes the remote Colab's T4 GPU.
Now, you can input questions to receive answers or use more apps to call Ollama, like setting the OpenAi-Translator tool to use Ollama, bypassing the need for a VPN to use ChatGPT and avoiding account bans.
Note: The free version of Google Colab's GPU has a daily limit of 12 hours. If you find it useful, consider purchasing the Pay as you go option, which allows 90 days of use with 100 GPU units.