This sample demonstrates how to deploy Ollama with Defang, along with a Next.js frontend using the AI SDK for smooth streaming conversations. By default it runs a very small model (llama3.2:1b
) which can perform well with just a CPU, but we've included lines that you can uncomment in the compose file to enable GPU support and run a larger model like gemma:7b
. If you want to deploy to a GPU powered instance, you will need to use your own AWS account with Defang BYOC.
- Download Defang CLI
- (Optional) If you are using Defang BYOC authenticated with your AWS account
- (Optional for local development) Docker CLI
To run the application locally, you can use the following command:
docker compose -f compose.dev.yaml up
Note
Download Defang CLI
Deploy your application to the defang playground by opening up your terminal and typing defang up
.
Keep in mind that the playground does not support GPU instances.
If you want to deploy to your own cloud account, you can use Defang BYOC:
- Authenticate your AWS account, and that you have properly set your environment variables like
AWS_PROFILE
,AWS_REGION
,AWS_ACCESS_KEY_ID
, andAWS_SECRET_ACCESS_KEY
. - Run
defang up
in a terminal that has access to your AWS environment variables.
Title: Ollama
Short Description: Ollama is a tool that lets you easily run large language models.
Tags: AI, LLM, ML, Llama, Mistral, Next.js, AI SDK,
Languages: Typescript