From 67911a1879b9c8d42e53162c51de8e13f96969a7 Mon Sep 17 00:00:00 2001
From: Vikramjeet Singh <72499426+VikramxD@users.noreply.github.com>
Date: Tue, 3 Dec 2024 13:02:29 +0530
Subject: [PATCH] Update README.md
---
README.md | 119 +++++++++++++++++++++++++++++++++++++++++-------------
1 file changed, 92 insertions(+), 27 deletions(-)
diff --git a/README.md b/README.md
index 82760ff..dfd49cc 100644
--- a/README.md
+++ b/README.md
@@ -1,7 +1,7 @@
MinMochi
-
Minimalist API Server for Mochi Text-to-Video Generation
+
Minimalist API Server for Mochi and LTX Text-to-Video Generation
[![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
@@ -11,14 +11,16 @@
## 🚀 Overview
-**MinMochi** serves the Genmo Mochi text-to-video model as a production-ready API. Generate high-quality videos from text prompts with minimal setup.
+**MinMochi** serves both the Genmo Mochi and Lightricks LTX text-to-video models as a production-ready API. Generate high-quality videos from text prompts with minimal setup.
## 🛠️ System Requirements
- 🐍 Python 3.10+
- 🎮 GPU Requirements:
- Recommended: NVIDIA A100 or H100
- - Suitable: NVIDIA A6000 or A40
+ - Minimum: NVIDIA A6000 or A40
+ - Mochi: 16GB+ VRAM
+ - LTX: 24GB+ VRAM
- ☁️ Active AWS account
- 🐳 Docker
@@ -38,7 +40,7 @@ uv pip install -e . --no-build-isolation
## ⚙️ Configuration
-MinMochi uses Pydantic settings for configuration management. The configuration is split into three main modules:
+MinMochi uses Pydantic settings for configuration management. The configuration is split into multiple modules:
### 1. Mochi Settings (`mochi_settings.py`)
```python
@@ -63,7 +65,23 @@ num_frames = 150
fps = 10
```
-### 2. AWS Settings (`aws_settings.py`)
+### 2. LTX Settings (`ltx_settings.py`)
+```python
+# Default settings, can be overridden with LTX_ prefixed env variables
+model_name = "LTX-Video"
+ckpt_dir = "checkpoints" # Directory containing model components
+device = "cuda"
+
+# Video Generation Settings
+num_inference_steps = 40
+guidance_scale = 3.0
+height = 480
+width = 704
+num_frames = 121
+frame_rate = 25
+```
+
+### 3. AWS Settings (`aws_settings.py`)
```python
# Override with environment variables
AWS_ACCESS_KEY_ID = ""
@@ -72,34 +90,47 @@ AWS_REGION = "ap-south-1"
AWS_BUCKET_NAME = "diffusion-model-bucket"
```
-### 3. Model Weights Settings (`mochi_weights.py`)
-```python
-output_dir = Path("weights")
-repo_id = "genmo/mochi-1-preview"
-model_file = "dit.safetensors"
-decoder_file = "decoder.safetensors"
-encoder_file = "encoder.safetensors"
-dtype = "bf16" # Options: "fp16", "bf16"
+## 🎨 Prompt Engineering Guide
+
+### For LTX Model
+Structure your prompts focusing on cinematic details:
+1. Start with main action
+2. Add specific movement details
+3. Describe visual elements precisely
+4. Include environment details
+5. Specify camera angles
+6. Describe lighting and colors
+
+Example LTX Prompt:
```
+A red maple leaf slowly falls through golden autumn sunlight in a serene forest. The leaf twirls and dances as it descends, casting delicate shadows. Sunbeams filter through trees, creating a warm, dappled lighting effect. The camera follows the leaf in a gentle downward tracking shot.
+```
+
+Parameter Guidelines (LTX):
+- Resolution: Must be divisible by 32 (e.g., 480x704)
+- Frames: Must follow pattern 8n+1 (e.g., 121, 161)
+- Guidance Scale: 3.0-3.5 recommended
+- Steps: 40+ for quality, 20-30 for speed
## 🎬 Usage
-### Launch Server
+### Launch Servers
```bash
-python src/api/mochi_serve.py
+# Launch Mochi Server
+python3 api/mochi_serve.py
+
+# Launch LTX Server
+python api/ltx_serve.py
```
### Generate Videos
+#### Mochi API
```python
-import requests
-import json
-
url = "http://localhost:8000/api/v1/video/mochi"
payload = {
"prompt": "A beautiful sunset over the mountains",
- "negative_prompt": "",
"num_inference_steps": 100,
"guidance_scale": 7.5,
"height": 480,
@@ -107,11 +138,41 @@ payload = {
"num_frames": 150,
"fps": 10
}
+```
+
+#### LTX API
+```python
+url = "http://localhost:8000/api/v1/video/ltx"
+payload = {
+ "prompt": "A red maple leaf slowly falls...",
+ "negative_prompt": "worst quality, inconsistent motion, blurry",
+ "num_inference_steps": 40,
+ "guidance_scale": 3.0,
+ "height": 480,
+ "width": 704,
+ "num_frames": 121,
+ "frame_rate": 25,
+ "seed": 42
+}
-response = requests.post(url, json=[payload])
+response = requests.post(url, json=payload)
print(response.json())
```
+### CURL Example (LTX)
+```bash
+curl -X POST http://localhost:8000/api/v1/video/ltx \
+-H "Content-Type: application/json" \
+-d '{
+ "prompt": "A red maple leaf slowly falls...",
+ "height": 480,
+ "width": 704,
+ "num_frames": 121,
+ "num_inference_steps": 40,
+ "guidance_scale": 3.0
+}'
+```
+
## 📊 Monitoring
### Metrics
@@ -124,18 +185,23 @@ Prometheus metrics available at `/metrics`:
- Structured logging with loguru
- Log rotation at 100MB
- 1-week retention period
-- Logs stored in `logs/api.log`
+- Logs stored in `logs/api.log` and `logs/ltx_api.log`
## 🎛️ GPU Memory Requirements
+### Mochi Model
| Resolution | Frames | Min GPU Memory |
|------------|--------|----------------|
| 480x480 | 60 | 16GB |
| 576x576 | 60 | 20GB |
| 768x768 | 60 | 24GB |
-
-
+### LTX Model
+| Resolution | Frames | Min GPU Memory |
+|------------|--------|----------------|
+| 480x704 | 121 | 24GB |
+| 576x832 | 121 | 32GB |
+| 720x1280 | 121 | 40GB |
## 📄 License
@@ -143,14 +209,13 @@ This project is licensed under the MIT License - see the [LICENSE](LICENSE) file
## 🙏 Acknowledgments
-- [Genmo.ai](https://genmo.ai) for the original Mochi model
+- [Genmo.ai](https://genmo.ai) for the Mochi model
+- [Lightricks](https://www.lightricks.com/) for the LTX-Video model
- [Hugging Face Diffusers](https://github.com/huggingface/diffusers)
- [LitServe](https://github.com/Lightning-AI/litserve) - API framework
---
-
-[Report Bug](https://github.com/vikramxD/minimochi/issues) • [Request Feature](https://github.com/vikramxD/minimochi/issues)
-
+Made with ❤️ by VikramxD