From 67911a1879b9c8d42e53162c51de8e13f96969a7 Mon Sep 17 00:00:00 2001 From: Vikramjeet Singh <72499426+VikramxD@users.noreply.github.com> Date: Tue, 3 Dec 2024 13:02:29 +0530 Subject: [PATCH] Update README.md --- README.md | 119 +++++++++++++++++++++++++++++++++++++++++------------- 1 file changed, 92 insertions(+), 27 deletions(-) diff --git a/README.md b/README.md index 82760ff..dfd49cc 100644 --- a/README.md +++ b/README.md @@ -1,7 +1,7 @@
Cute Mochi Logo

MinMochi

-

Minimalist API Server for Mochi Text-to-Video Generation

+

Minimalist API Server for Mochi and LTX Text-to-Video Generation

[![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/) [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) @@ -11,14 +11,16 @@ ## 🚀 Overview -**MinMochi** serves the Genmo Mochi text-to-video model as a production-ready API. Generate high-quality videos from text prompts with minimal setup. +**MinMochi** serves both the Genmo Mochi and Lightricks LTX text-to-video models as a production-ready API. Generate high-quality videos from text prompts with minimal setup. ## 🛠️ System Requirements - 🐍 Python 3.10+ - 🎮 GPU Requirements: - Recommended: NVIDIA A100 or H100 - - Suitable: NVIDIA A6000 or A40 + - Minimum: NVIDIA A6000 or A40 + - Mochi: 16GB+ VRAM + - LTX: 24GB+ VRAM - ☁️ Active AWS account - 🐳 Docker @@ -38,7 +40,7 @@ uv pip install -e . --no-build-isolation ## ⚙️ Configuration -MinMochi uses Pydantic settings for configuration management. The configuration is split into three main modules: +MinMochi uses Pydantic settings for configuration management. The configuration is split into multiple modules: ### 1. Mochi Settings (`mochi_settings.py`) ```python @@ -63,7 +65,23 @@ num_frames = 150 fps = 10 ``` -### 2. AWS Settings (`aws_settings.py`) +### 2. LTX Settings (`ltx_settings.py`) +```python +# Default settings, can be overridden with LTX_ prefixed env variables +model_name = "LTX-Video" +ckpt_dir = "checkpoints" # Directory containing model components +device = "cuda" + +# Video Generation Settings +num_inference_steps = 40 +guidance_scale = 3.0 +height = 480 +width = 704 +num_frames = 121 +frame_rate = 25 +``` + +### 3. AWS Settings (`aws_settings.py`) ```python # Override with environment variables AWS_ACCESS_KEY_ID = "" @@ -72,34 +90,47 @@ AWS_REGION = "ap-south-1" AWS_BUCKET_NAME = "diffusion-model-bucket" ``` -### 3. Model Weights Settings (`mochi_weights.py`) -```python -output_dir = Path("weights") -repo_id = "genmo/mochi-1-preview" -model_file = "dit.safetensors" -decoder_file = "decoder.safetensors" -encoder_file = "encoder.safetensors" -dtype = "bf16" # Options: "fp16", "bf16" +## 🎨 Prompt Engineering Guide + +### For LTX Model +Structure your prompts focusing on cinematic details: +1. Start with main action +2. Add specific movement details +3. Describe visual elements precisely +4. Include environment details +5. Specify camera angles +6. Describe lighting and colors + +Example LTX Prompt: ``` +A red maple leaf slowly falls through golden autumn sunlight in a serene forest. The leaf twirls and dances as it descends, casting delicate shadows. Sunbeams filter through trees, creating a warm, dappled lighting effect. The camera follows the leaf in a gentle downward tracking shot. +``` + +Parameter Guidelines (LTX): +- Resolution: Must be divisible by 32 (e.g., 480x704) +- Frames: Must follow pattern 8n+1 (e.g., 121, 161) +- Guidance Scale: 3.0-3.5 recommended +- Steps: 40+ for quality, 20-30 for speed ## 🎬 Usage -### Launch Server +### Launch Servers ```bash -python src/api/mochi_serve.py +# Launch Mochi Server +python3 api/mochi_serve.py + +# Launch LTX Server +python api/ltx_serve.py ``` ### Generate Videos +#### Mochi API ```python -import requests -import json - url = "http://localhost:8000/api/v1/video/mochi" payload = { "prompt": "A beautiful sunset over the mountains", - "negative_prompt": "", "num_inference_steps": 100, "guidance_scale": 7.5, "height": 480, @@ -107,11 +138,41 @@ payload = { "num_frames": 150, "fps": 10 } +``` + +#### LTX API +```python +url = "http://localhost:8000/api/v1/video/ltx" +payload = { + "prompt": "A red maple leaf slowly falls...", + "negative_prompt": "worst quality, inconsistent motion, blurry", + "num_inference_steps": 40, + "guidance_scale": 3.0, + "height": 480, + "width": 704, + "num_frames": 121, + "frame_rate": 25, + "seed": 42 +} -response = requests.post(url, json=[payload]) +response = requests.post(url, json=payload) print(response.json()) ``` +### CURL Example (LTX) +```bash +curl -X POST http://localhost:8000/api/v1/video/ltx \ +-H "Content-Type: application/json" \ +-d '{ + "prompt": "A red maple leaf slowly falls...", + "height": 480, + "width": 704, + "num_frames": 121, + "num_inference_steps": 40, + "guidance_scale": 3.0 +}' +``` + ## 📊 Monitoring ### Metrics @@ -124,18 +185,23 @@ Prometheus metrics available at `/metrics`: - Structured logging with loguru - Log rotation at 100MB - 1-week retention period -- Logs stored in `logs/api.log` +- Logs stored in `logs/api.log` and `logs/ltx_api.log` ## 🎛️ GPU Memory Requirements +### Mochi Model | Resolution | Frames | Min GPU Memory | |------------|--------|----------------| | 480x480 | 60 | 16GB | | 576x576 | 60 | 20GB | | 768x768 | 60 | 24GB | - - +### LTX Model +| Resolution | Frames | Min GPU Memory | +|------------|--------|----------------| +| 480x704 | 121 | 24GB | +| 576x832 | 121 | 32GB | +| 720x1280 | 121 | 40GB | ## 📄 License @@ -143,14 +209,13 @@ This project is licensed under the MIT License - see the [LICENSE](LICENSE) file ## 🙏 Acknowledgments -- [Genmo.ai](https://genmo.ai) for the original Mochi model +- [Genmo.ai](https://genmo.ai) for the Mochi model +- [Lightricks](https://www.lightricks.com/) for the LTX-Video model - [Hugging Face Diffusers](https://github.com/huggingface/diffusers) - [LitServe](https://github.com/Lightning-AI/litserve) - API framework ---
- -[Report Bug](https://github.com/vikramxD/minimochi/issues) • [Request Feature](https://github.com/vikramxD/minimochi/issues) - +Made with ❤️ by VikramxD