Skip to content

pabl-o-ce/hf-exllama

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

title emoji colorFrom colorTo sdk sdk_version app_file pinned header fullWidth license short_description
Exllama
😽
purple
indigo
gradio
5.5.0
app.py
false
mini
true
apache-2.0
Chat: exllama v2

Exllama Chat 😽

Open In Spaces Apache 2.0

A Gradio-based chat interface for ExLlamaV2, featuring Mistral-7B-Instruct-v0.3 and Llama-3-70B-Instruct models. Experience high-performance inference on consumer GPUs with Flash Attention support.

🌟 Features

  • 🚀 Powered by ExLlamaV2 inference library
  • 💨 Flash Attention support for optimized performance
  • 🎯 Supports multiple instruction-tuned models:
    • Mistral-7B-Instruct v0.3
    • Meta's Llama-3-70B-Instruct
  • ⚡ Dynamic text generation with adjustable parameters
  • 🎨 Clean, modern UI with dark mode support

🎮 Parameters

Customize your chat experience with these adjustable parameters:

  • System Message: Set the AI assistant's behavior and context
  • Max Tokens: Control response length (1-4096)
  • Temperature: Adjust response creativity (0.1-4.0)
  • Top-p: Fine-tune response diversity (0.1-1.0)
  • Top-k: Control vocabulary sampling (0-100)
  • Repetition Penalty: Prevent repetitive text (0.0-2.0)

🛠️ Technical Details

  • Framework: Gradio 5.5.0
  • Models: ExLlamaV2-compatible models
  • UI: Custom-themed interface with Gradio's Soft theme
  • Optimization: Flash Attention for improved performance

🔗 Links

📝 License

This project is licensed under the Apache 2.0 License - see the LICENSE file for details.

🙏 Acknowledgments


Made with ❤️ using ExLlamaV2 and Gradio

Releases

No releases published

Packages

No packages published

Languages