DefTruth

Follow

🎯

#pragma unroll

DefTruth DefTruth

🎯

#pragma unroll

Follow

🛠@xlite-dev, 🎟@vipshop, Contributor ☁️@PaddlePaddle, @vllm-project⚡️

1.7k followers · 145 following

@xlite-dev, @vipshop
Guangzhou, China
11:27 (UTC +08:00)

Achievements

Achievements

Organizations

Pinned Loading

xlite-dev/lite.ai.toolkit xlite-dev/lite.ai.toolkit Public

🛠 A lite C++ AI toolkit: 100+🎉 models (Stable-Diffusion, Face-Fusion, YOLO series, Det, Seg, Matting) with MNN, ORT and TRT.

C++ 4k 738
vllm-project/vllm vllm-project/vllm Public

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 45.5k 7k
xlite-dev/Awesome-LLM-Inference xlite-dev/Awesome-LLM-Inference Public

📚A curated list of Awesome LLM/VLM Inference Papers with codes: WINT8/4, FlashAttention, PagedAttention, MLA, Parallelism etc.

Python 3.9k 275
PaddlePaddle/FastDeploy PaddlePaddle/FastDeploy Public

⚡️An Easy-to-use and Fast Deep Learning Model Deployment Toolkit for ☁️Cloud 📱Mobile and 📹Edge. Including Image, Video, Text and Audio 20+ main stream scenarios and 150+ SOTA models with end-to-end…

C++ 3.2k 476
xlite-dev/CUDA-Learn-Notes xlite-dev/CUDA-Learn-Notes Public

📚Modern CUDA Learn Notes: 200+ Tensor/CUDA Cores Kernels🎉, HGEMM, FA2 via MMA and CuTe, 98~100% TFLOPS of cuBLAS/FA2.

Cuda 3.5k 381
xlite-dev/ffpa-attn-mma xlite-dev/ffpa-attn-mma Public

📚FFPA(Split-D): Yet another Faster Flash Attention with O(1) GPU SRAM complexity large headdim, 1.8x~3x↑🎉 faster than SDPA EA.

Cuda 169 7