Skip to content
Change the repository type filter

All

    Repositories list

    • fp6_llm

      Public
      An efficient GPU support for LLM inference with x-bit quantization (e.g. FP6,FP5).
      Cuda
      Apache License 2.0
      1521120Updated Oct 28, 2024Oct 28, 2024
    • blog

      Public
      Public repo for HF blog posts
      Jupyter Notebook
      752000Updated Oct 25, 2023Oct 25, 2023
    • FSA

      Public
      Webpage for FSA
      HTML
      0200Updated Oct 3, 2023Oct 3, 2023
    • flash-llm

      Public
      Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity
      Cuda
      Apache License 2.0
      15200Updated Sep 24, 2023Sep 24, 2023
    • 01200Updated Apr 27, 2022Apr 27, 2022
    • Python
      MIT License
      01400Updated Mar 18, 2022Mar 18, 2022
    • Conference talks given by FSA Lab, University of Sydney
      0100Updated Jul 28, 2021Jul 28, 2021