From d6001123e08cc52d08f90427eb5cc3225a47c800 Mon Sep 17 00:00:00 2001 From: "pre-commit-ci[bot]" <66853113+pre-commit-ci[bot]@users.noreply.github.com> Date: Mon, 15 Jul 2024 05:38:13 +0000 Subject: [PATCH] [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --- docs/h2o.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/h2o.md b/docs/h2o.md index 4d73c2005f2..d7b80f0108f 100644 --- a/docs/h2o.md +++ b/docs/h2o.md @@ -3,7 +3,7 @@ 2. [Usage](#usage) ## Introduction -**Heavy-Hitter Oracal (H2O)** is a novel approach for implementing the KV cache wihich significantly reduces memory footprint. +**Heavy-Hitter Oracal (H2O)** is a novel approach for implementing the KV cache which significantly reduces memory footprint. This methods base on the fact that the accumulated attention scores of all tokens in attention blocks adhere to a power-law distribution. It suggests that there exists a small set of influential tokens that are critical during generation, named heavy-hitters (H2). H2 provides an opportunity to step away from the combinatorial search problem and identify an eviction policy that maintains accuracy. @@ -46,4 +46,4 @@ user_model = LlamaForCausalLM.from_pretrained( trust_remote_code=args.trust_remote_code) ``` -Please refer to [h2o example](../examples/huggingface/pytorch/text-generation/h2o/run_generation.py) for the details. \ No newline at end of file +Please refer to [h2o example](../examples/huggingface/pytorch/text-generation/h2o/run_generation.py) for the details.