Skip to content

Commit

Permalink
refactor(README): add more info about flash-attn installation
Browse files Browse the repository at this point in the history
  • Loading branch information
ganler committed May 23, 2024
1 parent a48ce9a commit 00269af
Showing 1 changed file with 49 additions and 7 deletions.
56 changes: 49 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,12 +37,15 @@ pip install -r requirements.txt

## 🏁 Search Needle Function (SNF)

Search Needle Function is the first RepoQA task which aims to practice LLMs' ability of **long-context code understanding and retrieval**.
Its corresponding real-life application is to perform precise code search from user intent rather than simple keyword match.
Search Needle Function is the first and base RepoQA task which aims to practice LLMs' ability of **long-context code understanding and retrieval**.
Its corresponding real-life scenario is to perform precise code search from function description.

> [!Important]
<details><summary>🔎 More dataset details <i>:: click to expand ::</i></summary>
<div>

> [!Note]
>
> SNF includes 500 tests (5 programming languages x 10 repositories x 10 needle functions) where an LLM is given:
> SNF includes 500 tests (5 programming languages x 10 repos x 10 needle functions) where an LLM is given:
>
> 1. A large code context sorted in file dependency
> 2. A NL description of the needle function without revealing keywords like function names
Expand All @@ -51,6 +54,9 @@ Its corresponding real-life application is to perform precise code search from u
> The evaluator passes a test if the searched function is syntactically closest to the ground-truth compared against
> other functions (systematically parsed by `treesitter`) and the similarity is greater than a user defined threshold (by default 0.8).
</div>
</details>

You can run the SNF evaluation using various backends:

### OpenAI Compatible Servers
Expand All @@ -74,17 +80,24 @@ repoqa.search_needle_function --model "claude-3-haiku-20240307" --backend anthro
repoqa.search_needle_function --model "Qwen/CodeQwen1.5-7B-Chat" --backend vllm
```

<details><summary>🔎 Context extension for small-ctx models <i>:: click to expand ::</i></summary>
<div>

> [!Tip]
>
> You can unlock the model's context using [dynamic RoPE scaling](https://blog.eleuther.ai/yarn/#dynamic-scaling).
> For example, `Meta-Llama-3-8B-Instruct` has 8k context but running the default 16k test needs more (approx. 20k).
> There are two ways to unlock a model's context at inference time:
>
> To extend the context to 32k, in its config file (`hub/models--meta-llama--Meta-Llama-3-8B-Instruct/snapshots/[hash]/config.json`) set:
> 1. **Direct Extension**: Edit `max_positional_embedding` of the model's `config.json` (e.g., `hub/models--meta-llama--Meta-Llama-3-8B-Instruct/snapshots/[hash]/config.json`) to something like `22528`.
> 2. **[Dynamic RoPE Scaling](https://blog.eleuther.ai/yarn/#dynamic-scaling)**:
> To extend `Meta-Llama-3-8B-Instruct` from 8k to 32k (4x), edit the `config.json`:
>
> `"rope_scaling": {"type": "dynamic", "factor": 4.0}`
>
> Note: This works for vLLM `<0.4.3` and HuggingFace transformers. RepoQA will automatically configure dynamic RoPE for vLLM `>= 0.4.3`
</div>
</details>

> [!Note]
>
> Reference evaluation time:
Expand All @@ -98,6 +111,35 @@ repoqa.search_needle_function --model "Qwen/CodeQwen1.5-7B-Chat" --backend vllm
repoqa.search_needle_function --model "Qwen/CodeQwen1.5-7B-Chat" --backend hf --trust-remote-code
```

> [!Tip]
>
> Installing [flash-attn](https://github.com/Dao-AILab/flash-attention) and
> additionally set `--attn-implementation "flash_attention_2"` can largely
> lower the memory requirement.
<details><summary>🔨 Having trouble installing `flash-attn`? <i>:: click to expand ::</i></summary>
<div>

> [!Tip]
>
> If you have trouble with `pip install flash-attn --no-build-isolation`,
> you can try to directly use [pre-built wheels](https://github.com/Dao-AILab/flash-attention/releases):
>
> ```
> export FLASH_ATTN_VER=2.5.8 # check latest version at https://github.com/Dao-AILab/flash-attention/releases
> export CUDA_VER="cu122" # check available ones at https://github.com/Dao-AILab/flash-attention/releases
> export TORCH_VER=$(python -c "import torch; print('.'.join(torch.__version__.split('.')[:2]))")
> export PY_VER=$(python -c "import platform; print(''.join(platform.python_version().split('.')[:2]))")
> export OS_ARCH=$(python -c "import platform; print(f'{platform.system().lower()}_{platform.machine()}')")
>
> export WHEEL=flash_attn-${FLASH_ATTN_VER}+${CUDA_VER}torch${TORCH_VER}cxx11abiFALSE-cp${PY_VER}-cp${PY_VER}-${OS_ARCH}.whl
> wget https://github.com/Dao-AILab/flash-attention/releases/download/v${FLASH_ATTN_VER}/${WHEEL}
> pip install ${WHEEL}
> ```
</div>
</details>
### Google Generative AI API (Gemini)
```bash
Expand Down

0 comments on commit 00269af

Please sign in to comment.