How do you feed long texts to a model? #2

CorentinvdBdO · 2023-10-02T15:42:11Z

I tried naively to add examples in https://github.com/mit-han-lab/streaming-llm/blob/main/data/mt_bench.jsonl, including examples with length of 4k tokens, without changing anything in the script. I receive:

ASSISTANT: Token indices sequence length is longer than the specified maximum sequence length for this model (3905 > 2048). Running this sequence through the model will result in indexing errors
- - - - - - - - - d - d - d - d - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - d - d d d d d - d - d - - - - - - - - - - - - - - - - - - - - - - d - d - d - d - - - - - - - - - - - - - d - d d - d - d d d d - d - d d d d d d d d - d - d - d - d - d - d - d - - - d - d - d - d - d - d d d d d d d d d d d d - d - d - d - d - d - d - d - d - d - d - d d d d d - d - d - d - d - d - d d d d d d d d d d d d d d d d - d d d d - d0 - d - d - d - d - d - d - d - d - - - - - - - d - d - d - d - - - - - - d - d - - - d - d - d - d - d - d - d - d - - - - - - d - d - - - - - - - - d - d - d d d d d - d - d - d - d - d - d - d - d0 d0 d0 d d d d d d d d d d d d d d d d - d - d - d - - d - d - d - d - d - d d d d d d - d - - - - - - - - - - - - - - - - - - - - d - d - d - - - - - - - d - d - d - d - d - d d d d - d - d - d d d d d - d - d - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - d - d d d d d d d d d - d - d - d - d - d d d d d - d d d d d d - d - d - d - d - d - d - d - d - - - - - d - d - d - d - d - - - - - - - - - - d - d d d d d d d d d s d d d d0 n0 d0 - d - d - d - d - d - d - d - d - - - - - - - - d0 d d d d d d d d d d d d d d d d d d d d d d d d - d - d - d - d - - - - - - - - - d - d - d - d - - - - - - - - - - - - - - - - - - d - d - d - - - - - - - - - - - d - d - d - d - d d d, d, s s s s s s s s s s d s s s s s d d d. d, d d d n n n n d0 d00 d0 d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d, d, d, d, d, d, d, d, d, d, d d d d d0 d00, d, d, d, d000,0,0000000 d0 d. d. d. d, d, d d. d. et d. d d d d d et d et d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d et d et d et d et et et et et d et d d d d d d d d d d d d, d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d. d. d. d d d d d d d d d d d d d d d d d d d d d d d d d d

Did I misunderstand "infinite-length inputs without sacrificing efficiency and performance. "?

The text was updated successfully, but these errors were encountered:

Guangxuan-Xiao · 2023-10-03T04:16:48Z

As illustrated in our run_streaming_llama.py, the KV cache eviction occurs only before the prompt input and generation. This means the demo code isn't designed for single, long input samples.

However, for long text inputs with LLMs, you can reference our perplexity evaluation script. Here, we input text and evict the KV cache token-by-token.

As highlighted in our README's FAQ section, StreamingLLM doesn't enlarge the LLM context window. If you want to expand the context window, consider using a model like Llama-2-7B-32K-Instruct for your experiments.

CorentinvdBdO · 2023-10-03T07:48:45Z

Ok! Thank you for your answer, I knew it was too good to be true, still a great achievement!

gembancud · 2023-10-03T08:29:00Z

Hijacking from this, (tell me if I need a separate issue for this) but would adding more sink tokens similarly act as "state" once the sliding mechanism starts evicting tokens? I bet the model would have to learn to use the register cache during training.
Similar to the VIT Paper, research direction can be directed towards looking for outliers, if they are similarly removed due to the availability of registers/sinks. That can hopefully perhaps make quantization a bit more easier! What exciting ideas! Amazing work!

EDIT: In case this wasn't clear, sink cache + sliding tokens for computation in an autoregressive manner is similar to RNNs, because of "state". We've somehow backtracked to having our RNN "hidden state" alongside the attention mechanism.

CorentinvdBdO closed this as completed Oct 3, 2023

Guangxuan-Xiao pinned this issue Oct 3, 2023

iamhappytoo mentioned this issue Oct 14, 2023

How to generate longer token streams? #27

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How do you feed long texts to a model? #2

How do you feed long texts to a model? #2

CorentinvdBdO commented Oct 2, 2023

Guangxuan-Xiao commented Oct 3, 2023 •

edited

Loading

CorentinvdBdO commented Oct 3, 2023

gembancud commented Oct 3, 2023 •

edited

Loading

How do you feed long texts to a model? #2

How do you feed long texts to a model? #2

Comments

CorentinvdBdO commented Oct 2, 2023

Guangxuan-Xiao commented Oct 3, 2023 • edited Loading

CorentinvdBdO commented Oct 3, 2023

gembancud commented Oct 3, 2023 • edited Loading

Guangxuan-Xiao commented Oct 3, 2023 •

edited

Loading

gembancud commented Oct 3, 2023 •

edited

Loading