Autocomplete: completion is truncated (stops streaming) at the wrong place #3994

ferenci84 · 2025-02-06T10:23:12Z

Before submitting your bug report

I believe this is a bug. I'll try to join the Continue Discord for questions
I'm not able to find an open issue that reports the same bug
I've seen the troubleshooting guide on the Continue Docs

Relevant environment info

- OS:
- Continue version: 0.9.260
- IDE version:
- Model: qwen2.5-coder:1.5b-base (ollama)
- config.json:
  
"tabAutocompleteModel": {
    "title": "Qwen2.5-Coder 1.5B",
    "provider": "ollama",
    "model": "qwen2.5-coder:1.5b-base",
    "completionOptions": {
      "maxTokens": 256
    }
  },

The problem

The log shows that the completion is OK, but it only partially appears. The model completes "1__tabAutocompleteModel": { but only "1__ appears.

To reproduce

Open the testing sandbox, add this file:

{
  "1__tabAutocompleteModel": {
    "title": "Codestral",
    "provider": "mistral",
    "model": "codestral-latest",
    "apiKey": "",
    "apiBase": "https://codestral.mistral.ai/v1",
    "completionOptions": {
      "maxTokens": 256
    }
  },

  "tabAutocompleteModel": {
    "title": "Qwen2.5-Coder 3B",
    "provider": "ollama",
    "model": "qwen2.5-coder:1.5b-base",
    "completionOptions": {
      "maxTokens": 256
    }
  },
  "3__tabAutocompleteModel": {
    "title": "Together Qwen2.5 Coder",
    "provider": "together",
    "model": "Qwen/Qwen2.5-Coder-32B-Instruct",
    "apiKey": ""
  }
}

(A bit awkward, but this is a way I switch between different autocomplete model, and this is the place where the error happens)

Save to a file (test_config.json), reload the window. Place the cursor at the empty line and trigger autocomplete with a hotkey.

For me, the completion is:

"2__

In the output, the completion is:

"2__tabAutocompleteModel": {<|cursor|>
}

What causes the problem

I debugged, and found that the problem is caused by this line in core/autocomplete/filtering/streamTransforms/StreamTransformPipeline.ts:

    charGenerator = stopAtStartOf(charGenerator, suffix);

It's reasonable to stop streaming at the start of the suffix, however it seems this doesn't work right, because where it stops is not equal to the start of the suffix.

Log output

==========================================================================
==========================================================================
##### Completion options #####
{
  "contextLength": 8096,
  "maxTokens": 256,
  "model": "qwen2.5-coder:1.5b-base",
  "temperature": 0.01,
  "stop": [
    "<|endoftext|>",
    "<|fim_prefix|>",
    "<|fim_middle|>",
    "<|fim_suffix|>",
    "<|fim_pad|>",
    "<|repo_name|>",
    "<|file_sep|>",
    "<|im_start|>",
    "<|im_end|>",
    "/src/",
    "#- coding: utf-8",
    "```"
  ]
}

##### Prompt #####
Prefix: 
// test_config.json
{
  "1__tabAutocompleteModel": {
    "title": "Codestral",
    "provider": "mistral",
    "model": "codestral-latest",
    "apiKey": "",
    "apiBase": "https://codestral.mistral.ai/v1",
    "completionOptions": {
      "maxTokens": 256
    }
  },
  
Suffix: 
  "tabAutocompleteModel": {
    "title": "Qwen2.5-Coder 3B",
    "provider": "ollama",
    "model": "qwen2.5-coder:1.5b-base",
    "completionOptions": {
      "maxTokens": 256
    }
  },
  "3__tabAutocompleteModel": {
    "title": "Together Qwen2.5 Coder",
    "provider": "together",
    "model": "Qwen/Qwen2.5-Coder-32B-Instruct",
    "apiKey": ""
  }
}
==========================================================================
==========================================================================
Completion:
"2__tabAutocompleteModel": {<|cursor|>
}

The text was updated successfully, but these errors were encountered:

ferenci84 · 2025-02-06T12:09:38Z

Further investigating, it appears that the problem is caused by this change:

In my case "tabAutocompleteModel": { was the targetPart and tabAutocompleteModel" was the buffer.

ferenci84 · 2025-02-06T12:14:30Z

Why not have a smaller default max_tokens setting for the completion models? It defaults to 2048 or 4096, I had to set it to 100 to 200 to avoid too long responses.
This would avoid the cases where the model generates too many tokens and it takes too long. This could be combined with a less restrictive preprocessing without the adverse effects of overly large results.

Patrick-Erichsen · 2025-02-06T19:09:55Z

cc'ing @tomasz-stefaniak here who was more context on this logic.

inimaz · 2025-02-07T12:29:29Z

I don't know if it is related but it looks like it. Now as part of the suggestions it shows the <|cursor|> tag.
Is this intended? If so, how to disable this?

ferenci84 · 2025-02-07T16:36:19Z

I don't know if it is related but it looks like it. Now as part of the suggestions it shows the <|cursor|> tag. Is this intended? If so, how to disable this?

I think it's related to the qwen model (my example also had that tag), I think it should be added as a stop word in the qwen template. It should be an easy fix.

tomasz-stefaniak · 2025-02-10T20:31:03Z

Why not have a smaller default max_tokens setting for the completion models? It defaults to 2048 or 4096, I had to set it to 100 to 200 to avoid too long responses.

Do you mind opening a PR for this? You're right that in most cases such a long response is not needed.

Regarding truncation, I'd have to look into this in more detail. We have a relatively thorough testing suite based on some test cases that have historically caused problems. We'd need to verify that whatever fix we apply doesn't break too many of these: https://github.com/continuedev/continue/blob/main/core/autocomplete/filtering/test/testCases.ts#L5

ferenci84 · 2025-02-13T12:22:39Z

Why not have a smaller default max_tokens setting for the completion models? It defaults to 2048 or 4096, I had to set it to 100 to 200 to avoid too long responses.

Do you mind opening a PR for this? You're right that in most cases such a long response is not needed.

Yes, I'll provide a PR.

ferenci84 · 2025-03-03T17:13:05Z

@tomasz-stefaniak I sent a PR #4448 that make the maxTokens default to 256 for autocomplete if nothing is set.

sestinj assigned Patrick-Erichsen Feb 6, 2025

github-actions bot added the needs-triage label Feb 6, 2025

dosubot bot added area:autocomplete Relates to the auto complete feature kind:bug Indicates an unexpected problem or unintended behavior labels Feb 6, 2025

Patrick-Erichsen removed the needs-triage label Feb 6, 2025

ferenci84 mentioned this issue Mar 3, 2025

default maxTokens setting for autocomplete #4448

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Autocomplete: completion is truncated (stops streaming) at the wrong place #3994

Autocomplete: completion is truncated (stops streaming) at the wrong place #3994

ferenci84 commented Feb 6, 2025 •

edited

Loading

ferenci84 commented Feb 6, 2025

ferenci84 commented Feb 6, 2025

Patrick-Erichsen commented Feb 6, 2025

inimaz commented Feb 7, 2025 •

edited

Loading

ferenci84 commented Feb 7, 2025

tomasz-stefaniak commented Feb 10, 2025

ferenci84 commented Feb 13, 2025

ferenci84 commented Mar 3, 2025

Autocomplete: completion is truncated (stops streaming) at the wrong place #3994

Autocomplete: completion is truncated (stops streaming) at the wrong place #3994

Comments

ferenci84 commented Feb 6, 2025 • edited Loading

Before submitting your bug report

Relevant environment info

The problem

To reproduce

What causes the problem

Log output

ferenci84 commented Feb 6, 2025

ferenci84 commented Feb 6, 2025

Patrick-Erichsen commented Feb 6, 2025

inimaz commented Feb 7, 2025 • edited Loading

ferenci84 commented Feb 7, 2025

tomasz-stefaniak commented Feb 10, 2025

ferenci84 commented Feb 13, 2025

ferenci84 commented Mar 3, 2025

ferenci84 commented Feb 6, 2025 •

edited

Loading

inimaz commented Feb 7, 2025 •

edited

Loading