Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Autocomplete: completion is truncated (stops streaming) at the wrong place #3994

Open
3 tasks done
ferenci84 opened this issue Feb 6, 2025 · 8 comments
Open
3 tasks done
Assignees
Labels
area:autocomplete Relates to the auto complete feature kind:bug Indicates an unexpected problem or unintended behavior

Comments

@ferenci84
Copy link
Contributor

ferenci84 commented Feb 6, 2025

Before submitting your bug report

Relevant environment info

- OS:
- Continue version: 0.9.260
- IDE version:
- Model: qwen2.5-coder:1.5b-base (ollama)
- config.json:
  
"tabAutocompleteModel": {
    "title": "Qwen2.5-Coder 1.5B",
    "provider": "ollama",
    "model": "qwen2.5-coder:1.5b-base",
    "completionOptions": {
      "maxTokens": 256
    }
  },

The problem

The log shows that the completion is OK, but it only partially appears. The model completes "1__tabAutocompleteModel": { but only "1__ appears.

To reproduce

Open the testing sandbox, add this file:

{
  "1__tabAutocompleteModel": {
    "title": "Codestral",
    "provider": "mistral",
    "model": "codestral-latest",
    "apiKey": "",
    "apiBase": "https://codestral.mistral.ai/v1",
    "completionOptions": {
      "maxTokens": 256
    }
  },

  "tabAutocompleteModel": {
    "title": "Qwen2.5-Coder 3B",
    "provider": "ollama",
    "model": "qwen2.5-coder:1.5b-base",
    "completionOptions": {
      "maxTokens": 256
    }
  },
  "3__tabAutocompleteModel": {
    "title": "Together Qwen2.5 Coder",
    "provider": "together",
    "model": "Qwen/Qwen2.5-Coder-32B-Instruct",
    "apiKey": ""
  }
}

(A bit awkward, but this is a way I switch between different autocomplete model, and this is the place where the error happens)

Save to a file (test_config.json), reload the window. Place the cursor at the empty line and trigger autocomplete with a hotkey.

For me, the completion is:

"2__

In the output, the completion is:

"2__tabAutocompleteModel": {<|cursor|>
}

What causes the problem

I debugged, and found that the problem is caused by this line in core/autocomplete/filtering/streamTransforms/StreamTransformPipeline.ts:

    charGenerator = stopAtStartOf(charGenerator, suffix);

It's reasonable to stop streaming at the start of the suffix, however it seems this doesn't work right, because where it stops is not equal to the start of the suffix.

Log output

==========================================================================
==========================================================================
##### Completion options #####
{
  "contextLength": 8096,
  "maxTokens": 256,
  "model": "qwen2.5-coder:1.5b-base",
  "temperature": 0.01,
  "stop": [
    "<|endoftext|>",
    "<|fim_prefix|>",
    "<|fim_middle|>",
    "<|fim_suffix|>",
    "<|fim_pad|>",
    "<|repo_name|>",
    "<|file_sep|>",
    "<|im_start|>",
    "<|im_end|>",
    "/src/",
    "#- coding: utf-8",
    "```"
  ]
}

##### Prompt #####
Prefix: 
// test_config.json
{
  "1__tabAutocompleteModel": {
    "title": "Codestral",
    "provider": "mistral",
    "model": "codestral-latest",
    "apiKey": "",
    "apiBase": "https://codestral.mistral.ai/v1",
    "completionOptions": {
      "maxTokens": 256
    }
  },
  
Suffix: 
  "tabAutocompleteModel": {
    "title": "Qwen2.5-Coder 3B",
    "provider": "ollama",
    "model": "qwen2.5-coder:1.5b-base",
    "completionOptions": {
      "maxTokens": 256
    }
  },
  "3__tabAutocompleteModel": {
    "title": "Together Qwen2.5 Coder",
    "provider": "together",
    "model": "Qwen/Qwen2.5-Coder-32B-Instruct",
    "apiKey": ""
  }
}
==========================================================================
==========================================================================
Completion:
"2__tabAutocompleteModel": {<|cursor|>
}
@dosubot dosubot bot added area:autocomplete Relates to the auto complete feature kind:bug Indicates an unexpected problem or unintended behavior labels Feb 6, 2025
@ferenci84
Copy link
Contributor Author

Further investigating, it appears that the problem is caused by this change:

Image

In my case "tabAutocompleteModel": { was the targetPart and tabAutocompleteModel" was the buffer.

@ferenci84
Copy link
Contributor Author

Why not have a smaller default max_tokens setting for the completion models? It defaults to 2048 or 4096, I had to set it to 100 to 200 to avoid too long responses.
This would avoid the cases where the model generates too many tokens and it takes too long. This could be combined with a less restrictive preprocessing without the adverse effects of overly large results.

@Patrick-Erichsen
Copy link
Collaborator

cc'ing @tomasz-stefaniak here who was more context on this logic.

@inimaz
Copy link
Contributor

inimaz commented Feb 7, 2025

I don't know if it is related but it looks like it. Now as part of the suggestions it shows the <|cursor|> tag. Image
Is this intended? If so, how to disable this?

@ferenci84
Copy link
Contributor Author

I don't know if it is related but it looks like it. Now as part of the suggestions it shows the <|cursor|> tag. Image Is this intended? If so, how to disable this?

I think it's related to the qwen model (my example also had that tag), I think it should be added as a stop word in the qwen template. It should be an easy fix.

@tomasz-stefaniak
Copy link
Collaborator

Why not have a smaller default max_tokens setting for the completion models? It defaults to 2048 or 4096, I had to set it to 100 to 200 to avoid too long responses.

Do you mind opening a PR for this? You're right that in most cases such a long response is not needed.

Regarding truncation, I'd have to look into this in more detail. We have a relatively thorough testing suite based on some test cases that have historically caused problems. We'd need to verify that whatever fix we apply doesn't break too many of these: https://github.com/continuedev/continue/blob/main/core/autocomplete/filtering/test/testCases.ts#L5

@ferenci84
Copy link
Contributor Author

Why not have a smaller default max_tokens setting for the completion models? It defaults to 2048 or 4096, I had to set it to 100 to 200 to avoid too long responses.

Do you mind opening a PR for this? You're right that in most cases such a long response is not needed.

Yes, I'll provide a PR.

@ferenci84
Copy link
Contributor Author

@tomasz-stefaniak I sent a PR #4448 that make the maxTokens default to 256 for autocomplete if nothing is set.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:autocomplete Relates to the auto complete feature kind:bug Indicates an unexpected problem or unintended behavior
Projects
None yet
Development

No branches or pull requests

4 participants