Code Completation and Custom OpenAI #759

Pietro395 · 2024-11-11T11:48:11Z

What happened?

I am using CodeGPT via Custom OpenAI with Openrouter.
I configured the template as suggestions I never get code but only sentences like this:
It looks like you're working on a Python function that interacts with an OPC-UA server and performs some operations

For code completion I am using deepseeker, with these settings:

Relevant log output or stack trace

No response

Steps to reproduce

No response

CodeGPT version

2.12.3

Operating System

Linux

carlrobertoh · 2024-11-11T11:59:08Z

I'm not sure if any of the OpenRouter models can be used for code completions, as they only provide a OpenAI compatible Chat Completions API.

If you want to use DeepSeek Coder for code completions, you must configure the DeepSeek API explicitly and use the following configuration: https://api-docs.deepseek.com/guides/fim_completion

Pietro395 · 2024-11-11T12:42:39Z

I'm not sure if any of the OpenRouter models can be used for code completions, as they only provide a OpenAI compatible Chat Completions API.

If you want to use DeepSeek Coder for code completions, you must configure the DeepSeek API explicitly and use the following configuration: https://api-docs.deepseek.com/guides/fim_completion

It seems that the problem is that e.g. deepseeker does not support FIM in the OpenRouter version.
I am using gpt3.5 turbo and now it seems to work, I will do more tests, thanks

carlrobertoh · 2024-11-13T10:43:09Z

I will post the same text here as I did on Reddit for others as well.

Unfortunately, in most cases, the models supported by OpenRouter are only accessible via the /v1/chat/completions endpoint, which means it cannot be used for code completion. This is due to the request format, with the exception of the gpt-3.5-turbo-instruct, which can be accessed via the legacy /v1/completions endpoint by setting the correct prefix and suffix params.

Fortunately, you have a few other options to get this working:

The easiest option is likely to use our native llama.cpp local integration. The plugin comes with a pre-packaged llama.cpp backend along with a user friendly interface that allows you to download and run the models without needing any other 3rd party clients, such as Ollama or Open WebUI. Each model is already bound to an appropriate FIM template, so you don't have to worry about constructing the prompt yourself.

Simply choose and download the model (each model is downloaded directly from HuggingFace with a nice loading progress indicator). Once the model is downloaded, start the server and you're good to go!

However, please note that the llama.cpp integration can only be used with UNIX-based systems and the logic for building and running the server is not yet supported on Windows machines. If you're using Windows, then you'll likely find Ollama to be your best friend. The extension can recognize the models you have already downloaded.

For other models provided by cloud providers, you can use the Custom OpenAI provider. This option is highly configurable for various needs, including on-premise models hosted within your company.

Let's say you want to use the the newest Qwen2.5 Coder 32B model. One option is to use it with the Fireworks API from /v1/completions endpoint, which supports raw input, allowing us to send the pre-built FIM prompt.

Select the Custom OpenAI provider and use the Fireworks preset template

Fill in your API key obtained from their account page

Choose the proper FIM template (CodeQwen 2.5) from the Code Completion section

Replace the model body parameter value with accounts/fireworks/models/qwen2p5-coder-32b-instruct

Click apply

The same model can also be used for other features, such as regular chats, by simply replacing the model value in the Chat Completions tab.

We have many existing preset configurations in place. For instance, to use the Codestral model, simply select the Mistral AI template, enter the API key, and click apply. The values are already pre-filled, so the additional steps from the previous example are unnecessary.

Pietro395 · 2024-11-13T11:03:34Z

I will post the same text here as I did on Reddit for others as well.

Unfortunately, in most cases, the models supported by OpenRouter are only accessible via the /v1/chat/completions endpoint, which means it cannot be used for code completion. This is due to the request format, with the exception of the gpt-3.5-turbo-instruct, which can be accessed via the legacy /v1/completions endpoint by setting the correct prefix and suffix params.
Fortunately, you have a few other options to get this working:
The easiest option is likely to use our native llama.cpp local integration. The plugin comes with a pre-packaged llama.cpp backend along with a user friendly interface that allows you to download and run the models without needing any other 3rd party clients, such as Ollama or Open WebUI. Each model is already bound to an appropriate FIM template, so you don't have to worry about constructing the prompt yourself.
Simply choose and download the model (each model is downloaded directly from HuggingFace with a nice loading progress indicator). Once the model is downloaded, start the server and you're good to go!
However, please note that the llama.cpp integration can only be used with UNIX-based systems and the logic for building and running the server is not yet supported on Windows machines. If you're using Windows, then you'll likely find Ollama to be your best friend. The extension can recognize the models you have already downloaded.
For other models provided by cloud providers, you can use the Custom OpenAI provider. This option is highly configurable for various needs, including on-premise models hosted within your company.
Let's say you want to use the the newest Qwen2.5 Coder 32B model. One option is to use it with the Fireworks API from /v1/completions endpoint, which supports raw input, allowing us to send the pre-built FIM prompt.

Select the Custom OpenAI provider and use the Fireworks preset template

Fill in your API key obtained from their account page

Choose the proper FIM template (CodeQwen 2.5) from the Code Completion section

Replace the model body parameter value with accounts/fireworks/models/qwen2p5-coder-32b-instruct

Click apply

The same model can also be used for other features, such as regular chats, by simply replacing the model value in the Chat Completions tab.
We have many existing preset configurations in place. For instance, to use the Codestral model, simply select the Mistral AI template, enter the API key, and click apply. The values are already pre-filled, so the additional steps from the previous example are unnecessary.

I will also answer you here as well as on reddit:

Thanks for the clarification, I am actually doing some more testing with Openrouter and putting on code completion the URL ''https://openrouter.ai/api/v1/completions" seems to work with qwen 2.5 coder.
I will continue to use it and give you feedback

Pietro395 added the bug Something isn't working label Nov 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Code Completation and Custom OpenAI #759

Code Completation and Custom OpenAI #759

Pietro395 commented Nov 11, 2024

carlrobertoh commented Nov 11, 2024

Pietro395 commented Nov 11, 2024

carlrobertoh commented Nov 13, 2024

Pietro395 commented Nov 13, 2024

Code Completation and Custom OpenAI #759

Code Completation and Custom OpenAI #759

Comments

Pietro395 commented Nov 11, 2024

What happened?

Relevant log output or stack trace

Steps to reproduce

CodeGPT version

Operating System

carlrobertoh commented Nov 11, 2024

Pietro395 commented Nov 11, 2024

carlrobertoh commented Nov 13, 2024

Pietro395 commented Nov 13, 2024