-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DISCUSSION] I've now got this working with Ollama's chat completion API #16
Comments
Hi @jukofyork . Thank you for your input. I've merged your changes. |
Hi @gradusnikov , No problem and glad to be of help! I've got the Ollama port running really well now. I've still got to tidy it up and have stripped out a lot of stuff that didn't really work well yet with locally run LLMs: function calling via prompts barely worked and had to have streaming turned off, the local LLMs couldn't really create a working diff file, a lot of the stuff specific to JavaDoc and the Eclipse Java AST tree, etc. One thing you might want to add to your code is to make the right click menu context sensitive:
Then add this to 'fragment.e4xmi': I did also find that the view part as defined using the 'fragment.e4xmi' file was buggy in Linux and would appear blank after opening and closing it, so I moved it out into a class that extends ViewPart to fix this. The bug is probably not your code's fault but the Linux version of Eclipse's browser that is broken and has a known bug where it shows up blank unless you use 'export WEBKIT_DISABLE_COMPOSITING_MODE=1' or 'export WEBKIT_DISABLE_DMABUF_RENDERER=1'. I'm not sure if it's possible with the ''fragment.e4xmt' view but with one extended from ViewPart you can easily add to the toolbar and dropdown menu:
I did actually manage to get the right click context menu working without the 'fragment.e4xmi' stuff (which then lets you programmatically add to the menu instead of having to edit each command, menu item, etc by hand), but for some reason it didn't work with the dependency injection properly and I need to revisit it to see what was wrong. I've also made it so that you can send "system" messages to the view and they come up in a blue chat bubble. This might be worth adding to your code so that you can print out error messages for the user to see when they can't connect properly to the Open AI servers, etc. One final thing I've done is refine the prompts using the LLMs themselves. I have 8 now: code generation (for boiler-plate code), code completion, discus, document, code-review, refactor, optimize and debugging (+ the "fix errors" special case). To refine them I used several of the different local LLMs (codellama, deepseek-coder, etc) with instructions like "I'm writing an Eclipse IDE plugin for an AI assistant and would like you to comment on how good my prompts are". I then got them to make really wordy prompts that have lots of detail for each of the 8 categories above. This eventually produced really great prompts, but they used a lot of tokens to encode... So finally I used the most competent local LLM I have (called deekseek-llm-67b), who was about the only one who truly got what I was doing (and didn't get confused and start trying to answer the prompts they were supposed to be refining! 🤣), to compress them down whilst keeping the crucial information. You can probably use ChatGPT itself for this whole process, but I did find a few iterations of expanding the prompts into very wordy/detailed versions and then compressing them back down works extremely well. After a few iterations they eventually can't find anything useful to add or change and the final prompt is worded very well for them to comprehend. I'm going to tidy up the code over the next few days: see if I can get the context menu working , harden the code that communicates with the Ollama server, etc. But after that I'm happy to share everything I've got back with you - I've no intention of making/supporting a proper folk and it will otherwise stay private. I will then try to see if I can get the stuff you have working with Eclipses' Java AST tree to work with Eclipse's CDT AST tree for C++. Anyway, just want to say thanks for creating this and I think it's a great project and I hope it becomes more popular! Juk |
Sounds, great, looking forward to try this! |
Hi @jukofyork I find function calling very useful, esp. after adding web search and and web read. I think I will add more, as this is a simple and quite powerful way to make the LLM answer more accurately. I have not tried function calling with other LLMs but maybe the approach from like 6 months ago would work, where people were defining function definitions as part of the system message, along with the function call format? Or I can simply disable function calling in Settings? |
Hi again, I've got the communication with the Ollama server working fairly robustly at last: their server is a Go wrapper around I've struggled a lot with the dependantcy injection: either things not getting injected causing baffling null pointer exceptions or other weird things like the ILog (which I've now added a listener to to display in blue chat bubbles) seeming to have multiple copies instead of being a singleton, etc. I'm still not 100% sure why, but I think it's Eclipses' own dependantcy injection somehow interfering. Anyway had to strip a lot of it away to make sure everything works. I've iterated over a few different methods of using the interface and finally setted on the right click context menu and a toggle button to decide if the full file should be sent as extra context or not. This along with grabbing and appending anything in the edit box to the end of the prompt message seems to be the most usable. I think the I did consider seeing if I could get a "tree" of responses like a lot of the LLM web apps implement (with undo, sideways edits, etc) and possibly even see if I can journal the stuff getting sent to the Browser widget to serialise and restore, but I don't think it will really be that useful as long conversations soon exhaust the context windows of all the available locally runnable LLMs... I've added lots of other little fixes like rate-limiting and buffering the streaming to 5 events per second as found it could start to lag the Eclipse main UI thread out badly for some of the smaller/faster models that can send 20-50 events per second. Anyway, it's mainly just a case of tidying up the View code and I will share it back via Github and hopefully some of the stuff will be useful. |
Yeah, there is quite an interesting discussion on this here: They are defining the functions in the system message and then doing 4-5 shot teaching by making the first few messages examples of calling the functions. |
@gradusnikov I have added lots of things you might find useful, eg:
The final thing I want to do is allow multiple copies of the view to be opened and then I'll upload to Github later this week. I'm happy for others to try it out and use it, but Ollama is very buggy and I don't want to be spending lots of time helping people get Ollama working or step on @gradusnikov toes since it's his project after all and I've stripped out as much as I have added... I'll try and create a plug in installer and add some instructions too, but it's more going to be left as a foundation for others to build on rather than an active fork I want to maintain. |
I've done my best to commit the code to github (no idea why it's ended up in a subfolder like that though 😕): https://github.com/jukofyork/aiassistant The bits that are probably most useful to you:
There are also lots of small changes to do with the main view you may or may not want to use:
The prompts are the best I can come up with after a couple of months of trying. In general I've found the less newlines the better and starting your tasks with a '#' symbol seems to help them (possibly they think it's a markup header or maybe even they have been overtrained on Python comments). I've made it so the prompts use the StringTemplate library now: https://github.com/antlr/stringtemplate4/blob/master/doc/cheatsheet.md and added several other possibly useful context variables and a special I've had to strip out all of the I also found the I think from reading the issues here and on the Eclipse marketplace the Javax/Jakarta stuff and I also had to move all the dependencies into the main plug-in as for some reason this caused me a lot of problems too (possibly because I was trying to edit the forked version though). I did have more advanced code for the networking (due to Ollama being so buggy and crashing often from OOM errors), but had to remove it as found due to the way Eclipse only uses a single GUI thread; it caused more problems than it solved (ie: the menus kept freezing, etc). One thing I didn't fix but probably needs looking at is the O(n^2) complexity of the way the streamed tokens get added to the browser window: it gets more and more slow and starts to cause the main Eclipse GUI thread to stall. The best solution I could find without completely rewriting the code for this is to use There are probably a lot of other changes that I've forgotten to mention here, but would just like to say thanks for creating the base project! |
Just noticed there is some random 'ToDo' list with prompts come up as the main readme - I'll see if I can tidy it up tomorrow (I don't really use Git and seem to always make a mess of it I've also deliberately not added any actual binary release for the plugin as: firstly I don't want to take away from this project, and secondly I don't want become an Ollama technical support person... If anybody wants to use it: then you just need to install the plugin development stuff in Eclipse, use 'Import Fragments' and 'Export plugin' and it should work. |
Hi jukofyork!
Thank you very much for your edits. I will try to integrate your changes
with the main branch.
Cheers!
/w.
…On Sun, 7 Apr 2024 at 22:49, jukofyork ***@***.***> wrote:
Just noticed there is some random 'ToDo' list with prompts come up as the
main readme - I'll see if I can tidy it up tomorrow (I don't really use Git
and seem to always make a mess of it
|
No problem and I hope it is helpful :) I've updated the README to hopefully better explain how to build/use the forked version, and have added a few pictures and notes that I might have forgot to mention above. I have some other work I need to do for the next few weeks, but the next things I want to look at are:
I'll be sure to share back anything I find and will have a look through your latest code when I get back to it - the fork is based on a pull I made sometime last December and I see you have made quite a lot of changes since then. There are also quite a few changes to the Ollama API underway: OpenAI compatibility, function calling, etc are on their ToDo list, so it's probably a good time to leave it and see what they do next too. |
I'm finding Ollama to be too buggy to use now - it seems each bug they fix they create 2 more and their Golang wrapper of llama.cpp's server is getting more and more impenetrable to fix anything... It's some strange mix of llama.cpp's So it looks like I'm going to have to start using the llama.cpp server directly, but aren't sure if I should leave or just remove the Ollama server code now:
and so on... It's really so buggy now I don't actually trust what is getting sent to the server is actually what you expect (the system message getting ignored bug went unnoticed for months!). The problem is that if I leave the Ollama code in then options like "Add Context" won't actually work (nor will any future multi-shot prompts), but at the same time I'm reluctant to remove it as sometime in the future they may actually start to fix some of these bugs. Things like being able to list available models, load new models, allow the GPU VRAM to be unloaded after 5 minutes if you don't send keep-alive messages, and so on were all much nicer than what is going to be possible with the stock llama.cpp server 😕 On another note I have been researching how the completion engine works in Eclipse:
and specifically the CDT code that is used for C++ completion:
It looks horrifically complex, but like anything in Eclipse it is probably not that bad if you can get a minimal working example running... I doubt I'll have time to look at this properly for a few weeks though, but I think it would be worth looking into. |
@gradusnikov Not sure if you are still working on this, but I got Eclipse's internal spell-checker working now: It should be pretty much a drop-in replacement for the I think spelling mistakes likely harm LLMs quite significantly due to having to tokenise in strange / out-of-distribution ways, so it's probably a good way to boost quality for free. With some more reading it should be possible to make it ignore text inside code blocks too, but haven't had chance to look yet. |
This might also be useful for your temperature field: http://www.java2s.com/example/java-src/pkg/org/eclipse/wb/swt/doublefieldeditor-1e135.html Not having a real-valued scaler type was a serious oversight in SWT I think. |
There is also this that I bookmarked: http://www.java2s.com/Code/Java/SWT-JFace-Eclipse/SWTCompletionEditor.htm It's very out of date from a 2004 book, but likely a good starting off point to implement auto-complete using "fill in middle" LLMs. If I get chance I will look into this and report back. EDIT: Here is the book the source came from too: https://livebook.manning.com/book/swt-jface-in-action/chapter-5/88 |
Not sure if you are still working on this, but I've got LaTeX rendering working via MathJax v3.2.2: I had endless problems with this when I tried in the past, due to the formatting getting all mangled... But found a trick to avoid it via encoding as Base64 in Java: private static String convertInLineLatexToHtml(String line) {
String inlineLatexPatterns =
"\\$(.*?)\\$|" + // Single $ pairs
"\\\\\\((.*?)\\\\\\)"; // \( \) pairs
Pattern inlineLatexPattern = Pattern.compile(inlineLatexPatterns);
return inlineLatexPattern.matcher(line).replaceAll(match -> {
// Check each capture group since we don't know which pattern matched
for (int i = 1; i <= match.groupCount(); i++) {
String content = match.group(i);
if (content != null) {
String base64Content = Base64.getEncoder().encodeToString(content.getBytes());
return "<span class=\"inline-latex\">" + base64Content + "</span>";
}
}
return match.group(); // fallback, shouldn't happen
});
} and for blocks: private static void flushLatexBlockBuffer(StringBuilder latexBlockBuffer, StringBuilder htmlOutput) {
if (latexBlockBuffer.length() > 0) {
htmlOutput.append("<span class=\"block-latex\">");
htmlOutput.append(Base64.getEncoder().encodeToString(latexBlockBuffer.toString().getBytes()));
htmlOutput.append("</span>\n");
latexBlockBuffer.setLength(0); // Clear the buffer after processing to avoid duplicate content.
}
} Then decoding it from the HTML tags in JS: function renderLatex() {
// Convert block latex tags
document.querySelectorAll('.block-latex').forEach(elem => {
let decodedLatex = atob(elem.innerHTML);
elem.outerHTML = '\\\[' + decodedLatex + '\\\]';
});
// Convert inline latex tags
document.querySelectorAll('.inline-latex').forEach(elem => {
let decodedLatex = atob(elem.innerHTML);
elem.outerHTML = '\\\(' + decodedLatex + '\\\)';
});
MathJax.typeset();
} I only tried this again after finding that It should work on all valid LaTeX; block and inline, using both the dollar and bracket delineators. The only thing it won't do is allow multiline It also needs to be rendered only once (hence the |
Not an issue but I can't see any discussion board for this project...
I've now got this working directly with the Ollama chat completion API endpoint so it's now possible to use it with Local LLM instances:
https://github.com/jmorganca/ollama/blob/main/docs/api.md#generate-a-chat-completion
I originally tried to use LiteLLM to emulate the OpenAI API and then have it communicate with Ollama but it didn't really work.
So instead I've just made a hacked version of the plug-in and got it to communicate directly, and after finally getting to the bottom of why it was hanging after a couple of pages of text (see my other issue post for solution) it seems to be working pretty well. The main changes needed were:
AFAIK none of the open source LLMs can handle the function format OpenAI's models use so that isn't active yet, but I'm pretty sure I can get it to work using prompts at least to some extent. LiteLLM seems to have the ability to do this using the "--add_function_to_prompt" command line option:
https://litellm.vercel.app/docs/proxy/cli
I can probably tidy up the code in a couple of days if anyone is interested?
The text was updated successfully, but these errors were encountered: