-
Notifications
You must be signed in to change notification settings - Fork 354
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Code capability enhancement & Bot crash fix #272
base: main
Are you sure you want to change the base?
Code capability enhancement & Bot crash fix #272
Conversation
ecaf5e8
to
02232e2
Compare
…e-exception-fixes' into Tasks-more-relevant-docs-and-code-exception-fixes # Conflicts: # src/agent/coder.js # src/agent/prompter.js
Resolve merge conflicts with the latest codeNew additions
|
Can you try re-running this with a stupider model (not state-of-the-art lol). I'm curious to see if they benefit too, or just advanced ones. |
Comparison Experiment on Low-Performance Models1. ObjectiveThe objective is set using the following command: 2. Model SelectionFirst, I tested the lowest-performance model, gpt-3.5-turbo, but it could not limit itself to using only 3. Experimental Process
4. Experimental Results4.1 OriginalTotal run time: 16 minutes 41 seconds.
4.2 ModifiedI didn’t give any reminders to the bot while it was running.
4.3 Complete Comparison VideoTotal duration: 16 minutes 41 seconds. |
e1dfad9
to
0a21561
Compare
…e-exception-fixes' into Tasks-more-relevant-docs-and-code-exception-fixes # Conflicts: # src/agent/coder.js
Resolved merge conflict with Action Manager |
f6e309a
to
a6edd8f
Compare
…e-exception-fixes' into Tasks-more-relevant-docs-and-code-exception-fixes # Conflicts: # src/agent/prompter.js
There is a part that needs improvement |
Improve the relevance of docs to !newAction("task")Fix Qwen api concurrency limit issue |
Not sure about this. I like a few things, but not others. Code linting looks very useful, I would add that by itself. Though I don't like selecting only the most relevant skill docs. Wouldn't this strictly reduce performance? Yes it saves on context space, but means the LLM has no knowledge of most of the available functions. I am also skeptical that comparing the latest message to skill doc would reliably select the most relevant docs. Why not do this for commands too? Additionally, the logic for selecting relevant docs should be in its own separate file, like how we do for Examples. I see your comparison make it looks like it has about the same performance. So what is the benefit? |
Explanation and Feedback1. Purpose of
|
Hi @Ninot1Quyi, I've reconsidered and I now agree with you! This will be a very valuable contribution, let's move forward with it. I realized a main benefit is that it would allow the list of skills to grow almost indefinitely. So we should probably do something similar with command docs too, but let's just do skills for now. You don't need to test so much, and we can expect reduced performance as the cost of fixed context space usage. So long as it is easy to turn off. A few requests:
Take your time, no rush whatsoever. Good luck with your exams! |
@MaxRobinsonTheGreat I’m glad to hear that you appreciate my work, and I’ll make gradual changes to the code based on your suggestions. I need to find a better way to verify the generated code. Currently, I haven’t figured out how to enable ESLint to check the code in a sandbox environment, which is why I’m using Thank you again for recognizing my efforts. Wishing you a happy life! |
…more-relevant-docs-and-code-exception-fixes # Conflicts: # src/agent/action_manager.js # src/agent/prompter.js
I'm back! Just resolving merge conflicts for now. Code migration and improvements are in progress. |
The embedded concurrency limitation issue has been resolved in the latest qwen.js of the current PR, so qwen's own embedded model can be used in qwen.json instead of using openai's embedded model. |
…-docs-and-code-exception-fixes # Conflicts: # profiles/qwen.json
@MaxRobinsonTheGreat
Take a look and let me know if any further improvements are needed. Feel free to reach out if you need anything! |
…more-relevant-docs-and-code-exception-fixes # Conflicts: # src/agent/prompter.js
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks much better and is very close to being done. Thanks for your work. A few small requests:
- separate execTemplate and lintTemplate again
- don't add skill docs in messages, they should always be in context and it will quickly fill up message history
- other little changes
…more-relevant-docs-and-code-exception-fixes # Conflicts: # src/agent/prompter.js
1. Results:I have made all the code modifications according to your suggestions. All the modified parts have been completed as required. However, I would like to discuss some details with you regarding whether we need to add appropriate error explanations when errors occur during code inspection and execution. 2. Discussion on "Should we add proper error explanation prompts when errors occur during code inspection and execution":Here are some ideas I've come up with! If you think it's necessary, I can make the changes in the current PR. Alternatively, I can make these improvements to the coder in a new PR after merging. Feel free to share your thoughts by replying to this PR or shoot me an email at [[email protected]]! 😊 2.1 Option 1:Simply changing the number of "error-related skill_doc" prompts in 2.2 Option 2:Add another handling module to replace 2.3 Option 3:Modify the history update process in 2.4 Option 4:Extract the code generation process from the complete conversation history and treat it as a separate "small brain" that is solely responsible for code generation, debugging, and execution, much like the cerebellum in the human brain. 3 Summary:The reason I suggest this is because I want to provide the LLM with correct usage instructions when errors occur, so that it can better correct mistakes, rather than getting stuck in its own delusions and unable to fix them. Perhaps there is a better way to solve this issue, one that provides the LLM with proper help without causing a message explosion. |
Last Modified Time: November 10, 2024, 5:53 PM
Latest changes are as follows:
Improvement Effects
Model: GPT-4o
Initial Command:
!goal("Your goal is: use only "!newAction" instructions and rely only on code execution to obtain a diamond pickaxe. You must complete this task step by step and by yourself. And can't use another "!command". You should promptly check to see what you have.")
Effect: After testing, under the condition of relying solely on generated code, the bot can run stably for at least 30 minutes without crashing (I manually ended the process at 30 minutes), during which it executed over 130 validated code snippets.
Remaining Issues:
WARNING: If you use the command above or set a goal that requires a long time to work, please pay attention to the execution status and token consumption, as the LLM may continuously generate code in certain situations. For example, when "an iron pickaxe is available and diamonds need to be mined," it might stand still using its code abilities to search for nearby diamond locations. Since diamonds are rare, it may fail to find them continuously, repeatedly improving the code and getting stuck, leading to substantial token consumption.Please test with caution, it cost me $60 to test with gpt-4o for 60min. But gpt-4o-mini is much cheaper and can be used to test this command
Added Features:
2.1 During code generation, the top
select_num
relevant skillsDocs related to!newAction("task")
will be selected and sent to the LLM in the prompt to help it focus better on thetask
. Currently,select_num
is set to 5.2.2 Before running the code, use ESLint to perform syntax and exception checks on the generated code to detect issues in advance, check for undefined functions, and add exceptions to messages.
2.3 During code execution, detailed error information will be included in messages.
Added Files:
3.1 file path: ./bots/codeCheckTemplate.js
A template used for performing checks before code execution. ESLint cannot be used for detection in the sandbox.
3.2 file path: ./eslint.config.js
Manages the ESLint rules for code syntax and exception detection.
Modified Code Content:
4.1 package.json
- Added: ESLint dependency.
4.2 settings.js
- Set:
code_timeout_mins=3
, ensuring timely code execution updates and preventing long blocks.4.3 coder.js
- Added:
checkCode
function to pre-check for syntax and exceptions. First, it checks whether the functions used in the code exist. If they don't, it writes the illegal functions to themessage
, then proceeds with syntax and exception checks.- Modified: Modified the return value of
stageCode
function fromreturn { main: mainFn };
toreturn { func: { main: mainFn }, src_check_copy: src_check_copy };
to ensure pre-execution exception detection.4.4 action_manager.js
- Enhanced:
catch (err)
error detection to include detailed exception content and related code Docs in messages, improving the LLM's ability to fix code.4.5 index.js
- Modified:
docHelper
andgetSkillDocs
return values to return the docArray of functions from the skill library for subsequent word embedding vector calculations.4.6 prompter.js
- Added:
this.skill_docs_embeddings = {};
to store the docArray word embedding vectors.- Added: Parallel initialization of
this.skill_docs_embeddings
ininitExamples
.- Added:
getRelevantSkillDocs function
to obtainselect_num
relevant doc texts based on input messages and select_num. If select_num >= 0, it is meaningful; otherwise, return all content sorted by relevance.Note: This modification ensures code quality by making minimal changes only where necessary, while also clearing test outputs and comments. If further modifications are needed, please feel free to let me know.