Control agent to 10x the results of new recruits #134
Replies: 2 comments 4 replies
-
interesting, I was thinking something more along the lines of agents mass producing agents with a clear enumerated threat of, compete or be obsolete. We learn by failing, over and over again. learning from success, but even more so from failure. It would be mass evolution at an unbelievable rate. It plays on competition, Reward/Discipline, and survival of the fittest, all in one. It just (once again) points to the environment needing to be calculated and purposeful. (Nvidia's Eureka comes to mind, https://eureka-research.github.io) not only as an infinite, iterative, expandable and "time-manipulatable" (couldnt find a better word, I need daves reverse thesaurus) envirornment.. But it also proves that a machine can come up with better rewards for another machine then a human can (when implemented correctly ofc) |
Beta Was this translation helpful? Give feedback.
-
I like your original idea of Supervised Recursive Improvement (SRI) over the million monkeys evolution concept. Because you are correct, how would we get sufficient variation in the candidate agents with the million monkeys approach? SRI is more straightforward. However, one detail. In your description, you mentioned several conversations then a single improvement. I think it would be better to have one conversation, one improvement, then check. In a loop. Pseudocode:
|
Beta Was this translation helpful? Give feedback.
-
As we want our agents to create new agents, I was thinking about the fact that maybe the first shot won't be the right one. When we want to delegate a task, we sometimes forget to mention everything and we need to give feedback to enhance what the person is producing.
Imagine that a group of agents needs a new agent to make documentation, but the new bot adds 0 code examples in the text.
What if whenever we create a new agent, we have a control agent that is always "the superior of this new agent" and it takes notes of what could be improved after each response. Maybe other agents from the group could also give feedback about the quality of the answer.
After a few answers from the new agent, the control agent updates the prompt with all the feedback and notes he gathered.
The control agent stays for a few answers to ensure everything is fixed, and then retire.
I will edit this post with an example of this in action as soon as I have one. (I had to share it here before to get feedback while working on it)
Beta Was this translation helpful? Give feedback.
All reactions