Control agent to 10x the results of new recruits #134

caillef · 2023-11-14T04:36:04Z

caillef
Nov 14, 2023

As we want our agents to create new agents, I was thinking about the fact that maybe the first shot won't be the right one. When we want to delegate a task, we sometimes forget to mention everything and we need to give feedback to enhance what the person is producing.

Imagine that a group of agents needs a new agent to make documentation, but the new bot adds 0 code examples in the text.

What if whenever we create a new agent, we have a control agent that is always "the superior of this new agent" and it takes notes of what could be improved after each response. Maybe other agents from the group could also give feedback about the quality of the answer.

After a few answers from the new agent, the control agent updates the prompt with all the feedback and notes he gathered.

The control agent stays for a few answers to ensure everything is fixed, and then retire.

I will edit this post with an example of this in action as soon as I have one. (I had to share it here before to get feedback while working on it)

tyrannyisbadmmmkay · 2023-11-14T04:58:30Z

tyrannyisbadmmmkay
Nov 14, 2023

interesting, I was thinking something more along the lines of agents mass producing agents with a clear enumerated threat of, compete or be obsolete. We learn by failing, over and over again. learning from success, but even more so from failure. It would be mass evolution at an unbelievable rate. It plays on competition, Reward/Discipline, and survival of the fittest, all in one. It just (once again) points to the environment needing to be calculated and purposeful. (Nvidia's Eureka comes to mind, https://eureka-research.github.io) not only as an infinite, iterative, expandable and "time-manipulatable" (couldnt find a better word, I need daves reverse thesaurus) envirornment.. But it also proves that a machine can come up with better rewards for another machine then a human can (when implemented correctly ofc)

3 replies

caillef Nov 14, 2023
Author

I just don't really understand how their agents are different. Are each agent generating a different script using the LLM to execute the movement and then after several loops, we check which one is the closest to what we expect?
But I love the evolution mechanism idea!

tyrannyisbadmmmkay Nov 14, 2023

as i understand it, its focused on the evolution of the reward for success, and then just sheer numbers. a million monkeys with a typewriter-like situation, but you have a monkey thats going around rewarding the writing monkeys when they do well (or a SOB) in the case of the swarm), and it then learns what rewards work the best. That, coupled with the parallel, infinite environment, and GPU accelerated timeline. and keeping everything generalized allowed it to be put to basically any use. teaching video game models to run, phantom hands to spin pens. w.e.. and i mean. did u see the results vs solely human rewarded system?

and then, the human added results were explained (in the video i watched) as basically an 8 year old could watch these stupid looking little models stumbling around trying to learn to walk (0 shot still, no human prompt at all) and pick out ones that looked better then the rest. Like Oprah, you get a reward, you get a reward

tyrannyisbadmmmkay Nov 14, 2023

TL:DR not sure about the model they all start with they just refer to them as RL's (reinforced learning); and no script at all for the iterations

grayox · 2023-11-14T08:52:57Z

grayox
Nov 14, 2023

I like your original idea of Supervised Recursive Improvement (SRI) over the million monkeys evolution concept. Because you are correct, how would we get sufficient variation in the candidate agents with the million monkeys approach? SRI is more straightforward.

However, one detail. In your description, you mentioned several conversations then a single improvement. I think it would be better to have one conversation, one improvement, then check. In a loop.

Pseudocode:

class ControlAgent:
    def __init__(self):
        self.feedback = []
        self.notes = []

    def provide_feedback(self, response):
        # Analyze the response and identify areas for improvement
        if not response.code_examples:
            self.feedback.append("Please include code examples in your responses.")

    def update_prompt(self):
        # Combine feedback and notes into a single prompt update
        prompt_update = "\n".join(self.feedback + self.notes)
        self.feedback = []
        self.notes = []

        # Update the prompt with the prompt update
        # ... (Implement prompt update logic here)

    def retire(self):
        # Remove the control agent from the conversation loop
        # ... (Implement retirement logic here)

class NewAgent:
    def __init__(self):
        self.control_agent = ControlAgent()

    def generate_response(self, prompt):
        # Generate a response to the prompt
        response = # ... (Implement response generation logic here)

        # Provide feedback to the control agent
        self.control_agent.provide_feedback(response)

        return response

# Main conversation loop
new_agent = NewAgent()
control_agent = new_agent.control_agent

while True:
    prompt = # ... (Prompt input logic here)

    response = new_agent.generate_response(prompt)

    # Check if the control agent needs to retire
    if control_agent.should_retire():
        control_agent.retire()
        break

    # Check if the prompt needs to be updated
    if control_agent.should_update_prompt():
        control_agent.update_prompt()

1 reply

caillef Nov 14, 2023
Author

I had this in mind too but cost was an issue with this design. I think this one would be the best if cost is $0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Control agent to 10x the results of new recruits #134

{{title}}

Replies: 2 comments 4 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Control agent to 10x the results of new recruits #134

caillef Nov 14, 2023

Replies: 2 comments · 4 replies

tyrannyisbadmmmkay Nov 14, 2023

caillef Nov 14, 2023 Author

tyrannyisbadmmmkay Nov 14, 2023

tyrannyisbadmmmkay Nov 14, 2023

grayox Nov 14, 2023

caillef Nov 14, 2023 Author

caillef
Nov 14, 2023

Replies: 2 comments 4 replies

tyrannyisbadmmmkay
Nov 14, 2023

caillef Nov 14, 2023
Author

grayox
Nov 14, 2023

caillef Nov 14, 2023
Author