Moa #2628

gembancud · 2024-12-14T16:39:48Z

Add Mixture of Architects (MOA) Feature

Why choose between r1, o3, and sonnet, when you can have 'em all!

Overview

This PR introduces a powerful new feature called "Mixture of Architects" (MOA) - a collaborative AI architecture where multiple LLM "architects" work together to solve programming tasks. Each architect maintains its own conversation thread while being able to see and respond to other architects' proposals, enabling true multi-agent collaboration.

EDIT: Below is a long-winded explanation of the idea. But it should be concisely demonstrated in a later comment: here

Click to expand long explanation

Key Features

Multiple Architect Collaboration

Support for multiple LLM architects working together
Each architect identified by NATO phonetic name (alpha, bravo, charlie, etc.)
First architect (alpha) is always the main model
Each architect sees all other architects' proposals, enabling true collaborative discussion

Discussion Flow

The discussion proceeds in rounds, with each round following this pattern:

User submits a query/request
Architects respond sequentially:
- Each architect sees:
  - Original user query
  - All previous architects' proposals (XML fenced)
- Each architect provides:
  - Their analysis/instructions
  - Their own proposal (in XML fence)
- Can reference, support, critique or object to other architects' proposals

Commands

Users can interact with MOA using three main commands:

/discuss <message> (or just type normally) - Start/continue a discussion round
/code <message> - Move to implementation phase
/drop <architect-name> - Remove an architect from the discussion

Implementation Phase

When moving to implementation (/code), the entire discussion history is compiled chronologically with full context. The editor coder then decides how to implement the changes based on:

The full discussion history
The final user message
Their own analysis of the proposals

Technical Implementation

Key Components

MixtureOfArchitectsCoder: Main class implementing the MOA functionality
ArchitectAgent: Class representing individual architects
XML fencing for clear message boundaries:
- <user_message> - Contains user queries
- <proposal> - Contains an architect's specific proposal
- <architect name='NAME'> - Contains full architect responses

Collaborative Design

All architects see all proposals
Architects can directly reference and critique each others' proposals
No formal consensus mechanism - the editor coder makes implementation decisions
User guides the final implementation through their /code message

Benefits

Enhanced Problem Solving: Multiple perspectives and approaches to each programming task
Better Decision Making: Architects can critique and improve upon each others' proposals
Flexible Architecture: Easy to add or remove architects during discussion
Clear Communication: Structured XML format ensures clear boundaries between different architects' inputs
User Control: Users can guide the discussion and choose when to move to implementation

Example Usage & Message Flow

Visual Flow Representation

User Query 1: "Let's improve the error handling in our API endpoints"
└── Alpha's View:
    User Query 1
    └── Alpha Response + Proposal
        └── Bravo's Proposal
            └── Charlie's Proposal
                └── User Query 2
                    └── Alpha Response + Proposal
                        └── ...

└── Bravo's View:
    User Query 1 + Alpha's Proposal
    └── Bravo Response + Proposal
        └── Charlie's Proposal
            └── User Query 2 + Alpha's Proposal
                └── ...

└── Charlie's View:
    User Query 1 + Alpha's Proposal + Bravo's Proposal
    └── Charlie Response + Proposal
        └── User Query 2 + Alpha's Proposal + Bravo's Proposal
            └── ...

Detailed Example with Clear Message Flow

🗣️ User Initiates

User: Let's improve the error handling in our API endpoints

👤 ALPHA's Turn

What Alpha Sees:

<user_message>
Let's improve the error handling in our API endpoints
</user_message>

Alpha's Response:

ALPHA: I suggest we implement a structured approach to error handling.
<proposal>
1. Create a base ErrorResponse class:
   class ErrorResponse {
     code: string;
     message: string;
     details?: object;
     timestamp: Date;
   }
2. Implement standardized try-catch blocks
3. Add error codes and messages mapping
</proposal>

👤 BRAVO's Turn

What Bravo Sees:

<user_message>
Let's improve the error handling in our API endpoints
</user_message>

<architect name='ALPHA'>
<proposal>
1. Create a base ErrorResponse class:
   class ErrorResponse {
     code: string;
     message: string;
     details?: object;
     timestamp: Date;
   }
2. Implement standardized try-catch blocks
3. Add error codes and messages mapping
</proposal>
</architect>

Bravo's Response:

BRAVO: Building on Alpha's proposal, we should consider logging and monitoring.
<proposal>
4. Add structured logging in catch blocks:
   - Use Winston or Pino for structured JSON logging
   - Include request context (requestId, user, endpoint)
5. Implement error tracking metrics:
   - Count by error type
   - Track error rates and patterns
6. Set up monitoring dashboards for:
   - Error frequency
   - Response times
   - Error patterns over time
</proposal>

👤 CHARLIE's Turn

What Charlie Sees:

<user_message>
Let's improve the error handling in our API endpoints
</user_message>

<architect name='ALPHA'>
[Alpha's proposal]
</architect>

<architect name='BRAVO'>
[Bravo's proposal]
</architect>

Charlie's Response:

CHARLIE: Good suggestions for internal handling. Let's enhance the API client experience too.
<proposal>
7. Add detailed error documentation:
   - OpenAPI/Swagger error schemas
   - Error code catalog with solutions
8. Implement retry mechanisms:
   - Exponential backoff for 5xx errors
   - Retry-After header support
9. Add rate limiting handling:
   - Clear 429 responses
   - Rate limit headers
   - Client-side rate tracking
</proposal>

Key Points About the Flow

Sequential Processing:
- Each architect sees all previous proposals
- Later architects can build on or critique earlier suggestions
- Full context is maintained throughout the discussion
XML Structure:
- <user_message> wraps user inputs
- <architect name='NAME'> wraps each architect's full response
- <proposal> wraps specific proposals within responses
Context Accumulation:
- Alpha sees only the user's message
- Bravo sees user's message + Alpha's proposal
- Charlie sees user's message + Alpha's + Bravo's proposals
Implementation Phase:
- /code command triggers the editor
- Editor receives complete discussion history
- Makes informed decisions based on all architects' input

Testing

Tested with various model combinations
Verified XML parsing and message handling
Tested command processing and architect management
Validated implementation phase with different types of code changes

Future Enhancements

Add support for architect voting/consensus mechanisms
Implement architect specialization (e.g., security expert, performance expert)
Add ability to save/load architect configurations
Enhance discussion visualization

Breaking Changes

None. This is a new feature that doesn't affect existing functionality.

Dependencies

No new dependencies required.

This PR represents a significant enhancement to aider's capabilities, enabling more sophisticated and collaborative code generation and modification. The Mixture of Architects approach provides a unique way to leverage multiple LLMs for better code quality and more thorough problem solving.

Please contact me at discord for discussion :)

upnp

CLAassistant · 2024-12-14T16:39:55Z

All committers have signed the CLA.

LuciferMornens · 2024-12-14T21:32:19Z

Ngl this is a hell of a PR.

I hope @paul-gauthier accepts it.

…new ones

jerzydziewierz · 2024-12-16T14:43:14Z

key question -- is this any good? @gembancud can you provide any kind of evaluations, or performance metrics?
any reference tasks that were not solved by a single architect but were solved by MoA ?

gembancud · 2024-12-18T12:59:34Z

key question -- is this any good? @gembancud can you provide any kind of evaluations, or performance metrics? any reference tasks that were not solved by a single architect but were solved by MoA ?

This is with Sonnet and Gpt-4o together.

- dirname: 2024-12-18-11-12-24--trial_run9
  test_cases: 133
  model: openrouter/anthropic/claude-3.5-sonnet:beta, openai/gpt-4o
  edit_format: diff
  commit_hash: 49eb1d2-dirty
  pass_rate_1: 65.4
  pass_rate_2: 82.7
  percent_cases_well_formed: 100.0
  error_outputs: 8
  num_malformed_responses: 0
  num_with_malformed_responses: 0
  user_asks: 7
  lazy_comments: 0
  syntax_errors: 0
  indentation_errors: 1
  exhausted_context_windows: 0
  test_timeouts: 0
  command: aider --model openrouter/anthropic/claude-3.5-sonnet:beta, openai/gpt-4o
  date: 2024-12-18
  versions: 0.68.1.dev
  seconds_per_case: 58.4
  total_cost: 5.1431

Will look to chaining o1, and 1206 as well, as soon as my rate limits relax abit!

jerzydziewierz · 2024-12-18T13:43:46Z

@gembancud so I understand that it is not currently any better than the Sonnet alone.

@paul-gauthier recommend to reject because: Having to wait approx. 1 minute per question is fundamentally not compatible with the original vision of Aider -- of being an user-interactive system.

@gembancud it's a great exercise but at this time I would recommend that you keep this private; we should avoid ~~polluting~~ overloading Aider with too many features of academic only merit.

aj47 · 2024-12-20T04:13:47Z

/drop doesnt seem to be working for me

aj47 · 2024-12-20T04:15:24Z

no autocomplete suggestion for /discuss

gembancud · 2024-12-20T15:05:31Z

/drop doesnt seem to be working for me

my bad, i accidentally overrode /drop. should be working now, replaced removing an architect in moa to /ignore

My deepest apologies, i have been a bad contributor, not uploading a method use for this technique. I will do though soon, ive done prompt changes, and i pretty much use it as a daily driver now.
i have 3 architects working for me. sonnet35, gpt4o, and gemini 1206. and having 3 refinement steps before you trigger code implementation, from 3 different models means its under the lens multiple times. definitely still not perfect, but it definitely hits a much larger scope compared to /architect. By the time time the last model does its instructions everythings neatly refined. error handling, type validation, design concerns are all side effects rather than architect just straight up doing your tasks.

paul-gauthier · 2024-12-20T16:15:50Z

Thanks for your interest in aider and for preparing this PR.

This is a very large PR, that radically alters how aider would function. It seems unlikely that I could merge it would a pretty strong set of objective, quantitative evidence that it provides significant benefits.

Have you been benchmarking this approach?

gembancud · 2024-12-21T11:34:48Z

Thank you for the attention! :)

I have been recently dismayed by jerzydziewierz's remarks, coupled that the expensive benchmark tests i ran didnt breach the saturation mark. I have not had the time and financial confidentiality to test beyond the code editing benchmark unfortunately. More of my benchmarks are in a discord thread easily dug through the showcases channel

Though you may notice the commits in here are continuous, that is evidenced by my tweaking nearly everyday as I have it as a main driver for development moving forward. I do think the code quality in here is much better compared to /architect but is pretty much anecdotal evidence coupled by the fact that i may be biased as the author.

I am on the lookout for QoL suggestions though, to make it easier for everyone to try it out, as I think thats just a much more organic adoption if people look beyond the benchmarks. I know that that's partly true because lmsys leaderboard does not have sonnet 3.5 at the top and alot in our circles advocate it largely by personal experience. In that regard, I would prefer getting feedback from moa that way as well.

But if my message does not answer the base need for quantitative evidence then, Im fine with postponing the fight until the next release of benchmarks with reduced saturation. On a minor note, moa is much more impressive in the code quality aspect, that i think is disregarded in unit testing benchmarks. If quanitifiable it would be something like chatbot arena.

jerzydziewierz · 2024-12-31T20:41:14Z

Dear @gembancud ,

sorry if you feel insulted,

My personal experience with these auto coding tools is that they very quickly fall into the trap of under-specification: that is, the problem shifts into eliciting what does the user even want and need in the first place, rather than providing the solution.

Hence, Aider has been envisioned as an coder-interactive tool rather than an auto-chatbot arena with some agentic effects on the source code.

as @paul-gauthier said, Aider, by now, is a relatively mature as a tool and it is unlikely that it will simply accept such a major change of the direction into the main repo, as-is

May I suggest that you can fork aider (if licence permits) and develop your vision there, and when you can demonstrate to a few people that your approach is superior, the word of mouth will surely spread.

As to benchmarking, you indeed do not need to demonstrate superiority on any of the big official benchmarks that may cost hundreds of dollars to run.

Just demonstrate it nicely on one or a few examples specifically tailored to the strength of what you are proposing here.

I honestly wish you best of luck with your passion project.

VatsaPatel · 2025-01-09T00:57:46Z

Hi @gembancud, I am happy to sponsor any benchmarking cost that you may need :)

gembancud · 2025-02-01T18:28:25Z

Im back with some developments and a short demo of how it can be used

Youtube Demo Video here:

Observations

Benchmark performance is believed to be poor. Thats the base testing i did with gpt4o and o1-mini at the least. I havent tested it with better models due to budget constraints.

If that didn't turn you off, good, because i use this method everyday, and pretty much avoid using any other ai code assistants, because they dont have multi-model collab. :)
Some findings:

Have architects discuss toward building the SIMPLEST solution. Having them improve previous code, will just add more complexity and confuse the coder on what to implement.
Because there are different ideas from different models, its important to have a compiler step separate from a coder step. this is so the coder doesnt have to pick out the implementations from different models, and do search and replace on top. In practice ideas are better followed, and search/replace errors are minimal.
Having said that, the flow now is ARCHITECTS -> COMPILER -> CODER
All architects are expected of a format. I find that this makes it easier to read, because there's more to read, and knowing where to skim doesnt make you slow.
I haven't benchmarked, but i feel more confident ordering models in terms of intelligence. reason being, you wouldnt want junior engineer to correct senior engineers outputs. the base i use pretty much is 3.6 Sonnet, because alpha is currently set as compiler and coder. I then order it with r1 and then more recently o3-mini.

Usage (Try it yourself)

Testing the MOA Branch for Aider

This branch introduces the MOA feature to Aider. Follow the steps below to test out the branch in your own environment.

Step 1: Clone the Repository

Clone the branch from my fork using the following command:

git clone -b moa https://github.com/gembancud/aider.git

Then navigate into the repository:

cd aider

Step 2: Set Up a Virtual Environment (Recommended)

Warning: Installing in the global environment may overwrite your current packages. We strongly recommend using a virtual environment.

Using `venv`

Create and activate a virtual environment with:

python -m venv venv
source venv/bin/activate  # On Windows, use: venv\Scripts\activate

Or Using `conda`

Alternatively, create and activate a Conda environment:

conda create --name aider_env python=3.x  # Replace 3.x with your desired Python version
conda activate aider_env

Step 3: Install the Package in Editable Mode

With your virtual environment activated, install the package in editable mode:

pip install -e .

Step 4: Run Aider with the MOA Feature

Now you can run the application with the following recommended command:

aider --no-auto-commit --no-auto-lint --no-suggest-shell-commands --no-show-model-warnings --model openrouter/anthropic/claude-3.5-sonnet:beta --architect --editor-model openrouter/anthropic/claude-3.5-sonnet:beta --moa r1 o3-mini

Modify as you see fit :)
for the video, i used

 aider --no-auto-commit --no-auto-lint --no-suggest-shell-commands --no-show-model-warnings --model openrouter/anthropic/claude-3.5-sonnet:beta --architect --editor-model openrouter/anthropic/claude-3.5-sonnet:beta --moa openrouter/openai/o3-mini

Step 5: Workflow is as follows

Add files, and chat with AI. default mode is like /ask. you can keep on discussing to steer the conversation.
If you want it coded, /code <message> will trigger that. its important in the message to direct how you want it coded eg. /code lets implement alphas solution. ignore charlie. AI will then compile your instructions and then edit it.
Keep your tasks short and /clear and /reset every so often.

with that said i limit my commands to /code, /add <files>, /drop <files>, /read-only <files>, /clear, and /reset. for more convenient commands please feel free to discuss what could be interesting to add

Warning

Does not work with /ask and /chat modes right now unfortunately, so if you want to switch, open another terminal for that.
Alpha is pretty much the compiler and coder. I recommend sticking with 3.5 Sonnet as the base

Feel free to add any extra context or instructions specific to your PR.

gitkenan · 2025-02-01T22:01:18Z

Hey! Thanks for commenting on my feature request - looks like you beat me to it with this idea.

Since you asked me to take a look, I'll give you my first impressions. What you've done here is very cool and it's certainly inspiring. Open-source needs people like you who push the boundary and build new things on top of existing software, which goes without saying.

However, I found it a little difficult to understand why I should use MoE. Big PRs are quite hard to swallow and for them to go well, the presentation of your PR needs to be just as impactful as it is big in terms of number of lines introduced. It would help your case a lot if you had a clear demonstration that provides a strong argument as to why I should use it, and it'd also help if the usage could be simplified.

I say this because I really believe in the idea, but I think the presentation and approach needs work. After spending around 5-10 minutes looking into the PR I still struggled to understand how I'd go about using it. If the idea was as simple as a way to configure or enhance the architect mode, this would be a lot easier of a pill to swallow. I really appreciate your honesty about the benchmark results, but as you can imagine people need to see positive results in order to consider such a big change. Even if it was a small enhancement, I think people need to see convincing proof that it's a real improvement.

In all cases, thanks for showing me this, and I really hope some variation of this feature gets implemented soon, because there's no doubt in my mind about its massive potential, and perhaps we can work together on it if you're interested (in particular, I'm considering developing my own personal enhancement of the architect mode which implements these ideas, but I'm open for us to work together on this if you'd like)

All the best and thanks again!

gembancud added 5 commits December 14, 2024 03:16

Implement MOACoder

4c702a0

Add fix keeping discussion across discussions

a0ab8c9

Fix content orchestration for discussion in MOA

fe86266

Add drop and clear for moa

b474127

Restructure for chat interface instead of single user sending

c792c6f

gembancud added 2 commits December 16, 2024 00:52

Modify moa prompt for better collaboration

9eb5cfc

Modify moa prompt to build on previous solutions instead of creating …

b3ecd6f

…new ones

gembancud added 2 commits December 18, 2024 16:58

Modify moa prompt for cleaner solution building

f05e021

Fix moa committing issues, and integrate with benchmark harness

49eb1d2

gembancud added 5 commits December 18, 2024 21:54

Add developer message for openrouter o1 models

3db7fa1

Merge branch 'Aider-AI:main' into moa

c08ee12

Minor moa prompt changes

085a6d7

Merge branch 'moa' of github.com:gembancud/aider into moa

b216a48

Merge branch 'moa' of github.com:gembancud/aider into moa

edd6d56

gembancud added 3 commits December 20, 2024 22:24

Add resiliency for proposal tag observed failure modes

3736337

Merge branch 'moa' of github.com:gembancud/aider into moa

5fcd51e

Change removing architect into /ignore

6a2c289

gembancud added 3 commits December 23, 2024 14:32

Merge remote-tracking branch 'upstream/main' into moa

c652e61

Minor prompt edits to try wrangle creeping enhancements

c8fb8ce

Merge branch 'Aider-AI:main' into moa

9e5fac4

gembancud added 3 commits December 30, 2024 17:14

Minor prompt edits to wrangle scope creep in aoi section

fb3ee7e

Merge branch 'moa' of github.com:gembancud/aider into moa

c526ff9

Switch to prompt to use mini diffs for changes

c304777

gembancud added 2 commits January 2, 2025 21:11

Switch to minmaxing user requirements for prompt

7a73ddf

Merge remote-tracking branch 'upstream/main' into moa

9e58311

gembancud added 12 commits January 10, 2025 17:09

Rewrite messages orchestration in discussion phase

1950546

Merge remote-tracking branch 'upstream/main' into moa

428f21f

Add compiler step

c6024dd

Add assertion prompt for full implementation of coder from compiler

a3be0db

Merge branch 'Aider-AI:main' into moa

42737c4

Fix R1 user message turn issues

3c608fb

Add arbiter

6be6732

Merge remote-tracking branch 'upstream/main' into moa

f7b308a

developer message cleanup + moa cleanup

1ad4661

Add model-settings reasoning_effort, self graded architect proposals

efdc8d1

Add post code injection for code replacement awareness

12d0637

Add explanation requirements for self-grading

840e4a3

gembancud mentioned this pull request Feb 1, 2025

Feature request: new 'consensus' mode for multi-model consultation #3098

Open

gembancud added 2 commits February 2, 2025 20:43

revert to original model-settings.yml

2bd54c6

inject reasoning effort to architect models

f7fa312

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Moa #2628

Moa #2628

gembancud commented Dec 14, 2024 •

edited

Loading

CLAassistant commented Dec 14, 2024 •

edited

Loading

LuciferMornens commented Dec 14, 2024

jerzydziewierz commented Dec 16, 2024

gembancud commented Dec 18, 2024

jerzydziewierz commented Dec 18, 2024

aj47 commented Dec 20, 2024

aj47 commented Dec 20, 2024

gembancud commented Dec 20, 2024

paul-gauthier commented Dec 20, 2024

gembancud commented Dec 21, 2024 •

edited

Loading

jerzydziewierz commented Dec 31, 2024

VatsaPatel commented Jan 9, 2025

gembancud commented Feb 1, 2025

gitkenan commented Feb 1, 2025 •

edited

Loading

Moa #2628

Are you sure you want to change the base?

Moa #2628

Conversation

gembancud commented Dec 14, 2024 • edited Loading

Add Mixture of Architects (MOA) Feature

Overview

EDIT: Below is a long-winded explanation of the idea. But it should be concisely demonstrated in a later comment: here

Key Features

Multiple Architect Collaboration

Discussion Flow

Commands

Implementation Phase

Technical Implementation

Key Components

Collaborative Design

Benefits

Example Usage & Message Flow

Visual Flow Representation

Detailed Example with Clear Message Flow

🗣️ User Initiates

👤 ALPHA's Turn

👤 BRAVO's Turn

👤 CHARLIE's Turn

Key Points About the Flow

Testing

Future Enhancements

Breaking Changes

Dependencies

Please contact me at discord for discussion :)

CLAassistant commented Dec 14, 2024 • edited Loading

LuciferMornens commented Dec 14, 2024

jerzydziewierz commented Dec 16, 2024

gembancud commented Dec 18, 2024

jerzydziewierz commented Dec 18, 2024

aj47 commented Dec 20, 2024

aj47 commented Dec 20, 2024

gembancud commented Dec 20, 2024

paul-gauthier commented Dec 20, 2024

gembancud commented Dec 21, 2024 • edited Loading

jerzydziewierz commented Dec 31, 2024

VatsaPatel commented Jan 9, 2025

gembancud commented Feb 1, 2025

Observations

Usage (Try it yourself)

Testing the MOA Branch for Aider

Step 1: Clone the Repository

Step 2: Set Up a Virtual Environment (Recommended)

Using venv

Or Using conda

Step 3: Install the Package in Editable Mode

Step 4: Run Aider with the MOA Feature

Step 5: Workflow is as follows

Warning

gitkenan commented Feb 1, 2025 • edited Loading

gembancud commented Dec 14, 2024 •

edited

Loading

CLAassistant commented Dec 14, 2024 •

edited

Loading

gembancud commented Dec 21, 2024 •

edited

Loading

Using `venv`

Or Using `conda`

gitkenan commented Feb 1, 2025 •

edited

Loading