Would you rather have one super-smart AI or a swarm of dumber ones?

January 30, 2025

Do you wait around for one all-encompassing super-smart AI to emerge, or get started sooner with a swarm of a dozen passably-smart AIs?

I’ve been experimenting with different ways of harnessing Generative AI tools like ChatGPT and Llama to perform tasks as part of ongoing automation pipelines. Quite frankly, it’s amazing how much content they can generate on-demand, that looks and feels like 100% authentic, organic, human-authored stuff.

The Context Window

In my primitive use of these tools, one of the major limiting factors is the size of the “context window” - which is often equated to the memory or attention span of a human. Add too many concepts and data points to the context window and they start forgetting bits, like what you’ve actually asked them to do.

Prompt Engineering is a bit of a dark art. It yields results and responds well to experimentation and trial and error, and that experimentation tells you that adding more context, and more curated context, gives you better results. But add too much context and the AI starts forgetting the earlier instructions you gave it and “improvises.”

The Goal

I want to build code-reviewing tools using Generative AI. I can’t just give it 1 million lines of code stuffed into the context of my prompt, because that’s a sure-fire way to exceed even the 128k token limit of most of the available AI offerings.

The Challenge

Do I wait for a 1 million token model? A billion token model? Maybe they’ll come into existence, but maybe they won’t. Does context size equate to intelligence? Is it the direction that the industry is even going to take?

Creative Thinking Time

I’ve tried several ways of getting around the limitations here. One thing that makes a massive difference to the quality of responses is RAG (Retrieval Augmented Generation) which essentially shifts the context out from the prompt and into a database to be queried on the fly - optimising the information “in memory” at any time.

But even with RAG techniques, if you ask too much of an AI at once, it misses things. Those Context Windows apply both to the information you give it and the response it gives you.

Swarms and Roleplaying

Rather than creating an all-purpose, super-smart AI via elaborate Prompt Engineering, how about we take the swarm approach and build a dozen or so tightly-focused, role-oriented sub-agents?

Instead of asking for an AI to perform a code review, and loading it up with checklist of 100 quality control checks and a dozen reports of metrics like code coverage and dependency scans, let’s instead create a system of a dozen parallel agents - one for testing, another for security, and another for dependencies, all tied up neatly with an overall summariser.

Which is going to be better? Will this even work? Only time (and experimentation) will tell!