Summary
- Microsoft 365 Copilot mixes GPT drafting with Claude fact-checking for stronger research outputs.
- Researcher’s Critique scores 13.8% higher on DRACO by combining drafting and citation checks.
- Council shows multiple model answers and disagreements, letting you assemble the best workflow.
Things are getting interesting in the world of AI. First, companies used each other’s AI models. Then, we had a moment where everyone battened down the hatches and began focusing purely on making their model the best one. Now, we’re entering an era where each AI model does something the others can’t, so the only way to provide a truly stellar service is to mix different LLMs together.
Such is the case with Microsoft 365 Copilot’s newest agent, which mixes together the power of GPT and Claude when performing research. While GPT will be the one handling all the drafting, Claude will act as the strict editor who will fact-check the result and ensure everything is up to par. And the best part is, it works.
I tested Claude’s new interactive visuals, and they’re changing how I explain things
Most LLMs suffer with visualisation, Claude doesn’t
Microsoft 365 Copilot Researcher’s new feature delivers the best of two worlds
Two experts are better than one
In a press release, Microsoft revealed that Copilot Cowork is moving into the Frontier preview program. Copilot Cowork combines Claude’s capabilities with Microsoft’s own to create an agent you can delegate work to. It goes beyond the simple ‘chatbot style’ of LLMs and becomes a digital assistant of sorts.
One of the more exciting new features involves a new tool for Researcher, which combines two LLMs so that each one works in harmony with what it does best. As Microsoft describes it:
Researcher’s new Critique feature takes this even further, putting GPT and Claude to work together on every response: GPT drafts, Claude reviews for accuracy, completeness, and citation integrity before it’s delivered. […] The results are measurable—Researcher now scores 13.8% higher on the Deep Research Accuracy, Completeness, and Objectivity, or DRACO benchmark, the industry standard for deep research quality.
Researcher will also come with a tool called ‘Council’ that hands your prompt to several models and lets you see what each one says and where they agree or disagree. As such, Microsoft’s plan seems to be less on relying on Copilot to do all the heavy lifting and more tapping into the might of different AI companies to create a service that can handle every step of the workflow.

