
Will AI agents stop talking in text? The breakthrough of multi-agent systems in latent space
In the world of AI agents, we are starting to see a paradigm shift that could have much larger consequences than it appears at first glance. Until now, when multiple agents collaborate, the flow is almost always the same: a model reasons, translates its internal state into text, passes that text to another agent, and the second agent must reconstruct a new internal representation of the problem from those words.
It is an intuitive mechanism, but deeply inefficient. Every transition from latent space to text and then back from text to latent space introduces cost, latency, and loss of semantic information, especially when the task requires multiple rounds of collaboration between different models.
It is from this insight that Recursive Multi-Agent Systems was born, a work published on arXiv on April 27, 2026, by researchers affiliated with UIUC, Stanford, NVIDIA, and MIT. The paper proposes treating the entire multi-agent system as a unified recursive computation in latent space, rather than as a sequence of text messages between separate agents.
The core of the proposal is a lightweight module called RecursiveLink, designed to connect heterogeneous agents and allow the direct transfer of latent states from one model to another. In practice, the idea is simple yet radical: if models 'think' internally in continuous representations, then forcing them to verbalize everything at every step could be a structural waste.
This is the part that makes the work so interesting even outside the academic sphere. We are not just talking about a marginal improvement to an existing framework, but an attempt to rethink agent collaboration as a process natively internal to the model, minimizing dependence on tokens as an intermediate communication medium.
And this is where the numbers that have attracted so much attention come in. According to the paper's abstract, RecursiveMAS achieves an average accuracy improvement of 8.3% compared to advanced single-agent, multi-agent, and recursive baselines, along with an end-to-end acceleration between 1.2x and 2.4x and a reduction in token usage between 34.6% and 75.6%. In promotional materials and subsequent analyses, these results are linked to complex benchmarks in mathematics, science, medicine, and code generation—contexts where collaboration between agents tends to be useful but also very expensive.
Perhaps most relevantly, however, the framework is not presented as a system that requires retraining the source models from scratch. The official repository describes RecursiveMAS as an architecture that connects heterogeneous agents through lightweight modules, allowing the exchange, refinement, and evolution of latent states through recursive rounds. This makes the concept much more interesting for those building real pipelines: not a new abstract theory, but a possible, more efficient orchestration layer for multi-agent systems in production.
Ultimately, that is the whole point. For months, the agent market has thought almost exclusively in terms of prompts, tool use, orchestration, and communication protocols; this paper suggests instead that a huge part of the gain could come simply by changing the medium through which agents pass information. If text is the bottleneck, then the next evolution will not just be creating better agents, but allowing agents to collaborate in a way that is closer to how models actually process thought internally.
For those working on AI products, the message is very practical. In complex multi-agent systems, the cost arises not only from the inference of the single model, but from the amount of tokens spent to make agents talk to each other when they actually already have enough internal structure to collaborate without verbalizing every step. If this approach holds up outside of benchmarks, it could change how agentic pipelines for reasoning, coding, technical research, and advanced automation are designed.
Naturally, it is too early to treat RecursiveMAS as the new industry standard. The work is still in the paper-plus-code phase, and as always, between research results and industrial robustness lies the most important test: the real world. However, the signal is strong: the frontier of agents does not only pass through larger models or more numerous tools, but through a much deeper—and much more powerful—question: how to make models collaborate without forcing them to translate everything into words every time.
Related articles

Kimi K2.7 and Minimax M3: while the US blocks Mythos 5, China advances at an impressive speed

US blocks Fable 5 and Mythos 5: government shuts down Anthropic's most powerful AI models after just two days
