
NVIDIA Blackwell Ultra: GPUs for Autonomous Agents
NVIDIA has introduced Blackwell Ultra, the new GPU designed explicitly for agentic workloads. It's not just a boost in FLOPS: the architecture introduces hardware primitives to efficiently manage the 'reason → tool call → observe → reason' pattern typical of modern AI agents.
Among the most interesting innovations is a dedicated 'speculative tool execution' unit that allows the model to start generating the next reasoning step while the previous tool call is still in flight, reducing end-to-end latency by up to 40%.
The chip also integrates 288 GB of HBM4 with 12 TB/s bandwidth, enough to keep 1T parameter models in memory without aggressive sharding. For training specialized agents, this means much faster iteration cycles.
On the cloud front, AWS, GCP, and Azure have already announced the availability of Blackwell Ultra instances by Q3 2026. Preliminary pricing indicates around $12/h per on-demand GPU, with significant discounts on reserved instances.
For those who do not want to manage infrastructure, inference providers such as Together, Fireworks, and Groq have already confirmed they will host models optimized for Blackwell Ultra by the end of the year.
Related articles

Kimi K2.7 and Minimax M3: while the US blocks Mythos 5, China advances at an impressive speed

US blocks Fable 5 and Mythos 5: government shuts down Anthropic's most powerful AI models after just two days
