06 Jun 2026
15:43
NVIDIA Releases Nemotron 3 Ultra 550B MoE for Long-Running Agentic Workflows
NVIDIA ships Nemotron 3 Ultra, a 550B-parameter open MoE model optimized for sustained agent operations with up to 5x faster inference and 30% lower cost on complex tasks.
At a glance
NVIDIA has released Nemotron 3 Ultra, a 550-billion-parameter Mixture-of-Experts open model designed specifically for long-running autonomous agents that perform planning, reasoning, tool use and multi-step workflows in coding, research and enterprise environments.
What changed
The new model delivers up to 5x faster inference speed and reduces the cost of complex agentic tasks by up to 30% compared with other open frontier models. It is positioned for sustained operation across extended sessions rather than single-turn interactions.
Why it matters
Operationally, teams can shorten execution time for multi-step agent workflows or handle greater volume within existing compute budgets. Commercially, lower inference cost improves margin profiles for agent-based services and internal tooling. From a compliance perspective, the open release enables governance teams to inspect, audit and deploy the model within regulated data environments where closed models are restricted.
Key details
- 550B MoE architecture
- Optimized for long-running agents requiring persistent context and tool orchestration
- Targets coding, research and enterprise workflow domains
- Available under open license
Sources
Notes for citation
Publication date reflects the 4 June 2026 announcements from NVIDIA and NVIDIA AI official accounts. Performance claims are taken directly from the issuing statements; independent verification is recommended prior to production deployment.
Want to discuss how this affects your workflows? Book a call →AI-assisted analysis by Skirr AI
