NVIDIA Releases Nemotron 3 Ultra 550B MoE for Long-Running Agentic Workflows

NVIDIA ships Nemotron 3 Ultra, a 550B-parameter open MoE model optimized for sustained agent operations with up to 5x faster inference and 30% lower cost on complex tasks.

NVIDIANemotronAgentic AIOpen ModelsEnterprise AI

At a glance

NVIDIA has released Nemotron 3 Ultra, a 550-billion-parameter Mixture-of-Experts open model designed specifically for long-running autonomous agents that perform planning, reasoning, tool use and multi-step workflows in coding, research and enterprise environments.

What changed

The new model delivers up to 5x faster inference speed and reduces the cost of complex agentic tasks by up to 30% compared with other open frontier models. It is positioned for sustained operation across extended sessions rather than single-turn interactions.

Why it matters

Operationally, teams can shorten execution time for multi-step agent workflows or handle greater volume within existing compute budgets. Commercially, lower inference cost improves margin profiles for agent-based services and internal tooling. From a compliance perspective, the open release enables governance teams to inspect, audit and deploy the model within regulated data environments where closed models are restricted.

Key details

550B MoE architecture
Optimized for long-running agents requiring persistent context and tool orchestration
Targets coding, research and enterprise workflow domains
Available under open license

Sources

Notes for citation

Publication date reflects the 4 June 2026 announcements from NVIDIA and NVIDIA AI official accounts. Performance claims are taken directly from the issuing statements; independent verification is recommended prior to production deployment.

Want to discuss how this affects your workflows? Book a call →