Only 55 billion of Nemotron 3 Ultra's 550 billion parameters are active at inference time — a mixture-of-experts design that lets NVIDIA claim 300-plus output tokens per second while keeping operational costs roughly 30% below what it calls leading alternatives (vendor-stated figures). Jensen Huang announced the model at his Computex 2026 keynote in Taipei on 1 June, positioning it as the flagship of NVIDIA's open-weights portfolio and setting a 4 June release date for public weights.

What Nemotron 3 Ultra Actually Is

NVIDIA's technical documentation describes the Nemotron 3 family as combining Mamba-2 layers, standard Transformer attention, and mixture-of-experts routing — a hybrid architecture the company has confirmed across the Super and Ultra variants, though NVIDIA's Computex blog post does not break down the architecture in those terms. What the company does confirm is a one-million-token context window and a post-training regime built around reinforcement learning aimed at multi-step task execution: coding, planning, and agentic workflows rather than single-turn chat. NVIDIA also released training recipes alongside the weights, giving developers a clearer path to fine-tuning than most frontier-class releases allow.

Artificial Analysis, which partnered with NVIDIA on evaluations, placed Nemotron 3 Ultra at an Intelligence Index of 48 — an aggregate across ten benchmarks. That puts it ahead of the next-closest US open-weights models (scoring 33 to 39) but behind China's Kimi K2.6, which scores 54. Independent benchmarks beyond Artificial Analysis's evaluation are not yet available; the throughput and cost figures are vendor-stated.

550BTotal parameters (55B active)
300+Output tokens per second (vendor-stated)
48Intelligence Index — top US open-weights
1MToken context window

The Competitive Landscape It Enters

Nemotron 3 Ultra lands in a market where Chinese open-weight models have moved fast. DeepSeek V4 Pro serves at 50 to 100 tokens per second; Kimi K2.6 outscores Nemotron on intelligence metrics. According to OpenRouter data analysed in partnership with Andreessen Horowitz and reported by South China Morning Post, Chinese open-source models grew from roughly 1.2% of global usage in late 2024 to nearly 30% by end of 2025 — context NVIDIA is clearly aware of.

Against Meta's Llama and Google's Gemma families, Nemotron 3 Ultra competes on the combination of raw intelligence score and inference throughput rather than parameter count alone. The MoE sparsity means the effective compute cost per token is closer to a 55B dense model, which changes the economics for enterprises evaluating self-hosted deployments.

Platform Strategy, Not Just a Model Drop

Huang framed the launch in terms that go beyond releasing weights. "We're dedicated to building open models for the world, so you can take all of it, add to it, make it even better, make it yours," he said during the keynote. Third-party reporting based on SEC filings cites a five-year, US$26 billion commitment to open-weight AI development — a figure that does not appear in NVIDIA's own Computex blog post or official press releases and should be treated as unconfirmed until NVIDIA discloses it directly.

NVIDIA also launched the Nemotron Coalition at GTC on 16 March 2026 — a group of eight AI labs including Mistral AI, Perplexity, Black Forest Labs, Cursor, LangChain, Reflection AI, Sarvam, and Thinking Machines Lab, collaborating on frontier model development on NVIDIA DGX Cloud. The coalition's first shared base model is intended to underpin the forthcoming Nemotron 4 family. Nemotron 3 Ultra integrates with NVIDIA's Agent Toolkit and NeMo libraries and is accessible through NVIDIA Cloud Partners.

The Nano Omni Variant

The broader Nemotron 3 family includes Nano Omni, announced on 28 April 2026 — a 30-billion-parameter model with only 3 billion parameters active at inference, designed for edge deployment. It processes text, images, audio, video, documents, and graphical interfaces as inputs. NVIDIA says it achieves nine times higher throughput than comparable open multimodal models and can run on single-GPU hardware including the DGX Spark and Jetson platforms, without the multi-GPU clusters required by larger models.

What Comes Next

Weights ship 4 June. The key open questions are whether independent evaluators confirm the throughput and cost claims at scale, how the Nemotron Coalition labs contribute to subsequent releases, and whether NVIDIA can sustain model-quality momentum alongside its hardware roadmap. Kimi K2.6 scoring six points higher on the same index is not a detail NVIDIA can dismiss — it sets the benchmark Nemotron 4 will need to clear.