NVIDIA Nemotron 3 Ultra: 550B Open-Weights Model at Computex 2026

NVIDIA Nemotron 3 Ultra 550B parameter model announced at Computex 2026 Taipei Jensen Huang keynote Photo by Google DeepMind on Pexels

AI & ML · 1 Jun 2026 —

Only 55 billion of Nemotron 3 Ultra's 550 billion parameters are active at inference time — a mixture-of-experts design that lets NVIDIA claim 300-plus output tokens per second while keeping operational costs roughly 30% below what it calls leading alternatives (vendor-stated figures). Jensen Huang announced the model at his Computex 2026 keynote in Taipei on 1 June, positioning it as the flagship of NVIDIA's open-weights portfolio and setting a 4 June release date for public weights.

What Nemotron 3 Ultra Actually Is

NVIDIA's technical documentation describes the Nemotron 3 family as combining Mamba-2 layers, standard Transformer attention, and mixture-of-experts routing — a hybrid architecture the company has confirmed across the Super and Ultra variants, though NVIDIA's Computex blog post does not break down the architecture in those terms. What the company does confirm is a one-million-token context window and a post-training regime built around reinforcement learning aimed at multi-step task execution: coding, planning, and agentic workflows rather than single-turn chat. NVIDIA also released training recipes alongside the weights, giving developers a clearer path to fine-tuning than most frontier-class releases allow.

Artificial Analysis, which partnered with NVIDIA on evaluations, placed Nemotron 3 Ultra at an Intelligence Index of 48 — an aggregate across ten benchmarks. That puts it ahead of the next-closest US open-weights models (scoring 33 to 39) but behind China's Kimi K2.6, which scores 54. Independent benchmarks beyond Artificial Analysis's evaluation are not yet available; the throughput and cost figures are vendor-stated.

550BTotal parameters (55B active)

300+Output tokens per second (vendor-stated)

48Intelligence Index — top US open-weights

1MToken context window

The Competitive Landscape It Enters

Nemotron 3 Ultra lands in a market where Chinese open-weight models have moved fast. DeepSeek V4 Pro serves at 50 to 100 tokens per second; Kimi K2.6 outscores Nemotron on intelligence metrics. According to OpenRouter data analysed in partnership with Andreessen Horowitz and reported by South China Morning Post, Chinese open-source models grew from roughly 1.2% of global usage in late 2024 to nearly 30% by end of 2025 — context NVIDIA is clearly aware of.

Against Meta's Llama and Google's Gemma families, Nemotron 3 Ultra competes on the combination of raw intelligence score and inference throughput rather than parameter count alone. The MoE sparsity means the effective compute cost per token is closer to a 55B dense model, which changes the economics for enterprises evaluating self-hosted deployments.

Platform Strategy, Not Just a Model Drop

Huang framed the launch in terms that go beyond releasing weights. "We're dedicated to building open models for the world, so you can take all of it, add to it, make it even better, make it yours," he said during the keynote. Third-party reporting based on SEC filings cites a five-year, US$26 billion commitment to open-weight AI development — a figure that does not appear in NVIDIA's own Computex blog post or official press releases and should be treated as unconfirmed until NVIDIA discloses it directly.

NVIDIA also launched the Nemotron Coalition at GTC on 16 March 2026 — a group of eight AI labs including Mistral AI, Perplexity, Black Forest Labs, Cursor, LangChain, Reflection AI, Sarvam, and Thinking Machines Lab, collaborating on frontier model development on NVIDIA DGX Cloud. The coalition's first shared base model is intended to underpin the forthcoming Nemotron 4 family. Nemotron 3 Ultra integrates with NVIDIA's Agent Toolkit and NeMo libraries and is accessible through NVIDIA Cloud Partners.

The Nano Omni Variant

The broader Nemotron 3 family includes Nano Omni, announced on 28 April 2026 — a 30-billion-parameter model with only 3 billion parameters active at inference, designed for edge deployment. It processes text, images, audio, video, documents, and graphical interfaces as inputs. NVIDIA says it achieves nine times higher throughput than comparable open multimodal models and can run on single-GPU hardware including the DGX Spark and Jetson platforms, without the multi-GPU clusters required by larger models.

What Comes Next

Weights ship 4 June. The key open questions are whether independent evaluators confirm the throughput and cost claims at scale, how the Nemotron Coalition labs contribute to subsequent releases, and whether NVIDIA can sustain model-quality momentum alongside its hardware roadmap. Kimi K2.6 scoring six points higher on the same index is not a detail NVIDIA can dismiss — it sets the benchmark Nemotron 4 will need to clear.

Sources & cross-checks

Primary: Artificial Analysis — Nemotron 3 Ultra launch announced
Corroborated: NVIDIA Blog — GTC Taipei at Computex 2026
Corroborated: Decrypt — Nvidia Releases Its Best Open AI Model Yet
Corroborated: NVIDIA Newsroom — Nemotron Coalition press release (16 March 2026)
Corroborated: NVIDIA Blog — Nemotron 3 Nano Omni (28 April 2026)
Verified: 550B total / 55B active parameters, Intelligence Index 48, 300+ tokens/sec (vendor-stated), Kimi K2.6 score of 54 confirmed by Artificial Analysis and Decrypt. Nemotron Coalition members and 16 March 2026 announcement confirmed by NVIDIA Newsroom press release. Nano Omni 30B/3B-active and 28 April 2026 date confirmed by NVIDIA blog. Jensen Huang quote confirmed by NVIDIA Computex blog. Chinese open-source model share figures (1.2% late 2024 → ~30% end 2025) sourced from OpenRouter/a16z data per South China Morning Post. Hybrid Mamba-2/Transformer/MoE architecture confirmed by NVIDIA technical documentation for the Nemotron 3 Super/Ultra family. $26B figure from third-party reports citing SEC filings; absent from NVIDIA's own press releases — unconfirmed; noted as such in body.

Tags: #NVIDIA #Agentic-AI #Large-Language-Models #Computex-2026 #Open-Weights-AI

AI Tools Desk

AI & Developer Productivity Desk

AI Tools Desk tracks AI products, coding agents, model releases, and developer productivity tools for RECATOOLS.

View author profile → · Editorial policy

About this byline AI Tools Desk is a specialist RECATOOLS editorial desk focused on AI tools and developer productivity coverage. Articles are produced and reviewed under RECATOOLS editorial supervision.

What Nemotron 3 Ultra Actually Is

The Competitive Landscape It Enters

Platform Strategy, Not Just a Model Drop

The Nano Omni Variant

What Comes Next

Related articles

AWS Commits $1 Billion to Embedded AI Engineers as the Enterprise Fight Shifts to Deployment

ByteDance Launches Seedream 5.0 Pro, an Image Model That Outputs Editable Layers

Mira Murati's Thinking Machines Releases Inkling, Its First Open-Weight Model