SubQ Launches First Commercial Subquadratic LLM — 12M Context

Neural network and machine learning architectural visualization

AI & ML · 18 May 2026 —

A new AI lab calling itself Subquadratic launched on 5 May 2026 with $29 million in seed funding and a product that breaks one of deep learning's deepest assumptions. Its first model, SubQ 1M-Preview, is not a transformer. Instead of standard O(n²) transformer attention, SubQ uses sparse subquadratic attention end-to-end, and ships with a native context window of 12 million tokens — roughly twelve times larger than the current frontier.

Why subquadratic matters

Transformer attention scales quadratically with context length. Doubling the input doubles the compute per attention head twice. Practical models hit a wall around 1M–2M tokens because the memory bandwidth and the matrix-multiplication cost both blow up. Subquadratic architectures — Mamba, RWKV, RetNet, and now SubQ — replace the dense attention matrix with sparser operations whose compute grows much more slowly with context. The trade-off has historically been quality: subquadratic models lost on benchmarks against transformers of similar parameter count. The interesting claim from SubQ is that they have closed that gap.

The vendor claims

Per WhatLLM's reporting, SubQ claims:

Roughly 1/5 the cost of frontier transformer models on long-context tasks (think: full-book summarisation, multi-file code analysis, regulatory document review)
Up to 52× faster attention at scale when context exceeds a few million tokens
Native 12M-token context without resorting to retrieval-augmented sliding windows or context-compression hacks

These are the company's own numbers. Independent benchmarks have not yet been published, and as Air Street Press observed in its May state-of-AI roundup, the gap between vendor claims and HELM-Lite or LongBench scores on subquadratic models has historically been wide.

What 12M tokens unlocks

At 12M tokens you can fit roughly nine novels, or the source code of a mid-sized open-source project, or a full year of board minutes and quarterly filings, in a single prompt. Use cases that have been unreachable to retrieval-augmented architectures because they require holistic reasoning over the whole corpus — for example, "Find every contradiction across these 12 months of board minutes" — become viable if the attention scaling is genuinely subquadratic.

For anyone building LLM-backed finance or legal workflows, the practical signal is that long-context AI is moving from niche experimentation to a credible commercial offering. The next twelve months will tell whether SubQ's benchmarks survive contact with independent evaluation. If they do, the transformer-or-nothing assumption that has dominated the field since 2017 is about to be tested for the first time at production scale.

Sources and cross-checks: Primary: WhatLLM — New AI Models May 2026. Corroborated against: Air Street Press — State of AI: May 2026. Funding amount ($29M), context window (12M tokens), and "1/5 cost / 52× faster" claims verified as vendor-stated; awaiting independent benchmarks. Cross-checked 18 May 2026.

Tags: #LLM #AI #AI-Research

AI Tools Desk

AI & Developer Productivity Desk

AI Tools Desk tracks AI products, coding agents, model releases, and developer productivity tools for RECATOOLS.

View author profile → · Editorial policy

About this byline AI Tools Desk is a specialist RECATOOLS editorial desk focused on AI tools and developer productivity coverage. Articles are produced and reviewed under RECATOOLS editorial supervision.

Why subquadratic matters

The vendor claims

What 12M tokens unlocks

Related articles

AWS Commits $1 Billion to Embedded AI Engineers as the Enterprise Fight Shifts to Deployment

ByteDance Launches Seedream 5.0 Pro, an Image Model That Outputs Editable Layers

Mira Murati's Thinking Machines Releases Inkling, Its First Open-Weight Model