A new AI lab calling itself Subquadratic launched on 5 May 2026 with $29 million in seed funding and a product that breaks one of deep learning's deepest assumptions. Its first model, SubQ 1M-Preview, is not a transformer. Instead of standard O(n²) transformer attention, SubQ uses sparse subquadratic attention end-to-end, and ships with a native context window of 12 million tokens — roughly twelve times larger than the current frontier.

Why subquadratic matters

Transformer attention scales quadratically with context length. Doubling the input doubles the compute per attention head twice. Practical models hit a wall around 1M–2M tokens because the memory bandwidth and the matrix-multiplication cost both blow up. Subquadratic architectures — Mamba, RWKV, RetNet, and now SubQ — replace the dense attention matrix with sparser operations whose compute grows much more slowly with context. The trade-off has historically been quality: subquadratic models lost on benchmarks against transformers of similar parameter count. The interesting claim from SubQ is that they have closed that gap.

The vendor claims

Per WhatLLM's reporting, SubQ claims:

  • Roughly 1/5 the cost of frontier transformer models on long-context tasks (think: full-book summarisation, multi-file code analysis, regulatory document review)
  • Up to 52× faster attention at scale when context exceeds a few million tokens
  • Native 12M-token context without resorting to retrieval-augmented sliding windows or context-compression hacks

These are the company's own numbers. Independent benchmarks have not yet been published, and as Air Street Press observed in its May state-of-AI roundup, the gap between vendor claims and HELM-Lite or LongBench scores on subquadratic models has historically been wide.

What 12M tokens unlocks

At 12M tokens you can fit roughly nine novels, or the source code of a mid-sized open-source project, or a full year of board minutes and quarterly filings, in a single prompt. Use cases that have been unreachable to retrieval-augmented architectures because they require holistic reasoning over the whole corpus — for example, "Find every contradiction across these 12 months of board minutes" — become viable if the attention scaling is genuinely subquadratic.

For anyone building LLM-backed finance or legal workflows, the practical signal is that long-context AI is moving from niche experimentation to a credible commercial offering. The next twelve months will tell whether SubQ's benchmarks survive contact with independent evaluation. If they do, the transformer-or-nothing assumption that has dominated the field since 2017 is about to be tested for the first time at production scale.


Sources and cross-checks: Primary: WhatLLM — New AI Models May 2026. Corroborated against: Air Street Press — State of AI: May 2026. Funding amount ($29M), context window (12M tokens), and "1/5 cost / 52× faster" claims verified as vendor-stated; awaiting independent benchmarks. Cross-checked 18 May 2026.