MPT
MosaicML's efficient transformer with ALiBi positional encoding — fast, commercially licensed, extendable context.
Overview
MPT (MosaicML Pretrained Transformer) is a family of open-source language models developed by MosaicML (now part of Databricks) notable for their architectural modifications that enable very long context windows and fast inference. Using ALiBi (Attention with Linear Biases) instead of standard positional encodings, MPT models can be fine-tuned to handle context lengths far longer than they were trained on.
MPT-7B and MPT-30B were released in 2023 under permissive licences that allow commercial use. A key innovation was the FlashAttention integration that makes MPT models significantly faster to train and run than equivalent models. The MPT-7B-Chat and MPT-7B-Instruct variants provided instruction-following capability out of the box.
MosaicML was acquired by Databricks in 2023, integrating MPT into the Databricks AI platform and making it a foundation for enterprise LLM deployments within the Databricks ecosystem. The architectural innovations in MPT influenced subsequent open-source model designs, particularly around efficient attention and flexible context handling.
Pricing
Pricing shown for reference only. These figures reflect RECATOOLS research as of 8 May 2026 and may be out of date or incomplete. This is not financial or purchasing advice — always confirm the current price on the provider’s official website before making any decision.
Use cases
ASEAN Perspective
MPT in Southeast Asia
ASEAN-region availability and pricing notes coming soon. Drop the editorial team a note via /contact/ if you can supply local context (Singapore/Malaysia/Indonesia/Thailand/Vietnam).
MPT (MosaicML Pretrained Transformer) is an open foundation-model family from MosaicML, now part of Databricks, designed for efficient training, long context, and commercial use. At release it was a credible open alternative with permissive licensing and good fine-tuning ergonomics inside the Databricks/MosaicML stack.
It mainly suits teams already on Databricks who want to fine-tune or self-host an open model with enterprise support. Honest caveat: MPT has been largely superseded by Databricks' own DBRX and by stronger open models like Llama and Mistral lines, so it is more of a legacy/heritage option than a current first choice. It is a model family rather than a packaged product, so usability depends entirely on your MLOps maturity.
Notable facts
- MosaicML was acquired by Databricks for $1.3 billion in June 2023, just weeks after releasing MPT — one of the fastest acquisitions of an AI lab after a major model release.
- MPT models can be extended to handle 84,000 token contexts during inference even when trained on 2,048 tokens — a significant practical advantage for document analysis.
- The ALiBi attention mechanism used in MPT was invented by researchers at UNC and Facebook AI Research and represents a fundamentally different way of encoding positional information.
Frequently asked questions
About this listing
This entry was compiled from publicly available data including MPT's official website, press releases, documentation, and reputable third-party publications. RECATOOLS is not affiliated with MPT unless explicitly stated.
Third-party AI tools update their pricing, features, availability, and policies frequently. Information here may be outdated by the time you read this — we make reasonable efforts to keep listings current, but cannot guarantee absolute accuracy.
For the latest details, please refer to MPT directly →
Spotted something out of date? Suggest an update →
Alternatives to MPT
More in LLMs & Chat