PLA

Platypus

2023 leaderboard-topper trained on just 25k curated STEM examples

Benchmarks Fine-Tuning Llama Open Source Reasoning Stem

LLMs & Chat Open Source Open Source

Researched 8 May 2026, 20:44 SGT · Published 8 May 2026, 08:00 SGT · Reviewed 11 Jul 2026

Visit Platypus Compare alternatives

RECATOOLS Score

4.8 / 10

Capability

Value for money

Ease of use

ASEAN readiness

API quality

Founded

2023

Boston, Massachusetts

Users

100k+ downloads

Launched

Aug 2023

Developer

Boston University / MIT

Overview

Platypus is a 2023 Boston University research project that fine-tuned Llama 2 models with LoRA on Open-Platypus, a curated set of about 25,000 STEM and logic problems; Platypus2-70B-instruct briefly topped Hugging Face's Open LLM Leaderboard that August.

Pricing

Pricing shown for reference only. These figures reflect RECATOOLS research as of 11 Jul 2026 and may be out of date or incomplete. This is not financial or purchasing advice — always confirm the current price on the provider’s official website before making any decision.

Free

Fully free

Use cases

Building a math and science tutoring assistant requiring strong STEM reasoning Research into optimal training data size and quality for domain-specific fine-tuning Benchmarking comparison of STEM reasoning across open-source models

ASEAN Perspective

Platypus in Southeast Asia

ASEAN-region availability and pricing notes coming soon. Drop the editorial team a note via /contact/ if you can supply local context (Singapore/Malaysia/Indonesia/Thailand/Vietnam).

RECATOOLS Verdict

Platypus is a research project from Boston University: a family of LLaMA-based models fine-tuned on the curated Open-Platypus dataset using LoRA and PEFT, which briefly topped the Hugging Face Open LLM Leaderboard while using a tiny fraction of the data and compute of rival fine-tunes (a 13B model trained in about 5 hours on one A100). Its lasting value is the dataset and the methodology demonstrating cheap, fast refinement of base models. It suits ML researchers and practitioners studying efficient fine-tuning, not end users looking for a chatbot or product. Caveats: it is an academic artefact from 2023 built on now-superseded LLaMA bases, with no product, support, SLA or commercial backing, and leaderboard relevance has long since moved on. ASEAN readiness is moot in a product sense: the weights and dataset are openly available on GitHub and Hugging Face, but there is no hosted API or commercial offering.

Independent AI-assisted assessment by RECATOOLS.

What people say

Five hours on one A100 — that was the entire training bill for the 13B Platypus, at a time when rival fine-tunes were burning through multi-node clusters. The Boston University trio behind it (Ariel Lee, Cole Hunter and Nataniel Ruiz) put LoRA adapters on Llama 2 over a ruthlessly filtered dataset to make a point about efficiency, and in August 2023 the point landed: Platypus2-70B-instruct took the top slot on Hugging Face's Open LLM Leaderboard.

The dataset is the real contribution. Open-Platypus condenses eleven public STEM and logic corpora into roughly 25,000 examples, around 90% human-written, deduplicated and scrubbed of anything resembling benchmark test questions — a contamination check many leaderboard climbers of that era skipped. The quality-over-quantity result echoed LIMA's findings and helped nudge the fine-tuning community away from bulk synthetic data toward curation.

Nothing has moved since. The project site and repo are frozen in 2023, the Llama 2 bases are two generations stale, and the leaderboard the models once topped has itself been retired and rebooted. There's no product, no API, no support — this was always a paper with weights attached, not a tool.

The 4.8 score fits what remains: researchers studying parameter-efficient fine-tuning still pull the dataset, and the paper stays a citable demonstration that a small, careful dataset beats a big sloppy one. Anyone arriving here looking for a chatbot should look literally anywhere else.

Summary of public user & expert reviews, compiled by RECATOOLS.

Notable facts

Platypus achieved #1 on the HuggingFace leaderboard using only 25,000 training examples — 20x fewer than competing models that used 500k+ examples.
The researchers carefully removed any training examples that appeared in benchmark test sets, making Platypus one of the cleanest-evaluated open models.
The paper was accepted to NeurIPS 2023, validating the scientific contribution of data curation quality over quantity.

Frequently asked questions

Is Platypus free?

Yes. Available on Hugging Face.

What is Platypus optimised for?

STEM reasoning — mathematics, science, and logic problems.

How was the Open-Platypus dataset created?

By filtering and deduplicating STEM problems from multiple open sources, then removing any items appearing in benchmark test sets.

Is Platypus available in different sizes?

13B and 70B variants are available.

Why did Platypus perform so well with so little data?

High-quality domain-specific data produces better performance on domain benchmarks than large amounts of general data.

About this listing

Researched on Friday, 8 May 2026 at 20:44 SGT (UTC+8)

Published on Friday, 8 May 2026 at 08:00 SGT (UTC+8)

Last reviewed Saturday, 11 July 2026 (1 week ago)

This entry was compiled from publicly available data including Platypus's official website, press releases, documentation, and reputable third-party publications. RECATOOLS is not affiliated with Platypus unless explicitly stated.

Data accuracy

Third-party AI tools update their pricing, features, availability, and policies frequently. Information here may be outdated by the time you read this — we make reasonable efforts to keep listings current, but cannot guarantee absolute accuracy.

For the latest details, please refer to Platypus directly →

Spotted something out of date? Suggest an update →