Dolly

Databricks' open-source instruction-following model — trained on human-generated data, commercial use permitted.

Commercial Licence Databricks Human Data Instruction Tuned Open Source Research

LLMs & Chat Open Source Has API Open Source

Researched 8 May 2026, 20:44 SGT · Published 8 May 2026, 08:00 SGT · Reviewed 11 Jul 2026

Visit Dolly Compare alternatives

RECATOOLS Score

3.5 / 10

Capability

Value for money

Ease of use

ASEAN readiness

API quality

Founded

2023

San Francisco, California

Users

200k+ downloads

Launched

Apr 2023

Developer

Databricks

Overview

Dolly 2.0 is Databricks' April 2023 instruction-tuned Pythia-12B model, billed as the first open-source instruction LLM cleared for commercial use because all 15,000 training examples were written by Databricks employees. The databricks-dolly-15k dataset has outlived the model itself.

Pricing

Pricing shown for reference only. These figures reflect RECATOOLS research as of 11 Jul 2026 and may be out of date or incomplete. This is not financial or purchasing advice — always confirm the current price on the provider’s official website before making any decision.

Free

Fully free

Use cases

Building a commercially deployable AI product without legal uncertainty about training data licensing Training a custom instruction model starting from a clean human-generated dataset Research into the minimum amount of human-generated data needed for useful instruction following

ASEAN Perspective

Dolly in Southeast Asia

ASEAN-region availability and pricing notes coming soon. Drop the editorial team a note via /contact/ if you can supply local context (Singapore/Malaysia/Indonesia/Thailand/Vietnam).

RECATOOLS Verdict

Dolly v2 was a landmark in 2023 — one of the first instruction-tuned LLMs released under a fully open, commercially usable licence, trained on a human-generated dataset (databricks-dolly-15k). It mattered as proof that open, non-restrictive instruction models were possible, and it remains a reasonable teaching artifact for understanding fine-tuning. As a tool to actually deploy in 2026 it is obsolete: a 12B model from early 2023 is far behind Llama 3.x, Qwen, Mistral and other modern open weights on every benchmark and on efficiency. There is no hosted product, no API and no ongoing development — it lives on Hugging Face as weights you self-host. Treat it as historically significant, not as a current option.

Independent AI-assisted assessment by RECATOOLS.

What people say

Nobody deploys Dolly in 2026, and even Databricks never claimed you should — the launch blog conceded the model wasn't state-of-the-art, and early users found it hallucinated freely and trailed GPT-3.5 by a wide margin. What earns Dolly its place in the record books is a licensing manoeuvre, not a benchmark.

In early 2023, every open instruction model worth using (Alpaca, Vicuna and friends) was tuned on GPT output, which OpenAI's terms made legally murky for commercial products. Databricks' answer was to turn dataset creation into an internal contest: over 5,000 employees wrote 15,000 instruction–response pairs across eight task categories. Fine-tune EleutherAI's Pythia-12B on that, release everything under CC-BY-SA, and you get Dolly 2.0 — shipped in April 2023 as the first instruction-tuned LLM anyone could legally build a business on.

The dataset outlived the model by a wide margin. databricks-dolly-15k is still cited, still downloaded, and still shows up in fine-tuning tutorials and as seed data for other corpora; the model itself sits dormant on Hugging Face, and the GitHub repo has been essentially untouched since mid-2023. Databricks moved on to MPT via the MosaicML acquisition, then to DBRX.

The 3.5 score is about right — arguably generous for the model, fair for the contribution. If you need a clean, human-written instruction dataset for a class project or a reproduction study, dolly-15k remains genuinely useful. If you want a model that answers questions well, any modern open-weight release does better.

Summary of public user & expert reviews, compiled by RECATOOLS.

Notable facts

Dolly 2.0 was the first open-source LLM where ALL training data was generated by human employees — allowing commercial use with complete legal clarity.
The 15,000 training examples were written by Databricks staff in their spare time across 3 weeks, making it one of the most unusual crowdsourced datasets in AI history.
Dolly was named after Dolly the sheep, the first cloned mammal — symbolising the replication of ChatGPT's instruction-following behaviour at low cost.

Frequently asked questions

Is Dolly free?

Yes. Apache 2.0 licence — free for commercial use.

Why is Dolly's training data special?

All 15,000 examples were written by humans with no AI generation, enabling commercial use without licence concerns about GPT-generated data.

Is Dolly competitive with GPT-3.5?

No. Dolly is significantly less capable. Its value is the licence clarity for commercial use, not performance leadership.

Can I use the Dolly training dataset for my own model?

Yes. The databricks-dolly-15k dataset is Apache 2.0 licensed.

What model is Dolly based on?

Dolly 2.0 is based on EleutherAI's pythia-12b model.

About this listing

Researched on Friday, 8 May 2026 at 20:44 SGT (UTC+8)

Published on Friday, 8 May 2026 at 08:00 SGT (UTC+8)

Last reviewed Saturday, 11 July 2026 (1 week ago)

This entry was compiled from publicly available data including Dolly's official website, press releases, documentation, and reputable third-party publications. RECATOOLS is not affiliated with Dolly unless explicitly stated.

Data accuracy

Third-party AI tools update their pricing, features, availability, and policies frequently. Information here may be outdated by the time you read this — we make reasonable efforts to keep listings current, but cannot guarantee absolute accuracy.

For the latest details, please refer to Dolly directly →

Spotted something out of date? Suggest an update →