Dolly

Databricks' open-source instruction-following model — trained on human-generated data, commercial use permitted.

LLMs & Chat Open Source Has API Open Source
Researched · Published · Reviewed
RECATOOLS Score
3.5 / 10
Capability
3
Value for money
5
Ease of use
4
ASEAN readiness
5
API quality
Founded
2023
HQ
San Francisco, California
Users
200k+ downloads
Launched
Apr 2023
Developer
Databricks

Overview

Dolly is an open-source instruction-following language model created by Databricks employees, trained on 15,000 high-quality question-answer pairs written entirely by Databricks staff. Released in April 2023 as Dolly 2.0, it was the first commercially usable open-source instruction model — Databricks generated new human-written training data specifically to avoid licence restrictions on GPT-generated data.

The training dataset (databricks-dolly-15k) is itself an important contribution: 15,000 instruction-following examples written by Databricks employees across 8 categories including brainstorming, classification, closed QA, generation, information extraction, open QA, summarisation, and creative writing. This dataset is widely used as a clean, human-generated instruction dataset.

While Dolly's model performance was modest compared to GPT-3.5 level models, its significance was the licence clarity it provided: because all training data was generated by Databricks employees, the model could be used commercially without the legal uncertainty around GPT-generated training data. This represented a meaningful advance in the open-source LLM licensing landscape.

Advertisement

Pricing

Pricing shown for reference only. These figures reflect RECATOOLS research as of 8 May 2026 and may be out of date or incomplete. This is not financial or purchasing advice — always confirm the current price on the provider’s official website before making any decision.

Free
Free
Fully free

Use cases

Building a commercially deployable AI product without legal uncertainty about training data licensing Training a custom instruction model starting from a clean human-generated dataset Research into the minimum amount of human-generated data needed for useful instruction following
Advertisement

ASEAN Perspective

Dolly in Southeast Asia

ASEAN-region availability and pricing notes coming soon. Drop the editorial team a note via /contact/ if you can supply local context (Singapore/Malaysia/Indonesia/Thailand/Vietnam).

RECATOOLS Verdict

Dolly v2 was a landmark in 2023 — one of the first instruction-tuned LLMs released under a fully open, commercially usable licence, trained on a human-generated dataset (databricks-dolly-15k). It mattered as proof that open, non-restrictive instruction models were possible, and it remains a reasonable teaching artifact for understanding fine-tuning.

As a tool to actually deploy in 2026 it is obsolete: a 12B model from early 2023 is far behind Llama 3.x, Qwen, Mistral and other modern open weights on every benchmark and on efficiency. There is no hosted product, no API and no ongoing development — it lives on Hugging Face as weights you self-host. Treat it as historically significant, not as a current option.

Independent AI-assisted assessment by RECATOOLS.

Notable facts

  • Dolly 2.0 was the first open-source LLM where ALL training data was generated by human employees — allowing commercial use with complete legal clarity.
  • The 15,000 training examples were written by Databricks staff in their spare time across 3 weeks, making it one of the most unusual crowdsourced datasets in AI history.
  • Dolly was named after Dolly the sheep, the first cloned mammal — symbolising the replication of ChatGPT's instruction-following behaviour at low cost.

Frequently asked questions

Is Dolly free?
Yes. Apache 2.0 licence — free for commercial use.
Why is Dolly's training data special?
All 15,000 examples were written by humans with no AI generation, enabling commercial use without licence concerns about GPT-generated data.
Is Dolly competitive with GPT-3.5?
No. Dolly is significantly less capable. Its value is the licence clarity for commercial use, not performance leadership.
Can I use the Dolly training dataset for my own model?
Yes. The databricks-dolly-15k dataset is Apache 2.0 licensed.
What model is Dolly based on?
Dolly 2.0 is based on EleutherAI's pythia-12b model.

About this listing

Researched on
Published on
Last reviewed

This entry was compiled from publicly available data including Dolly's official website, press releases, documentation, and reputable third-party publications. RECATOOLS is not affiliated with Dolly unless explicitly stated.

Data accuracy

Third-party AI tools update their pricing, features, availability, and policies frequently. Information here may be outdated by the time you read this — we make reasonable efforts to keep listings current, but cannot guarantee absolute accuracy.

For the latest details, please refer to Dolly directly →

Spotted something out of date? Suggest an update →

Advertisement