Dolly
Databricks' open-source instruction-following model — trained on human-generated data, commercial use permitted.
Overview
Dolly is an open-source instruction-following language model created by Databricks employees, trained on 15,000 high-quality question-answer pairs written entirely by Databricks staff. Released in April 2023 as Dolly 2.0, it was the first commercially usable open-source instruction model — Databricks generated new human-written training data specifically to avoid licence restrictions on GPT-generated data.
The training dataset (databricks-dolly-15k) is itself an important contribution: 15,000 instruction-following examples written by Databricks employees across 8 categories including brainstorming, classification, closed QA, generation, information extraction, open QA, summarisation, and creative writing. This dataset is widely used as a clean, human-generated instruction dataset.
While Dolly's model performance was modest compared to GPT-3.5 level models, its significance was the licence clarity it provided: because all training data was generated by Databricks employees, the model could be used commercially without the legal uncertainty around GPT-generated training data. This represented a meaningful advance in the open-source LLM licensing landscape.
Pricing
Pricing shown for reference only. These figures reflect RECATOOLS research as of 8 May 2026 and may be out of date or incomplete. This is not financial or purchasing advice — always confirm the current price on the provider’s official website before making any decision.
Use cases
ASEAN Perspective
Dolly in Southeast Asia
ASEAN-region availability and pricing notes coming soon. Drop the editorial team a note via /contact/ if you can supply local context (Singapore/Malaysia/Indonesia/Thailand/Vietnam).
Dolly v2 was a landmark in 2023 — one of the first instruction-tuned LLMs released under a fully open, commercially usable licence, trained on a human-generated dataset (databricks-dolly-15k). It mattered as proof that open, non-restrictive instruction models were possible, and it remains a reasonable teaching artifact for understanding fine-tuning.
As a tool to actually deploy in 2026 it is obsolete: a 12B model from early 2023 is far behind Llama 3.x, Qwen, Mistral and other modern open weights on every benchmark and on efficiency. There is no hosted product, no API and no ongoing development — it lives on Hugging Face as weights you self-host. Treat it as historically significant, not as a current option.
Notable facts
- Dolly 2.0 was the first open-source LLM where ALL training data was generated by human employees — allowing commercial use with complete legal clarity.
- The 15,000 training examples were written by Databricks staff in their spare time across 3 weeks, making it one of the most unusual crowdsourced datasets in AI history.
- Dolly was named after Dolly the sheep, the first cloned mammal — symbolising the replication of ChatGPT's instruction-following behaviour at low cost.
Frequently asked questions
About this listing
This entry was compiled from publicly available data including Dolly's official website, press releases, documentation, and reputable third-party publications. RECATOOLS is not affiliated with Dolly unless explicitly stated.
Third-party AI tools update their pricing, features, availability, and policies frequently. Information here may be outdated by the time you read this — we make reasonable efforts to keep listings current, but cannot guarantee absolute accuracy.
For the latest details, please refer to Dolly directly →
Spotted something out of date? Suggest an update →
Alternatives to Dolly
More in LLMs & Chat