Lilac

Dataset curation for LLM training

Code & Dev Tools Open Source Open Source
Researched · Published
RECATOOLS Score
6.5 / 10
Capability
7
Value for money
7
Ease of use
6
ASEAN readiness
5
API quality
6
Founded
2023
HQ
San Francisco, California, USA
Users
Launched
Developer

Overview

Lilac (acquired by Databricks in 2024) is an open-source tool for inspecting, curating and labeling large datasets used to train and fine-tune LLMs. Particularly useful for finding PII, duplicates and quality issues at scale.

Advertisement

Pricing

Pricing shown for reference only. These figures reflect RECATOOLS research as of 20 May 2026 and may be out of date or incomplete. This is not financial or purchasing advice — always confirm the current price on the provider’s official website before making any decision.

Free
Free
Free tier with core features.

Use cases

Dataset inspection Training-data curation PII detection
Advertisement

ASEAN Perspective

Lilac in Southeast Asia

ASEAN-region availability and pricing notes coming soon. Drop the editorial team a note via /contact/ if you can supply local context (Singapore/Malaysia/Indonesia/Thailand/Vietnam).

RECATOOLS Verdict

Lilac is an open-source tool for exploring, clustering, searching and cleaning unstructured text datasets — useful for LLM evaluation and for preparing data for RAG, fine-tuning and pre-training. Built by ex-Google engineers, it suits data and ML teams who need to understand and curate large text corpora rather than treat them as a black box.

The critical caveat is status: Lilac was acquired by Databricks, and its capabilities are being folded into the Databricks platform, so the standalone open-source project is effectively in maintenance/archive mode. Evaluate it as either a Databricks feature or an OSS reference rather than an actively developed independent product. Self-hostable for free; English-only; no SEA-specific provisions.

Independent AI-assisted assessment by RECATOOLS.

About this listing

Researched on
Published on

This entry was compiled from publicly available data including Lilac's official website, press releases, documentation, and reputable third-party publications. RECATOOLS is not affiliated with Lilac unless explicitly stated.

Data accuracy

Third-party AI tools update their pricing, features, availability, and policies frequently. Information here may be outdated by the time you read this — we make reasonable efforts to keep listings current, but cannot guarantee absolute accuracy.

For the latest details, please refer to Lilac directly →

Spotted something out of date? Suggest an update →

Advertisement