StarCoder
Open-source code LLM trained on 80+ languages from The Stack dataset — free for research and commercial use.
Overview
StarCoder is an open-source large language model specialised for code, developed by BigCode — a collaboration between Hugging Face and ServiceNow. Trained on The Stack dataset, a curated collection of permissively licensed code from GitHub, StarCoder provides strong code generation capabilities across 80+ programming languages with a permissive licence for commercial use.
StarCoder2 (released in 2024) is the second generation with significantly improved quality, available in 3B, 7B, and 15B parameter sizes. The 15B model performs competitively with models twice its size on code generation benchmarks. The training dataset was carefully filtered for licence compliance, addressing concerns about GPL contamination in code generation models.
For developers who want code generation capabilities without the data privacy concerns of commercial APIs, StarCoder provides a deployable alternative. It is particularly used in research contexts, by organisations that need to self-host for compliance reasons, and as a starting point for domain-specific code model fine-tuning.
Pricing
Pricing shown for reference only. These figures reflect RECATOOLS research as of 8 May 2026 and may be out of date or incomplete. This is not financial or purchasing advice — always confirm the current price on the provider’s official website before making any decision.
Use cases
ASEAN Perspective
StarCoder in Southeast Asia
ASEAN-region availability and pricing notes coming soon. Drop the editorial team a note via /contact/ if you can supply local context (Singapore/Malaysia/Indonesia/Thailand/Vietnam).
StarCoder, from the BigCode project (Hugging Face and ServiceNow), is an openly licensed code-generation model family notable for being trained on permissively licensed source code with clear data governance — a meaningful point for enterprises wary of copyright exposure. It is a solid base for self-hosted coding assistants and fine-tuning.
Purely on raw coding capability it trails frontier proprietary models (GPT, Claude, Gemini) and the strongest open coders like Qwen-Coder and DeepSeek-Coder, so it is more compelling for its license clarity and self-hostability than top-of-leaderboard performance. Open weights mean unrestricted ASEAN use, but you supply the infrastructure and tooling.
Notable facts
- StarCoder was trained on The Stack dataset, which was filtered using an opt-out mechanism — developers could request their GitHub repositories be excluded from training.
- The 15B parameter model achieves performance comparable to closed 30B code models, demonstrating that careful data curation matters more than raw scale.
- BigCode is a unique collaboration between two companies (Hugging Face and ServiceNow) and the broader academic community, releasing all training data and model weights openly.
Frequently asked questions
About this listing
This entry was compiled from publicly available data including StarCoder's official website, press releases, documentation, and reputable third-party publications. RECATOOLS is not affiliated with StarCoder unless explicitly stated.
Third-party AI tools update their pricing, features, availability, and policies frequently. Information here may be outdated by the time you read this — we make reasonable efforts to keep listings current, but cannot guarantee absolute accuracy.
For the latest details, please refer to StarCoder directly →
Spotted something out of date? Suggest an update →
Alternatives to StarCoder
More in Code & Dev Tools