Guanaco
QLoRA-fine-tuned open-source chatbot that achieves ChatGPT quality using only 24GB GPU memory.
Overview
Guanaco is an open-source chatbot model created by Tim Dettmers and collaborators at the University of Washington using QLoRA (Quantized Low-Rank Adaptation), a technique that makes fine-tuning large language models possible on consumer hardware. The research paper and model were released in May 2023.
The key contribution of Guanaco is the QLoRA technique itself: by quantising the base model to 4-bit and then training only small adapter layers using LoRA, it became possible to fine-tune a 65B parameter Llama model on a single 48GB GPU, and a 7B model on a consumer GPU with 24GB VRAM. This democratised LLM fine-tuning significantly.
Guanaco 65B was evaluated as comparable to ChatGPT on human preference benchmarks despite being trained for less than 24 hours on a single GPU. While newer models have surpassed it, the QLoRA technique developed for Guanaco became the dominant approach for resource-efficient LLM fine-tuning and is now used in virtually every open-source fine-tuning toolkit.
Pricing
Pricing shown for reference only. These figures reflect RECATOOLS research as of 8 May 2026 and may be out of date or incomplete. This is not financial or purchasing advice — always confirm the current price on the provider’s official website before making any decision.
Use cases
ASEAN Perspective
Guanaco in Southeast Asia
ASEAN-region availability and pricing notes coming soon. Drop the editorial team a note via /contact/ if you can supply local context (Singapore/Malaysia/Indonesia/Thailand/Vietnam).
Guanaco is a family of open research models finetuned with QLoRA, the technique that made it possible to finetune large models on a single consumer GPU. Its historical importance is large: it demonstrated near-frontier (at the time) chat quality from highly efficient training, and the QLoRA method it showcased is now standard practice. It suits researchers and ML engineers studying efficient finetuning, not end users.
This is a research artifact, not a product: there is no hosted service, no official API, no support, and the base models are now well behind current open weights. Capability and value should be read as historical and educational rather than as a tool you would deploy today. No localisation or ASEAN provisioning of any kind.
Notable facts
- Guanaco 65B was fine-tuned in 24 hours on a single GPU that costs $2/hour to rent — demonstrating that frontier-class model fine-tuning is accessible to independent researchers.
- The QLoRA technique developed for Guanaco reduces the memory required to fine-tune Llama 2 70B by 75% compared to full precision training.
- The QLoRA paper is one of the most cited machine learning papers of 2023 and directly enabled consumer-friendly fine-tuning tools like Axolotl and LLaMA Factory.
Frequently asked questions
About this listing
This entry was compiled from publicly available data including Guanaco's official website, press releases, documentation, and reputable third-party publications. RECATOOLS is not affiliated with Guanaco unless explicitly stated.
Third-party AI tools update their pricing, features, availability, and policies frequently. Information here may be outdated by the time you read this — we make reasonable efforts to keep listings current, but cannot guarantee absolute accuracy.
For the latest details, please refer to Guanaco directly →
Spotted something out of date? Suggest an update →
Alternatives to Guanaco
More in LLMs & Chat