Guanaco

QLoRA-fine-tuned open-source chatbot that achieves ChatGPT quality using only 24GB GPU memory.

LLMs & Chat Open Source Open Source
Researched · Published · Reviewed
RECATOOLS Score
4 / 10
Capability
4
Value for money
5
Ease of use
3
ASEAN readiness
4
API quality
Founded
2023
HQ
Seattle, Washington
Users
100k+ downloads
Launched
May 2023
Developer
University of Washington

Overview

Guanaco is an open-source chatbot model created by Tim Dettmers and collaborators at the University of Washington using QLoRA (Quantized Low-Rank Adaptation), a technique that makes fine-tuning large language models possible on consumer hardware. The research paper and model were released in May 2023.

The key contribution of Guanaco is the QLoRA technique itself: by quantising the base model to 4-bit and then training only small adapter layers using LoRA, it became possible to fine-tune a 65B parameter Llama model on a single 48GB GPU, and a 7B model on a consumer GPU with 24GB VRAM. This democratised LLM fine-tuning significantly.

Guanaco 65B was evaluated as comparable to ChatGPT on human preference benchmarks despite being trained for less than 24 hours on a single GPU. While newer models have surpassed it, the QLoRA technique developed for Guanaco became the dominant approach for resource-efficient LLM fine-tuning and is now used in virtually every open-source fine-tuning toolkit.

Advertisement

Pricing

Pricing shown for reference only. These figures reflect RECATOOLS research as of 8 May 2026 and may be out of date or incomplete. This is not financial or purchasing advice — always confirm the current price on the provider’s official website before making any decision.

Free
Free
Fully free

Use cases

Fine-tuning a large language model on a single consumer GPU using the QLoRA technique Research into efficient LLM fine-tuning for academic papers Creating domain-specific models from Llama with limited hardware resources
Advertisement

ASEAN Perspective

Guanaco in Southeast Asia

ASEAN-region availability and pricing notes coming soon. Drop the editorial team a note via /contact/ if you can supply local context (Singapore/Malaysia/Indonesia/Thailand/Vietnam).

RECATOOLS Verdict

Guanaco is a family of open research models finetuned with QLoRA, the technique that made it possible to finetune large models on a single consumer GPU. Its historical importance is large: it demonstrated near-frontier (at the time) chat quality from highly efficient training, and the QLoRA method it showcased is now standard practice. It suits researchers and ML engineers studying efficient finetuning, not end users.

This is a research artifact, not a product: there is no hosted service, no official API, no support, and the base models are now well behind current open weights. Capability and value should be read as historical and educational rather than as a tool you would deploy today. No localisation or ASEAN provisioning of any kind.

Independent AI-assisted assessment by RECATOOLS.

Notable facts

  • Guanaco 65B was fine-tuned in 24 hours on a single GPU that costs $2/hour to rent — demonstrating that frontier-class model fine-tuning is accessible to independent researchers.
  • The QLoRA technique developed for Guanaco reduces the memory required to fine-tune Llama 2 70B by 75% compared to full precision training.
  • The QLoRA paper is one of the most cited machine learning papers of 2023 and directly enabled consumer-friendly fine-tuning tools like Axolotl and LLaMA Factory.

Frequently asked questions

Is Guanaco free?
Yes. Open weights available on Hugging Face.
What is QLoRA?
Quantised Low-Rank Adaptation — a technique that allows fine-tuning large models in 4-bit quantisation using LoRA adapters.
Can Guanaco be used commercially?
The model is for research use; the QLoRA technique is freely usable for any purpose.
Is Guanaco still state-of-the-art?
As a model, no. As a demonstration of efficient fine-tuning methodology, it remains historically significant.
How does QLoRA differ from standard LoRA?
QLoRA quantises the frozen base model to 4-bit, dramatically reducing memory while training only the LoRA adapter weights at higher precision.

About this listing

Researched on
Published on
Last reviewed

This entry was compiled from publicly available data including Guanaco's official website, press releases, documentation, and reputable third-party publications. RECATOOLS is not affiliated with Guanaco unless explicitly stated.

Data accuracy

Third-party AI tools update their pricing, features, availability, and policies frequently. Information here may be outdated by the time you read this — we make reasonable efforts to keep listings current, but cannot guarantee absolute accuracy.

For the latest details, please refer to Guanaco directly →

Spotted something out of date? Suggest an update →

Advertisement