DeepInfra

Open-model inference from $0.02 per million tokens.

API Inference Open Source Serverless

LLMs & Chat Paid Has API

Researched 3 Jun 2026, 23:48 SGT · Published 4 Jun 2026, 08:27 SGT · Reviewed 13 Jul 2026

Visit DeepInfra Compare alternatives

RECATOOLS Score

7.7 / 10

Capability

Value for money

Ease of use

ASEAN readiness

API quality

Founded

—

Users

—

Launched

—

Developer

—

Overview

A pay-per-token serverless cloud for open-weight LLMs, embeddings, speech and image models behind an OpenAI-compatible API, plus dedicated GPU instances. Built for cost-sensitive builders.

Pricing

Pricing shown for reference only. These figures reflect RECATOOLS research as of 13 Jul 2026 and may be out of date or incomplete. This is not financial or purchasing advice — always confirm the current price on the provider’s official website before making any decision.

Serverless

From $0.02/M tokens

Pay-per-token across the model catalogue.

OpenAI-compatible API
$1 free signup credit
50+ open models
No monthly minimum

Batch

50% off per-token

Bulk jobs processed within 24 hours.

Up to 1,000 requests per call
Half the standard token price
Async delivery

Dedicated GPU

Custom

Reserved instances for high throughput.

Dedicated hardware
Predictable capacity
Volume pricing

What you can produce with DeepInfra

50+ open models via one OpenAI-compatible API
Per-token pricing from $0.02/M tokens
Batch endpoint: 1,000 requests, 50% off, within 24 hours
Dedicated GPU instances for high-throughput teams
Embeddings, speech-to-text and image generation endpoints
SOC 2, ISO 27001, GDPR and HIPAA compliance
$1 signup credit, no credit card required

ASEAN Perspective

DeepInfra in Southeast Asia

ASEAN-region availability and pricing notes coming soon. Drop the editorial team a note via /contact/ if you can supply local context (Singapore/Malaysia/Indonesia/Thailand/Vietnam).

RECATOOLS Verdict

Cheap is the entire strategy, and DeepInfra commits to it: prices start around $0.02 per million tokens and rarely climb past $1.50, undercutting Together AI and most GPU clouds by roughly half on the same model. The catalogue is the other draw — 50+ open models across LLMs, embeddings, speech and image, which reviewers note is about triple what Groq or Cerebras offer. It's an OpenAI-compatible API, so integration is a base-URL swap, and a batch endpoint runs 1,000 requests at 50% off within 24 hours. The honest trade-off is speed: around 150 tokens/sec on a 70B model, fine for production but nowhere near the wafer-scale crowd. It carries SOC 2, ISO 27001, GDPR and HIPAA certifications, which matters for regulated teams. Best as a low-cost primary for throughput-tolerant workloads and research; look elsewhere if you need sub-100ms token latency.

Independent AI-assisted assessment by RECATOOLS.

What people say

Reviews and pricing trackers tell a consistent story: DeepInfra wins on cost and model breadth, loses on raw speed. Launched in 2022 with the stated goal of making open-source inference as cheap as possible, it now serves 50+ models — Llama 3.3 70B, DeepSeek V3/R1, Qwen2.5, plus embeddings, speech-to-text and image generation.

The headline is unit price. Trackers list Llama 3.3 70B at $0.35/M input on DeepInfra versus $0.59/M on Together AI, and DeepSeek V3 at $0.45/M versus $0.90/M. The floor runs to $0.02-$0.04/M for small models like Llama 3.2 3B. New accounts get $1 in free credits with no card required, and the batch endpoint discounts jobs 50% if you can wait up to 24 hours.

Model coverage is the second recurring praise point. One comparison pegged DeepInfra's catalogue at roughly 3x what Groq or Cerebras carry, framing it as the most versatile option for testing across model families without juggling providers.

The universal caveat is throughput. Reviewers measure around 150 tokens/sec on a 70B model — slower than Groq or Cerebras, but 5-10x cheaper. The verdict across sources is the same: for cost-sensitive production, batch pipelines, or research on a budget, it's hard to beat; for latency-critical chat or agents, pair it with a faster provider.

Compliance shows up as a differentiator for a low-cost shop: SOC 2, ISO 27001, GDPR and HIPAA certifications are documented, which is more than several cheaper competitors can claim. Beyond that, coverage is light on enterprise-scale reliability testimony — most write-ups are pricing comparisons rather than long-run production reviews, so teams should validate uptime for their own load. Documentation is described as solid if terse, and dedicated GPU instances are available for teams that outgrow serverless per-token billing.

Summary of public user & expert reviews, compiled by RECATOOLS.

About this listing

Researched on Wednesday, 3 June 2026 at 23:48 SGT (UTC+8)

Published on Thursday, 4 June 2026 at 08:27 SGT (UTC+8)

Last reviewed Monday, 13 July 2026 (1 week ago)

This entry was compiled from publicly available data including DeepInfra's official website, press releases, documentation, and reputable third-party publications. RECATOOLS is not affiliated with DeepInfra unless explicitly stated.

Data accuracy

Third-party AI tools update their pricing, features, availability, and policies frequently. Information here may be outdated by the time you read this — we make reasonable efforts to keep listings current, but cannot guarantee absolute accuracy.

For the latest details, please refer to DeepInfra directly →

Spotted something out of date? Suggest an update →