OpenAI Realtime API

Speech-to-speech voice-agent API, billed per audio token

GPT-4o OpenAI Real-Time Voice

Video & Audio Paid Has API

Researched 20 May 2026, 08:00 SGT · Published 19 May 2026, 08:00 SGT · Reviewed 12 Jul 2026

Visit OpenAI Realtime API Compare alternatives

RECATOOLS Score

7.8 / 10

Capability

Value for money

Ease of use

ASEAN readiness

API quality

Founded

2024

San Francisco, California, USA

Users

—

Launched

—

Developer

—

Overview

The Realtime API streams audio directly in and out of GPT models over WebRTC or WebSocket, with native interruption handling and tool calls, so you skip stitching together separate STT, LLM and TTS services.

Pricing

Pricing shown for reference only. These figures reflect RECATOOLS research as of 11 Jul 2026 and may be out of date or incomplete. This is not financial or purchasing advice — always confirm the current price on the provider’s official website before making any decision.

gpt-realtime-2.1-mini

$10/M audio in · $20/M audio out

Lower-cost model for simpler voice agents

$0.30/M cached audio input
Roughly 60% cheaper than the full model
Same WebRTC/WebSocket API

gpt-realtime-2.1

$32/M audio in · $64/M audio out

Full-quality model for production voice agents

$0.40/M cached audio input
Higher Big Bench Audio scores
Native tool calling + interruption handling

Use cases

Voice agents Real-time speech Voice apps

What you can produce with OpenAI Realtime API

Native speech-to-speech, no separate STT/TTS pipeline
Built-in interruption handling mid-response
Function/tool calling during a live voice session
WebRTC and WebSocket transport options
gpt-realtime-2.1-mini variant at roughly 40% of full pricing
Prompt caching cuts repeated audio-input cost by over 98%

ASEAN Perspective

OpenAI Realtime API in Southeast Asia

ASEAN-region availability and pricing notes coming soon. Drop the editorial team a note via /contact/ if you can supply local context (Singapore/Malaysia/Indonesia/Thailand/Vietnam).

RECATOOLS Verdict

OpenAI's Realtime API does voice-to-voice natively instead of chaining speech-to-text, a text model and text-to-speech together, which is what killed the responsiveness of most 2023-era voice bots. gpt-realtime-2.1 scores meaningfully higher than its predecessor on audio-intelligence and instruction-following benchmarks, and OpenAI reports thousands of developers have shipped on it since the October 2024 beta.

The bill is the catch: audio tokens run far above text-token rates, and depending on caching and tool-output size, real deployments land anywhere from $0.05 to $0.46 per minute — a wide enough range that you need to measure your own traffic before committing. The WebSocket/WebRTC programming model is heavier than a REST call, you're locked into OpenAI's voices, and as of May 2026 the audio modality still isn't covered under OpenAI's or Microsoft's HIPAA BAA, ruling out healthcare voice agents outright. A gpt-realtime-2.1-mini variant cuts costs by roughly 60% if full quality isn't required.

Independent AI-assisted assessment by RECATOOLS.

What people say

OpenAI shipped the Realtime API into public beta in October 2024, and by its own account thousands of developers have built voice agents on it since — the pitch of streaming audio straight in and out, rather than bolting together a transcription model, an LLM and a TTS engine, was one of the first things that made real-time voice agents feel buildable outside a big speech-tech team. gpt-realtime-2.1 (July 2026) is the current model, and OpenAI's own benchmarks put GPT-Realtime-2 (high) 15.2% ahead of GPT-Realtime-1.5 on Big Bench Audio for audio intelligence, with the xhigh variant scoring 13.8% higher on Audio MultiChallenge instruction-following.

Cost is where most of the developer commentary lands. A HackerNoon analysis of roughly 4,000 measured sessions found uncached agents running $0.18-$0.46/minute, dropping to $0.05-$0.10/minute once prompt caching and trimmed tool outputs are in place — a real-estate-tour-assistant case study in the same piece landed around $0.069/minute with the AI talking 25-27 seconds per minute. That's a wide enough spread that teams need to measure their own traffic pattern rather than trust a single published rate. Audio output tokens price roughly double audio input, so verbose agents cost more than terse ones by design.

Quality complaints exist alongside the benchmark gains. One widely shared build log (Towards AI, May 2026) described the stock voices as sounding "robotic" for a production use case and documented switching to a hybrid stack to fix it — a reminder that benchmark scores and how a voice actually sounds to a caller aren't the same thing. Reviewers also flag that the model is only one piece of a production voice agent: carrier/telephony integration, the tool layer, observability and compliance are still entirely the developer's problem, and as of May 2026 the audio modality isn't covered by OpenAI's or Azure's standard HIPAA Business Associate Agreement, closing off healthcare use cases without a workaround. For outbound calling specifically, where the person didn't ask for the call, reviewers note the bar for sounding natural is unforgiving in a way inbound assistants don't face.

Summary of public user & expert reviews, compiled by RECATOOLS.

About this listing

Researched on Wednesday, 20 May 2026 at 08:00 SGT (UTC+8)

Published on Tuesday, 19 May 2026 at 08:00 SGT (UTC+8)

Last reviewed Sunday, 12 July 2026 (1 week ago)

This entry was compiled from publicly available data including OpenAI Realtime API's official website, press releases, documentation, and reputable third-party publications. RECATOOLS is not affiliated with OpenAI Realtime API unless explicitly stated.

Data accuracy

Third-party AI tools update their pricing, features, availability, and policies frequently. Information here may be outdated by the time you read this — we make reasonable efforts to keep listings current, but cannot guarantee absolute accuracy.

For the latest details, please refer to OpenAI Realtime API directly →

Spotted something out of date? Suggest an update →