What’s It About?
OpenAI has announced three new audio models for its Realtime API, each focusing on different aspects of speech processing. GPT-Realtime-2 enables natural live conversations with reasoning capabilities, GPT-Realtime-Translate translates spoken language between more than 70 input and 13 output languages, while GPT-Realtime-Whisper is designed for the instant transcription of audio to text. All three models are now available via the API for developers.
Background & Context
With GPT-Realtime-2, OpenAI presents what it describes as the first speech model with GPT-5 reasoning capabilities. The model is optimized for live conversations and can process complex requests, communicate without interruptions, and respond to objections or follow-up questions. The technological foundation allows conversations to flow significantly more naturally than with previous voice assistants.
The translation model GPT-Realtime-Translate aims to overcome language barriers in real time by maintaining the natural flow of speech during translation. GPT-Realtime-Whisper builds on the well-known Whisper technology and offers particularly low transcription latencies thanks to its streaming architecture. For European developers, it is relevant that the Realtime API enables data processing within the EU.
Pricing differs by model: GPT-Realtime-2 is billed per audio token at $32 per million input tokens and $64 per million output tokens. The other two models are billed on a time basis — GPT-Realtime-Translate at $0.034 per minute and GPT-Realtime-Whisper at $0.017 per minute.
What Does This Mean?
- Developers gain tools for demanding voice applications in areas such as customer support, education, and healthcare, where real-time communication is critical.
- The combination of reasoning capabilities and natural language processing could significantly improve the quality of AI-powered hotlines and virtual assistants.
- Multilingual applications become more practical for international teams and projects with real-time translation in over 70 languages.
- Live transcription enables new use cases such as automatic subtitling or the creation of meeting minutes during calls.
- EU data residency addresses data protection requirements of European companies and could increase acceptance in this market.
Sources
- OpenAI: New Audio Models for Real-Time AI Support (Heise)
- OpenAI Has New Voice Models That Reason, Translate, and Transcribe as You Speak (9to5Mac)
- GPT-Realtime-2: OpenAI’s New Voice Model (DataCamp)
- Advancing Voice Intelligence with New Models in the API (OpenAI)
- Audio API Documentation (OpenAI Developers)
This article was created with AI assistance and is based on the cited sources and the language model’s training data.
Further Reading: From Text Generator to Digital Employee: How AI Is Changing the World in Four Stages
