Tamil G in AINews

OpenAI rolls out GPT-Realtime-2, Translate and Whisper audio models for voice AI

OpenAI has introduced three new audio models in its API that enable developers to build a new class of voice applications. These models are designed to make voice interactions more natural, context-aware, and capable of taking action in real time.

The three models—GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper—move voice systems beyond simple call-and-response into continuous, agent-like interactions that can listen, reason, translate, transcribe, and act during conversations.

New realtime audio models

GPT-Realtime-2

GPT-Realtime-2 is OpenAI’s first voice model with GPT-5-class reasoning designed for live conversational use cases. It supports complex interactions where the model can think, respond, and use tools while the conversation continues.

It is built for situations where responses, actions, and reasoning must happen together without interrupting the flow of speech.

Key capabilities

Handles complex, multi-step voice requests in real time
Maintains continuous conversational flow with contextual reasoning
Uses tools during live conversations without breaking interaction
Supports spoken preambles such as “let me check that” or “one moment while I look into it”
Executes parallel tool calls with audible transparency (e.g., “checking your calendar”)
Improves recovery behavior with natural fallback responses instead of silent failure
Expands context window from 32K to 128K for longer sessions
Better handling of domain-specific vocabulary, proper nouns, and technical terms
Supports adjustable tone (calm, empathetic, or upbeat based on context)
Adjustable reasoning levels: minimal, low, medium, high, xhigh (low is default)

Performance improvements

+15.2% on Big Bench Audio (audio intelligence) vs GPT-Realtime-1.5 (high setting)
+13.8% on Audio MultiChallenge (instruction following) in xhigh setting

GPT-Realtime-Translate

GPT-Realtime-Translate enables real-time multilingual voice communication where speech is translated instantly while preserving meaning and pacing. It also supports live transcription alongside translation.

It is designed to maintain accuracy even in natural speech conditions such as interruptions, accent variations, or context switching.

Key capabilities

Supports 70+ input languages
Outputs in 13 languages
Real-time speech translation with preserved meaning and timing
Live transcription alongside translated output
Handles accents, regional pronunciation, and domain-specific language
Maintains fluency during natural or interrupted speech

Use cases

Customer support across languages
Education and classrooms
Cross-border communication and sales
Media, events, and creator platforms

For example, Deutsche Telekom is testing real-time multilingual voice interactions where users speak different languages while the system translates conversations instantly with low latency.

GPT-Realtime-Whisper

GPT-Realtime-Whisper is a streaming speech-to-text model designed for low-latency transcription. It converts spoken audio into text as it is being spoken, enabling real-time understanding and interaction.

It supports continuous transcription, making voice data usable immediately in workflows.

Key capabilities

Real-time transcription during speech
Low-latency streaming captions
Continuous understanding of live conversations
Designed for responsive voice applications

Use cases

Meeting captions and notes
Classrooms and education tools
Live broadcasts and events
Customer support workflows
Healthcare, recruiting, and sales systems
Real-time summaries and follow-up generation during conversations

Voice as a software interface

OpenAI highlights voice as one of the most natural ways to interact with software. It allows users to complete tasks without typing, such as getting help while driving, changing travel plans on the move, or receiving support in their preferred language.

However, effective voice systems require more than fast responses. A capable voice agent must:

Understand intent and maintain context
Adapt to changing requests during conversation
Use tools while continuing dialogue
Recover smoothly from interruptions or failures
Respond appropriately based on tone and situation

Together, these models move voice AI from simple interaction into systems that can complete tasks in real time while conversations are ongoing.

Emerging voice AI patterns

OpenAI identifies three key patterns shaping voice applications:

Voice-to-action

Users describe tasks, and the system executes them using reasoning and tools.

Example: Zillow-style assistants that can find homes, apply constraints, and schedule tours.

Systems-to-voice

Applications turn real-time context into spoken guidance.

Example: Travel systems that provide updates on flight delays, gate changes, fastest routes, and baggage status.

Voice-to-voice

AI enables real-time multilingual conversations across users and contexts.

Example: Deutsche Telekom-style systems that translate speech live during conversations.

These patterns can also combine. Priceline is working toward full trip management through voice, including flight search, hotel changes, delay handling, TSA updates, and translation during travel.

Safety and safeguards

The Realtime API includes multiple layers of safety and compliance protections:

Active classifiers monitor live sessions and can stop conversations that violate safety rules
Developers can add additional safeguards using the Agents SDK
Policies prohibit spam, deception, and harmful redistribution of outputs
Developers must disclose when users are interacting with AI unless clearly obvious
Supports EU Data Residency for regional compliance needs
Covered under enterprise privacy commitments

Pricing and availability

All three models are available in the Realtime API:

GPT-Realtime-2

$32 per 1M audio input tokens
$0.40 per 1M cached input tokens
$64 per 1M audio output tokens

GPT-Realtime-Translate

$0.034 per minute

GPT-Realtime-Whisper

$0.017 per minute

Getting started

Developers can test the models in the OpenAI Playground. They can also integrate GPT-Realtime-2 into applications using Codex or start building new realtime voice applications from scratch.

AI voice agentsconversational AIGPT-Realtime-2GPT-Realtime-TranslateGPT-Realtime-Whisperlive translation AImultilingual voice AIOpenAIOpenAI API updateRealtime APIrealtime speech processingspeech-to-textstreaming transcriptionvoice AI models

Tamil G:

DJI Osmo Mobile 8P with FrameTap remote launched in India
DJI has launched the Osmo Mobile 8P globally, including India. The latest iteration of the…
Telegram update brings Guest Bots, bot-to-bot communication, AI automation, and more
Telegram has introduced a new update adding AI-focused bot features, expanded automation options, and multiple…