Tamil G in AIGoogleNews

Google rolls out Gemini 3.1 Flash TTS with better speech quality and developer controls

Google has introduced Gemini 3.1 Flash TTS, a new text-to-speech model focused on improving speech quality, controllability, and scalability. The model is part of the Gemini 3.1 Flash Audio family and is designed for developers, enterprises, and general users building AI-based speech applications.

Gemini 3.1 Flash Audio belongs to the Gemini series of multimodal models, supporting audio alongside other modalities such as text, images, and video.

Gemini 3.1 Flash TTS

Gemini 3.1 Flash TTS is based on Gemini 3 Pro and is designed specifically for generating speech from text inputs.

Gemini 3.1 Flash TTS
- Input: Text up to 16K tokens
- Output: Audio up to 32K tokens
Gemini 3.1 Flash Live
- Inputs: Audio, images, video, and text up to 128K tokens
- Outputs: Audio and text up to 64K tokens

These configurations enable both standalone text-to-speech generation and multimodal interactions through the Flash Live variant.

Key features

Improved speech quality: Generates more natural and expressive speech output; achieved an Elo score of 1,211 on the Artificial Analysis TTS leaderboard
Cost and performance balance: Positioned in Artificial Analysis’ “most attractive quadrant,” indicating a balance between speech quality and cost
Audio tags for control: Allows users to guide tone, pacing, and delivery using natural language instructions embedded within text
Multi-speaker support: Supports dialogue between multiple speakers with distinct voice characteristics
Scene direction: Enables definition of context and interaction style to maintain consistent character behavior
Speaker-level controls: Supports assignment of audio profiles and adjustment of tone, accent, and pace

Inline expression changes: Allows voice style adjustments within a sentence using embedded tags
Developer control tools: Provides configurable controls in Google AI Studio, enabling detailed direction over speech output
Seamless API export: Allows configured voice settings to be exported as Gemini API code for reuse across applications
Multilingual support: Supports speech generation across more than 70 languages with localized control
Global scalability: Designed for deployment across different regions and use cases

Early developers and enterprise users report improved controllability and expressive output, particularly with audio tags

Safety

Audio generated using Gemini 3.1 Flash TTS includes SynthID watermarking, an embedded identifier that enables detection of AI-generated content and supports measures against misuse.

Availability

Gemini 3.1 Flash TTS is rolling out in preview:

Developers: Available via Gemini API and Google AI Studio
Enterprises: Available on Vertex AI
Workspace users: Available through Google Vids

Next Read: Adobe rolls out Firefly AI Assistant with creative agent for multi-step workflows »

AI speech modelAI text to speechaudio tags AIGemini 3.1 Flash TTSGemini APIGoogle AI StudioGoogle Gemini AIGoogle Vidsmulti speaker AImultilingual AI speechSynthID watermarkingVertex AI

Tamil G:

Google adds selfie video verification for Account sign-in and recovery
Google has introduced selfie video verification as a new sign-in option for Google Accounts. The…
Kia Syros EV with up to 526 km range, Level 2 ADAS and BaaS launched in India starting at Rs. 13.49 lakh
Kia India has launched the Kia Syros EV, its second made-in-India mass-market electric vehicle. The…
Google wraps up Build with AI: Code for Communities in India with 12 AI projects set for constituency pilots
Google has concluded the Parliamentary Showcase for Build with AI: Code for Communities, a nationwide…