Google has released Gemini Embedding 2, a multimodal embedding model built on the Gemini architecture.
The model expands beyond earlier text-only embedding systems by mapping text, images, videos, audio, and documents into a single unified embedding space. It captures semantic meaning across more than 100 languages and supports AI tasks such as Retrieval-Augmented Generation (RAG), semantic search, sentiment analysis, and data clustering.
Gemini Embedding 2
Gemini Embedding 2 uses the multimodal capabilities of the Gemini architecture to generate embeddings from different types of data.
The model supports interleaved multimodal inputs, allowing developers to combine inputs such as text and images in a single request. This enables the system to capture relationships between different media types and process datasets that contain multiple formats.
Key features
Multimodal input support
- Text: Supports up to 8,192 input tokens
- Images: Processes up to six images per request, supporting PNG and JPEG formats
- Videos: Supports video input of up to 120 seconds in MP4 and MOV formats
- Audio: Directly processes audio without requiring transcription
- Documents: Supports embedding PDF files up to six pages
Interleaved multimodal inputs
The model can process multiple media types within a single request, enabling contextual understanding between inputs such as image and text.
Matryoshka Representation Learning (MRL)
Gemini Embedding 2 incorporates Matryoshka Representation Learning, which allows embedding vectors to scale across different dimensions. The default dimension is 3,072, and developers can reduce the size to manage storage and performance requirements.
Recommended output dimensions:
- 3,072
- 1,536
- 768
Model capabilities
According to Google, the model introduces multimodal embedding support across text, image, video, and speech tasks, while adding native audio processing capability.
Supported use cases
- Retrieval-Augmented Generation (RAG)
- Semantic search
- Sentiment analysis
- Data clustering
- Large-scale data management
Availability
Gemini Embedding 2 is available in Public Preview through the Gemini API and Vertex AI. Developers can access the model through integrations with frameworks and vector database tools including:
- LangChain
- LlamaIndex
- Haystack
- Weaviate
- Qdrant
- ChromaDB
The model can also be used with vector search systems for multimodal data processing.