Google rolls out Gemini Embedding 2 for multimodal AI applications

Google has released Gemini Embedding 2, a multimodal embedding model built on the Gemini architecture.

The model expands beyond earlier text-only embedding systems by mapping text, images, videos, audio, and documents into a single unified embedding space. It captures semantic meaning across more than 100 languages and supports AI tasks such as Retrieval-Augmented Generation (RAG), semantic search, sentiment analysis, and data clustering.

Gemini Embedding 2

Gemini Embedding 2 uses the multimodal capabilities of the Gemini architecture to generate embeddings from different types of data.

The model supports interleaved multimodal inputs, allowing developers to combine inputs such as text and images in a single request. This enables the system to capture relationships between different media types and process datasets that contain multiple formats.

Key features

Multimodal input support

  • Text: Supports up to 8,192 input tokens
  • Images: Processes up to six images per request, supporting PNG and JPEG formats
  • Videos: Supports video input of up to 120 seconds in MP4 and MOV formats
  • Audio: Directly processes audio without requiring transcription
  • Documents: Supports embedding PDF files up to six pages

Interleaved multimodal inputs

The model can process multiple media types within a single request, enabling contextual understanding between inputs such as image and text.

Matryoshka Representation Learning (MRL)

Gemini Embedding 2 incorporates Matryoshka Representation Learning, which allows embedding vectors to scale across different dimensions. The default dimension is 3,072, and developers can reduce the size to manage storage and performance requirements.

Recommended output dimensions:

  • 3,072
  • 1,536
  • 768

Model capabilities

According to Google, the model introduces multimodal embedding support across text, image, video, and speech tasks, while adding native audio processing capability.

Supported use cases

  • Retrieval-Augmented Generation (RAG)
  • Semantic search
  • Sentiment analysis
  • Data clustering
  • Large-scale data management

Availability

Gemini Embedding 2 is available in Public Preview through the Gemini API and Vertex AI. Developers can access the model through integrations with frameworks and vector database tools including:

  • LangChain
  • LlamaIndex
  • Haystack
  • Weaviate
  • Qdrant
  • ChromaDB

The model can also be used with vector search systems for multimodal data processing.


Related Post