Google unveils TPU 8t and TPU 8i chips for agentic AI and reasoning workloads

At Google Cloud Next, Google announced its eighth-generation Tensor Processing Units (TPUs), introducing two purpose-built architectures: TPU 8t and TPU 8i. These chips are designed to support large-scale AI workloads, from model training and development to high-volume inference and agent-based systems.

The new generation builds on over a decade of TPU development and continues Google’s approach of co-designing silicon, networking, and software to improve efficiency and performance. TPUs have already powered foundation models such as Gemini.

Built for the agentic AI era

AI systems are moving toward agent-based workflows that require reasoning, multi-step execution, and continuous learning loops. These workloads demand infrastructure capable of handling long context windows, complex logic, and high concurrency.

To address this, TPU 8t and TPU 8i were developed in collaboration with Google DeepMind. The platforms are designed to support both large-scale training and real-time reasoning workloads, including emerging world models such as Genie 3 that simulate environments for agent learning.

Two architectures with different roles

Instead of a single general-purpose design, Google has introduced two specialized TPU systems:

TPU 8t for large-scale training and pre-training
TPU 8i for inference, sampling, and reasoning

Both chips can run a range of workloads, but this separation allows each to deliver better efficiency for its target use case.

TPU 8t: Designed for large-scale training

TPU 8t focuses on accelerating model training by combining high compute throughput, memory capacity, and interconnect bandwidth. It is built to shorten training cycles while maintaining high system utilization.

A single TPU 8t superpod scales to large configurations, allowing complex models to operate within a unified compute and memory environment.

Key capabilities

Massive scale: Up to 9,600 chips per superpod with 2PB shared high-bandwidth memory and 121 ExaFlops compute
Bandwidth improvements: 2x inter-chip bandwidth and up to 4x data center network bandwidth through Virgo Network
Faster data access: TPUDirect RDMA and TPU Direct Storage enable direct transfers, delivering up to 10x faster storage access

The top diagram shows the data transfer path without TPUDirect Storage. The bottom diagram shows TPU 8t data transfer with TPUDirect Storage between 2 TPU 8t chips and TPUDirect Storage with Managed 10T Lustre storage.

Near-linear scaling: Supports scaling to more than 1 million chips using JAX and Pathways
High utilization: Designed to achieve over 97% goodput through system-level reliability and optimization

The architecture also introduces SparseCore for embedding workloads and native FP4 precision to improve compute efficiency and reduce memory overhead.

TPU 8i: Built for inference and reasoning

TPU 8i is optimized for latency-sensitive workloads, particularly those involving multiple AI agents operating concurrently. It is designed to reduce delays in communication, synchronization, and memory access.

Key innovations

Higher memory bandwidth: 288GB HBM and 384MB on-chip SRAM, around 3x more SRAM than the previous generation
Lower latency: Collectives Acceleration Engine (CAE) reduces synchronization latency by up to 5x
Optimized networking: Boardfly topology reduces network diameter by over 50%, improving communication efficiency
Axion CPUs: Custom Arm-based CPUs with NUMA architecture improve system-level performance and data handling

These changes enable TPU 8i to deliver up to 80% better performance-per-dollar compared to the previous generation, especially for reasoning and Mixture-of-Experts (MoE) models.

Network and system design advancements

Google has introduced workload-specific networking architectures for both TPU variants.

Virgo Network (TPU 8t): A scale-out network with high-radix switches and a multi-planar design to reduce latency and increase bandwidth

Boardfly topology (TPU 8i): A hierarchical network that reduces communication hops from 16 in a 3D torus to 7, improving latency for distributed workloads

These changes are aimed at maintaining efficiency as clusters scale to large sizes.

Software and developer ecosystem

TPU 8t and TPU 8i support widely used frameworks, enabling developers to deploy models without major changes.

Supported stack

JAX, PyTorch, Keras
vLLM, SGLang, MaxText
XLA and Pathways for scaling and orchestration

Additional support includes native PyTorch (preview), Pallas and Mosaic for kernel-level optimization, and bare metal access without virtualization overhead.

Power efficiency and data center integration

Efficiency is a key focus for large-scale AI systems. TPU 8t and TPU 8i deliver up to 2x better performance-per-watt compared to the previous generation.

Google has optimized efficiency across the stack:

Integration of compute and networking to reduce data transfer overhead
Dynamic power management based on workload demand
Fourth-generation liquid cooling for higher density deployments

The company states its data centers now deliver six times more compute per unit of electricity compared to five years ago.

TPU 8t vs TPU 8i: Quick comparison

Feature	TPU 8t	TPU 8i
Primary use	Training / Pre-training	Inference / Reasoning
Network topology	3D Torus + Virgo	Boardfly
HBM capacity	216 GB	288 GB
On-chip SRAM	128 MB	384 MB
Peak FP4 performance	12.6 PFLOPs	10.1 PFLOPs
HBM bandwidth	6,528 GB/s	8,601 GB/s
Specialized unit	SparseCore	CAE
CPU host	Axion Arm-based	Axion Arm-based

AI Hypercomputer integration

Both TPU systems are part of Google Cloud’s AI Hypercomputer, which combines compute, storage, networking, and software into a unified platform. This setup is designed to support the full AI lifecycle, from training to deployment and large-scale inference.

Availability and industry adoption

Google said TPU 8t and TPU 8i will be generally available later this year through Google Cloud as part of its AI Hypercomputer platform.

The company also noted that organizations such as Citadel Securities are already using TPUs for advanced AI workloads, indicating early adoption in production environments.