Google Cloud unveils Trillium: The sixth-generation tensor processing unit

Pallavi Madhiraju May 15, 2024 5:12 pm

Google Cloud has introduced Trillium, the latest in its series of Tensor Processing Units (TPUs), marking a significant advancement in AI hardware technology. Announced at Google I/O, Trillium is poised to set new benchmarks for performance and energy efficiency, promising a 4.7X increase in peak compute performance per chip compared to its predecessor, TPU v5e.

Trillium: A Leap in AI Processing Power

Trillium, Google Cloud‘s sixth-generation TPU, is designed to meet the escalating demands for higher compute, memory, and communication capacities necessary to train and fine-tune the most capable AI models. This new generation not only doubles the High Bandwidth Memory (HBM) capacity and bandwidth but also enhances the Interchip Interconnect (ICI) bandwidth, achieving substantial improvements in processing ultra-large embeddings used in advanced ranking and recommendation systems.

Amin Vahdat, VP/GM ML, Systems, and Cloud AI at Google, emphasized the strategic importance of this development: “For more than a decade, we at Google have been developing custom AI-specific hardware to push forward the frontier of what is possible in scale and efficiency.”

Google Cloud introduces Trillium, the sixth-generation TPU, enhancing AI capabilities with groundbreaking technology.

Enhancing AI Model Training and Serving

Trillium’s impact is significant, underlined by its integration in Google’s latest AI models, including Gemini 1.5 Flash, Imagen 3, and Gemma 2, all of which are trained and served using TPUs. Its introduction is expected to transform how foundation models are trained, enabling faster processing with reduced latency and lower costs. Moreover, the sustainability aspect has not been overlooked, with Trillium being over 67% more energy-efficient than the TPU v5e.

The scalability of Trillium is another key feature. It can scale up to 256 TPUs in a single pod, with the capability to expand further to hundreds of pods, connecting tens of thousands of chips in a supercomputing network. This scalability is facilitated by multislice technology and Titanium Intelligence Processing Units (IPUs), supporting a multi-petabit-per-second datacenter network.

Transformative Partnerships and Applications

Trillium’s introduction is timely, coinciding with increasing adoption of AI across various sectors. Companies like Nuro in autonomous vehicle technology, Deep Genomics in drug discovery, and Deloitte, a Google Cloud Partner of the Year for AI, are already poised to leverage Trillium to revolutionize their operations.

Furthermore, Google Cloud’s AI Hypercomputer, an architecture specifically designed for cutting-edge AI workloads, incorporates Trillium TPUs. This integration supports a performance-optimized infrastructure that includes open-source software frameworks and flexible consumption models, empowering developers and enhancing AI/ML capabilities.

Jeff Boudier, Head of Product at Hugging Face, expressed excitement about the partnership with Google Cloud: “We are excited to further accelerate open source AI with the upcoming sixth-generation Trillium TPUs, and we expect open models to continue to deliver optimal performance thanks to the 4.7X increase in performance per chip compared to the previous generation.”

Looking Forward

As AI continues to evolve, Trillium TPUs represent a pivotal advancement in hardware that will power the next generation of AI models. Available later this year, Trillium TPUs will offer unprecedented opportunities for innovation and efficiency in AI applications.