Almost exactly a year ago, Google launched its Tensor Processing Unit (TPU) v4 chips at Google I/O 2021, promising twice the performance of TPU v3. At the time, Google CEO Sundar Pichai said Google’s data centers “will soon feature dozens of v4 TPU pods, many of which will run on or near 90% carbon-free power.” Now, at Google I/O 2022, Pichai has revealed the fruit of that work: a TPU v4-powered data center in Mayes County, Oklahoma, which Google says is the world’s largest accessible machine learning center. audience in the world.
“This machine learning hub features eight Cloud TPU v4 pods, custom-built on the same network infrastructure that powers Google’s largest neural models,” Pichai said. Google’s v4 TPU pods consist of 4,096 v4 TPU chips, each delivering 275 teraflops of ML-targeted bfloat16 (“brain floating point”) performance. In total, this means that each TPU pod contains approximately 1.13 bfloat16 exaflops of AI power – and the Mayes County data center pods total approximately 9 exaflops. Google claims this makes it the largest such hub in the world “in terms of aggregate computing power”, at least among those available to the generic public.
“Based on our recent survey of 2,000 IT decision makers, we found that inadequate infrastructure capabilities are often the underlying cause of failed AI projects,” commented Matt Eastwood, VP senior president for research at IDC. “To address the growing importance of purpose-built AI infrastructure for businesses, Google launched its new machine learning cluster in Oklahoma with nine aggregate compute exaflops. We believe this is of the largest publicly accessible ML hub[.]”
Additionally, Google says this hub runs on 90% carbon-free power on an hourly basis — a feat that can be difficult, given the intermittency of renewable energy sources. To learn more about Google’s carbon-free energy methodology, Click here — but not all of the efficiency gains are attributable to Google’s renewable energy efforts. The TPU v4 chip provides approximately three times the flops per watt compared to the TPU v3, and the entire data center operates at a power utilization efficiency (PUE) of 1.10, which according to Google makes it one of the energy efficient data centers in the world. “This helps us progress towards our goal of becoming the first major company to operate all of our data centers and campuses globally with carbon-free power 24/7 by 2030,” Pichai told Google. I/O.
“We hope this will fuel innovation in many areas, from medicine to logistics to sustainability and more,” he said of the data center. To that end, Google has operated its TPU Research Cloud (TRC) program for several years, providing TPU access to “ML enthusiasts everywhere”.
“They’ve published hundreds of articles and open source github libraries on topics ranging from ‘Persian poetry writing with AI’ to ‘Distinguishing between sleep and exercise-induced fatigue’. using computer vision and behavioral genetics,” said Jeff Dean, senior vice president. of Google Research and AI. “The launch of Cloud TPU v4 is a major milestone for Google Research and our TRC program, and we’re very excited about our long-term collaboration with ML developers around the world to use AI for good.”
At I/O, Pichai suggested it was just one element of a deeper commitment to efficient and powerful data centers, citing the company’s planned $9.5 billion in investment. in data centers and offices across the United States in 2022 alone.