Brainstorm AI Singapore 2024: Cutting Data Center Energy Consumption By 50%

Fortune

Presenter: Tim ROSENFIELD, Founder and CEO, Sustainable Metal Cloud

Transcript

00:00I've got a short amount of time, about five minutes here,

00:03to describe a little bit about what we do

00:06and how we fit into the AI and energy ecosystem

00:12before we have a little chat on stage.

00:14So I am Tim, co-founder, co-CEO of Sustainable Metal Cloud.

00:18I was trying to think of a way to describe

00:22what we do in a very short amount of time.

00:25And I guess I'd characterize it as we

00:28seek to solve AI's energy problem.

00:32AI has a substantial energy challenge and problem,

00:37which we'll unpack over the course of the next few slides.

00:40We started life in Australia, as you can tell from my accent,

00:44I'm Australian, about seven years ago.

00:47Our mission and our vision, myself and my co-founders,

00:51was as outsiders to the data center and AI,

00:57or as it was known then, HPC industry,

01:00how could we apply engineering to turn energy into knowledge

01:08more efficiently?

01:09Now, what I mean by that is, for most folks,

01:13the data center is a place that data is stored.

01:16It's now becoming a place that data is processed.

01:20These AI factories are actually large engineering challenges.

01:23And our process was, could we start afresh?

01:28Could we think of what is the best way to do this

01:31and to accomplish this?

01:32So we started in Australia, where

01:33we built a 20-megawatt engineering center.

01:37The problem that we're trying to solve for,

01:39how to turn energy into knowledge as cost-effectively

01:44and as efficiently as possible, starts with heat.

01:48Heat and cooling.

01:49And that is one of the drivers of AI's energy challenge.

01:53Roll it back one step, and the graph I've got on screen here,

01:57you can see a very clear trend.

02:00And what this is showing is that over the last decade or two,

02:04we've had real substantial improvements in chip density

02:08and chip efficiency.

02:10But as we've come to now two nanometer, three nanometer

02:13process nodes, we're getting less and less

02:16efficient with every step.

02:19And this has coincided with the rise of AI,

02:22which unfortunately uses the most energy-intensive

02:27and hottest chips out there.

02:29So behind me on the graph here, you can see in green,

02:32we have the power limits of NVIDIA's GPU series.

02:36So this is ramping up at a very quick rate.

02:40And this is driving a massive challenge in the data center.

02:44So more heat in the data center means

02:46you need more cooling to remove that heat.

02:49And that is why our solution is quite different.

02:54So we use liquid.

02:55We've developed a liquid platform

02:57that you actually take the servers, you put them in.

03:00It's called single-phase immersion, a full server

03:03in oil, and use the properties of oil,

03:06which has a heat coefficient 1,000 times greater

03:10than that of air, to remove the heat from the servers.

03:13Our system was designed to be modular,

03:17to go into any data center, and to host commonly found

03:21workloads in the AI ecosystem, like generative AI training

03:25using H100, Omniverse using L4DS,

03:29and other types of platforms.

03:32In a nutshell, what it is is we wanted

03:35to solve for taking any DC, like this diagram here,

03:40and coming up with a platform that

03:43would fit into that data center, whether that

03:46was built for liquid or not.

03:48So whether it was designed 15 years ago,

03:50or whether it was designed this year.

03:52Either way, we wanted something that

03:54was agnostic to the data center that it goes into.

03:58And then on top of that, we wanted

04:00to improve the efficiency of the servers themselves.

04:03So by integrating the data center

04:06with our new type of AI factory, with an NVIDIA GPU-based server,

04:13we could achieve breakthrough efficiency,

04:15which we'll get into in more detail in the next session.

04:18Long story short, by vertically integrating this solution,

04:22we cut around 45% to 50% of the energy

04:26to run NVIDIA GPU-based workloads today

04:29over standard air-based configurations.

04:33We've designed this to be containerized

04:35so that we can deploy it in an existing data center,

04:38or we can build greenfield data centers.

04:41What we're doing here in Singapore

04:43is we've deployed seven of these systems.

04:47We're deploying into India, we've deployed into Australia,

04:50and we're deploying into Thailand.

04:51And we're now looking to other regions and around the world.

04:54And the final slide before I leave you

04:56is how does this get used?

04:58So we're actually trying to take this

05:01to as many places as possible.

05:03And one of the ways we do that is

05:05installing the infrastructure in the data center,

05:08making it an AI factory, but then making it easy to use

05:12by allowing others to consume this service

05:14as a managed service,

05:16whether it's a cooling-as-a-service solution

05:17or whether it's a GPU-as-a-service solution

05:20with partners like NVIDIA.

05:22And customers around the world are able to access this today.

Category

Transcript

Recommended