How Amazon is racing to catch Microsoft and Google in generative A.I. with custom AWS chips

Guardian Nigeria

Watch How Amazon is racing to catch Microsoft and Google in generative A.I. with custom AWS chips - Guardian Nigeria on Dailymotion

Transcript

00:00 in an unmarked office building in Austin, Texas.

00:03 - Come on in.

00:04 - There are two small rooms

00:05 where a handful of employees peer into microscopes,

00:07 soldering irons, or tiny tweezers in hand.

00:10 They're designing two types of microchips

00:12 made to power data centers,

00:13 and more recently, the AI boom.

00:15 - When we get this, what do we do?

00:17 We test it.

00:19 First thing that we do, we test it.

00:21 - But these chips aren't coming from NVIDIA,

00:23 AMD, or any of the chip companies

00:25 that have been hitting headlines and market milestones

00:27 since Chad GPT burst on the scene late last year.

00:31 I'm here inside Amazon's Austin, Texas chip lab,

00:33 where it makes its own custom microchips

00:35 to compete with those from Intel, NVIDIA, and other giants.

00:38 And it's actually a way for them to save money

00:40 and boost performance,

00:41 because it's one of the biggest buyers

00:43 of data center chips in the world.

00:45 AWS CEO Adam Slipsky told CNBC

00:48 that the chips that we saw here today

00:50 are powering large language models

00:52 and more for the AI boom.

00:53 - The entire world would like more chips

00:56 for doing generative AI,

00:58 and whether that's GPUs,

01:01 or whether that's Amazon's own chips that we're designing,

01:04 I think that we're in a better position

01:06 than anybody else on earth

01:07 to supply the capacity

01:09 that our customers collectively are gonna want.

01:11 - Amazon Web Services

01:12 is the world's biggest cloud computing provider

01:14 and the most profitable arm of the retail giant,

01:17 with an operating income of 5.4 billion in Q2.

01:21 Although that number has been down year over year

01:23 for three quarters in a row,

01:25 AWS still accounted for 70%

01:27 of Amazon's overall 7.7 billion Q2 operating profit,

01:31 giving it the cash it needs

01:32 for the huge undertaking that is custom silicon,

01:35 and a growing portfolio of developer tools

01:37 that could eventually propel Amazon

01:39 to the center of all the AI buzz.

01:41 - Many of our AWS customers have terabytes,

01:44 or petabytes, or exabytes of data

01:46 already stored on AWS.

01:48 And they know that that data

01:51 is gonna help them customize

01:53 the models that they're using

01:54 to power their generative AI applications.

01:57 - And yet others have acted faster

01:59 and invested more to capture business

02:01 from the generative AI boom.

02:03 Think Microsoft's reported $13 billion investment

02:06 in chat GPT maker OpenAI,

02:08 and Google's chatbot, BARD,

02:09 followed by its $300 million investment

02:12 in OpenAI rival, Anthropic.

02:14 AWS's profit margins have historically been far higher

02:17 than those at Google Cloud,

02:19 but those margins have been narrowing.

02:21 And although AWS's growth is still impressive,

02:23 that's happening at a slower pace too.

02:25 - Amazon is not used to chasing markets.

02:30 Amazon is used to creating markets.

02:32 And I think for the first time in a long time,

02:34 they're finding themselves on the back foot,

02:36 and they are working to play catch up.

02:39 - CNBC sat down with top AWS executives and analysts

02:43 to ask about custom chips

02:44 and how it plans to make strides in generative AI

02:47 to catch Google and Microsoft,

02:49 and perhaps give a needed boost to AWS too.

02:51 - We end up with a package part like this, right?

03:09 And this is an actual machine learning accelerator

03:14 that was designed, and you can see Annapurna Labs on it.

03:17 - In 2015, Amazon bought Israeli startup, Annapurna Labs,

03:20 to accelerate its dive into the chip business.

03:23 In July, we went to AWS's Annapurna location in Austin

03:26 for an exclusive look at the chip design process

03:29 with lab director, Rami Sino.

03:31 AWS also designs chips in Silicon Valley, Canada,

03:34 and at a larger lab in Israel,

03:36 then sends them off to be made by chip manufacturers

03:38 like TSMC in Taiwan.

03:40 AWS quietly started production of custom silicon

03:43 back in 2013 with a piece of specialized hardware

03:46 called Nitro, now the highest volume AWS chip,

03:49 with more than 20 million in use in every AWS server.

03:53 Then at AWS's big annual customer conference,

03:55 reInvent, in 2018,

03:57 Amazon launched its ARM-based server chip, Graviton,

04:00 to rival x86 CPUs from giants like AMD and Intel.

04:04 - It's probably high single digit

04:05 to maybe 10% of total server sales are ARM,

04:08 and a good chunk of those are gonna be Amazon.

04:11 So on the CPU side, they've done quite well.

04:13 - We're into our third generation of our Graviton chip.

04:16 That provides acceleration in terms of speed

04:19 and cost efficiency and power

04:21 for very general kind of web-based workloads.

04:24 - After announcing Graviton in 2018,

04:26 AWS announced its first AI-focused chips.

04:29 VP of product, Matt Wood,

04:31 showed us the two AI chips it has today.

04:33 - This big one here is called Tranium,

04:35 and this small one here is called Inferentia.

04:38 - Inferentia, Amazon's first AI chip, launched in 2019.

04:41 - Which we're on our second generation of,

04:43 which allows customers to deliver very, very low cost,

04:47 high throughput, low latency machine learning inference,

04:50 which is all the predictions of the,

04:52 when you type in a prompt into your generative AI model,

04:56 that's where all that gets processed

04:57 to give you the response.

04:58 With Inferentia, you can get about four times more throughput

05:03 and 10 times lower latency using Inferentia

05:06 than anything else available on AWS today.

05:09 - Tranium came on the market in 2021.

05:11 - All right, so this is a package part.

05:13 And then let me show you the other side.

05:15 What you see here is all the interfaces.

05:18 - Machine learning breaks down

05:19 into these two different stages.

05:21 So you train the machine learning models,

05:23 and then you run inference against those trained models.

05:26 And so we see a lot of customers that are interested

05:29 in training their own machine learning models

05:32 and their own generative AI models.

05:33 And so that's where Tranium really, really helps.

05:36 Tranium provides about 50% improvement

05:40 in terms of price performance,

05:42 relative to any other way

05:43 of training machine learning models on AWS.

05:46 - But for now, NVIDIA's GPUs are still king

05:48 when it comes to training LLMs.

05:50 AWS itself just launched new AI acceleration hardware

05:53 powered by NVIDIA H100s.

05:56 - Accelerating performance by up to 6X

05:59 and reducing training costs by up to 40%

06:02 as compared to EC2P, for instance.

06:05 - NVIDIA chips have a massive software ecosystem

06:09 that's been built up around them over the last 15 years

06:11 that nobody else has.

06:12 The big winner from AI right now is NVIDIA.

06:15 That seems clear.

06:16 - Still, Amazon is not the only non-chip giant

06:18 getting into custom silicon.

06:20 Apple has its M-series of chips.

06:22 And a couple years before Amazon had AI chips,

06:24 Google launched its own cloud tensor processing units,

06:27 or TPUs.

06:28 - Nobody's at the same scale as Google.

06:30 Google's been deploying this stuff for like eight years.

06:32 My assumption is all of the hyperscalers,

06:34 whether they've announced it or not,

06:35 are all working on their own accelerators.

06:38 And by the way,

06:38 many are also working on their own CPUs as well.

06:41 - But when it comes to custom chips,

06:42 Microsoft is lagging behind Amazon and Google.

06:45 Microsoft has yet to announce the Athena AI chip

06:48 it's been working on,

06:48 reportedly in partnership with AMD.

06:51 - I think the true differentiation

06:52 is the technical capabilities that they're bringing to bear.

06:55 Because guess what?

06:56 Microsoft does not have Trinium or Influencia.

06:58 - Generative AI is the current craze,

07:05 but Amazon was building out a broader AI infrastructure

07:08 for machine learning with dozens of services

07:10 long before it made chips or used them to train LLMs.

07:13 - Late 1990s, we were the first one

07:15 to actually leverage machine learning-based technologies

07:19 to reinvent our recommendation engines.

07:22 And we leveraged machine learning

07:23 to do things like a better product search

07:26 and then automating leveraging robotics

07:29 and computer vision in our Amazon FCs,

07:32 our fulfillment centers,

07:33 to help products ship faster,

07:36 to actually reinventing

07:38 completely new customer experiences

07:40 with things like Amazon Alexa.

07:43 - But when OpenAI launched ChatGPT in November, 2022,

07:46 Microsoft was suddenly dominating the AI headlines,

07:49 followed by Google's BARD in February.

07:52 Two months later,

07:52 Amazon announced its own large language model

07:54 called Titan and Bedrock,

07:56 a cloud service to help developers enhance software

07:58 using generative AI.

08:00 - I think ChatGPT and Microsoft

08:02 rollout of their initiatives was so fast,

08:04 so aggressive, so quick.

08:05 It caught a lot of the market participants' flat-footed.

08:08 Amazon is trying to educate the market

08:11 in order to close the gap.

08:12 But frankly speaking,

08:14 it's going to take a couple of months.

08:15 - Let's rewind the clock even before ChatGPT.

08:17 It's not like after that happened,

08:19 suddenly we hurried and came up with a plan

08:21 because you can't engineer a chip in that quick a time.

08:26 And I think it actually accelerated

08:28 some of the customer conversation

08:29 and their keenness to actually move forward

08:32 with generative AI deployments.

08:34 - Meta also recently announced its own LLM, LLAMA2.

08:38 The open-source ChatGPT rival

08:39 is available on Microsoft's Azure cloud platform.

08:42 Now, a leaked internal email shows Amazon CEO Andy Jassy

08:46 is directly overseeing a new central team

08:48 that's building out expansive large language models.

08:51 But so far, AWS has focused on tools

08:54 instead of building a ChatGPT competitor.

08:56 - So if you look at the Bedrock strategy

08:58 that they are trying to focus on,

08:59 they are betting the farm on the fact that enterprises

09:03 might not necessarily be building out their own GPT models.

09:07 - Bedrock gives AWS customers access to LLMs

09:10 made by Anthropic, Stability AI, AI21,

09:13 and Amazon's own Titan.

09:15 - Titan is actually a family of foundational models.

09:17 We have text-based models,

09:19 which are great for generative text,

09:21 so creating marketing copy and advertising,

09:24 chatbots, those sorts of things.

09:25 And then we have an embedding model,

09:27 which is great for personalization

09:29 and ranking those sorts of use cases.

09:31 (upbeat music)

09:34 - Amazon says its AI products are being used

09:41 by numerous customers, like Philips, 3M,

09:44 Old Mutual, and HSBC.

09:46 In the Q2 earnings call,

09:47 it said a very significant amount of AWS business

09:50 is now driven by AI

09:51 and the 20-plus machine learning services it offers.

09:54 - We don't believe that one model

09:56 is going to rule the world.

09:58 We understand we want our customers

10:01 to have the state-of-the-art models

10:04 from multiple providers

10:05 because they are going to pick the right tool

10:06 for the right job.

10:08 - One of Amazon's new AI offerings is AWS HealthScribe,

10:11 a service unveiled in July

10:12 to help doctors automatically draft

10:14 patient visit summaries and more.

10:16 Another big tool in the AWS AI stack is CodeWhisperer.

10:19 - CodeWhisperer generates code recommendations

10:22 from natural language prompts

10:24 based on contextual information,

10:26 and it's a tool that helps customers

10:29 understand the context of the task.

10:31 - The number of participants who use CodeWhisperer

10:34 were 27% more likely to complete their task successfully,

10:39 and they did it 57% faster on average.

10:43 - Last year, Microsoft also reported productivity boosts

10:46 from its coding companion, GitHub Copilot.

10:48 And then there's SageMaker,

10:51 Amazon's machine learning hub

10:52 that offers algorithms, models, and more.

10:55 - It's a kind of an aircraft.

10:56 In one example there,

10:58 it was 45% lighter for a particular carrier.

11:02 - In June, AWS also announced

11:04 a $100 million generative AI innovation center.

11:07 - We have so many customers who are saying,

11:09 "I want to do generative AI,"

11:10 but they don't all necessarily know what that means for them

11:13 in the context of their own businesses.

11:15 And so we're going to bring in solutions architects

11:17 and engineers and strategists and data scientists

11:22 to work with them one-on-one.

11:24 (upbeat music)

11:27 - When companies are choosing between Amazon,

11:28 Google, and Microsoft for their generative AI needs,

11:31 some may choose Bedrock

11:32 because they're already familiar with AWS,

11:34 where they run other applications and store a ton of data.

11:37 - If you took the data that we have in Amazon S3

11:40 that's stored on devices like this,

11:43 you stack them up, one on top of another,

11:47 it would take you all the way

11:49 to the International Space Station

11:51 and almost all the way back,

11:53 and that is a lot of data.

11:56 - At the end of the day,

11:57 Amazon does not need to win headlines.

11:58 Amazon already has a really strong cloud install base.

12:01 All they need to do is to figure out

12:02 how to enable their existing customers

12:04 to expand into value creation motions using generative AI.

12:09 - So how many AWS customers are actually using it

12:11 for machine learning?

12:12 - We have over a hundred thousand customers today

12:15 that are using machine learning on AWS,

12:18 many of which have standardized

12:19 on our machine learning service, which is called SageMaker,

12:22 to build, train, and deploy their own custom models.

12:25 - But in reality, that's not a big percentage

12:27 of AWS's millions of customers.

12:29 Although most aren't tapping into it for AI yet,

12:31 that could change.

12:33 - What we are not seeing is enterprises saying,

12:35 oh, wait a minute, Microsoft is so ahead in generative AI,

12:38 let's just go out and let's switch our infrastructure

12:41 strategies, migrate everything to Microsoft.

12:43 That is not happening, because at the end of the day,

12:45 even if you're trying to create a chatbot,

12:47 if you're already an Amazon customer,

12:48 chances are you're likely going to explode

12:50 Amazon ecosystems quite extensively.

12:52 - How quickly can these companies move

12:56 to develop these generative AI applications

12:58 is driven by starting first on the data they have in AWS

13:03 and using compute and machine learning tools

13:08 that we provide.

13:09 Imagine you're cooking dinner and you're using a new recipe.

13:15 It is a lot faster to start with ingredients

13:18 that you already know that have been cut

13:20 and prepared to go ahead and put together the recipe

13:23 than if you have to research all the ingredients,

13:26 get familiar with them, and then learn

13:28 how you're going to put them together

13:29 and cook with them, right?

13:31 That's what AWS customers are doing.

13:33 They have all the different ingredients

13:35 that they're familiar with and they know how to use,

13:37 and whether that's storage or compute,

13:40 or it's machine learning tools like Amazon SageMaker

13:44 and Amazon Bedrock, and they're putting it together

13:46 that much faster.

13:48 - And as generative AI continues to accelerate,

13:50 all the big players are scrambling to establish

13:52 how to use these tools responsibly and securely.

13:55 - Can't tell you how many Fortune 500 companies

13:57 I've talked to who have banned Chachapiti.

14:00 So with our approach of generative AI

14:04 and our Bedrock service, anything you do,

14:07 any model you use through Bedrock

14:09 will be in your own isolated,

14:12 virtual private cloud environment.

14:14 It'll be encrypted.

14:15 It'll have the same AWS access controls.

14:19 - Szilipski joined six other AI players

14:21 at the White House in July to sign pledges

14:23 to ensure that AI tools are secure.

14:25 - There are open problems that still need to be solved,

14:28 especially when you're trying to deal

14:29 with highly regulated industries

14:30 in financial services, healthcare, and beyond.

14:33 We still do not have any well-thought-out

14:36 regulatory guardrails around data protection,

14:40 private information protection,

14:41 and responsible AI capabilities in the space.

14:44 - There's also national security concerns.

14:46 The Biden administration has proposed new rules

14:49 that would require US cloud providers

14:50 like Amazon and Microsoft to seek government permission

14:53 before providing China with cloud computing services

14:56 using AI chips.

14:58 But for now, there's no slowdown in sight

15:00 for the development of new generative AI applications

15:02 or the chips needed to power them,

15:04 and that race is just beginning.

15:06 - So let's say that we're three steps into a race

15:09 and we start asking, "Well, who's ahead?

15:11 Who's behind? How do the runners look?"

15:13 But then you look up and you realize that it's a 10K race.

15:17 And then you realize it's the wrong question to ask.

15:20 Who's where three steps into the race?

15:23 The real question is what's gonna happen

15:24 at the end of the 10K race?

15:26 In this case, we're just at the very dawn of generative AI.

15:31 (upbeat music)

15:33 (upbeat music)

15:36 (upbeat music)

15:39 (upbeat music)

Category

Transcript

Recommended