Brainstorm AI 2024: OpenAI Demo

Fortune

12/10/2024

Presenter: Romain Huet, Head, Developer Experience, OpenAI

Transcript

00:00Thank you. Good afternoon, everyone. Super thrilled to be here today.

00:04It's an interesting time in AI. The pace of change is really accelerating,

00:09and there are more AI applications going from prototype to production every single day.

00:14I'm Romain, and I lead developer experience at OpenAI.

00:17And you know, a lot of people know us for ChatGPT.

00:19But in fact, our first product was the OpenAI API,

00:22the ability for developers and for companies to take our AI models

00:27and bring them into their own products and services.

00:30And today, we're very fortunate to work with more than 3 million developers on the platform,

00:34and 92% of Fortune 500 actually use our product.

00:37So it's an amazing time to be all of us together and building in AI.

00:42You know, one of the favorite things in my role day to day

00:46is to work with amazing founders and companies who are pushing the envelope of these models every day

00:51and also showing them the art of the possible with what we put out there in the world.

00:56So in that spirit, instead of having any presentation of slides today,

01:00let's jump to my laptop and take a look at some live demos.

01:05So first off, I'd be remiss not to mention Sora.

01:08We just launched Sora this morning, our video generation model,

01:13and it's now available on Sora.com.

01:15So what you're looking at here is the homepage with features video

01:19already created in the last few hours by the community.

01:22It's pretty amazing.

01:23In fact, we can fire off maybe a first video generation here.

01:27I'll say a golden retriever dressed as Santa on Christmas Day.

01:33You know, sounds pretty cool.

01:36There is an enormous amount of demand at the moment for Sora.

01:39So it sounds like, you know, the service is already at capacity.

01:42That's okay.

01:43You can check it out later, you know.

01:46And these are some of the beautiful videos that are already creating.

01:50So now, in fact, to show you something more concrete but with a wink to

01:54Sora, I'm going to bring here something about multimodality.

01:59Multimodality is what we refer to as the ability for AI models to not

02:04just work with text but also work with speech, with audio, with vision,

02:10and also with video, as you've seen with Sora.

02:13So I wanted to bring you here with a demo of how we can combine all of

02:17these modalities together to create something all unique.

02:21So this is the only part of the demo that is not live.

02:24This is describing a prompt, as you can read on the screen in great

02:27detail, about kind of the history of the German automotive industry.

02:33And so if I click here, this is the video that came out of Sora.

02:45So what's pretty amazing here is that it's only about us stitching

02:48together some scenes out of Sora and adding a little bit of a

02:51soundtrack on top, but all of the images are untouched.

02:55This is straight out of the model.

02:57It's pretty magical.

02:59So now let's say we want to use such a video as a basis for a

03:03presentation, and what if we want to tell a story that these images

03:07and this video can tell us?

03:09So what I'll do next is I'll click this button, extract frames, and

03:13I'll slice a few frames out of this video that Sora generated.

03:18And now what I'll do next, I'll bring another modality of GPT-4,

03:22the ability to understand and extract insights from images or

03:26pictures, and that's what we call vision.

03:28And so what I'll do next here, I'll click this button, and I'll

03:31take all of these six frames I sliced out of the Sora video, and

03:35I'll send them to GPT-4 with vision to come up with a story.

03:39So don't blink.

03:40It goes fast.

03:41There we go.

03:42Every single time this is unique.

03:43Again, this is all happening in real time as we speak.

03:46So now out of these few frames, we were able to come up with that

03:50story.

03:51So this could be, for instance, the narration for a documentary,

03:55or this could be something that you can adapt to your own use

03:57cases and industries.

04:00But next, what if we were to bring this to life?

04:02What if we were to be able to bring a voice onto that narration?

04:06Now I'm sure some of you are already thinking ahead here.

04:09So what I'll do next is I'll click here to create a voiceover,

04:12and we now have amazing voices in our API, like preset voices,

04:17and they are much more expressive.

04:19We'll see that in a minute with the next demo.

04:21But here, I also wanted to show you a preview of our voice

04:25engine model.

04:26The voice engine model is the ability to take a short sample

04:29of a voice and turn it into a voice you can use in your apps.

04:33So this is only available to trusted partners for now, and

04:37we're actively testing this with them.

04:39We obviously take safety very seriously.

04:41We wanted to show you a quick preview of what that could look

04:44like.

04:45So I'll go ahead, and here I'm going to record a few seconds of

04:48my voice.

04:49I'm extremely excited to be on stage at the Fortune Brainstorm

04:53AI Conference today, and all I need is a few more seconds,

04:56and I think we should be good to go for me to actually narrate

04:59this beautiful video that Sora created.

05:02All right, so now that this is...

05:04So I'll go ahead, record it right there.

05:06I can just send the script that we created before.

05:09Now send this little sample to our voice engine, and we'll see

05:12how all of these can be combined together to create something

05:16a little unique here.

05:18So in a second, we should see that exact same video.

05:21In the heart of the German automotive industry...

05:24But now, narrate it.

05:25...Workers meticulously assemble the machines that...

05:27So one more modality that you can play with, right?

05:30...Every bolt secures a legacy of engineering precision and

05:34innovation.

05:35Next, I could also do translation.

05:37So as you can hear from my French accent, I can maybe also

05:41narrate it in French.

05:42So I'll go ahead and click French here.

05:44...Parallel commitment to quality and efficiency,

05:46setting a couture pour rejoindre le monde.

05:48And now this is me narrating this video in French.

05:51...Témoigne d'un engagement sans pareil envers la qualité...

05:54And last but not least, another modality that you can now tap

05:57into from our multi-modal AI model is the ability to bring

06:01the speech to text as well.

06:03So here, for instance, most of you don't speak French, but I

06:06can actually turn on the captions, and all of a sudden,

06:09we have this other modality of speech to text available.

06:13So this is just one example of how you can combine now these

06:17multi-modal models like GPT4 to create something very unique

06:21that you can adapt to your own businesses and products and

06:25services.

06:26But next, I want to show you something else about real-time

06:30and our real-time API.

06:32First of all here, how many of you in the room have tried

06:34ChatGPT with the advanced voice mode before?

06:37Wow, quite a few of you.

06:38That's awesome.

06:39Well, you know, one question that we've heard so many times

06:42over the past few months has been, how can I bring these

06:44exact same experiences into my own products, into my own

06:48services, whether it's to reinvent customer service at

06:51scale or if it's about, like, a new way to interface with

06:55products?

06:56Well, here, what I'm going to show you is a little sneak peek

06:59behind the scenes.

07:00So this is the OpenAI playground.

07:02So this is where developers and builders start their journey

07:06with our models.

07:07And so here, I can start a session at any time and actually

07:12engage to start to have a feel for how the real-time API and

07:16the speech-to-speech capabilities of GPT4 come to life.

07:19So let's try that.

07:22Hi, there.

07:24Hello, there.

07:25How can I assist you today?

07:26Yeah, I'm on stage at the Fortune brainstorm AI conference.

07:29Could you welcome the attendees for me?

07:32Welcome, everyone, to the Fortune brainstorm AI conference.

07:35Okay.

07:36So I like how you started, but I think I want you to bring more

07:38energy, more enthusiasm, and don't forget people watching on

07:41the live stream.

07:43Welcome, everyone, to the Fortune brainstorm AI conference

07:47at the fabulous St. Regis Hotel in San Francisco.

07:51To our incredible attendees and viewers online, prepare for an

07:56inspiring and groundbreaking event as we dive into the future

08:00of AI.

08:01Cool.

08:02Now, could you whisper with a very low voice a secret about San

08:05Francisco that maybe only the true locals could know, like

08:08very, very low, like...

08:11Did you know that underneath San Francisco's bustling streets,

08:16there's a hidden network of tunnels?

08:19I did not know that.

08:20These tunnels were used during Prohibition to transport liquor

08:25discreetly.

08:26It's a little-known secret.

08:28So as you can see, the ability to interrupt the model at any

08:31time, the ability to jump back and, like, generate intonations,

08:35all of that is at your disposal to create amazing ways to

08:38interface between your brand, your product, and your own

08:41customers.

08:42Maybe one last example here.

08:44I have a test for you here.

08:45So here is the deal.

08:47I'm going to speak to you in three different languages, and

08:50I'd like you to speak back to me with those same three languages

08:53mixed up.

08:54Does that sound good?

08:55Absolutely.

08:56I'm ready for the challenge.

08:58Let's hear it.

08:59All right.

09:00Ciao.

09:01Parla Italiano.

09:02Je suis français.

09:03Très heureux d'être ici on stage, and I can switch back to

09:05English at any time.

09:06Your turn.

09:07Ciao.

09:08Je parle français aussi.

09:10And I'm switching to English now.

09:12It's a pleasure to mix these languages on stage with you.

09:16So this is the kind of power now you have with these

09:19multimodal abilities for speech-to-speech.

09:22Here, one example, for instance, is that if you want to

09:24imagine what that could look like briefly, well, this is what

09:27you can do.

09:29You can now have voice as a modality to interface with no

09:32keyboard.

09:33What if I were to say, hey, can you show me on the mat where

09:36we are?

09:38Sure, Romain.

09:40Showing your current location in San Francisco now.

09:42Awesome.

09:43I'm planning a trip to Paris.

09:44Can you show that to me?

09:47Here's Paris, a beautiful city with rich history.

09:50Amazing.

09:51I like to see amazing Christmas decorations when I'm there.

09:53Can you show us a few examples to me?

09:58Here are some spots in Paris with amazing Christmas

10:00decorations.

10:02Galerie Lafayette, the Champs-Élysées.

10:04So this is one way you can imagine how we'll be able to

10:07reinvent computer-to-human interfaces in the future with

10:11multimodality.

10:12And, of course, this is not the only paradigm shift that we're

10:15excited about because we've also launched the O1 series of

10:18models that can now tackle very complex tasks with reasoning.

10:22And we strongly believe that reasoning plus multimodality

10:25combined will unleash AI agents next year.

10:29And so we're extremely excited about these two paradigm shifts.

10:32Thank you so much for your time.

10:34And I can't wait to see what you build.

Brainstorm AI 2024: OpenAI Demo

Category

Transcript

Recommended