• last year
Presenter: Romain Huet, Head, Developer Experience, OpenAI
Transcript
00:00Thank you. Good afternoon, everyone. Super thrilled to be here today.
00:04It's an interesting time in AI. The pace of change is really accelerating,
00:09and there are more AI applications going from prototype to production every single day.
00:14I'm Romain, and I lead developer experience at OpenAI.
00:17And you know, a lot of people know us for ChatGPT.
00:19But in fact, our first product was the OpenAI API,
00:22the ability for developers and for companies to take our AI models
00:27and bring them into their own products and services.
00:30And today, we're very fortunate to work with more than 3 million developers on the platform,
00:34and 92% of Fortune 500 actually use our product.
00:37So it's an amazing time to be all of us together and building in AI.
00:42You know, one of the favorite things in my role day to day
00:46is to work with amazing founders and companies who are pushing the envelope of these models every day
00:51and also showing them the art of the possible with what we put out there in the world.
00:56So in that spirit, instead of having any presentation of slides today,
01:00let's jump to my laptop and take a look at some live demos.
01:05So first off, I'd be remiss not to mention Sora.
01:08We just launched Sora this morning, our video generation model,
01:13and it's now available on Sora.com.
01:15So what you're looking at here is the homepage with features video
01:19already created in the last few hours by the community.
01:22It's pretty amazing.
01:23In fact, we can fire off maybe a first video generation here.
01:27I'll say a golden retriever dressed as Santa on Christmas Day.
01:33You know, sounds pretty cool.
01:36There is an enormous amount of demand at the moment for Sora.
01:39So it sounds like, you know, the service is already at capacity.
01:42That's okay.
01:43You can check it out later, you know.
01:46And these are some of the beautiful videos that are already creating.
01:50So now, in fact, to show you something more concrete but with a wink to
01:54Sora, I'm going to bring here something about multimodality.
01:59Multimodality is what we refer to as the ability for AI models to not
02:04just work with text but also work with speech, with audio, with vision,
02:10and also with video, as you've seen with Sora.
02:13So I wanted to bring you here with a demo of how we can combine all of
02:17these modalities together to create something all unique.
02:21So this is the only part of the demo that is not live.
02:24This is describing a prompt, as you can read on the screen in great
02:27detail, about kind of the history of the German automotive industry.
02:33And so if I click here, this is the video that came out of Sora.
02:45So what's pretty amazing here is that it's only about us stitching
02:48together some scenes out of Sora and adding a little bit of a
02:51soundtrack on top, but all of the images are untouched.
02:55This is straight out of the model.
02:57It's pretty magical.
02:59So now let's say we want to use such a video as a basis for a
03:03presentation, and what if we want to tell a story that these images
03:07and this video can tell us?
03:09So what I'll do next is I'll click this button, extract frames, and
03:13I'll slice a few frames out of this video that Sora generated.
03:18And now what I'll do next, I'll bring another modality of GPT-4,
03:22the ability to understand and extract insights from images or
03:26pictures, and that's what we call vision.
03:28And so what I'll do next here, I'll click this button, and I'll
03:31take all of these six frames I sliced out of the Sora video, and
03:35I'll send them to GPT-4 with vision to come up with a story.
03:39So don't blink.
03:40It goes fast.
03:41There we go.
03:42Every single time this is unique.
03:43Again, this is all happening in real time as we speak.
03:46So now out of these few frames, we were able to come up with that
03:50story.
03:51So this could be, for instance, the narration for a documentary,
03:55or this could be something that you can adapt to your own use
03:57cases and industries.
04:00But next, what if we were to bring this to life?
04:02What if we were to be able to bring a voice onto that narration?
04:06Now I'm sure some of you are already thinking ahead here.
04:09So what I'll do next is I'll click here to create a voiceover,
04:12and we now have amazing voices in our API, like preset voices,
04:17and they are much more expressive.
04:19We'll see that in a minute with the next demo.
04:21But here, I also wanted to show you a preview of our voice
04:25engine model.
04:26The voice engine model is the ability to take a short sample
04:29of a voice and turn it into a voice you can use in your apps.
04:33So this is only available to trusted partners for now, and
04:37we're actively testing this with them.
04:39We obviously take safety very seriously.
04:41We wanted to show you a quick preview of what that could look
04:44like.
04:45So I'll go ahead, and here I'm going to record a few seconds of
04:48my voice.
04:49I'm extremely excited to be on stage at the Fortune Brainstorm
04:53AI Conference today, and all I need is a few more seconds,
04:56and I think we should be good to go for me to actually narrate
04:59this beautiful video that Sora created.
05:02All right, so now that this is...
05:04So I'll go ahead, record it right there.
05:06I can just send the script that we created before.
05:09Now send this little sample to our voice engine, and we'll see
05:12how all of these can be combined together to create something
05:16a little unique here.
05:18So in a second, we should see that exact same video.
05:21In the heart of the German automotive industry...
05:24But now, narrate it.
05:25...Workers meticulously assemble the machines that...
05:27So one more modality that you can play with, right?
05:30...Every bolt secures a legacy of engineering precision and
05:34innovation.
05:35Next, I could also do translation.
05:37So as you can hear from my French accent, I can maybe also
05:41narrate it in French.
05:42So I'll go ahead and click French here.
05:44...Parallel commitment to quality and efficiency,
05:46setting a couture pour rejoindre le monde.
05:48And now this is me narrating this video in French.
05:51...Témoigne d'un engagement sans pareil envers la qualité...
05:54And last but not least, another modality that you can now tap
05:57into from our multi-modal AI model is the ability to bring
06:01the speech to text as well.
06:03So here, for instance, most of you don't speak French, but I
06:06can actually turn on the captions, and all of a sudden,
06:09we have this other modality of speech to text available.
06:13So this is just one example of how you can combine now these
06:17multi-modal models like GPT4 to create something very unique
06:21that you can adapt to your own businesses and products and
06:25services.
06:26But next, I want to show you something else about real-time
06:30and our real-time API.
06:32First of all here, how many of you in the room have tried
06:34ChatGPT with the advanced voice mode before?
06:37Wow, quite a few of you.
06:38That's awesome.
06:39Well, you know, one question that we've heard so many times
06:42over the past few months has been, how can I bring these
06:44exact same experiences into my own products, into my own
06:48services, whether it's to reinvent customer service at
06:51scale or if it's about, like, a new way to interface with
06:55products?
06:56Well, here, what I'm going to show you is a little sneak peek
06:59behind the scenes.
07:00So this is the OpenAI playground.
07:02So this is where developers and builders start their journey
07:06with our models.
07:07And so here, I can start a session at any time and actually
07:12engage to start to have a feel for how the real-time API and
07:16the speech-to-speech capabilities of GPT4 come to life.
07:19So let's try that.
07:22Hi, there.
07:24Hello, there.
07:25How can I assist you today?
07:26Yeah, I'm on stage at the Fortune brainstorm AI conference.
07:29Could you welcome the attendees for me?
07:32Welcome, everyone, to the Fortune brainstorm AI conference.
07:35Okay.
07:36So I like how you started, but I think I want you to bring more
07:38energy, more enthusiasm, and don't forget people watching on
07:41the live stream.
07:43Welcome, everyone, to the Fortune brainstorm AI conference
07:47at the fabulous St. Regis Hotel in San Francisco.
07:51To our incredible attendees and viewers online, prepare for an
07:56inspiring and groundbreaking event as we dive into the future
08:00of AI.
08:01Cool.
08:02Now, could you whisper with a very low voice a secret about San
08:05Francisco that maybe only the true locals could know, like
08:08very, very low, like...
08:11Did you know that underneath San Francisco's bustling streets,
08:16there's a hidden network of tunnels?
08:19I did not know that.
08:20These tunnels were used during Prohibition to transport liquor
08:25discreetly.
08:26It's a little-known secret.
08:28So as you can see, the ability to interrupt the model at any
08:31time, the ability to jump back and, like, generate intonations,
08:35all of that is at your disposal to create amazing ways to
08:38interface between your brand, your product, and your own
08:41customers.
08:42Maybe one last example here.
08:44I have a test for you here.
08:45So here is the deal.
08:47I'm going to speak to you in three different languages, and
08:50I'd like you to speak back to me with those same three languages
08:53mixed up.
08:54Does that sound good?
08:55Absolutely.
08:56I'm ready for the challenge.
08:58Let's hear it.
08:59All right.
09:00Ciao.
09:01Parla Italiano.
09:02Je suis français.
09:03Très heureux d'être ici on stage, and I can switch back to
09:05English at any time.
09:06Your turn.
09:07Ciao.
09:08Je parle français aussi.
09:10And I'm switching to English now.
09:12It's a pleasure to mix these languages on stage with you.
09:16So this is the kind of power now you have with these
09:19multimodal abilities for speech-to-speech.
09:22Here, one example, for instance, is that if you want to
09:24imagine what that could look like briefly, well, this is what
09:27you can do.
09:29You can now have voice as a modality to interface with no
09:32keyboard.
09:33What if I were to say, hey, can you show me on the mat where
09:36we are?
09:38Sure, Romain.
09:40Showing your current location in San Francisco now.
09:42Awesome.
09:43I'm planning a trip to Paris.
09:44Can you show that to me?
09:47Here's Paris, a beautiful city with rich history.
09:50Amazing.
09:51I like to see amazing Christmas decorations when I'm there.
09:53Can you show us a few examples to me?
09:58Here are some spots in Paris with amazing Christmas
10:00decorations.
10:02Galerie Lafayette, the Champs-Élysées.
10:04So this is one way you can imagine how we'll be able to
10:07reinvent computer-to-human interfaces in the future with
10:11multimodality.
10:12And, of course, this is not the only paradigm shift that we're
10:15excited about because we've also launched the O1 series of
10:18models that can now tackle very complex tasks with reasoning.
10:22And we strongly believe that reasoning plus multimodality
10:25combined will unleash AI agents next year.
10:29And so we're extremely excited about these two paradigm shifts.
10:32Thank you so much for your time.
10:34And I can't wait to see what you build.

Recommended