• 5 months ago
How will multimodal AI transform our technology and how we interact with AI? This conversation from Imagination In Action’s ‘Forging the Future of Business with AI’ Summit in April 2024 touches on how this advanced AI could be the secret to more advanced technological advances.

Subscribe to FORBES: https://www.youtube.com/user/Forbes?sub_confirmation=1

Fuel your success with Forbes. Gain unlimited access to premium journalism, including breaking news, groundbreaking in-depth reported stories, daily digests and more. Plus, members get a front-row seat at members-only events with leading thinkers and doers, access to premium video that can help you get ahead, an ad-light experience, early access to select products including NFT drops and more:

https://account.forbes.com/membership/?utm_source=youtube&utm_medium=display&utm_campaign=growth_non-sub_paid_subscribe_ytdescript

Stay Connected
Forbes newsletters: https://newsletters.editorial.forbes.com
Forbes on Facebook: http://fb.com/forbes
Forbes Video on Twitter: http://www.twitter.com/forbes
Forbes Video on Instagram: http://instagram.com/forbes
More From Forbes: http://forbes.com

Forbes covers the intersection of entrepreneurship, wealth, technology, business and lifestyle with a focus on people and success.

Category

🤖
Tech
Transcript
00:00 Welcome, welcome, welcome.
00:03 Let's get started.
00:06 This is the Multimodal AI Revolution panel.
00:09 We have a very exciting conversation for you here today.
00:13 We'll be discussing what's coming up next for Multimodal AI.
00:18 Quick definition for Multimodal AI is, it's a-
00:22 Shh.
00:27 I think that worked actually. All right.
00:31 So in this conversation, we'll be covering what Multimodal AI is,
00:36 whether it can augment human lives,
00:38 how variables can leverage it,
00:40 and we'll also discuss some ethical and
00:42 technical challenges that surround the space.
00:45 A quick definition for Multimodal AI is,
00:49 it's a machine learning model that's capable of
00:51 processing images, videos, and text,
00:55 and it can do other forms of modality as well.
00:57 An example of this currently is using GPT Vision,
01:01 where you can give it a picture of the ingredients you have access to,
01:05 and it can create a recipe for you.
01:07 So let's get started. Let's do a quick round of introductions.
01:11 Talk about, introduce yourself,
01:13 what are you working on,
01:14 and what is one ability of Multimodal AI that you find most exciting?
01:19 Let's start with you, Tyson.
01:20 Yeah, absolutely.
01:22 Yeah, my name is Tyson.
01:23 I'm one of the co-founders of Avoca AI.
01:27 It was actually a company that my co-founder,
01:32 Porv and I started a bit over two years ago.
01:34 We were students at MIT over seven,
01:39 eight years ago, class of 2017,
01:41 and actually did a lot of research at the Media Lab.
01:44 So great to be back today.
01:46 Avoca leverages Voice AI to build
01:50 the world's most advanced receptionist for a lot of antiquated industries,
01:56 including home services.
01:58 So these are electricians, plumbers, HVAC,
02:01 the people that you probably think would be the last people to be utilizing AI.
02:07 But yeah, that's what we're working on in a nutshell.
02:11 In terms of Multimodal,
02:13 I think the area that I'm most excited about is
02:17 the ability to actually incorporate
02:22 not just text, but the emotional aspect.
02:26 And understand, because in the world of HVAC and plumbing,
02:31 it's not just customer service you're dealing with.
02:34 You actually need to make a sale.
02:36 And in order to make a sale and be convincing,
02:40 we need AI that cannot just understand what people are saying,
02:43 but the nonverbal stuff and the stuff around what they actually mean.
02:48 Because that's actually a lot of where the interesting parts are.
02:53 Exciting. James?
02:55 Hi everyone, I'm James.
02:57 Currently running DevEx,
03:00 Developer Relations at 12 Labs.
03:02 And our company is building Multimodal AI for video understanding.
03:07 So back to Ayush's definition of Multimodal,
03:11 just like how the baby is trying to acquire knowledge.
03:16 They read the text, they hear sounds, feel the emotion,
03:19 smell the odors, all these different scents and all that is coming in.
03:24 We're trying to build the title production models that are doing the same,
03:28 interpreting the visual element, the speech element,
03:32 as well as the text element inside a video.
03:34 And then come up with a comprehensive representation of that video.
03:39 And if you think about the title data in the world,
03:43 I'd say more than 80% is unstructured data,
03:46 more than 80% is actually video data.
03:48 And unlike text and image, video is a very challenging thing to tackle
03:53 because of the temporal dimension, how things move over time,
03:56 the consistency between visual and speech and text.
04:01 So we're trying to build the type of AI that can tackle these challenging theoretical problems.
04:06 In terms of use cases, I think we have a lot of different vehicles,
04:11 ranging from sports to media and entertainment,
04:15 to e-learning and even security, surveillance, healthcare.
04:19 People building video search tools to find interesting moments
04:23 in a football game or a baseball game.
04:25 They use our tool to quickly edit video to make a new TV show or movie trailer.
04:32 They even use our tool to find weapons, violence on body count footage of the police.
04:40 So I think any industry that requires a lot of video data
04:44 can benefit from the type of multimodal understanding that we do.
04:49 Hi, can you hear me? Is that working?
04:52 My name is Lexie Mills. I'm a digital communication specialist.
04:56 We focus on emerging technologies.
04:58 So anything where there isn't a word or people aren't searching for a word,
05:02 it's our job to help people use the word, understand it.
05:06 On the other side of what we do, we have a foundation that looks a lot at digital ethics
05:12 and more recently in digital forensics.
05:15 So over the last three, four years, we've been using our skills in an inverse way
05:21 to mine data and information for different types of abuse cases,
05:26 which are typically quite hard to prosecute.
05:28 Whereas now we can get huge amounts of data using free off-the-shelf AI tools
05:34 to be able to prosecute cases that previously would have just slipped under the radar
05:38 due to lack of evidence.
05:40 Hey everyone, my name is Apoorv.
05:44 I'm the other co-founder of Avoka that Tyson previously mentioned.
05:48 So yeah, just to recap, we're like a receptionist for these home service businesses.
05:52 I think Tyson covered most of it on I think what's exciting and what we're actually working on.
05:56 I would say the big thing to maybe emphasize around where AI is headed now,
06:01 it's kind of like we've always seen a lot of these customer support startups
06:04 working AI, infiltrating some different companies.
06:07 What we are working on at Avoka and I think what we're starting to see happen more
06:10 is how that can infiltrate sales.
06:12 Sales requires a lot more emotions, it requires a lot more understanding of the human nature.
06:17 I don't think AI can do all of sales, but it can significantly improve it.
06:21 And so yeah, that's essentially what we're working on.
06:24 It's quite exciting.
06:25 Very exciting.
06:26 James, so you work for a company called 12 Labs.
06:30 It basically helps understand video.
06:32 So right now if I go to YouTube and I do a search, it's semantic and it'll try to find
06:36 the exact transcript or find the keyword.
06:39 But what your company does is it finds snippets from the video.
06:42 So I can just ask and be like, where was that robot?
06:46 Which part of the movie did that robot come in?
06:48 And 12 Labs will help me identify it.
06:51 Is that correct?
06:52 Yeah, that's correct.
06:53 I think our hypothesis is video understanding has not evolved a lot over the past decade.
06:59 The way research tackled that is they view specific computer vision optimized for a very
07:06 specific task like key-power estimation, object detection, semantic segmentation, et cetera.
07:12 They generate metadata or text from the video and then when they say perform search, they
07:16 actually do keyword search or metadata search based on that text or transcript.
07:22 But that cannot capture the visual element of the video and also maybe can totally disconnect
07:28 it from what's happening.
07:29 And so with the rise of Transformer and the versatility of multi-modal data, we can create
07:37 basically embeddings from this video, which is like a vector representation of the video
07:42 content.
07:43 And when you perform search, you actually do semantic search on that video embedding space
07:46 and the result is much more holistic and native to the way models learn.
07:53 So I think that's the future.
07:56 Just like how Transformer transformed NLP, we're seeing the same thing happening with
08:01 video Transformer transforming video understanding.
08:04 Awesome.
08:05 And Tyson, so this ties to the work you guys are doing where you're trying to identify
08:09 non-verbal communication.
08:11 I read a stat somewhere that said 80% of communication is non-verbal.
08:15 So the way I'm moving my hands, the way you're looking, your facial features changing, et
08:21 cetera.
08:22 Are you capturing the video footage as well?
08:24 Because I know right now you're doing just voice calls, but at some point you plan to
08:28 capture video footage as well, use something like 12 labs to get that visual context for
08:33 emotional intelligence.
08:34 Yeah.
08:35 So right now we're primarily or almost exclusively working purely in the voice realm because
08:42 remember, most of our customers are antiquated industry folks like home services and they
08:50 unfortunately don't have the luxury of getting their customers to call in on Zoom.
08:55 And so it's all purely phone communication.
08:59 But even within phone communication, there's so much that is not captured just simply from
09:04 transcribing that and analyzing the words.
09:08 There's the tonality, whether the customer is angry, upset.
09:13 One thing that we're really keen on is kind of understanding, at the beginning of the
09:18 call, measuring the customer sentiment and then seeing what the customer sentiment is
09:23 at the end of the call and seeing what that delta is.
09:27 And that's a good metric for us to determine whether or not we did a good job with improving
09:32 the customer's day and talking to them.
09:36 So this is for you, Apoorv, and for you, Tyson, as well.
09:39 We've had sentiment analysis for a while, right?
09:43 Do you think the models now have just made it 10 times better?
09:47 What is the difference you're seeing with what's happening now versus sentiment analysis
09:51 and natural language stuff we had in 2015?
09:53 Yeah, I think maybe I can add to that.
09:55 I think there's two things.
09:56 I think one, I think the analysis of text has definitely 10xed in terms of our ability
10:01 to do that in several years.
10:02 But I think the more important and bigger thing that's going to be emerging is actually
10:06 models that do not even look at text.
10:08 They're focused more on the sound that the human is making on the other side.
10:12 And so I think there's another company that's actually exclusively focused on this, like
10:15 Hume.ai, I want to check out.
10:17 But essentially, what you can do is you can actually train models now that go off of a
10:20 base layer where they can go and actually hear what the person is saying and be like,
10:24 "Okay, is this more likely to be high energy or is this more likely to be low energy?"
10:28 And I think understanding that gives us the next wave of unlocking voice applications.
10:33 This is very interesting because Hume.ai and I, we had a conversation after a hackathon
10:37 because I was trying to build this thing which would give ChatGPT emotional intelligence.
10:41 And I'm using ChatGPT to teach me stuff.
10:46 And I teach myself, so when I'm talking to a student, I can tell this person's losing
10:50 interest or this is too hard for them, this is too easy.
10:53 And I can change the content I'm sharing with them.
10:56 Similarly, I was trying to use my facial features as a modality which I can give to ChatGPT
11:02 through a GPD plugin.
11:04 And that's when I came across Hume.ai.
11:05 The work they're doing is around basically giving emotional intelligence, which is a
11:10 whole set of modalities to AI.
11:14 Do you think that has, would you be able to integrate that into work, into your startup?
11:20 And what are the implications of that going to be like for you?
11:23 Yeah, I mean, tremendous.
11:24 I think one of the things that, even obviously with a voice agent, I think one of the most
11:33 common problems is that the voice agent is not able to actually understand how the end
11:39 customer is feeling.
11:41 And so when it comes to elevating the level to actually sales, when someone has 10 options
11:48 for who they want to install a new HVAC unit, the cues around, are they actually interested
11:55 in buying?
11:56 Do they want to hear about all the upgrades?
11:59 Or are they just someone that just wants to get the cheapest option?
12:03 Being able to decipher that and then navigate the conversation from there is extremely important.
12:08 That is very interesting.
12:10 And Lexi, in your previous jobs, you've worked as a head of communications.
12:14 And on your LinkedIn, I saw this in your bio and I thought this was incredible.
12:18 It said, "Lexi combines technical search knowledge with psychology to create data-driven, measurable
12:23 communication strategies that maximize influence on human behavior."
12:27 You think tactics like this, where we're using more than what we naturally know, and we're
12:34 augmenting our lives through AI, is going to have significant impact on human communication?
12:38 Yes, definitely.
12:39 You know, there's almost no AI that isn't somewhat trained on internet data.
12:44 And the thing is, Google's objective is primarily to give us what we want and as fast as possible.
12:51 But what we want and what we need can often be very different.
12:56 And so I do a lot of work in debt management.
12:58 I bizarrely bonkers about how we communicate around debt.
13:02 And when someone types in "get rid of debt," they'll get different search results compared
13:07 to someone who uses good language or good grammar.
13:10 But that is omitting what level of fear are they in at that point in time.
13:15 And we could be regulating advertising based on someone's emotional state.
13:19 So they're making emotionally intelligent decisions or not emotionally deficient.
13:25 And if we take it a little bit further, if you think about something like lung cancer
13:30 survival statistics, you're either researching that because you're a researcher, or you're
13:36 most likely researching that because you know someone with it.
13:40 Now getting the statistics isn't super helpful unless you have the context.
13:44 You know, there are several tests you need to interpret that data.
13:48 Getting the information fast actually isn't even giving you accurate information.
13:52 It's not giving you the context to digest it.
13:55 And knowing who's online that you could speak to, that coming up first, giving you a warning,
14:01 actually this information won't be helpful unless you understand X and Y, means that
14:05 people's entire search journey becomes more intelligent.
14:09 And then we're going to be looking at how we optimize for that thereafter, because the
14:13 structure of the internet feeds directly in to what we see optimizing in certain LLMs.
14:18 It's interesting.
14:20 From an ethics standpoint, do you think it is right to be analyzing someone in this much
14:27 level of detail, where I'm getting, you know, micro changes in their facial features, and
14:32 I'm able to decipher what they might be thinking deep down.
14:36 You think that's ethical?
14:38 I think there are ethical challenges to it, but I think it's also unethical to not be
14:42 doing so.
14:43 Right?
14:44 Right now, a lot of the ad tech is coded to take advantage of emotional states that we
14:48 understand through language time of search as well.
14:52 And so by not doing it, we have ethical concerns.
14:55 It's just that we're already in that flow, so we're not questioning it.
14:58 We tend to question new problems, new challenges, new technology.
15:02 But actually, a lot of the challenges we see with new technology have existed previously.
15:08 That's interesting.
15:11 One of the things that I've been...
15:12 I'm guessing you've been following the New York Times suing OpenAI for using the text
15:18 they've generated, their articles to train their training data.
15:22 You anticipate issues with, you know, Marvel Studios coming or Universal coming and going
15:27 to Sora, OpenAI, and being like, "Listen, you used our training data to generate these
15:33 videos."
15:34 Do you think that could be an issue, starting with you?
15:37 I think there are going to be issues.
15:38 I mean, the New York Times has quite an issue with bias over the years.
15:42 And so I think there are what we're seeing people conveying as fears, and what are the
15:46 underlying fears.
15:48 There's a great book called The Grey Lady Winked, which is about the historical bias
15:52 across the New York Times.
15:54 And we've seen it in all news.
15:57 Search is biased.
15:58 So the content it's drawing from is also biased.
16:02 And you've got double bias, and you've got double bias scaled.
16:05 And then you've got copyright issues thereafter.
16:09 From a business standpoint, if you want to be really crude and vicious, yeah, you should
16:14 probably stop people learning from your content just to protect anything else that could be
16:18 revealed within it, not just to protect your financial interests.
16:24 But that has ethical implications too.
16:27 I do think there is going to be significant change in how we process conversations and
16:32 how we make decisions.
16:34 James, are you seeing any interesting work coming out in this space with new models that
16:40 are being trained right now or that are being released?
16:43 Yeah, yeah, for sure.
16:45 A big company, obviously, doing it right now.
16:48 I think we have a couple of folks from Gemini at the summit.
16:53 And we talk about GPT-4V, Sora, Anthropic, even Cloud got vision capabilities.
17:00 And then within the startup atmosphere, competing with us with Adev, Redcar, I think even Hacking
17:10 Face start releasing open source vision language model.
17:14 In academic open source community, I think the most popular one is Nava.
17:19 And they have a couple of multiple versions of that.
17:22 And I think there's new research coming out from academia all the time.
17:26 And if you're interested in learning more, just trying to check out conferences like
17:31 CVPR or HTML.
17:34 Yeah, those are very powerful.
17:37 I think internally at 12 Labs, we're also in the process of keep building more and more
17:42 video production models, video language model that can enable these interesting use cases.
17:46 And I think the best feeling is when developers and users are actually using our models for
17:52 real-world use cases.
17:54 And last year, we hosted a hackathon, actually with 11 Labs.
17:58 It's a funny name, 11 Labs and 12 Labs.
18:01 But we focused on multi-modal AI.
18:03 And a lot of people like Hacker was building interesting application from e-learning to
18:08 social impact use cases.
18:10 And that's actually how I got connected with Ayush, because he was on the winning team
18:15 of the hackathon.
18:16 And we got to stay in touch and try to see more of how much the viewers change just over
18:23 the past six months.
18:24 Yeah, it is.
18:26 So I'll go to Tyson and Apoorva next, and we'll talk about the challenges.
18:30 But just reflecting real quick, James and I met at that hackathon.
18:35 And we were trying to-- I realized I was watching lecture videos, and I would tend to zone out
18:40 a fair bit.
18:41 And I realized different people have different interests where they're focused completely
18:46 in certain areas where they zone out.
18:48 So we used an EEG headset to measure brainwaves and build this knowledge graph, and then use
18:54 transformers to literally make any part of the lecture that's not exciting, exciting
18:59 for the things you care about.
19:01 And then that was enabled by 12 Labs and 11 Labs, which made it easier for us to generate
19:06 voices.
19:07 So we had Steve Jobs coming out and asking us a question and being like, hey, are you
19:11 losing interest?
19:12 Come back in, and trying to bring us back in.
19:15 So Apoorva and Tyson, you guys are using this in production for your company.
19:20 What are some challenges you're seeing right now that are preventing you from making it
19:27 highly scalable where everyone else can use it?
19:30 I think-- maybe I can start there.
19:32 I think right now with how voice AI is and where we're at with the product in terms of
19:37 understanding human emotions, being able to be emotive back and sound human-like, I think
19:42 the first 20 to 30 seconds of a conversation can be very well done by an AI.
19:47 The AI can essentially understand the human's problem, understand what to do next.
19:52 Should you be closing this person on a sale?
19:54 Should you be answering some kind of question?
19:56 Should you be routing it to someone else?
19:58 That part, I think we're at a place where AI is actually better than a human, because
20:02 the AI will always pick up every call within one second.
20:05 Where I think we are seeing challenges with us, and I think generally in the industry,
20:09 is the part after that.
20:10 So for our use case, pretend that your sink just broke and it's just flooding with water
20:16 and you need to get that repaired.
20:17 If you go call and you see something that's robotic answering you for a minute or two
20:21 minutes, you're going to start getting very agitated.
20:23 You're like, OK, please transfer me to a human.
20:25 This is a serious issue.
20:26 I don't want to waste time talking to a robot.
20:29 And I think that shift starts happening after that 20 to 30 second mark.
20:32 And so what we need to see right now is for the AI to be much smarter in terms of understanding
20:38 their services, understanding when technicians can come out, or how to actually solve the
20:43 end customer problem.
20:45 And I think that change is still, I think, you still need to be a little bit more understanding
20:50 of human emotions, being able to empathize with the customer, circling back with them.
20:54 And so we're quite not there yet.
20:56 Is there anything you'd like to add to this?
21:00 Yeah, I think that's the primary one.
21:03 I mean, the other one is just a lot of, you know, before I started Voka, I was working
21:09 at a self-driving car company, Nero.
21:12 And one of the innate biases that or challenges is that, you know, people have a fundamental
21:19 distrust of AI.
21:21 And so even, you know, at Nero we saw, you know, there were times where our like miles
21:28 per critical disengagement is like a self-driving kind of golden metric.
21:33 You know, there was times where that was getting, you know, certain, you know, areas that our
21:39 MPCD was better than the human average.
21:43 But people are still afraid because as long as an AI makes a mistake, you know, they're
21:48 upset.
21:50 So we're running into the same thing at Voka where sometimes the AI may be better at solving
21:55 their problem, but because they've had so many bad experiences with, you know, AI, you
22:02 know, phone AI and IVRs and stuff in the past, they're just starting at a baseline where
22:09 they have a fundamental distrust.
22:11 And so it's, you know, you have to almost be much better in order to get people to change
22:17 their behavior.
22:18 That is interesting.
22:20 One of the things you touched on was the lack of large context models right now.
22:25 I can hold everything.
22:26 I can say a conversation's been going on for 10 minutes, holding that in memory.
22:31 And you know, most, I think GPT-4 is like 32K, Claude is 200K token size.
22:37 And now we have Gemini, which 1.5, which is a million tokens.
22:40 You think as these larger models come out, the space for AI variables becomes big because
22:46 we have, we'll be able to hold all the conversations we're having throughout the day in one context,
22:52 maybe do like multiple conversations back and forth.
22:55 Do you think that that is going to be the key solution for the problems?
23:00 Yeah, from my point of view, that's going to be huge.
23:02 I don't think it's going to be everything.
23:03 Like, so for example, even with Gemini, with how big the context length is, generally being
23:08 in the beginning of the prompt or beginning of the context is usually, it leads to higher
23:11 accuracy and there's a lot of things like that.
23:13 So I think generally it's going to need to be able to consume that information quite
23:16 well.
23:17 So context length is one, but then the depth with, you can actually consume that context
23:21 as a second.
23:22 But that will be totally a game changer.
23:24 I do think that there's other aspects to that.
23:26 Like I think with human conversation, it's not just something you can codify.
23:30 It's often like a lot of the things are your brain understanding.
23:33 Like if you think about it as a how brain process a human conversation, there's a lot
23:37 of similarities from like, you know, whatever many years you've had conversations with humans
23:41 that you pick up on, that's like emotional intelligence.
23:43 That part needs to, we need to figure out how to codify that better.
23:46 And that's where multimodal modality can be huge with like video, hearing sounds, things
23:50 like that.
23:51 And that would be the next step, how to actually codify that properly into a context.
23:55 Awesome.
23:56 So we are coming up on time.
23:57 I'll do one final question.
23:58 We're seeing a lot of multimodal AI companies come up and this is for everyone who go down
24:04 the, down the row.
24:07 What is one, as it becomes widely adopted, what do you think will differentiate companies
24:12 that really succeed in the space and stay around versus all the fluff that we're seeing?
24:18 You want to start off?
24:19 Sure.
24:20 I think, you know, many things, I think for us, one of the bets that we have at Avoca
24:25 is kind of a deep verticalization.
24:28 And so, you know, by being the company that is so ingrained in home services, we eventually
24:33 develop a, you know, a mode on the types of data, but then also around the integrations
24:38 and how we are able to, you know, fit kind of every single one of these needs.
24:44 And then also, you know, the types of, you know, use cases and objections and paths that
24:51 we are able to find.
24:53 We're able to really fine tune our models and just, you know, service this one extremely
24:59 niche industry, you know, super well.
25:02 Yeah.
25:03 So my answer is probably somewhat follow what Tajan just said.
25:08 We still have a lot of demos on social, but I think the application that actually, you
25:13 know, revenue and transforming enterprise is going to be embedded deeply in the workflow
25:18 of those organizations.
25:19 You know, so I think we got a lot of comparison with companies like Runway and Picard and
25:26 other video generation companies, but we're actually doing video understanding, not video
25:31 generation.
25:32 And from perspective of like, you know, video editors, filmmakers, our tools actually augment,
25:38 you know, their workflow and help them, you know, do their job better, not actually replacing
25:43 their job.
25:44 Right.
25:45 And so come up with that positioning and, you know, make sure that we augment, you know,
25:48 human capabilities, not replacing them is very important.
25:51 And the second part of, you know, the amounts are also around like being proprietary data
25:59 set.
26:00 I think like for video, it's not a lot of open source or, you know, openly available
26:07 video compared to like text or images.
26:09 So I think getting access to them and more importantly, getting high quality label video
26:14 data is even more important.
26:16 How do we like generate description, label this video data, given the, you know, the
26:21 challenges of dealing with, you know, the temporal dimension, et cetera.
26:24 So we invested a lot of effort on, you know, video labeling, annotation, as well as the
26:29 infrastructure to process the video efficiently.
26:33 And, you know, we have already seen some very promising results in the type of performance
26:37 that Amodo was able to produce, given that, you know, higher quality of video data that
26:43 we collect.
26:45 I think going back to your point about building up the trust and we expect AI to perform perfectly
26:52 and it never will in its early stages.
26:54 I think the firms that I see making good headway are the ones that are able to communicate
26:59 that this is a process, not an event, because they will garner trust based on truth.
27:04 And the beauty of multimodal is that we have so many ways to have that dialogue and the
27:09 people that choose to invest, not just in getting the technology to be more reliable,
27:13 but getting their communications and their dialogue with humans to be more reliable and
27:18 allowing them the context for where the technology sits and goes will give them runway because
27:25 we need that relationship with human beings and technology to continue.
27:29 And for that to happen, we need to have trust.
27:32 Yeah, I think I definitely agree with trust.
27:35 That's huge.
27:36 Also, verticalization.
27:37 I think maybe one more thing to add that does probably tie into verticalization a bit is
27:41 around data, like having very rich data that's important for your customer is, I think, essential.
27:46 So for example, for us, for Avoca, the way we viewed is we got a lot of data around sales,
27:50 how sales conversations are happening.
27:52 And there's so many nuances that are actually just different in sales than it is in customer
27:56 support, which is often what AI models trained in the past that actually makes it such that
28:00 maybe the best companies will be the ones that can capture not only the nuance between
28:04 sales versus customer support conversations, but also between your company versus other
28:08 companies.
28:09 Like, how exactly do you handle this objection?
28:11 Like what is like the right steps to do after that and things like that.
28:14 And so that can only come from verticalization and having customers and trust.
28:18 Awesome.
28:19 Yeah.
28:20 I think one of the key points everyone's touched on here has been trust, infrastructure, all
28:25 these things have to be updated.
28:27 And as we see, this is just a start.
28:29 Like we are seeing a lot of variables come out.
28:32 We just saw Humane release and we've had other variables, companies announced that their
28:38 own products are coming out.
28:39 This is just going to be more and more important.
28:41 And if something's recording me 24/7, that trust factor and that ability to like really
28:45 augment my life has to be present.
28:49 You've just had the chance to learn about multimodal AI from some of the experts in
28:53 the field.
28:54 And these people are on the ground, they're working and they're building stuff.
28:57 So they're very well up to date on what's happening.
29:00 So for that, I'd love you to just give them a huge round of applause and thank you all
29:04 for listening to us.
29:05 Thank you.
29:06 Thank you.
29:08 Thank you.
29:08 Thank you.
29:09 Thank you.
29:10 Thank you.
29:12 Thank you.
29:13 Thank you.
29:15 Thank you.
29:16 Thank you.
29:18 Thank you.
29:19 Thank you.
29:20 Thank you.
29:20 Thank you.
29:22 Thank you.
29:23 Thank you.
29:25 Thank you.
29:25 Thank you.
29:26 [BLANK_AUDIO]

Recommended