How To Teach AI To Understand What They’re Seeing

Forbes

Shawn Jain, a former OpenAI staff member, talks about his time at OpenAI and how they worked to give AI the ability to understand what it’s seeing and what impact this could have on technology such as autonomous driving.   Subscribe to FORBES: https://www.youtube.com/user/Forbes?sub_confirmation=1  Fuel your success with Forbes. Gain unlimited access to premium journalism, including breaking news, groundbreaking in-depth reported stories, daily digests and more. Plus, members get a front-row seat at members-only events with leading thinkers and doers, access to premium video that can help you get ahead, an ad-light experience, early access to select products including NFT drops and more:  https://account.forbes.com/membership/?utm_source=youtube&utm_medium=display&utm_campaign=growth_non-sub_paid_subscribe_ytdescript  Stay Connected Forbes newsletters: https://newsletters.editorial.forbes.com Forbes on Facebook: http://fb.com/forbes Forbes Video on Twitter: http://www.twitter.com/forbes Forbes Video on Instagram: http://instagram.com/forbes More From Forbes:  http://forbes.com  Forbes covers the intersection of entrepreneurship, wealth, technology, business and lifestyle with a focus on people and success.

Transcript

00:00 Tell us about your career in AI and how you found yourself at closed AI.

00:05 I mean open AI.

00:07 Thanks for having me here, John.

00:10 It's been a really great conference so far.

00:13 It's a summit like Mount Everest.

00:15 We're trying to go high and we're forging conferences.

00:18 That's where you try to hit your quarter and sell something.

00:20 That ain't this.

00:21 Well, good. I guess hopefully we're nearing the summit.

00:25 Yeah. So I'll tell you a little bit about my background.

00:27 I come from a totally science nerd family.

00:32 My dad was a programmer.

00:33 He started data analytics company.

00:35 My mom was an early employee at a startup that was acquired by SAP.

00:40 My sister studied computer science.

00:42 My grandfather was an amateur astronomer.

00:45 And to reduce the burden of all the calculations he had to do to understand the trajectory

00:52 of planets and everything else he was predicting, he actually wrote computer programs.

00:57 He was on punch cards.

00:59 And this is at a time in India when computers were not widely available.

01:04 And I think back then they were aligned with the Soviet Union to get the technology.

01:08 I don't know if you know this.

01:09 You're like eight years away from 30.

01:12 So you still have a bunch of years to be 30 under 30.

01:15 But yeah. So sorry.

01:18 So that's the kind of house I grew up in.

01:23 I grew up making Linux network servers, wiring my house with Cat5 Ethernet, fixing amplifiers

01:30 by changing out the capacitors and stuff like that.

01:32 And all I wanted to do was go to MIT because that's where the origin of all this amazing

01:36 technology was.

01:37 So I got the opportunity to go.

01:38 I studied Course 6.2 here.

01:41 While I was there -- while I was here, I got involved doing computer vision research.

01:46 Specifically, I got involved in doing scene understanding.

01:51 So this is making models that actually understand videos and create program representations of

01:56 what's going on.

01:57 And scene understanding is important because you can probably see a picture of me and John

02:02 up there.

02:03 These are objects.

02:04 But if there's multiple items in the photo, if there's multiple objects in the photo,

02:08 you need to understand their relationships to understand what's going on in the scene.

02:12 And so that's the project that we were working on at MIT.

02:16 After that, I went over to Uber ATG.

02:18 That's Uber's self-driving group, because that was the best place to apply computer

02:22 vision.

02:24 After I spent some time there working on using LIDAR to improve localization and perception,

02:31 I moved over to Microsoft Research, where I worked on multimodal models and efficient

02:36 models.

02:37 And because I worked on efficient models, I realized that deep learning models really

02:42 needed a lot of compute.

02:44 And that's how I ended up going over to OpenAI, because they had the most investment.

02:49 What year was that?

02:50 I moved over to OpenAI in January of '21.

02:56 So what does self-driving cars have to do with large language models?

03:02 And you kind of foreshadowed a little bit of your interest in this, but how did that

03:07 crystallize with your role at OpenAI?

03:12 Yeah, so I think that self-driving cars and language models are actually not that different

03:18 in a sense.

03:19 So they're both scaling laws problems, riding trends in deep learning.

03:23 And so they're both enjoying this virtuous cycle of more data, more compute leads to

03:27 better results, leads to more investment, and more data and more compute.

03:32 And also, my research work at MIT was about scene understanding.

03:36 And what I realized is that I think that scene understanding is actually a key enabling technology

03:41 for self-driving cars, because they are in scenes, they are in environments with potentially

03:45 hundreds of actors-- other cars, pedestrians, cyclists, bicyclists, and so on.

03:52 And so if you can actually describe a scene as a program, it's actually a form of compression,

03:57 which is actually a really, really good proxy for understanding.

04:01 If you think about large language models today, they actually create a compressed representation

04:06 of the language while they're training through this next token prediction objective function.

04:11 And they seem to demonstrate understanding.

04:13 So that's how I made this connection between scene understanding and self-driving cars.

04:19 Yeah.

04:20 So let's be voyeuristic.

04:21 Let's let these people know what your life was like at OpenAI.

04:25 So what was it like working there?

04:28 Did you realize the tools you were researching and the tools the organization was creating

04:34 were transformative?

04:35 Was it what you expected?

04:37 Were people surprised to the left and right of you?

04:43 And why did you leave?

04:45 It was so transformative technology.

04:49 What led to you deciding to say, you know what, I'm going to go it alone?

04:54 Well I went there because I thought it was interesting.

04:56 I wasn't sure if everybody else thought it was interesting.

05:00 And I think--

05:01 How many people worked there when you went there in '21?

05:03 I can't say.

05:04 OK.

05:05 So was it a little, a medium, or a lot?

05:08 Can you say that?

05:09 It was on the smaller side.

05:10 Smaller side.

05:11 OK.

05:12 All right.

05:13 So don't say anything that's going to have repercussions on your career.

05:16 But was it like-- were people looking at each other like, oh my god, this is really happening?

05:23 We're doing this?

05:24 Or everyone was like, yeah, this is what we expected.

05:26 Can you comment on that?

05:28 And say pass if you can.

05:30 Yeah.

05:31 I think we were getting some amazing results.

05:34 Good enough results that I would double and triple check if they were real.

05:39 That's how surprising they were to me.

05:42 And were you-- so you did research.

05:43 Were you in charge of triple checking?

05:45 Or they're like, holy shit, we better triple check this?

05:48 When you're doing research, you want to be confident that you're presenting good results.

05:52 So I would definitely triple check my own work.

05:55 So you left.

05:58 You're not there anymore.

05:59 You don't have a badge to get in the building.

06:01 They probably took all your files.

06:04 There are files.

06:05 You're welcome.

06:06 How-- was that an easy decision?

06:09 Or were you like, I need to-- I have had enough experience, like dog years.

06:13 Three times seven.

06:14 You've been there 27 years.

06:18 What's your calculation to step out the door?

06:21 There's a dog right over there.

06:22 Here, can you hold the dog up?

06:25 Yeah.

06:27 So what was your calculation to leave?

06:29 Did they ask you to leave?

06:30 They're like, we've had enough of you.

06:31 We don't need any more research.

06:32 Or they didn't like you.

06:34 Or you left because you're like, wait a second.

06:36 I know too much.

06:38 And I want to take this knowledge and do something with it.

06:41 Give us some insight there.

06:44 And if I'm asking two personal questions, say pass, and I'll go somewhere else.

06:50 I love research freedom.

06:52 And so that's the main reason.

06:54 Yeah.

06:55 So do you plan to do research freedom in whatever you do next?

06:58 Or you've done that enough, and you'll hire someone else to do that part of it.

07:02 And you want to do the other stuff?

07:04 Oh, I'm basking in the glory of having my freedom and independence.

07:08 So that's what I'm really enjoying.

07:10 You could read between the lines.

07:11 There's a lot right there that he didn't say that you know.

07:14 I just want to telegraph what just happened there.

07:18 So talk about what are things you'd like to do in this post-AI career that you have right

07:25 now?

07:26 Well, it's not a post-AI career.

07:28 It's very much in the middle of AI, just not at a--

07:31 I meant in terms of your LinkedIn, your job, your post-AI.

07:36 What's going to be the thing next?

07:38 In this next chapter, what do you want to do?

07:40 Well, first of all, I took a little breather to recollect myself after a tremendous sprint

07:47 that I had.

07:50 So now I'm exploring a few different startup ideas in robotics foundation models and in

07:56 time series data generation.

07:58 I'm looking for people who are experts in these areas to collaborate with, to work with.

08:04 If you're that kind of person, definitely want to chat with you.

08:07 So considering a few startup ideas, I was also very heads down in my research.

08:12 So now I'm also starting to reconnect with the startup community.

08:16 And that's what I'm seeing here, a lot of interesting startups.

08:18 Has it been a good experience here so far?

08:20 It's been a great experience.

08:22 So how do you use AI?

08:25 Can you tell us what tools you're using?

08:27 Can you tell us what you're using them for?

08:30 Like Sean Jane, what does your day look like?

08:35 What are you accomplishing collaborating with these tools?

08:40 If I am writing code, I'm definitely using some kind of...

08:43 10%, 50%, 80%.

08:45 How much is AI helping you?

08:49 I don't think I can quantify it in terms of how much percentage of my code is AI written

08:54 or not.

08:55 I would say it saves me time and saves me from bugs, especially once I learned how to

09:02 use it.

09:03 So say you get connected with someone who's graduated from MIT and you have something

09:10 in common.

09:11 You're a part of the same extracurricular team or club and they say, "Hey, give me some

09:15 advice.

09:16 I'm about to enter the real world.

09:17 You've been out there and I want to do similar stuff to you did."

09:21 Given what you know now, what would you tell them?

09:23 Do this or don't do this.

09:25 Like there's a guy who said, "Go get Wes, young man."

09:27 That was a long time ago.

09:28 What would you say to yourself a long time ago who's just graduating, given the AI world

09:35 that's going on, what would you say?

09:38 I think you should work on foundational technologies, technologies that are enabling, that are a

09:43 platform for other people to create all kinds of different startups and other new technologies.

09:48 So I've always chosen to work on what I believe are foundational technologies like computer

09:52 vision, like language models, like better sensors in LIDARs.

09:56 And I think that each of these technologies are now you're seeing creating whole new markets

10:02 for startups in robotics and code assistants or code generation and everywhere.

10:08 So if you are an expert in the foundational technology, you're really, really well placed

10:12 to create the startup that commercializes that technology as well.

10:17 So LIDAR has become cheaper.

10:20 Is that a fact?

10:21 Yeah, LIDAR has become cheaper.

10:23 Now that it's on every iPhone, new iPhone has LIDAR technology and as more cars have

10:29 it, it gets cheaper.

10:33 What can we do with LIDAR that we couldn't do before?

10:37 And then my next question is physical AI.

10:39 Is that a term that you are familiar with and do you think about that and does that

10:43 have anything to do with some of the things you're interested in?

10:46 Or that's sort of an area that's off adjacent to what you're thinking about.

10:52 So two questions there.

10:53 Got it.

10:54 So the first one about LIDAR, it's an interesting one.

10:57 So LIDAR is a laser scanner.

10:58 It shoots out a beam of light, non-visible light, and you measure the time of flight

11:05 until that beam of light reflects back to you.

11:08 And from that, you can calculate the distance using like a distance rate time, simple calculation

11:14 divided by the speed of light.

11:16 So it allows you to create really dense 3D point clouds or 3D if you have a spinning

11:24 LIDAR and probably 2D if you have a 2D LIDAR.

11:30 And I think the interesting thing about LIDAR is because it's a new kind of sensor, the

11:37 applications of it are still being discovered.

11:39 So because LIDAR is on the iPhone, I believe it helps the portrait mode feature in iPhone,

11:46 but it also helps self-driving cars.

11:48 So I'm actually looking forward to seeing what other new cool applications of LIDAR

11:53 actually come out.

11:54 I think they could be on mobile autonomous robots in factories in the homes very soon.

11:59 So physical AI, a term you're familiar with or it's not your thing?

12:06 Do you mean embodied AI?

12:07 Well, I mean like a lot of people interact with AI that's on a 2D screen, but now you

12:11 can throw AI beyond the screen.

12:14 So maybe pass on that one.

12:17 All right.

12:18 I have, you created a feature in chat GPT and you wrote a white paper when you were

12:23 at OpenAI.

12:24 Can you talk about either of those or that top secret I shouldn't tell people that?

12:28 No, we can talk about the paper.

12:29 Okay.

12:30 But people can't know about the feature you made?

12:32 No, you can use the feature.

12:33 It's out there.

12:34 Yeah.

12:35 Enlighten us.

12:36 So what is it?

12:37 Well, I think you guys can do your own research on it, but it's the advanced data analysis

12:42 feature in chat GPT.

12:43 I was a contributor to it.

12:44 I didn't invent it by myself.

12:46 It allows a model to write code, execute it and observe the results and debug it.

12:53 So it's kind of, it's a step beyond just completing code.

12:58 Thank you.

13:00 You're welcome.

13:02 Like I said, not just my thing.

13:04 I was a contributor.

13:05 Great.

13:06 You can't tell us how many people worked on it, I bet, right?

13:09 Sorry.

13:10 Yeah.

13:11 No, I saw that coming.

13:12 Yeah.

13:13 All right.

13:14 Did you want to talk, did you want to say about the research paper?

13:16 Yeah, I can talk about the paper.

13:18 That's public work.

13:19 So me and also other co-authors published a paper called Evolution Through Large Models

13:25 when I was at OpenAI.

13:27 And it's a really interesting paper because we tackled a very general, but very real problem

13:32 in this paper, which is how can you get a language model to generate code in a domain

13:38 in which it has little to no training data?

13:41 And in this paper, we actually showed how synthetic data can actually improve code generation

13:46 ability and that the synthetic data can be generated through an evolutionary algorithm.

13:53 And the evolutionary algorithm is actually powered by the language model.

13:57 So if you don't know what an evolutionary algorithm is, it's basically a biologically

14:01 inspired algorithm for optimization.

14:04 You have a set of candidate solutions, and these are called the population.

14:09 You evaluate each candidate's fitness.

14:11 So how good is this particular solution?

14:13 So in the robotics domain, it might be how far does this robot walk using this particular

14:20 control algorithm?

14:23 And then you select the fittest individuals from your population, and then you produce

14:26 offspring, children, from that solution via mutation.

14:31 The interesting thing we did in this paper is that we actually use a language model as

14:34 a mutation operator.

14:36 It's an intelligent mutation operator instead of a random mutation operator.

14:41 And this process was able to generate synthetic data, which we were actually able to train

14:48 the language model on.

14:50 And by training the language model on this data, it actually improved its ability to

14:55 generate more data.

14:58 And this was a really, really interesting paper.

15:02 I think it's gotten, unfortunately, less attention out there than it should have.

15:08 It was situated in the robotics domain, so the code that we were writing was demonstrated

15:12 in a robotics simulator.

15:15 But the technique is widely applicable.

15:17 Okay, last question.

15:20 A lot of MIT students are getting involved in AI and are going to play probably a disproportionate

15:27 leadership role in what comes next.

15:30 How did your MIT experience shape you to be able to evaluate opportunities, to deal with

15:37 ethical challenges?

15:39 There's a bunch of Pandora's boxes that are opening here.

15:43 Did you feel well-served, and are you hopeful?

15:47 And just give us a little insight.

15:49 You're not at the pinnacle of your career on the way down.

15:51 You're just getting going.

15:53 And I want this audience to kind of understand who are these young kids that are in the room

16:00 when it happened, are contributing to the features, are writing the white papers, are

16:06 going it alone to create the next company that could be a major player, could be the

16:11 next trillion-dollar company.

16:13 Lex Friedman said on this stage, there's going to be a bunch of trillion-dollar companies

16:17 right now that are going to play a big role in our society.

16:21 And your profile is of that.

16:25 And you're not just going in there saying, I'm going to work for the man.

16:28 I'm going to go there, and I'm going to be the guy.

16:32 And I'm excited for what's next.

16:34 But how well are you prepared for that?

16:36 And where do you feel maybe you're lacking, and you need to complement that?

16:40 Maybe someone in here could complement that for you.

16:43 But yeah, give us a little insight into that.

16:45 Yeah.

16:46 I mean, I think going to MIT, I learned a lot of different things.

16:50 I think one of the best parts about an MIT education is that there is no black box that

16:57 you don't open.

16:58 So if you're creating a high-performance ML system, you need to have a full stack understanding

17:05 of what's going on.

17:06 So you need to understand the chips, the memory, the assembly code, the primitives that that

17:11 chip executes well.

17:13 You might need to know about the kernels, the compiler, the front end, and then finally

17:16 the model, too.

17:19 So even though I, at the moment, am not an expert in every one of these things, I do

17:25 feel that because of my MIT education, I can actually go in and learn about each of these

17:31 topics to a level of working understanding that will actually complement all my other

17:35 knowledge.

17:36 So I think that's what I learned from the technical side at MIT.

17:40 And in terms of how I would lead, I would say if I was to create a research group right

17:46 now, the number one thing that I would do is make sure that it's a high-trust environment.

17:51 Because in research, oftentimes you're iterating and running experiments for weeks and weeks

17:55 and weeks and have no concrete results other than to say that I think I understand the

18:00 problem better, but I don't have a result that you can deliver, that you can productionize,

18:05 that you can release.

18:07 So creating a high-trust environment in which people are not afraid to ask questions, in

18:15 which people are not afraid to go out there and experiment, I think is really essential.

18:19 And I think you see that culture at the Media Lab, you see that culture at CSAIL, and that's

18:24 the culture I'd like to propagate.

18:26 Class of 2016, thank you very much.

18:29 [END]

18:30 1

18:30 Page 1 of 2

Category

Transcript

Recommended