What You Need to Know About MIT's Deep Learning Course

Discoveries World

### Title: What You Need to Know About MIT's Deep Learning Course  ### Description: Curious about deep learning and where to start? Dive into this overview of MIT's Introduction to Deep Learning course (6.S191)! In this video, we explore the key topics, structure, and unique insights offered by one of the most renowned introductory courses on deep learning. Whether you're a beginner or looking to deepen your understanding of neural networks, this course is your gateway to mastering the essentials of AI and machine learning. Join us as we break down what makes MIT 6.S191 a standout choice for aspiring data scientists and AI enthusiasts. Don't forget to like, comment, and subscribe for more in-depth content on deep learning and AI!

Transcript

00:00:00Good afternoon, everyone, and welcome to MIT Success 191.

00:00:15My name is Alexander Amini, and I'll

00:00:17be one of your instructors for the course this year,

00:00:19along with Ava.

00:00:21And together, we're really excited to welcome you

00:00:23to this really incredible course.

00:00:25This is a very fast-paced and very intense one week

00:00:31that we're about to go through together.

00:00:33So we're going to cover the foundations of a also

00:00:36very fast-paced moving field, and a field that

00:00:39has been rapidly changing over the past eight years

00:00:42that we have taught this course at MIT.

00:00:45Now, over the past decade, in fact,

00:00:49even before we started teaching this course,

00:00:51AI and deep learning has really been

00:00:53revolutionizing so many different advances

00:00:57and so many different areas of science, mathematics, physics,

00:01:01and so on.

00:01:02And not that long ago, we were having new types of challenges

00:01:08and problems that we did not think

00:01:11were necessarily solvable in our lifetimes

00:01:13that AI is now actually solving beyond human performance today.

00:01:20And each year that we teach this course,

00:01:23this lecture in particular is getting harder and harder

00:01:27to teach, because for an introductory level course,

00:01:30this lecture, lecture number one,

00:01:33is the lecture that's supposed to cover the foundations.

00:01:35And if you think to any other introductory course,

00:01:38like an introductory course 101 on mathematics or biology,

00:01:42those lecture ones don't really change that much over time.

00:01:46But we're in a rapidly changing field of AI and deep learning

00:01:50where even these types of lectures are rapidly changing.

00:01:55So let me give you an example of how we introduced this course

00:01:58only a few years ago.

00:02:01Hi, everybody, and welcome to MIT 6S191,

00:02:07the official introductory course on deep learning

00:02:11taught here at MIT.

00:02:14Deep learning is revolutionizing so many fields,

00:02:18from robotics to medicine and everything in between.

00:02:24You'll learn the fundamentals of this field

00:02:27and how you can build some of these incredible algorithms.

00:02:33In fact, this entire speech and in video are not real

00:02:39and were created using deep learning

00:02:42and artificial intelligence.

00:02:45And in this class, you'll learn how.

00:02:49It has been an honor to speak with you today,

00:02:52and I hope you enjoy the course.

00:02:58The really surprising thing about that video to me

00:03:02when we first did it was how viral it went a few years ago.

00:03:07So just in a couple months of us teaching this course

00:03:09a few years ago, that video went very viral.

00:03:12It got over a million views within only a few months.

00:03:16People were shocked with a few things,

00:03:18but the main one was the realism of AI

00:03:22to be able to generate content that looks and sounds

00:03:27extremely hyper-realistic.

00:03:30And when we did this video, when we created this for the class

00:03:34only a few years ago, this video took us about $10,000

00:03:38in compute to generate, just about a minute-long video,

00:03:41extremely, I mean, if you think about it,

00:03:43I would say it's extremely expensive to compute something,

00:03:46what we look at like that.

00:03:47And maybe a lot of you are not really even impressed

00:03:49by the technology today,

00:03:51because you see all of the amazing things

00:03:53that AI and deep learning are producing.

00:03:56Now, fast forward today, the progress in deep learning,

00:04:00yeah, and people were making all kinds of, you know,

00:04:02exciting remarks about it when it came out a few years ago.

00:04:05Now this is common stuff,

00:04:06because AI is really doing much more powerful things

00:04:10than this fun little introductory video.

00:04:14So today, fast forward four years,

00:04:17about four years to today, right?

00:04:20Now, where are we?

00:04:21AI is now generating content

00:04:23with deep learning being so commoditized, right?

00:04:27Deep learning is in all of our fingertips now,

00:04:30online, in our smartphones, and so on.

00:04:34In fact, we can use deep learning

00:04:36to generate these types of hyper-realistic pieces of media

00:04:41and content entirely from English language

00:04:44without even coding anymore, right?

00:04:46So before, we had to actually go in,

00:04:48train these models, and really code them

00:04:50to be able to create that one-minute long video.

00:04:54Today, we have models that will do that for us,

00:04:57end-to-end, directly from English language.

00:04:59So we can prompt these models to create something

00:05:01that the world has never seen before,

00:05:03a photo of an astronaut riding a horse,

00:05:05and these models can imagine those pieces of content

00:05:09entirely from scratch.

00:05:11My personal favorite is actually how we can now ask

00:05:13these deep learning models to create new types of software,

00:05:19even themselves being software, to ask them to create,

00:05:22for example, to write this piece of TensorFlow code

00:05:26to train a neural network, right?

00:05:28We're asking a neural network to write TensorFlow code

00:05:31to train another neural network,

00:05:33and our model can produce examples of functional

00:05:36and usable pieces of code that satisfy this English prompt

00:05:42while walking through each part of the code independently.

00:05:44So not even just producing it, but actually educating

00:05:47and teaching the user on what each part

00:05:49of these code blocks are actually doing.

00:05:53You can see an example here.

00:05:55And really, what I'm trying to show you with all of this

00:05:57is that this is just highlighting how far deep learning

00:06:01has gone, even in a couple years

00:06:04since we've started teaching this course.

00:06:07I mean, going back even from before that to eight years ago.

00:06:10And the most amazing thing that you'll see in this course,

00:06:14in my opinion, is that what we try to do here

00:06:17is to teach you the foundations of all of this,

00:06:20how all of these different types of models are created

00:06:23from the ground up, and how we can make

00:06:26all of these amazing advances possible

00:06:28so that you can also do it on your own as well.

00:06:32And like I mentioned in the beginning,

00:06:33this introduction course is getting harder and harder

00:06:36to do and to make every year.

00:06:38I don't know where the field is going to be next year.

00:06:41And I mean, that's my honest truth.

00:06:44Or even, honestly, in even one or two months' time from now,

00:06:49just because it's moving so incredibly fast.

00:06:51But what I do know is that what we will share with you

00:06:54in the course as part of this one week

00:06:57is going to be the foundations of all of the technologies

00:07:00that we have seen up until this point

00:07:02that will allow you to create that future for yourselves

00:07:05and to design brand new types of deep learning models

00:07:09using those fundamentals and those foundations.

00:07:13So let's get started with all of that

00:07:16and start to figure out how we can actually achieve

00:07:19all of these different pieces

00:07:20and learn all of these different components.

00:07:23And we should start this by really tackling the foundations

00:07:27from the very beginning and asking ourselves,

00:07:30we've heard this term.

00:07:31I think all of you, obviously,

00:07:33before you've come to this class today,

00:07:34you've heard the term deep learning.

00:07:36But it's important for you to really understand

00:07:38how this concept of deep learning relates

00:07:41to all of the other pieces of science

00:07:43that you've learned about so far.

00:07:46So to do that, we have to start from the very beginning

00:07:48and start by thinking about

00:07:49what is intelligence at its core,

00:07:52not even artificial intelligence, but just intelligence.

00:07:55So the way I like to think about this

00:07:58is that I like to think that intelligence

00:08:00is the ability to process information

00:08:04which will inform your future decision-making abilities.

00:08:08Now, that's something that we as humans do every single day.

00:08:12Now, artificial intelligence is simply the ability

00:08:15for us to give computers that same ability,

00:08:18to process information and inform future decisions.

00:08:23Now, machine learning is simply a subset

00:08:27of artificial intelligence.

00:08:28The way you should think of machine learning

00:08:30is just as the programming ability,

00:08:33or let's say even simpler than that,

00:08:35machine learning is the science

00:08:39of trying to teach computers

00:08:41how to do that processing of information

00:08:44and decision-making from data.

00:08:47So instead of hard-coding some of these rules

00:08:49into machines and programming them

00:08:51like we used to do in software engineering classes,

00:08:54now we're going to try and do that processing of information

00:08:58and informing of future decision-making abilities

00:09:00directly from data.

00:09:02And then going one step deeper,

00:09:04deep learning is simply the subset of machine learning

00:09:07which uses neural networks to do that.

00:09:10It uses neural networks to process raw pieces of data now,

00:09:14unprocessed data, and allows them to ingest

00:09:17all of those very large data sets

00:09:19and inform future decisions.

00:09:21Now, that's exactly what this class is really all about.

00:09:25If you think of, if I had to summarize this class

00:09:27in just one line, it's all about teaching machines

00:09:30how to process data, process information,

00:09:34and inform decision-making abilities from that data

00:09:37and learn it from that data.

00:09:40Now, this program is split

00:09:42between really two different parts.

00:09:45So you should think of this class

00:09:47as being captured with both technical lectures,

00:09:50which, for example, this is one part of,

00:09:53as well as software labs.

00:09:55We'll have several new updates this year,

00:09:57as I mentioned earlier,

00:09:58just covering the rapid changing of advances in AI,

00:10:01and especially in some of the later lectures,

00:10:03you're going to see those.

00:10:05The first lecture today is going to cover

00:10:07the foundations of neural networks themselves,

00:10:11starting with really the building blocks

00:10:13of every single neural network,

00:10:14which is called the perceptron.

00:10:15And finally, we'll go through the week

00:10:18and we'll conclude with a series of exciting guest lectures

00:10:22from industry-leading sponsors of the course.

00:10:25And finally, on the software side,

00:10:29after every lecture, you'll also get software experience

00:10:33and project-building experience

00:10:34to be able to take what we teach in lectures

00:10:37and actually deploy them in real code

00:10:39and actually produce based on the learnings

00:10:43that you find in this lecture.

00:10:44And at the very end of the class, from the software side,

00:10:47you'll have the ability to participate

00:10:49in a really fun day at the very end,

00:10:51which is the project pitch competition.

00:10:53It's kind of like a Shark Tank style competition

00:10:56of all of the different projects from all of you

00:10:59and win some really awesome prizes.

00:11:01So let's step through that a little bit briefly.

00:11:03This is the syllabus part of the lecture.

00:11:06So each day we'll have dedicated software labs

00:11:09that will basically mirror all of the technical lectures

00:11:12that we go through,

00:11:13just helping you reinforce your learnings.

00:11:15And these are coupled with, each day, again,

00:11:18coupled with prizes for the top-performing software solutions

00:11:21that are coming up in the class.

00:11:23This is going to start with today, with lab one,

00:11:26and it's going to be on music generation.

00:11:28So you're going to learn how to build a neural network

00:11:30that can learn from a bunch of musical songs,

00:11:34listen to them, and then learn to compose

00:11:37brand new songs in that same genre.

00:11:41Tomorrow, lab two, on computer vision,

00:11:43you're going to learn about facial detection systems.

00:11:47You'll build a facial detection system from scratch

00:11:50using convolutional neural networks.

00:11:52You'll learn what that means tomorrow.

00:11:54And you'll also learn how to actually de-bias,

00:11:57remove the biases that exist

00:11:59in some of these facial detection systems,

00:12:02which is a huge problem for the state-of-the-art solutions

00:12:05that exist today.

00:12:06And finally, a brand new lab at the end of the course

00:12:10will focus on large language models,

00:12:12where you're actually going to take

00:12:14a multi-billion-parameter large language model

00:12:18and fine-tune it to build an assistive chatbot

00:12:22and evaluate a set of cognitive abilities,

00:12:24ranging from mathematics abilities to scientific reasoning

00:12:27to logical abilities and so on.

00:12:31And finally, at the very, very end,

00:12:33there will be a final project pitch competition

00:12:36for up to five minutes per team,

00:12:39and all of these are accompanied with great prizes.

00:12:41So definitely there will be a lot of fun

00:12:43to be had throughout the week.

00:12:45There are many resources to help with this class.

00:12:48You'll see them posted here.

00:12:49You don't need to write them down

00:12:50because all of the slides are already posted online.

00:12:53Please post to Piazza if you have any questions.

00:12:56And of course, we have an amazing team

00:12:59that is helping teach this course this year,

00:13:01and you can reach out to any of us if you have any questions.

00:13:04The Piazza is a great place to start.

00:13:06Myself and Ava will be the two main lecturers

00:13:09for this course, Monday through Wednesday especially,

00:13:12and we'll also be hearing some amazing guest lecturers

00:13:15on the second half of the course,

00:13:17which definitely you would want to attend

00:13:19because they really cover the really state-of-the-art sides

00:13:22of deep learning that's going on in industry

00:13:26outside of academia.

00:13:28And very briefly, just wanted to give a huge thanks

00:13:31to all of our sponsors who, without their support,

00:13:33this course, like every year, would not be possible.

00:13:37Okay, so now let's start with the fun stuff

00:13:40and my favorite part of the course,

00:13:41which is the technical parts.

00:13:43And let's start by just asking ourselves a question,

00:13:47which is why do we care about all of this?

00:13:50Why do we care about deep learning?

00:13:51Why did you all come here today to learn

00:13:54and to listen to this course?

00:13:56So to understand, I think we, again,

00:13:59need to go back a little bit to understand

00:14:01how machine learning used to be performed.

00:14:05So machine learning typically would define

00:14:08a set of features, or you can think of these

00:14:10as kind of a set of things to look for in an image

00:14:14or in a piece of data.

00:14:16Usually these are hand-engineered,

00:14:17so humans would have to define these themselves.

00:14:21And the problem with these is that they tend

00:14:23to be very brittle in practice,

00:14:25just by nature of a human defining them.

00:14:27So the key idea of deep learning,

00:14:29and what you're going to learn throughout this entire week,

00:14:32is this paradigm shift of trying to move away

00:14:35from hand-engineering features and rules

00:14:37that computers should look for,

00:14:39and instead trying to learn them directly

00:14:42from raw pieces of data.

00:14:43So what are the patterns that we need to look at

00:14:47in data sets such that if we look at those patterns,

00:14:50we can make some interesting decisions

00:14:52and interesting actions can come out.

00:14:54So for example, if we wanted to learn how to detect faces,

00:14:58we might, if you think even how you would detect faces,

00:15:01right, if you look at a picture,

00:15:02what are you looking for to detect a face?

00:15:05You're looking for some particular patterns.

00:15:06You're looking for eyes and noses and ears,

00:15:09and when those things are all composed in a certain way,

00:15:12you would probably deduce that that's a face, right?

00:15:15Computers do something very similar.

00:15:16So they have to understand what are the patterns

00:15:19that they look for, what are the eyes and noses and ears

00:15:22of those pieces of data, and then from there,

00:15:25actually detect and predict from them.

00:15:31So the really interesting thing, I think,

00:15:34about deep learning is that these foundations

00:15:37for doing exactly what I just mentioned,

00:15:40picking out the building blocks,

00:15:41picking out the features from raw pieces of data

00:15:44and the underlying algorithms themselves

00:15:46have existed for many, many decades.

00:15:50Now, the question I would ask at this point is,

00:15:54so why are we studying this now

00:15:56and why is all of this really blowing up right now

00:15:58and exploding with so many great advances?

00:16:01Well, for one, there's three things, right?

00:16:03One is that the data that is available to us today

00:16:06is significantly more pervasive.

00:16:09These models are hungry for data.

00:16:10You're going to learn about this more in detail,

00:16:12but these models are extremely hungry for data,

00:16:15and we're living in a world right now, quite frankly,

00:16:19where data is more abundant

00:16:21than it has ever been in our history.

00:16:23Now, secondly, these algorithms are massively

00:16:27compute-hungry and they're massively parallelizable,

00:16:30which means that they have greatly benefited

00:16:33from compute hardware,

00:16:34which is also capable of being parallelized.

00:16:37The particular name of that hardware is called a GPU, right?

00:16:42GPUs can run parallel processing streams of information

00:16:46and are particularly amenable to deep learning algorithms

00:16:49and the abundance of GPUs and that compute hardware

00:16:52has also pushed forward what we can do in deep learning.

00:16:56And finally, the last piece is the software, right?

00:16:59It's the open-source tools that are really used

00:17:03as the foundational building blocks of deploying

00:17:06and building all of these underlying models

00:17:09that you're going to learn about in this course,

00:17:10and those open-source tools

00:17:12have just become extremely streamlined,

00:17:14making this extremely easy for all of us

00:17:16to learn about these technologies

00:17:18within an amazing one-week course like this.

00:17:22So let's start now with understanding,

00:17:24now that we have some of the background,

00:17:26let's start with understanding exactly

00:17:28what is the fundamental building block of a neural network.

00:17:32Now, that building block is called a perceptron, right?

00:17:35Every single neural network is built up

00:17:39of multiple perceptrons,

00:17:42and you're going to learn how those perceptrons,

00:17:44number one, compute information themselves

00:17:46and how they connect to these much larger

00:17:48billion-parameter neural networks.

00:17:52So the key idea of a perceptron, or even simpler,

00:17:55think of this as a single neuron, right?

00:17:57So a neural network is composed of many, many neurons,

00:18:00and a perceptron is just one neuron.

00:18:03So that idea of a perceptron is actually extremely simple,

00:18:06and I hope that by the end of today,

00:18:08this idea and this processing of a perceptron

00:18:12becomes extremely clear to you.

00:18:14So let's start by talking about

00:18:16just the forward propagation of information

00:18:18through a single neuron.

00:18:21Now, single neurons ingest information.

00:18:23They can actually ingest multiple pieces of information.

00:18:27So here you can see this neuron taking as input

00:18:30three pieces of information, X1, X2, and Xm, right?

00:18:34So we define this set of inputs called X, one through m,

00:18:39and each of these inputs, each of these numbers

00:18:41is going to be element-wise multiplied

00:18:44by a particular weight.

00:18:45So this is going to be denoted here by W1 through Wm.

00:18:49So this is a corresponding weight for every single input,

00:18:52and you should think of this as really, you know,

00:18:54being assigned to that input, right?

00:18:57The weights are part of the neuron itself.

00:19:00Now, you multiply all of these inputs

00:19:03with their weights together, and then you add them up.

00:19:05We take this single number after that addition,

00:19:08and you pass it through what's called

00:19:10a nonlinear activation function

00:19:12to produce your final output, which here we're calling Y.

00:19:19Now, what I just said is not entirely correct, right?

00:19:22So I missed out one critical piece of information.

00:19:25That piece of information is that we also have

00:19:27what you can see here is called this bias term.

00:19:30That bias term is actually what allows your neuron

00:19:34to shift its activation function horizontally

00:19:38on that X axis, if you think of it, right?

00:19:41So on the right side, you can now see this diagram

00:19:44illustrating mathematically that single equation

00:19:47that I talked through kind of conceptually, right?

00:19:49Now you can see it mathematically written down

00:19:51as one single equation, and we can actually rewrite this

00:19:55using linear algebra using vectors and dot products.

00:19:58So let's do that, right?

00:20:00So now our inputs are going to be described by a capital X,

00:20:03which is simply a vector of all of our inputs,

00:20:06X1 through Xm, and then our weights are going to be

00:20:10described by a capital W, which is going to be W1 through Wm.

00:20:16The input is obtained by taking the dot product

00:20:19of X and W, right?

00:20:21That dot product does that element-wise multiplication,

00:20:24and then adds, sums all of the element-wise multiplications.

00:20:28And then, here's the missing piece,

00:20:30is that we're now going to add that bias term.

00:20:33Here we're calling the bias term W0, right?

00:20:36And then we're going to apply the nonlinearity,

00:20:38which here denoted as Z, or G, excuse me.

00:20:42So I've mentioned this nonlinearity a few times,

00:20:44this activation function.

00:20:46Let's dig into it a little bit more so we can understand

00:20:48what is actually this activation function doing.

00:20:51Well, I said a couple things about it.

00:20:53I said it's a nonlinear function, right?

00:20:55Here you can see one example of an activation function.

00:20:59One commonly used activation function

00:21:04is called the sigmoid function,

00:21:05which you can actually see here

00:21:06on the bottom right-hand side of the screen.

00:21:09The sigmoid function is very commonly used

00:21:11because its outputs, right?

00:21:13So it takes as input any real number,

00:21:15the x-axis is infinite, plus or minus.

00:21:18But on the y-axis, it basically squashes every input, x,

00:21:24into a number between zero and one.

00:21:26So it's actually a very common choice

00:21:27for things like probability distributions,

00:21:30if you want to convert your answers into probabilities,

00:21:32or learn or teach a neuron

00:21:34to learn a probability distribution.

00:21:37But in fact, there are actually many different types

00:21:39of nonlinear activation functions

00:21:42that are used in neural networks.

00:21:43And here are some common ones.

00:21:45And again, throughout this presentation,

00:21:47you'll see these little TensorFlow icons,

00:21:49actually throughout the entire course.

00:21:50You'll see these TensorFlow icons on the bottom,

00:21:53which basically just allow you to relate

00:21:56some of the foundational knowledge

00:21:58that we're teaching in the lectures

00:22:00to some of the software labs.

00:22:01And this might provide a good starting point

00:22:03for a lot of the pieces that you have to do later on

00:22:06in the software parts of the class.

00:22:08So the sigmoid activation,

00:22:10which we talked about in the last slide,

00:22:11here it's shown on the left-hand side, right?

00:22:13This is very popular

00:22:14because of the probability distributions, right?

00:22:16It squashes everything between zero and one.

00:22:18But you see two other very common types

00:22:21of activation functions in the middle

00:22:24and the right-hand side as well.

00:22:26So the other very, very common one,

00:22:27probably this is the one now

00:22:29that's the most popular activation function,

00:22:31is now on the far right-hand side.

00:22:33It's called the ReLU activation function,

00:22:35or also called the rectified linear unit.

00:22:37So basically it's linear everywhere,

00:22:39except there's a nonlinearity at x equals zero.

00:22:42So there's kind of a step or a great discontinuity, right?

00:22:46So benefit of this, very easy to compute.

00:22:49It still has the nonlinearity, which we kind of need,

00:22:52and we'll talk about why we need it in one second.

00:22:54But it's very fast, right?

00:22:56Just two linear functions piecewise combined

00:22:58with each other.

00:23:00Okay, so now let's talk about

00:23:01why we need a nonlinearity in the first place.

00:23:04Why not just deal with a linear function

00:23:07that we pass all of these inputs through?

00:23:09So the point of the activation function,

00:23:11even at all, why do we have this,

00:23:13is to introduce nonlinearities in of itself.

00:23:17So what we want to do is to allow our neural network

00:23:21to deal with nonlinear data, right?

00:23:25Our neural networks need the ability

00:23:27to deal with nonlinear data

00:23:28because the world is extremely nonlinear, right?

00:23:33This is important because if you think of the real world,

00:23:36real data sets, this is just the way they are, right?

00:23:39If you look at data sets like this one,

00:23:41green and red points, right?

00:23:42And I ask you to build a neural network

00:23:45that can separate the green and the red points.

00:23:48This means that we actually need

00:23:51a nonlinear function to do that.

00:23:52We cannot solve this problem with a single line, right?

00:23:56In fact, if we used linear functions

00:24:00as your activation function,

00:24:02no matter how big your neural network is,

00:24:04it's still a linear function

00:24:06because linear functions combined with linear functions

00:24:08are still linear.

00:24:09So no matter how deep or how many parameters

00:24:11your neural network has,

00:24:13the best it would be able to do

00:24:14to separate these green and red points would look like this.

00:24:17But adding nonlinearities allows our neural networks

00:24:20to be smaller by allowing them to be more expressive

00:24:24and capture more complexities in the data sets.

00:24:27And this allows them to be much more powerful in the end.

00:24:31So let's understand this with a simple example.

00:24:34Imagine I give you now this trained neural network.

00:24:36So what does it mean, trained neural network?

00:24:38It means now I'm giving you the weights, right?

00:24:40Not only the inputs, but I'm going to tell you

00:24:42what the weights of this neural network are.

00:24:44So here, let's say the bias term, W0, is going to be one,

00:24:48and our W vector is going to be three and negative two, right?

00:24:53These are just the weights of your trained neural network.

00:24:55Let's worry about how we got those weights in a second.

00:24:58But this network has two inputs, X1 and X2.

00:25:03Now, if we want to get the output of this neural network,

00:25:06all we have to do, simply, is to do the same story

00:25:09that we talked about before, right?

00:25:11It's dot product, inputs with weights,

00:25:16add the bias, and apply the nonlinearity, right?

00:25:19And those are the three components

00:25:20that you really have to remember as part of this class,

00:25:23right, dot product, add the bias, and apply a nonlinearity.

00:25:28That's going to be the process that keeps repeating

00:25:30over and over and over again for every single neuron.

00:25:33After that happens, that neuron

00:25:35is going to output a single number, right?

00:25:38Now, let's take a look at what's inside of that nonlinearity.

00:25:42It's simply a weighted combination

00:25:44of those inputs with those weights, right?

00:25:49So if we look at what's inside of G, right,

00:25:52inside of G is a weighted combination of X and W, right,

00:25:57added with a bias, right?

00:25:59That's going to produce a single number, right?

00:26:02But in reality, for any input that this model could see,

00:26:06what this really is is a two-dimensional line,

00:26:08because we have two parameters in this model.

00:26:12So we can actually plot that line.

00:26:14We can see exactly how this neuron separates points

00:26:19on these axes between X1 and X2, right?

00:26:23These are the two inputs of this model.

00:26:24We can see exactly and interpret

00:26:27exactly what this neuron is doing, right?

00:26:29We can visualize its entire space,

00:26:31because we can plot the line that defines this neuron,

00:26:34right, so here we're plotting when that line equals zero.

00:26:38And in fact, if I give you, if I give that neuron,

00:26:42in fact, a new data point, here the new data point

00:26:44is X1 equals negative one and X2 equals two,

00:26:47just an arbitrary point in this two-dimensional space.

00:26:50We can plot that point in the two-dimensional space,

00:26:53and depending on which side of the line it falls on,

00:26:56it tells us, you know, what the answer is going to be,

00:26:59what the sign of the answer is going to be,

00:27:01and also what the answer itself is going to be, right?

00:27:04So if we follow that equation written on the top here

00:27:07and plug in negative one and two,

00:27:10we're going to get one minus three minus four,

00:27:13which equals minus six, right?

00:27:15And when I put that into my non-linearity, G,

00:27:19I'm going to get a final output of 0.002, right?

00:27:24So that, don't worry about the final output,

00:27:26that's just going to be the output

00:27:27for that sigmoid function.

00:27:30But the important point to remember here

00:27:31is that the sigmoid function actually divides the space

00:27:35into these two parts, right?

00:27:37It squashes everything between zero and one,

00:27:39but it divides it implicitly by everything less than 0.5

00:27:44and greater than 0.5, depending on if it's on,

00:27:47if X is less than zero or greater than zero.

00:27:51So depending on which side of the line that you fall on,

00:27:54remember the line is when X equals zero,

00:27:56the input to the sigmoid is zero.

00:27:58If you fall on the left side of the line,

00:28:01your output will be less than 0.5

00:28:04because you're falling on the negative side of the line.

00:28:07If your input is on the right side of the line,

00:28:10now your output is going to be greater than 0.5, right?

00:28:15So here we can actually visualize the space.

00:28:18This is called the feature space of a neural network.

00:28:21We can visualize it in its completion, right?

00:28:24We can totally visualize and interpret this neural network.

00:28:27We can understand exactly what it's going to do

00:28:29for any input that it sees, right?

00:28:31But of course, this is a very simple neuron, right?

00:28:34It's not a neural network, it's just one neuron.

00:28:36And even more than that, it's even a very simple neuron.

00:28:39It only has two inputs, right?

00:28:42So in reality, the types of neurons

00:28:44that you're going to be dealing with in this course

00:28:47are going to be neurons and neural networks

00:28:49with millions or even billions of these parameters,

00:28:53right, of these inputs, right?

00:28:55So here we only have two weights, W1, W2,

00:28:58but today's neural networks

00:28:59have billions of these parameters.

00:29:02So drawing these types of plots that you see here

00:29:05obviously becomes a lot more challenging.

00:29:07It's actually not possible.

00:29:11But now that we have some of the intuition

00:29:12behind a perceptron, let's start now

00:29:15by building neural networks

00:29:17and seeing how all of this comes together.

00:29:20So let's revisit that previous diagram of a perceptron.

00:29:23Now, again, if there's only one thing

00:29:26to take away from this lecture right now,

00:29:28it's to remember how a perceptron works.

00:29:31That equation of a perceptron is extremely important

00:29:34for every single class that comes after today,

00:29:36and there's only three steps.

00:29:38It's dot product with the inputs,

00:29:40add a bias, and apply your nonlinearity.

00:29:44Let's simplify the diagram a little bit.

00:29:46I'll remove the weight labels from this picture,

00:29:49and now you can assume that if I show a line,

00:29:52every single line has an associated weight

00:29:55that comes with that line.

00:29:57I'll also remove the bias term for simplicity.

00:30:00Assume that every neuron has that bias term.

00:30:02I don't need to show it.

00:30:04And now note that the result here, now calling it z,

00:30:08which is just the dot product plus bias

00:30:11before the nonlinearity,

00:30:14is the output is going to be linear, first of all.

00:30:17It's just a weighted sum of all those pieces.

00:30:19We have not applied the nonlinearity yet,

00:30:21but our final output is just going to be g of z.

00:30:25It's the activation function,

00:30:27our nonlinear activation function applied to z.

00:30:31Now, if we want to step this up a little bit more

00:30:35and say, what if we had a multi-output function?

00:30:39Now, we don't just have one output,

00:30:40but let's say we want to have two outputs.

00:30:42Well, now we can just have two neurons in this network.

00:30:46Every neuron sees all of the inputs that came before it,

00:30:51but now you see the top neuron

00:30:53is going to be predicting an answer,

00:30:55and the bottom neuron will predict its own answer.

00:30:57Now, importantly, one thing you should really notice here

00:30:59is that each neuron has its own weights, right?

00:31:04Each neuron has its own lines

00:31:05that are coming into just that neuron, right?

00:31:08So they're acting independently,

00:31:09but they can later on communicate

00:31:11if you have another layer, right?

00:31:15So, let's start now by initializing this process

00:31:22a bit further and thinking about it more programmatically,

00:31:25right, what if we wanted to program

00:31:27this neural network ourselves from scratch, right?

00:31:31Remember that equation I told you,

00:31:32it didn't sound very complex.

00:31:33Let's take a dot product, add a bias,

00:31:36which is a single number, and apply a nonlinearity.

00:31:38Let's see how we would actually implement

00:31:40something like that.

00:31:41So to define the layer, right,

00:31:44we're now going to call this a layer,

00:31:46which is a collection of neurons, right?

00:31:50We have to first define how that information

00:31:53propagates through the network.

00:31:55So we can do that by creating a call function.

00:31:57Here, first, we're going to actually define

00:31:59the weights for that network, right?

00:32:02So remember, every network, every neuron, I should say,

00:32:05every neuron has weights and a bias, right?

00:32:07So let's define those first.

00:32:10We're going to create the call function

00:32:12to actually see how we can pass information

00:32:15through that layer, right?

00:32:17So this is going to take as input an inputs, right?

00:32:21This is like what we previously called x,

00:32:23and it's the same story that we've been seeing

00:32:26this whole class, right?

00:32:27We're going to matrix multiply,

00:32:29or take a dot product of our inputs with our weights.

00:32:33We're going to add a bias,

00:32:35and then we're going to apply a nonlinearity.

00:32:37It's really that simple, right?

00:32:39We've now created a single-layer neural network, right?

00:32:45So this line in particular, this is the part

00:32:47that allows us to be a powerful neural network,

00:32:52maintaining that nonlinearity.

00:32:54And the important thing here is to note

00:32:57that modern deep learning toolboxes and libraries

00:33:02already implement a lot of these for you, right?

00:33:04So it's important for you to understand the foundations,

00:33:07but in practice, all of that layer architecture

00:33:10and all of that layer logic is actually implemented

00:33:14in tools like TensorFlow and PyTorch

00:33:16through a dense layer, right?

00:33:18So here you can see an example of calling,

00:33:21or creating, initializing a dense layer with two neurons,

00:33:27allowing it to feed in an arbitrary set of inputs.

00:33:30Here we're seeing these two neurons in a layer

00:33:33being fed three inputs, right?

00:33:35And in code, it's only reduced down

00:33:37to this one line of TensorFlow code,

00:33:39making it extremely easy and convenient

00:33:42for us to use these functions and call them.

00:33:45So now let's look at our single-layered neural network.

00:33:48This is where we have now one layer

00:33:51between our input and our outputs, right?

00:33:53So we're slowly and progressively increasing

00:33:56the complexity of our neural network

00:33:58so that we can build up all of these building blocks, right?

00:34:02This layer in the middle is called a hidden layer, right?

00:34:06Obviously, because you don't directly observe it,

00:34:08you don't directly supervise it, right?

00:34:11You do observe the two input and output layers,

00:34:13but your hidden layer is just kind of a neuron layer

00:34:18that you don't directly observe, right?

00:34:19It just gives your network more capacity,

00:34:22more learning complexity.

00:34:24And since we now have a transformation function

00:34:26from inputs to hidden layers and hidden layers to output,

00:34:31we now have a two-layered neural network, right?

00:34:34Which means that we also have two weight matrices, right?

00:34:38We don't have just the W1,

00:34:40which we previously had to create this hidden layer,

00:34:42but now we also have W2,

00:34:44which does the transformation

00:34:45from hidden layer to output layer.

00:34:47Yes?

00:34:48What happens to the non-linearity in hidden?

00:34:50You have just linear, so there's no,

00:34:53is it a perceptron or not?

00:34:55Yes, so every hidden layer also has

00:34:57a non-linearity accompanied with it, right?

00:35:00And that's a very important point

00:35:01because if you don't have that perceptron,

00:35:03then it's just a very large linear function

00:35:05followed by a final non-linearity at the very end, right?

00:35:08So you need that cascading and overlapping application

00:35:14of non-linearities that occur throughout the network.

00:35:18Awesome.

00:35:20Okay, so now let's zoom in,

00:35:22look at a single unit in the hidden layer.

00:35:25Take this one, for example, let's call it Z2, right?

00:35:27It's the second neuron in the first layer, right?

00:35:31It's the same perceptron that we saw before.

00:35:33We compute its answer by taking a dot product

00:35:36of its weights with its inputs,

00:35:39adding a bias, and then applying a non-linearity.

00:35:42If we took a different hidden node, like Z3,

00:35:45the one right below it, we would compute its answer

00:35:48exactly the same way that we computed Z2,

00:35:50except its weights would be different

00:35:52than the weights of Z2.

00:35:53Everything else stays exactly the same.

00:35:55It sees the same inputs, but of course,

00:35:57I'm not going to actually show Z3 in this picture,

00:36:00and now this picture is getting a little bit messy,

00:36:02so let's clean things up a little bit more.

00:36:04I'm gonna remove all the lines now

00:36:05and replace them just with these boxes,

00:36:08these symbols that will denote

00:36:10what we call a fully connected layer, right?

00:36:13So these layers now denote that everything in our input

00:36:16is connected to everything in our output,

00:36:18and the transformation is exactly as we saw before,

00:36:20dot product, bias, and non-linearity.

00:36:24And again, in code, to do this is extremely straightforward

00:36:28with the foundations that we've built up

00:36:30from the beginning of the class.

00:36:32We can now just define two of these dense layers, right?

00:36:35Our hidden layer on line one with n hidden units,

00:36:39and then our output layer with two hidden output units.

00:36:43Does that mean the non-linearity function

00:36:44must be the same between layers?

00:36:47Non-linearity function does not need to be the same

00:36:49through each layer.

00:36:50Oftentimes it is because of convenience.

00:36:54There are some cases where you would want it

00:36:56to be different as well, especially in lecture two,

00:36:59you're going to see non-linearities be different

00:37:02even within the same layer, let alone different layers.

00:37:06But unless for a particular reason,

00:37:10generally convention is there's no need

00:37:11to keep them differently.

00:37:15Now, let's keep expanding our knowledge a little bit more.

00:37:18If we now want to make a deep neural network,

00:37:20not just a neural network like we saw on the previous side,

00:37:23now it's deep, all that means is that we're now going

00:37:25to stack these layers on top of each other, one by one,

00:37:29more and more creating a hierarchical model, right?

00:37:32The ones where the final output is now going to be computed

00:37:36by going deeper and deeper and deeper

00:37:38into the neural network.

00:37:40And again, doing this in code, again,

00:37:43follows the exact same story as before,

00:37:45just cascading these TensorFlow layers

00:37:48on top of each other,

00:37:49and just going deeper into the network.

00:37:53Okay, so now this is great because now we have

00:37:56at least a solid foundational understanding

00:37:58of how to not only define a single neuron,

00:38:00but how to define an entire neural network.

00:38:02And you should be able to actually explain

00:38:04at this point or understand how information goes

00:38:07from input through an entire neural network

00:38:11to compute an output.

00:38:13So now let's look at how we can apply these neural networks

00:38:16to solve a very real problem

00:38:18that I'm sure all of you care about.

00:38:21So here's a problem on how we want to build an AI system

00:38:24to learn to answer the following question,

00:38:26which is, will I pass this class, right?

00:38:29I'm sure all of you are really worried about this question.

00:38:34So to do this, let's start with a simple input feature model.

00:38:38The two features that let's concern ourselves with

00:38:41are going to be number one, how many lectures you attend,

00:38:45and number two, how many hours you spend

00:38:49on your final project.

00:38:51So let's look at some of the past years of this class.

00:38:55We can actually observe how different people have

Category

Transcript

Recommended