the comfortably final introductory part is what are the types of machine learning problems and you will see very often that people have these three big groups of machine learning problems.

let’s introduce this terminology here so first group is supervised learning problems in the supervised learning you you have some some data like for example photos of cats and dogs the same example they are labeled so there’s a photo and it says it’s a cat and there’s another photo and says it’s a dog and and so on so the the task is to distinguish one class from another maybe you have different classes still it’s supervised because it’s supervised by the labels unsupervised learning imagine the same data but you don’t have labels you just have a bunch of images and that’s all, you don’t you don’t tell the algorithm where the cats and dogs are and an example here would be for example that you want the program to figure out that there are two different kinds of animals or more and essentially cluster your images by what animal it is so that’s that’s unsupervised learning setting and the reinforcement learning is the third group um where for example you want the so where the [Music] program is more like an agent it has to make decisions uh in some sort of game environment or exploration environment for example play chess play go drive a car that’s all reinforcement learning tasks so in this course we are not going to be talking about reinforcement learning which has a bit of a different sets of um set of um ideas behind it you’d need to take a separate course to learn about that we will be talking about the first two mostly about supervised learning and less so about unsupervised learning which is the same in any machine learning textbook that you will pick up and that is because supervised learning is easier as we will see throughout this course i want to finish with giving this one famous quote from uh jan lakun a deep learning researcher from a few years ago um let me not uh let me show this cake picture here so he said most of human and animal learning is unsupervised learning and then compares it with a cake and says that the bulk of the cake is unsupervised learning and supervised learning is only the icing and the reinforcement learning is the cherry and the cake referring to how the actual animal intelligence works that’s contentious i think some people might disagree but the idea is that if you’re if you’re a baby and you learn something about the world most input that you get doesn’t have explicit labels you and you learn a lot from this input you just you just look at the world and you somehow learn to categorize um at least that’s the idea that people have about how baby learns and that’s largely unsupervised but that’s a very difficult problem and until very recently there was much more progress in supervised learning than in unsupervised learning so this will also be our focus here in this introductory course.

okay so with this we can jump in and we will start with linear regression that might sound a bit underwhelming for some of you because we’re talking about machine learning and i’m starting with a technique that is years old or more probably a very classical statistical method linear regression however i think that actually it’s very useful to demonstrate a lot of concepts in machine learning so we’ll in fact spend like four lectures on linear regression because it will i will just use it as a vehicle to introduce a lot of useful concepts that then appear everywhere in machine learning and today we’re really talking about a very very simple situation so simple linear regression is not is actually a technical term it’s it’s not um a linear regression that is simple you know it it means that you only have one predictor so you have some let’s choose some example you want to predict the height of a person from let’s say the age of a person so x in this case will be the age and y will be the height and if you collect the data then you will see that they’re positively correlated the larger the age the um the higher the person is let’s say the data that’s how the data looks like um and we want to build then we want to fit a linear function here so a linear function will have two parameters one parameter is called the intercept that’s beta here that’s where this line crosses the y axis and the second parameter is the slope um we have so that’s just some terminology here that’s a supervised learning problem of course it’s because we have the y this is not a classification problem so we don’t want to classify two two groups of objects here one from another it’s called a regression problem because we’re predicting continuous variable y height we have some training data which is a set of pairs x i y i where i goes from one to n n is the sample size and we want to fit this model right so we want to fit the linear function to the data so this means we want to choose the beta and beta we want to find the values of these coefficients that somehow yield the line that um describes our data well so yes as i said our prediction the the value of f of x y should be close to y i so how how would we how would we do it how is even what is the way to mathematically formulate this problem right as opposed to just say it should be a good fit so how can we formalize the notion of a good fit that’s a central object here in machine learning is the loss function so we want to write some function that will describe how well our given model with given coefficients describes the data and we want to tinker with the model until it fits well enough or until it’s the best possible fit so the loss function also calls the cost function sometimes and let me just give it to you here directly the loss function of linear regression so let’s look at that that’s the sum of squared values and it’s the sum of all training examples right from to n of the squared deviation between the actual value and um so the actual value y i here and the our prediction f uh from x y and then we square sum it up and divide by n so it’s the average of this um squared deviations right and our prediction is of course just beta plus beta so that’s what we’re getting over here um this is called mean squared error right mean squared error why it’s a good question here is why are we using mean squared error as a loss function for linear regression and i’m actually not going to answer this now i suggest that you though spend a few moments thinking what are the possible loss functions one could have chosen alternatively so it’s not obvious that that’s the best or the optimal in some sense loss function i could have used the third power and not the second power i could have used the absolute value um so the first power but of course i want to analyze the positive errors and i also want to penalize negative errors so i can just put the absolute value and sum that up or maybe the fourth power or maybe some other function of this difference or maybe it should not be the sum but the product you know that you can come up with a lot of functions that will somehow be small if the fit is good and large when the fit is bad which is what we want so why are we using this particular function there can be several answers one of them that is just computationally very very easy to work with and then you will see some other answers appearing in in later lectures um for now we’ll just you know try it out with this last function and see how it goes another term that you might um hear when you when you read on that topic is that this is called ordinary least squares regression or estimation problem you want to find f that minimize this in terms of this squared error so it will be least squares okay and now we will do um another step and instead of simple linear regression we will consider what i will call a baby linear regression and that’s now not a standard term but my own term which is i forget about the intercept let’s try to fit even simpler model than that so ridiculously simple model which just has one parameter and that’s a beta .