ImageVerifierCode 换一换
格式:PDF , 页数:97 ,大小:1.90MB ,
资源ID:3434877      下载积分:3 金币
快捷下载
登录下载
邮箱/手机:
温馨提示:
快捷下载时,用户名和密码都是您填写的邮箱或者手机号,方便查询和重复下载(系统自动生成)。 如填写123,账号就是123,密码也是123。
特别说明:
请自助下载,系统不会自动发送文件的哦; 如果您已付费,想二次下载,请登录后访问:我的下载记录
支付方式: 支付宝    微信支付   
验证码:   换一换

加入VIP,免费下载
 

温馨提示:由于个人手机设置不同,如果发现不能下载,请复制以下地址【https://www.bingdoc.com/d-3434877.html】到电脑端继续下载(重复下载不扣费)。

已注册用户请登录:
账号:
密码:
验证码:   换一换
  忘记密码?
三方登录: 微信登录   QQ登录  

下载须知

1: 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。
2: 试题试卷类文档,如果标题没有明确说明有答案则都视为没有答案,请知晓。
3: 文件的所有权益归上传用户所有。
4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
5. 本站仅提供交流平台,并不能对任何下载内容负责。
6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

版权提示 | 免责声明

本文(斯坦福大学机器学习所有问题及答案合集.pdf)为本站会员(wj)主动上传,冰点文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。 若此文所含内容侵犯了您的版权或隐私,请立即通知冰点文库(发送邮件至service@bingdoc.com或直接QQ联系客服),我们立即给予删除!

斯坦福大学机器学习所有问题及答案合集.pdf

1、 CS 229 机器学习 (问题及答案)斯坦福大学 目录(1)作业1(Supervised Learning)1 (2)作业1解答(Supervised Learning)5 (3)作业2(Kernels,SVMs,and Theory)15 (4)作业2解答(Kernels,SVMs,and Theory)19 (5)作业3(Learning Theory and Unsupervised Learning)27 (6)作业3解答(Learning Theory and Unsupervised Learning)31 (7)作业4(Unsupervised Learning and Rei

2、nforcement Learning)39 (8)作业4解答(Unsupervised Learning and Reinforcement Learning)44 (9)Problem Set#1:Supervised Learning 56 (10)Problem Set#1 Answer 62 (11)Problem Set#2:Problem Set#2:Naive Bayes,SVMs,and Theory 78 (12)Problem Set#2 Answer 85 CS229 Problem Set#11CS 229,Public CourseProblem Set#1:Sup

3、ervised Learning1.Newtons method for computing least squaresIn this problem,we will prove that if we use Newtons method solve the least squaresoptimization problem,then we only need one iteration to converge to.(a)Find the Hessian of the cost function J()=12Pmi=1(Tx(i)y(i)2.(b)Show that the first it

4、eration of Newtons method gives us=(XTX)1XT y,thesolution to our least squares problem.2.Locally-weighted logistic regressionIn this problem you will implement a locally-weighted version of logistic regression,wherewe weight different training examples differently according to the query point.The lo

5、cally-weighted logistic regression problem is to maximize()=2T+mXi=1w(i)hy(i)logh(x(i)+(1 y(i)log(1 h(x(i)i.The 2T here is what is known as a regularization parameter,which will be discussedin a future lecture,but which we include here because it is needed for Newtons method toperform well on this t

6、ask.For the entirety of this problem you can use the value =0.0001.Using this definition,the gradient of()is given by()=XTz where z Rmis defined byzi=w(i)(y(i)h(x(i)and the Hessian is given byH=XTDX Iwhere D Rmmis a diagonal matrix withDii=w(i)h(x(i)(1 h(x(i)For the sake of this problem you can just

7、 use the above formulas,but you should try toderive these results for yourself as well.Given a query point x,we choose compute the weightsw(i)=exp?|x x(i)|222?.Much like the locally weighted linear regression that was discussed in class,this weightingscheme gives more when the“nearby”points when pre

8、dicting the class of a new example.1CS229 Problem Set#12(a)Implement the Newton-Raphson algorithm for optimizing()for a new query pointx,and use this to predict the class of x.The q2/directory contains data and code for this problem.You should implementthe y=lwlr(X train,y train,x,tau)function in th

9、e lwlr.m file.This func-tion takes as input the training set(the X train and y train matrices,in the formdescribed in the class notes),a new query point x and the weight bandwitdh tau.Given this input the function should 1)compute weights w(i)for each training exam-ple,using the formula above,2)maxi

10、mize()using Newtons method,and finally 3)output y=1h(x)0.5 as the prediction.We provide two additional functions that might help.The X train,y train=load data;function will load the matrices from files in the data/folder.The func-tion plot lwlr(X train,y train,tau,resolution)will plot the resulting

11、clas-sifier(assuming you have properly implemented lwlr.m).This function evaluates thelocally weighted logistic regression classifier over a large grid of points and plots theresulting prediction as blue(predicting y=0)or red(predicting y=1).Dependingon how fast your lwlr function is,creating the pl

12、ot might take some time,so werecommend debugging your code with resolution=50;and later increase it to atleast 200 to get a better idea of the decision boundary.(b)Evaluate the system with a variety of different bandwidth parameters.In particular,try =0.01,0.050.1,0.51.0,5.0.How does the classificat

13、ion boundary change whenvarying this parameter?Can you predict what the decision boundary of ordinary(unweighted)logistic regression would look like?3.Multivariate least squaresSo far in class,we have only considered cases where our target variable y is a scalar value.Suppose that instead of trying

14、to predict a single output,we have a training set withmultiple outputs for each example:(x(i),y(i),i=1,.,m,x(i)Rn,y(i)Rp.Thus for each training example,y(i)is vector-valued,with p entries.We wish to use a linearmodel to predict the outputs,as in least squares,by specifying the parameter matrix iny=T

15、x,where Rnp.(a)The cost function for this case isJ()=12mXi=1pXj=1?(Tx(i)j y(i)j?2.Write J()in matrix-vector notation(i.e.,without using any summations).Hint:Start with the m n design matrixX=(x(1)T(x(2)T.(x(m)T2CS229 Problem Set#13and the m p target matrixY=(y(1)T(y(2)T.(y(m)Tand then work out how t

16、o express J()in terms of these matrices.(b)Find the closed form solution for which minimizes J().This is the equivalent tothe normal equations for the multivariate case.(c)Suppose instead of considering the multivariate vectors y(i)all at once,we insteadcompute each variable y(i)jseparately for each

17、 j=1,.,p.In this case,we have a pindividual linear models,of the formy(i)j=Tjx(i),j=1,.,p.(So here,each j Rn).How do the parameters from these p independent leastsquares problems compare to the multivariate solution?4.Naive BayesIn this problem,we look at maximum likelihood parameter estimation usin

18、g the naiveBayes assumption.Here,the input features xj,j=1,.,n to our model are discrete,binary-valued variables,so xj 0,1.We call x=x1x2 xnTto be the input vector.For each training example,our output targets are a single binary-value y 0,1.Ourmodel is then parameterized by j|y=0=p(xj=1|y=0),j|y=1=p

19、(xj=1|y=1),andy=p(y=1).We model the joint distribution of(x,y)according top(y)=(y)y(1 y)1yp(x|y=0)=nYj=1p(xj|y=0)=nYj=1(j|y=0)xj(1 j|y=0)1xjp(x|y=1)=nYj=1p(xj|y=1)=nYj=1(j|y=1)xj(1 j|y=1)1xj(a)Find the joint likelihood function()=logQmi=1p(x(i),y(i);)in terms of themodel parameters given above.Here,

20、represents the entire set of parametersy,j|y=0,j|y=1,j=1,.,n.(b)Show that the parameters which maximize the likelihood function are the same as3CS229 Problem Set#14those given in the lecture notes;i.e.,thatj|y=0=Pmi=11x(i)j=1 y(i)=0Pmi=11y(i)=0j|y=1=Pmi=11x(i)j=1 y(i)=1Pmi=11y(i)=1y=Pmi=11y(i)=1m.(c

21、)Consider making a prediction on some new data point x using the most likely classestimate generated by the naive Bayes algorithm.Show that the hypothesis returnedby naive Bayes is a linear classifieri.e.,if p(y=0|x)and p(y=1|x)are the classprobabilities returned by naive Bayes,show that there exist

22、s some Rn+1suchthatp(y=1|x)p(y=0|x)if and only if T?1x?0.(Assume 0is an intercept term.)5.Exponential family and the geometric distribution(a)Consider the geometric distribution parameterized by:p(y;)=(1 )y1,y=1,2,3,.Show that the geometric distribution is in the exponential family,and give b(y),T(y

23、),and a().(b)Consider performing regression using a GLM model with a geometric response vari-able.What is the canonical response function for the family?You may use the factthat the mean of a geometric distribution is given by 1/.(c)For a training set(x(i),y(i);i=1,.,m,let the log-likelihood of an e

24、xamplebe logp(y(i)|x(i);).By taking the derivative of the log-likelihood with respect toj,derive the stochastic gradient ascent rule for learning using a GLM model withgoemetric responses y and the canonical response function.4CS229 Problem Set#1 Solutions1CS 229,Public CourseProblem Set#1 Solutions

25、:Supervised Learning1.Newtons method for computing least squaresIn this problem,we will prove that if we use Newtons method solve the least squaresoptimization problem,then we only need one iteration to converge to.(a)Find the Hessian of the cost function J()=12Pmi=1(Tx(i)y(i)2.Answer:As shown in th

26、e class notesJ()j=mXi=1(Tx(i)y(i)x(i)j.So2J()jk=mXi=1k(Tx(i)y(i)x(i)j=mXi=1x(i)jx(i)k=(XTX)jkTherefore,the Hessian of J()is H=XTX.This can also be derived by simply applyingrules from the lecture notes on Linear Algebra.(b)Show that the first iteration of Newtons method gives us=(XTX)1XT y,thesoluti

27、on to our least squares problem.Answer:Given any(0),Newtons method finds(1)according to(1)=(0)H1J(0)=(0)(XTX)1(XTX(0)XT y)=(0)(0)+(XTX)1XT y=(XTX)1XT y.Therefore,no matter what(0)we pick,Newtons method always finds after oneiteration.2.Locally-weighted logistic regressionIn this problem you will imp

28、lement a locally-weighted version of logistic regression,wherewe weight different training examples differently according to the query point.The locally-weighted logistic regression problem is to maximize()=2T+mXi=1w(i)hy(i)logh(x(i)+(1 y(i)log(1 h(x(i)i.5CS229 Problem Set#1 Solutions2The 2T here is

29、 what is known as a regularization parameter,which will be discussedin a future lecture,but which we include here because it is needed for Newtons method toperform well on this task.For the entirety of this problem you can use the value =0.0001.Using this definition,the gradient of()is given by()=XT

30、z where z Rmis defined byzi=w(i)(y(i)h(x(i)and the Hessian is given byH=XTDX Iwhere D Rmmis a diagonal matrix withDii=w(i)h(x(i)(1 h(x(i)For the sake of this problem you can just use the above formulas,but you should try toderive these results for yourself as well.Given a query point x,we choose com

31、pute the weightsw(i)=exp?|x x(i)|222?.Much like the locally weighted linear regression that was discussed in class,this weightingscheme gives more when the“nearby”points when predicting the class of a new example.(a)Implement the Newton-Raphson algorithm for optimizing()for a new query pointx,and us

32、e this to predict the class of x.The q2/directory contains data and code for this problem.You should implementthe y=lwlr(X train,y train,x,tau)function in the lwlr.m file.This func-tion takes as input the training set(the X train and y train matrices,in the formdescribed in the class notes),a new qu

33、ery point x and the weight bandwitdh tau.Given this input the function should 1)compute weights w(i)for each training exam-ple,using the formula above,2)maximize()using Newtons method,and finally 3)output y=1h(x)0.5 as the prediction.We provide two additional functions that might help.The X train,y train=load data;function will load the matrices from files in the data/folder.The func-tion plot lwl

copyright@ 2008-2023 冰点文库 网站版权所有

经营许可证编号:鄂ICP备19020893号-2