斯坦福大学机器学习所有问题及答案合集.pdf

资源描述

斯坦福大学机器学习所有问题及答案合集.pdf

《斯坦福大学机器学习所有问题及答案合集.pdf》由会员分享，可在线阅读，更多相关《斯坦福大学机器学习所有问题及答案合集.pdf（97页珍藏版）》请在冰点文库上搜索。

斯坦福大学机器学习所有问题及答案合集.pdf

CS229机器学习（问题及答案）斯坦福大学目录

（1）作业1（SupervisedLearning）1

（2）作业1解答（SupervisedLearning）5（3）作业2（Kernels,SVMs,andTheory）15（4）作业2解答（Kernels,SVMs,andTheory）19（5）作业3（LearningTheoryandUnsupervisedLearning）27（6）作业3解答（LearningTheoryandUnsupervisedLearning）31（7）作业4（UnsupervisedLearningandReinforcementLearning）39（8）作业4解答（UnsupervisedLearningandReinforcementLearning）44（9）ProblemSet#1:

SupervisedLearning56（10）ProblemSet#1Answer62（11）ProblemSet#2:

ProblemSet#2:

NaiveBayes,SVMs,andTheory78（12）ProblemSet#2Answer85CS229ProblemSet#11CS229,PublicCourseProblemSet#1:

SupervisedLearning1.NewtonsmethodforcomputingleastsquaresInthisproblem,wewillprovethatifweuseNewtonsmethodsolvetheleastsquaresoptimizationproblem,thenweonlyneedoneiterationtoconvergeto.（a）FindtheHessianofthecostfunctionJ（）=12Pmi=1（Tx（i）y（i）2.（b）ShowthatthefirstiterationofNewtonsmethodgivesus=（XTX）1XTy,thesolutiontoourleastsquaresproblem.2.Locally-weightedlogisticregressionInthisproblemyouwillimplementalocally-weightedversionoflogisticregression,whereweweightdifferenttrainingexamplesdifferentlyaccordingtothequerypoint.Thelocally-weightedlogisticregressionproblemistomaximize（）=2T+mXi=1w（i）hy（i）logh（x（i）+（1y（i）log（1h（x（i）i.The2Thereiswhatisknownasaregularizationparameter,whichwillbediscussedinafuturelecture,butwhichweincludeherebecauseitisneededforNewtonsmethodtoperformwellonthistask.Fortheentiretyofthisproblemyoucanusethevalue=0.0001.Usingthisdefinition,thegradientof（）isgivenby（）=XTzwherezRmisdefinedbyzi=w（i）（y（i）h（x（i）andtheHessianisgivenbyH=XTDXIwhereDRmmisadiagonalmatrixwithDii=w（i）h（x（i）（1h（x（i）Forthesakeofthisproblemyoucanjustusetheaboveformulas,butyoushouldtrytoderivetheseresultsforyourselfaswell.Givenaquerypointx,wechoosecomputetheweightsw（i）=exp?

|xx（i）|222?

.Muchlikethelocallyweightedlinearregressionthatwasdiscussedinclass,thisweightingschemegivesmorewhenthe“nearby”pointswhenpredictingtheclassofanewexample.1CS229ProblemSet#12（a）ImplementtheNewton-Raphsonalgorithmforoptimizing（）foranewquerypointx,andusethistopredicttheclassofx.Theq2/directorycontainsdataandcodeforthisproblem.Youshouldimplementthey=lwlr（Xtrain,ytrain,x,tau）functioninthelwlr.mfile.Thisfunc-tiontakesasinputthetrainingset（theXtrainandytrainmatrices,intheformdescribedintheclassnotes）,anewquerypointxandtheweightbandwitdhtau.Giventhisinputthefunctionshould1）computeweightsw（i）foreachtrainingexam-ple,usingtheformulaabove,2）maximize（）usingNewtonsmethod,andfinally3）outputy=1h（x）0.5astheprediction.Weprovidetwoadditionalfunctionsthatmighthelp.TheXtrain,ytrain=loaddata;functionwillloadthematricesfromfilesinthedata/folder.Thefunc-tionplotlwlr（Xtrain,ytrain,tau,resolution）willplottheresultingclas-sifier（assumingyouhaveproperlyimplementedlwlr.m）.Thisfunctionevaluatesthelocallyweightedlogisticregressionclassifieroveralargegridofpointsandplotstheresultingpredictionasblue（predictingy=0）orred（predictingy=1）.Dependingonhowfastyourlwlrfunctionis,creatingtheplotmighttakesometime,sowerecommenddebuggingyourcodewithresolution=50;andlaterincreaseittoatleast200togetabetterideaofthedecisionboundary.（b）Evaluatethesystemwithavarietyofdifferentbandwidthparameters.Inparticular,try=0.01,0.050.1,0.51.0,5.0.Howdoestheclassificationboundarychangewhenvaryingthisparameter?

Canyoupredictwhatthedecisionboundaryofordinary（unweighted）logisticregressionwouldlooklike?

3.MultivariateleastsquaresSofarinclass,wehaveonlyconsideredcaseswhereourtargetvariableyisascalarvalue.Supposethatinsteadoftryingtopredictasingleoutput,wehaveatrainingsetwithmultipleoutputsforeachexample:

（x（i）,y（i）,i=1,.,m,x（i）Rn,y（i）Rp.Thusforeachtrainingexample,y（i）isvector-valued,withpentries.Wewishtousealinearmodeltopredicttheoutputs,asinleastsquares,byspecifyingtheparametermatrixiny=Tx,whereRnp.（a）ThecostfunctionforthiscaseisJ（）=12mXi=1pXj=1?

（Tx（i）jy（i）j?

2.WriteJ（）inmatrix-vectornotation（i.e.,withoutusinganysummations）.Hint:

StartwiththemndesignmatrixX=（x

（1）T（x

（2）T.（x（m）T2CS229ProblemSet#13andthemptargetmatrixY=（y

（1）T（y

（2）T.（y（m）TandthenworkouthowtoexpressJ（）intermsofthesematrices.（b）FindtheclosedformsolutionforwhichminimizesJ（）.Thisistheequivalenttothenormalequationsforthemultivariatecase.（c）Supposeinsteadofconsideringthemultivariatevectorsy（i）allatonce,weinsteadcomputeeachvariabley（i）jseparatelyforeachj=1,.,p.Inthiscase,wehaveapindividuallinearmodels,oftheformy（i）j=Tjx（i）,j=1,.,p.（Sohere,eachjRn）.Howdotheparametersfromthesepindependentleastsquaresproblemscomparetothemultivariatesolution?

4.NaiveBayesInthisproblem,welookatmaximumlikelihoodparameterestimationusingthenaiveBayesassumption.Here,theinputfeaturesxj,j=1,.,ntoourmodelarediscrete,binary-valuedvariables,soxj0,1.Wecallx=x1x2xnTtobetheinputvector.Foreachtrainingexample,ouroutputtargetsareasinglebinary-valuey0,1.Ourmodelisthenparameterizedbyj|y=0=p（xj=1|y=0）,j|y=1=p（xj=1|y=1）,andy=p（y=1）.Wemodelthejointdistributionof（x,y）accordingtop（y）=（y）y（1y）1yp（x|y=0）=nYj=1p（xj|y=0）=nYj=1（j|y=0）xj（1j|y=0）1xjp（x|y=1）=nYj=1p（xj|y=1）=nYj=1（j|y=1）xj（1j|y=1）1xj（a）Findthejointlikelihoodfunction（）=logQmi=1p（x（i）,y（i）;）intermsofthemodelparametersgivenabove.Here,representstheentiresetofparametersy,j|y=0,j|y=1,j=1,.,n.（b）Showthattheparameterswhichmaximizethelikelihoodfunctionarethesameas3CS229ProblemSet#14thosegiveninthelecturenotes;i.e.,thatj|y=0=Pmi=11x（i）j=1y（i）=0Pmi=11y（i）=0j|y=1=Pmi=11x（i）j=1y（i）=1Pmi=11y（i）=1y=Pmi=11y（i）=1m.（c）ConsidermakingapredictiononsomenewdatapointxusingthemostlikelyclassestimategeneratedbythenaiveBayesalgorithm.ShowthatthehypothesisreturnedbynaiveBayesisalinearclassifieri.e.,ifp（y=0|x）andp（y=1|x）aretheclassprobabilitiesreturnedbynaiveBayes,showthatthereexistssomeRn+1suchthatp（y=1|x）p（y=0|x）ifandonlyifT?

1x?

0.（Assume0isaninterceptterm.）5.Exponentialfamilyandthegeometricdistribution（a）Considerthegeometricdistributionparameterizedby:

p（y;）=

（1）y1,y=1,2,3,.Showthatthegeometricdistributionisintheexponentialfamily,andgiveb（y）,T（y）,anda（）.（b）ConsiderperformingregressionusingaGLMmodelwithageometricresponsevari-able.Whatisthecanonicalresponsefunctionforthefamily?

Youmayusethefactthatthemeanofageometricdistributionisgivenby1/.（c）Foratrainingset（x（i）,y（i）;i=1,.,m,letthelog-likelihoodofanexamplebelogp（y（i）|x（i）;）.Bytakingthederivativeofthelog-likelihoodwithrespecttoj,derivethestochasticgradientascentruleforlearningusingaGLMmodelwithgoemetricresponsesyandthecanonicalresponsefunction.4CS229ProblemSet#1Solutions1CS229,PublicCourseProblemSet#1Solutions:

AsshownintheclassnotesJ（）j=mXi=1（Tx（i）y（i）x（i）j.So2J（）jk=mXi=1k（Tx（i）y（i）x（i）j=mXi=1x（i）jx（i）k=（XTX）jkTherefore,theHessianofJ（）isH=XTX.ThiscanalsobederivedbysimplyapplyingrulesfromthelecturenotesonLinearAlgebra.（b）ShowthatthefirstiterationofNewtonsmethodgivesus=（XTX）1XTy,thesolutiontoourleastsquaresproblem.Answer:

Givenany（0）,Newtonsmethodfinds

（1）accordingto

（1）=（0）H1J（0）=（0）（XTX）1（XTX（0）XTy）=（0）（0）+（XTX）1XTy=（XTX）1XTy.Therefore,nomatterwhat（0）wepick,Newtonsmethodalwaysfindsafteroneiteration.2.Locally-weightedlogisticregressionInthisproblemyouwillimplementalocally-weightedversionoflogisticregression,whereweweightdifferenttrainingexamplesdifferentlyaccordingtothequerypoint.Thelocally-weightedlogisticregressionproblemistomaximize（）=2T+mXi=1w（i）hy（i）logh（x（i）+（1y（i）log（1h（x（i）i.5CS229ProblemSet#1Solutions2The2Thereiswhatisknownasaregularizationparameter,whichwillbediscussedinafuturelecture,butwhichweincludeherebecauseitisneededforNewtonsmethodtoperformwellonthistask.Fortheentiretyofthisproblemyoucanusethevalue=0.0001.Usingthisdefinition,thegradientof（）isgivenby（）=XTzwherezRmisdefinedbyzi=w（i）（y（i）h（x（i）andtheHessianisgivenbyH=XTDXIwhereDRmmisadiagonalmatrixwithDii=w（i）h（x（i）（1h（x（i）Forthesakeofthisproblemyoucanjustusetheaboveformulas,butyoushouldtrytoderivetheseresultsforyourselfaswell.Givenaquerypointx,wechoosecomputetheweightsw（i）=exp?

|xx（i）|222?

展开阅读全文