r语言回归自测习题附代码答案.docx

资源描述

r语言回归自测习题附代码答案.docx

《r语言回归自测习题附代码答案.docx》由会员分享，可在线阅读，更多相关《r语言回归自测习题附代码答案.docx（10页珍藏版）》请在冰点文库上搜索。

r语言回归自测习题附代码答案.docx

r语言回归自测习题附代码答案

##################Part1:

LinearRegressionConcepts#######################

##Thesequestionsdonotrequirecodingbutwillexploresomeimportantconcepts.

##"Regression"referstothesimplelinearregressionequation:

##y=b0+b1*x

##Thishomeworkwillnotdiscussothermodels.

##1.（1pt）

##WhatistheinterpretationofthecoefficientB1?

##（Whatmeaningdoesitrepresent?

）

##Youranswerhere

#当自变量增加一个单位的时候，因变量增加多少个单位？

##2.（1pt）

##Outliersareproblemsformanystatisticalmethods,butareparticularlyproblematic

##forlinearregression.Whyisthat?

Itmayhelptodefinewhatoutliermeansinthiscase.

##（Hint:

Thinkofhowresidualsarecalculated）

##Youranswerhere

#因为线性回归模型的一个观测点异常时，会对自变量和因变量的平均值产生很大影响，会对beta产生很大的影响，模型会发生巨大的改变

#标准化残差值大于2或者小于2的点可能是离群点

##3.（1pt）

##Howcouldyoudealwithoutliersinordertoimprovetheaccuracyofyourmodel?

##Youranswerhere

#对离群点进行删除或者用均值来替代

##################Part2:

SamplingandPointEstimation#####################

##Thefollowingproblemswillusethecatsdatasetandexplore

##theaveragebodyweightoffemalecats.

##Loadthedatabyrunningthefollowingcode

#install.packages（"MASS"）

library（MASS）

##Warning:

package'MASS'wasbuiltunderRversion3.3.3

data（cats）

##4.（2pts）

##SubsetthedataframetoONLYincludefemalecats.

##Youranswerhere

cats=cats[cats$Sex=="F",]

##Usethesamplefunctiontogenerateavectorof1sand2sthatisthesame

##lengthasthesubsetteddataframeyoujustcreated.Usethisvectortosplit

##the'Bwt'variableintotwovectors,Bwt1andBwt2.

##IMPORTANT:

Makesuretorunthefollowingseedfunctionbeforeyourunyoursample

##function.Runthembacktobackeachtimeyouwanttorunthesamplefunctiontoensure

##thesameseedisusedeverytime.

##Check:

Ifyoudidthisproperly,youwillhave24elementsinBwt1and23elements

##inBwt2.

set.seed（676）

##Youranswerhere

set.seed（676）

s1=sample（length（cats$Bwt）,24）

Bwt1=cats$Bwt[sample（length（cats$Bwt）,24）]

Bwt2=cats$Bwt[-s1]

##5.（3pts）

##Calculatethemeanandthestandarddeviationforeachofthetwo

##vectors,Bwt1andBwt2.Usethisinformationtocreatea95%

##confidenceintervalforyoursamplemeans（youcanusethefollowingformula

##foraconfidenceinterval:

mean+/-2*standarddeviation）.

##Comparetheconfidenceintervals--dotheyseemtoagreeordisagree?

##Youranswerhere

mean（Bwt1）

##[1]2.3375

mean（Bwt2）

##[1]2.395652

sd（Bwt1）

##[1]0.2617873

sd（Bwt2）

##[1]0.2754802

#confidenceinterval

mean（Bwt1）+2*sd（Bwt1）

##[1]2.861075

mean（Bwt1）-2*sd（Bwt1）

##[1]1.813925

mean（Bwt2）+2*sd（Bwt2）

##[1]2.946613

mean（Bwt2）-2*sd（Bwt2）

##[1]1.844692

#从置信区间来看，他们相差不大，结果类似。

##6.

##Draw1000observationsfromastandardnormaldistribution.Calculatethesamplemean.

##Repeatthis500times,storingeachsamplemeaninavectorcalledmean_dist.

##Plotahistogramofmean_disttodisplaythedistributionofyoursamplemean.

##Howcloselydoesyourhistogramresemblethisnormaldistribution?

Explain.

##Youranswerhere

mean_dist=0

for（iin1:

1000）{

x=rnorm（1000）

mean_dist[i]=mean（x）

}

hist（mean_dist）

#从结果来看，均值直方图符合正态分布。

##7.（3pts）

##WriteafunctionthatimplementsQ5.

HW.Bootstrap=function（distn,n,reps）{

set.seed（666）

###Youranswerhere

#confidenceinterval

mean_dist=0

if（distn=="rexp"）{

for（iin1:

reps）{

x<-rexp（n,1）

mean_dist[i]=mean（x）

}

hist（mean_dist）

}

##UsethefunctionyouwritetorepeattheexperimentinQ5butinsteadofthe

##normaldistributionasweusedabove,useanexponentialdistributionwithmean1.

##Checkyourhistogramandwriteoutyourfindings.

##（Hint:

HW.Bootstrap（rexp,n,reps））

##Youranswerhere

HW.Bootstrap（distn="rexp",n=1000,reps=1000）

#从结果来看，指数分布的均值直方图形状符合正态分布。

###################Part3:

MoreLinearRegression######################

##ThisproblemwillusethePrestigedataset.

##Loadthedatabyrunningcodebelow

#install.packages（"car"）

library（car）

##Warning:

package'car'wasbuiltunderRversion3.3.3

data（Prestige）

head（Prestige）

##educationincomewomenprestigecensustype

##gov.administrators13.111235111.1668.81113prof

##general.managers12.26258794.0269.11130prof

##accountants12.77927115.7063.41171prof

##purchasing.officers11.4288659.1156.81175prof

##chemists14.62840311.6873.52111prof

##physicists15.64110305.1377.62113prof

##Wewillfocusonthistwovariables:

##income:

Averageincomeofincumbents,dollars,in1971.

##education:

Averageeducationofoccupationalincumbents,years,in1971

##Beforestartingthisproblem,wewilldeclareanullhypthosesisthat

##educationhasnoeffectonincome.

##Thatis:

H0:

B1=0

##HA:

B1!

##Wewillattempttorejectthishypothesisbyusingalinearregression

##8.（2pt）

##FitalinearregressionusingofPrestigedatausingeducationtopredict

##income,usinglm（）.Examinethemodeldiagnosticsusingplot（）.Wouldyou

##considerthisagoodmodelornot?

Explain.

##Youranswerhere

mm<-lm（income~.,data=Prestige）

plot（mm）

#从图中看，可以发现有异常点出现，qq图没有分布在红线周围，残差不符合正态分布，因此模型拟合效果一般。

##9.（2pts）

##Usingtheinformationfromsummary（）onyourmodel（theoutputfromthelm（）command）,createa

##95%confidenceintervalforthecoefficientofeducationvariable

##Youranswerhere

summary（mm）

##Call:

##lm（formula=income~.,data=Prestige）

##Residuals:

##Min1QMedian3QMax

##-7752.4-954.6-331.2742.614301.3

##Coefficients:

##EstimateStd.ErrortvaluePr（>|t|）

##（Intercept）7.320533037.270480.0020.99808

##education131.18372288.749610.4540.65068

##women-53.234809.83107-5.4154.96e-07***

##prestige139.2091236.402393.8240.00024***

##census0.042090.235680.1790.85865

##typeprof509.151501798.879140.2830.77779

##typewc347.990101173.893840.2960.76757

##---

##Signif.codes:

0'***'0.001'**'0.01'*'0.05'.'0.1''1

##Residualstandarderror:

2633on91degreesoffreedom

##（4observationsdeletedduetomissingness）

##MultipleR-squared:

0.6363,AdjustedR-squared:

0.6123

##F-statistic:

26.54on6and91DF,p-value:

<2.2e-16

#95%confidenceinterval

confint.lm（mm）

##2.5%97.5%

##（Intercept）-6025.84416066040.4852295

##education-442.3818984704.7493459

##women-72.7630052-33.7065943

##prestige66.9002455211.5179870

##census-0.42605090.5102307

##typeprof-3064.10093704082.4039336

##typewc-1983.80579892679.7860021

##10.（2pts）

##Basedontheresultfromquestion9,wouldyourejectthenullhypothesisornot?

##（Assumeasignificancelevelof0.05）.Explain.

##Youranswerhere

#Coefficients:

#EstimateStd.ErrortvaluePr（>|t|）

#（Intercept）7.320533037.270480.0020.99808

#education131.18372288.749610.4540.65068

#从结果来看，education的p值大于0.05，因此可以认为教育对收入没有显著的影响。

##11.（1pt）

##Assumingthatthenullhypothesisistrue.

##Basedonyourdecisioninthepreviousquestion,wouldyoubecommittingadecisionerror?

##Ifso,whichtypeoferror?

##Youranswerhere

#而类型II错误不正确地保留假虚假假设（“假阴性”）。

##12.（1pt）

##Discusswhatyourregressionresultsmeaninthecontextofthedata.

##（Hint:

ThinkbacktoQuestion1）

##Youranswerhere

#从结果来看，可以发现性别声望对收入有显著的影响，同时可以发现声望越大，收入越高，而性别为女性，则收入会降低。

展开阅读全文