r语言回归自测习题附代码答案.docx

上传人:b****4 文档编号:6656929 上传时间:2023-05-10 格式:DOCX 页数:10 大小:39KB
下载 相关 举报
r语言回归自测习题附代码答案.docx_第1页
第1页 / 共10页
r语言回归自测习题附代码答案.docx_第2页
第2页 / 共10页
r语言回归自测习题附代码答案.docx_第3页
第3页 / 共10页
r语言回归自测习题附代码答案.docx_第4页
第4页 / 共10页
r语言回归自测习题附代码答案.docx_第5页
第5页 / 共10页
r语言回归自测习题附代码答案.docx_第6页
第6页 / 共10页
r语言回归自测习题附代码答案.docx_第7页
第7页 / 共10页
r语言回归自测习题附代码答案.docx_第8页
第8页 / 共10页
r语言回归自测习题附代码答案.docx_第9页
第9页 / 共10页
r语言回归自测习题附代码答案.docx_第10页
第10页 / 共10页
亲,该文档总共10页,全部预览完了,如果喜欢就下载吧!
下载资源
资源描述

r语言回归自测习题附代码答案.docx

《r语言回归自测习题附代码答案.docx》由会员分享,可在线阅读,更多相关《r语言回归自测习题附代码答案.docx(10页珍藏版)》请在冰点文库上搜索。

r语言回归自测习题附代码答案.docx

r语言回归自测习题附代码答案

##################Part1:

LinearRegressionConcepts#######################

##Thesequestionsdonotrequirecodingbutwillexploresomeimportantconcepts.

##"Regression"referstothesimplelinearregressionequation:

##y=b0+b1*x

##Thishomeworkwillnotdiscussothermodels.

##1.(1pt)

##WhatistheinterpretationofthecoefficientB1?

##(Whatmeaningdoesitrepresent?

##Youranswerhere

#当自变量增加一个单位的时候,因变量增加多少个单位?

##2.(1pt)

##Outliersareproblemsformanystatisticalmethods,butareparticularlyproblematic

##forlinearregression.Whyisthat?

Itmayhelptodefinewhatoutliermeansinthiscase.

##(Hint:

Thinkofhowresidualsarecalculated)

##Youranswerhere

#因为线性回归模型的一个观测点异常时,会对自变量和因变量的平均值产生很大影响,会对beta产生很大的影响,模型会发生巨大的改变

#标准化残差值大于2或者小于2的点可能是离群点

##3.(1pt)

##Howcouldyoudealwithoutliersinordertoimprovetheaccuracyofyourmodel?

##Youranswerhere

#对离群点进行删除或者用均值来替代

##################Part2:

SamplingandPointEstimation#####################

##Thefollowingproblemswillusethecatsdatasetandexplore

##theaveragebodyweightoffemalecats.

##Loadthedatabyrunningthefollowingcode

#install.packages("MASS")

library(MASS)

##Warning:

package'MASS'wasbuiltunderRversion3.3.3

data(cats)

##4.(2pts)

##SubsetthedataframetoONLYincludefemalecats.

##Youranswerhere

cats=cats[cats$Sex=="F",]

##Usethesamplefunctiontogenerateavectorof1sand2sthatisthesame

##lengthasthesubsetteddataframeyoujustcreated.Usethisvectortosplit

##the'Bwt'variableintotwovectors,Bwt1andBwt2.

##IMPORTANT:

Makesuretorunthefollowingseedfunctionbeforeyourunyoursample

##function.Runthembacktobackeachtimeyouwanttorunthesamplefunctiontoensure

##thesameseedisusedeverytime.

##Check:

Ifyoudidthisproperly,youwillhave24elementsinBwt1and23elements

##inBwt2.

set.seed(676)

##Youranswerhere

set.seed(676)

s1=sample(length(cats$Bwt),24)

Bwt1=cats$Bwt[sample(length(cats$Bwt),24)]

Bwt2=cats$Bwt[-s1]

##5.(3pts)

##Calculatethemeanandthestandarddeviationforeachofthetwo

##vectors,Bwt1andBwt2.Usethisinformationtocreatea95%

##confidenceintervalforyoursamplemeans(youcanusethefollowingformula

##foraconfidenceinterval:

mean+/-2*standarddeviation).

##Comparetheconfidenceintervals--dotheyseemtoagreeordisagree?

##Youranswerhere

mean(Bwt1)

##[1]2.3375

mean(Bwt2)

##[1]2.395652

sd(Bwt1)

##[1]0.2617873

sd(Bwt2)

##[1]0.2754802

#confidenceinterval

mean(Bwt1)+2*sd(Bwt1)

##[1]2.861075

mean(Bwt1)-2*sd(Bwt1)

##[1]1.813925

mean(Bwt2)+2*sd(Bwt2)

##[1]2.946613

mean(Bwt2)-2*sd(Bwt2)

##[1]1.844692

#从置信区间来看,他们相差不大,结果类似。

##6.

##Draw1000observationsfromastandardnormaldistribution.Calculatethesamplemean.

##Repeatthis500times,storingeachsamplemeaninavectorcalledmean_dist.

##Plotahistogramofmean_disttodisplaythedistributionofyoursamplemean.

##Howcloselydoesyourhistogramresemblethisnormaldistribution?

Explain.

##Youranswerhere

mean_dist=0

for(iin1:

1000){

x=rnorm(1000)

mean_dist[i]=mean(x)

}

hist(mean_dist)

#从结果来看,均值直方图符合正态分布。

##7.(3pts)

##WriteafunctionthatimplementsQ5.

HW.Bootstrap=function(distn,n,reps){

set.seed(666)

###Youranswerhere

#confidenceinterval

mean_dist=0

if(distn=="rexp"){

for(iin1:

reps){

x<-rexp(n,1)

mean_dist[i]=mean(x)

}

hist(mean_dist)

}

}

##UsethefunctionyouwritetorepeattheexperimentinQ5butinsteadofthe

##normaldistributionasweusedabove,useanexponentialdistributionwithmean1.

##Checkyourhistogramandwriteoutyourfindings.

##(Hint:

HW.Bootstrap(rexp,n,reps))

##Youranswerhere

HW.Bootstrap(distn="rexp",n=1000,reps=1000)

#从结果来看,指数分布的均值直方图形状符合正态分布。

###################Part3:

MoreLinearRegression######################

##ThisproblemwillusethePrestigedataset.

##Loadthedatabyrunningcodebelow

#install.packages("car")

library(car)

##Warning:

package'car'wasbuiltunderRversion3.3.3

data(Prestige)

head(Prestige)

##educationincomewomenprestigecensustype

##gov.administrators13.111235111.1668.81113prof

##general.managers12.26258794.0269.11130prof

##accountants12.77927115.7063.41171prof

##purchasing.officers11.4288659.1156.81175prof

##chemists14.62840311.6873.52111prof

##physicists15.64110305.1377.62113prof

##Wewillfocusonthistwovariables:

##income:

Averageincomeofincumbents,dollars,in1971.

##education:

Averageeducationofoccupationalincumbents,years,in1971

##Beforestartingthisproblem,wewilldeclareanullhypthosesisthat

##educationhasnoeffectonincome.

##Thatis:

H0:

B1=0

##HA:

B1!

=0

##Wewillattempttorejectthishypothesisbyusingalinearregression

##8.(2pt)

##FitalinearregressionusingofPrestigedatausingeducationtopredict

##income,usinglm().Examinethemodeldiagnosticsusingplot().Wouldyou

##considerthisagoodmodelornot?

Explain.

##Youranswerhere

mm<-lm(income~.,data=Prestige)

plot(mm)

#从图中看,可以发现有异常点出现,qq图没有分布在红线周围,残差不符合正态分布,因此模型拟合效果一般。

##9.(2pts)

##Usingtheinformationfromsummary()onyourmodel(theoutputfromthelm()command),createa

##95%confidenceintervalforthecoefficientofeducationvariable

##Youranswerhere

summary(mm)

##

##Call:

##lm(formula=income~.,data=Prestige)

##

##Residuals:

##Min1QMedian3QMax

##-7752.4-954.6-331.2742.614301.3

##

##Coefficients:

##EstimateStd.ErrortvaluePr(>|t|)

##(Intercept)7.320533037.270480.0020.99808

##education131.18372288.749610.4540.65068

##women-53.234809.83107-5.4154.96e-07***

##prestige139.2091236.402393.8240.00024***

##census0.042090.235680.1790.85865

##typeprof509.151501798.879140.2830.77779

##typewc347.990101173.893840.2960.76757

##---

##Signif.codes:

0'***'0.001'**'0.01'*'0.05'.'0.1''1

##

##Residualstandarderror:

2633on91degreesoffreedom

##(4observationsdeletedduetomissingness)

##MultipleR-squared:

0.6363,AdjustedR-squared:

0.6123

##F-statistic:

26.54on6and91DF,p-value:

<2.2e-16

#95%confidenceinterval

confint.lm(mm)

##2.5%97.5%

##(Intercept)-6025.84416066040.4852295

##education-442.3818984704.7493459

##women-72.7630052-33.7065943

##prestige66.9002455211.5179870

##census-0.42605090.5102307

##typeprof-3064.10093704082.4039336

##typewc-1983.80579892679.7860021

##10.(2pts)

##Basedontheresultfromquestion9,wouldyourejectthenullhypothesisornot?

##(Assumeasignificancelevelof0.05).Explain.

##Youranswerhere

#Coefficients:

#EstimateStd.ErrortvaluePr(>|t|)

#(Intercept)7.320533037.270480.0020.99808

#education131.18372288.749610.4540.65068

#从结果来看,education的p值大于0.05,因此可以认为教育对收入没有显著的影响。

##11.(1pt)

##Assumingthatthenullhypothesisistrue.

##Basedonyourdecisioninthepreviousquestion,wouldyoubecommittingadecisionerror?

##Ifso,whichtypeoferror?

##Youranswerhere

#而类型II错误不正确地保留假虚假假设(“假阴性”)。

##12.(1pt)

##Discusswhatyourregressionresultsmeaninthecontextofthedata.

##(Hint:

ThinkbacktoQuestion1)

##Youranswerhere

#从结果来看,可以发现性别声望对收入有显著的影响,同时可以发现声望越大,收入越高,而性别为女性,则收入会降低。

展开阅读全文
相关资源
猜你喜欢
相关搜索
资源标签

当前位置:首页 > 工程科技

copyright@ 2008-2023 冰点文库 网站版权所有

经营许可证编号:鄂ICP备19020893号-2