作业5R.docx - 冰点文库

资源描述

作业5R.docx

《作业5R.docx》由会员分享，可在线阅读，更多相关《作业5R.docx（17页珍藏版）》请在冰点文库上搜索。

作业5R.docx

作业5R

1.Downloaddata.UseRtoanswerthefollowingquestions.

>data=（"D:

/R/data/",header=T）

>attach（data）

HypothesisTest:

a）ReaddataintoR.Conductatestofhypothesistodetermineifthereisadifferenceinthemeansellingpriceofhomeswithanattachedgarageandhomeswithoutagarage.Usethesignificancelevel.

>garage<-Price[Garage==1]

>nogarage<-Price[Garage==0]

>a=（garage,nogarage）;a

WelchTwoSamplet-test

data:

garageandnogarage

t=,df=,p-value=

alternativehypothesis:

truedifferenceinmeansisnotequalto0

95percentconfidenceinterval:

sampleestimates:

meanofxmeanofy

在95%的置信水平下P值明显小于，拒绝原假设，有车库和没车库的房子的价格有显著差异

b）Conductatestofhypothesistodetermineifthereisadifferenceinthevariabilityofthesellingpricesofhomesthathaveaswimmingpoolversusthosethatdonothaveaswimmingpool.Usethesignificancelevel.

￥

>pool<-Price[Pool==1]

>nopool<-Price[Pool==0]

>b=（pool,nopool,=;b

WelchTwoSamplet-test

data:

poolandnopool

t=,df=,p-value=

alternativehypothesis:

truedifferenceinmeansisnotequalto0

98percentconfidenceinterval:

sampleestimates:

meanofxmeanofy

在98%的置信水平下P值明显小于，拒绝原假设，有泳池和没泳池的房子的价格有显著差异

Regression:

a）Writeouttheregressionequation.Givesomeinterpretationstothismodel.

>y=Price;x1=Bedrooms;x2=Size;x3=Pool;x4=Distance;x5=Township;x6=Garage;x7=Baths

>z=（y,x1,x2,x3,x4,x5,x6,x7）

>lmz=lm（y~1+x1+x2+x3+x4+x5+x6+x7,data=z）;lmz

Call:

（

lm（formula=y~.,data=z）

Coefficients:

（Intercept）x1x2x3x4

x5x6x7

b）DetermineandinterprettheR2value.

>anova（lmz）

AnalysisofVarianceTable

：

Response:

DfSumSqMeanSqFvaluePr（>F）

x115040950409***

x2199559955**

x311475414754***

x411275312753**

x5112831283

x612677126771***

x7172117211*

Residuals971076311110

---

—

Signif.codes:

0‘***’‘**’‘*’‘.’‘’1

在方差分析中x5对因变量的影响不显著

>summary（lmz）

Call:

lm（formula=y~.,data=z）

Residuals:

Min1QMedian3QMax

、

Coefficients:

EstimateStd.ErrortvaluePr（>|t|）

（Intercept）

x1**

x2*

x3**

x6***

x7*

---

Signif.codes:

0‘***’‘**’‘*’‘.’‘’1

Residualstandarderror:

on97degreesoffreedom

MultipleR-squared:

AdjustedR-squared:

F-statistic:

on7and97DF,p-value:

Summary中x4,x5未能通过t检验

后面模型可以考虑剔除x4,x5

R的平方为，修改后的R的平方为

c）Developacorrelationmatrix.Summarizeyourfindings.Checktheindependentvariablesformulticollinearity.

>cor（z）

yx1x2x3x4

【

y0.

x10.

x3-0.

x4-0.

x50.

x5x6x7

y0.0.0.

x10.0.0.

x20.

x3-0.-0.

x4-0.-0.-0.

x60.

x70.

无明显多重共线性

>library（car）

>vif（lmz）

x1x2x3x4x5x6x7

Vif均小于2，进一步说明无多重共线性

d）Conductaglobaltestonthesetofindependentvariables.

e）Testeachoftheindependentvariablestodetermineiftheydifferfromzero.

f）GivetheANOVAtableforthisregressionmodel.GivesomeexplanationstothisANOVAtable.

g）WouldyouconsiderdeletinganyoftheindependentvariablesIfso,reruntheregressionanalysisandreportthenewequation.

>（"leaps"）

]

>library（leaps）

>vselect=regsubsets（y~x1+x2+x3+x4+x5+x6+x7,data=z）

>s=summary（vselect）

>l=（s$outmat,RSS=s$rss,R2=s$rsq,cp=s$cp,BIC=s$bic）;l

x1x2x3x4x5x6x7RSSR2cp

（1）*

（1）**

（1）***

（1）****

（1）*****

（1）******

（1）*******

BIC

（1）

这里通过分析RSS,cp和BIC的值可以确定最佳模型，RSS的值越大越好，

BIC越小越好,由于第一个模型剔除太多，不具有太高的经济意义，不予考虑

通过综合的考虑，选择剔除x4和x5

采用逐步回归法对模型进行调整

>step（lmz,direction="forward"）

Start:

AIC=

y~1+x1+x2+x3+x4+x5+x6+x7

Call:

lm（formula=y~1+x1+x2+x3+x4+x5+x6+x7,data=z）

Coefficients:

【

（Intercept）x1x2x3x4

x5x6x7

采用前进法得到的回归模型AIC=，模型没有改变

>step（lmz,direction="both"）

Start:

AIC=

y~1+x1+x2+x3+x4+x5+x6+x7

DfSumofSqRSSAIC

-x51108092

107631

-x41109701

-x71114842

-x21115236

-x31115611

-x11116629

-x61131362

Step:

AIC=

y~x1+x2+x3+x4+x6+x7

DfSumofSqRSSAIC

-x41109890

108092

+x51107631

-x71115453

-x21115485

-x31115649

-x11116677

-x61132338

Step:

AIC=

y~x1+x2+x3+x6+x7

DfSumofSqRSSAIC

109890

+x41108092

+x51109701

-x21117783

-x31118177

-x71118209

-x11118601

-x61141489

、

Call:

lm（formula=y~x1+x2+x3+x6+x7,data=z）

Coefficients:

（Intercept）x1x2x3x6

逐步回归后得到的结果与之前相似。

剔除x4和x，5AIC=

>lmz1=lm（y~+x1+x2+x3+x6+x7,data=z）;lmz1

！

Call:

lm（formula=y~+x1+x2+x3+x6+x7,data=z）

Coefficients:

（Intercept）x1x2x3x6

>anova（lmz1）

AnalysisofVarianceTable

Response:

DfSumSqMeanSqFvaluePr（>F）

x115040950409***

x2199559955**

x311475414754***

x613744137441***

x7183188318**

Residuals991098901110

---

…

Signif.codes:

0‘***’‘**’‘*’‘.’‘’1

>summary（lmz1）

Call:

lm（formula=y~+x1+x2+x3+x6+x7,data=z）

Residuals:

Min1QMedian3QMax

Coefficients:

《

EstimateStd.ErrortvaluePr（>|t|）

（Intercept）

x1**

x2**

x3**

x6***

x7**

---

Signif.codes:

0‘***’‘**’‘*’‘.’‘’1

Residualstandarderror:

on99degreesoffreedom

MultipleR-squared:

AdjustedR-squared:

F-statistic:

on5and99DF,p-value:

通过修改的模型具有明显各项方差检验和t检验均有明显的改善

最终得到的模型为

y=++

h）Givesomediagnosticplotsfortheregressionmodel.Summarizeyourfindings.

><-round（residuals（lmz1）,2）

><-round（predict（lmz1）,2）

>result<（y,,

》

>result

；

[

《

）

100

101

102

103

104

105

>fit1<-fitted（lmz1）

>residuals1<-resid（lmz1）

>plot（fit1,residuals1）

>residuals1<-resid

>qqnorm（residuals1）

>qqline（residuals1）

>library（car）

>qqPlot（lmz1）

展开阅读全文