作业5R.docx
《作业5R.docx》由会员分享,可在线阅读,更多相关《作业5R.docx(17页珍藏版)》请在冰点文库上搜索。
![作业5R.docx](https://file1.bingdoc.com/fileroot1/2023-6/10/3fedf24e-43dc-4c4f-ab7a-b317bd91503c/3fedf24e-43dc-4c4f-ab7a-b317bd91503c1.gif)
作业5R
1.Downloaddata.UseRtoanswerthefollowingquestions.
>data=("D:
/R/data/",header=T)
>attach(data)
HypothesisTest:
a)ReaddataintoR.Conductatestofhypothesistodetermineifthereisadifferenceinthemeansellingpriceofhomeswithanattachedgarageandhomeswithoutagarage.Usethesignificancelevel.
>garage<-Price[Garage==1]
>nogarage<-Price[Garage==0]
>a=(garage,nogarage);a
WelchTwoSamplet-test
data:
garageandnogarage
t=,df=,p-value=
alternativehypothesis:
truedifferenceinmeansisnotequalto0
95percentconfidenceinterval:
sampleestimates:
meanofxmeanofy
在95%的置信水平下P值明显小于,拒绝原假设,有车库和没车库的房子的价格有显著差异
b)Conductatestofhypothesistodetermineifthereisadifferenceinthevariabilityofthesellingpricesofhomesthathaveaswimmingpoolversusthosethatdonothaveaswimmingpool.Usethesignificancelevel.
¥
>pool<-Price[Pool==1]
>nopool<-Price[Pool==0]
>b=(pool,nopool,=;b
WelchTwoSamplet-test
data:
poolandnopool
t=,df=,p-value=
alternativehypothesis:
truedifferenceinmeansisnotequalto0
98percentconfidenceinterval:
sampleestimates:
~
meanofxmeanofy
在98%的置信水平下P值明显小于,拒绝原假设,有泳池和没泳池的房子的价格有显著差异
Regression:
a)Writeouttheregressionequation.Givesomeinterpretationstothismodel.
>y=Price;x1=Bedrooms;x2=Size;x3=Pool;x4=Distance;x5=Township;x6=Garage;x7=Baths
>z=(y,x1,x2,x3,x4,x5,x6,x7)
>lmz=lm(y~1+x1+x2+x3+x4+x5+x6+x7,data=z);lmz
Call:
(
lm(formula=y~.,data=z)
Coefficients:
(Intercept)x1x2x3x4
x5x6x7
b)DetermineandinterprettheR2value.
>anova(lmz)
AnalysisofVarianceTable
:
Response:
y
DfSumSqMeanSqFvaluePr(>F)
x115040950409***
x2199559955**
x311475414754***
x411275312753**
x5112831283
x612677126771***
x7172117211*
Residuals971076311110
---
—
Signif.codes:
0‘***’‘**’‘*’‘.’‘’1
在方差分析中x5对因变量的影响不显著
>summary(lmz)
Call:
lm(formula=y~.,data=z)
Residuals:
Min1QMedian3QMax
、
Coefficients:
EstimateStd.ErrortvaluePr(>|t|)
(Intercept)
x1**
x2*
x3**
x4
x5
x6***
x7*
---
^
Signif.codes:
0‘***’‘**’‘*’‘.’‘’1
Residualstandarderror:
on97degreesoffreedom
MultipleR-squared:
AdjustedR-squared:
F-statistic:
on7and97DF,p-value:
Summary中x4,x5未能通过t检验
后面模型可以考虑剔除x4,x5
R的平方为,修改后的R的平方为
c)Developacorrelationmatrix.Summarizeyourfindings.Checktheindependentvariablesformulticollinearity.
>cor(z)
yx1x2x3x4
【
y0.
x10.
x2
x3-0.
x4-0.
x50.
x6
x7
x5x6x7
y0.0.0.
x10.0.0.
|
x20.
x3-0.-0.
x4-0.-0.-0.
x5
x60.
x70.
无明显多重共线性
>library(car)
>vif(lmz)
x1x2x3x4x5x6x7
|
Vif均小于2,进一步说明无多重共线性
d)Conductaglobaltestonthesetofindependentvariables.
e)Testeachoftheindependentvariablestodetermineiftheydifferfromzero.
f)GivetheANOVAtableforthisregressionmodel.GivesomeexplanationstothisANOVAtable.
g)WouldyouconsiderdeletinganyoftheindependentvariablesIfso,reruntheregressionanalysisandreportthenewequation.
>("leaps")
]
>library(leaps)
>vselect=regsubsets(y~x1+x2+x3+x4+x5+x6+x7,data=z)
>s=summary(vselect)
>l=(s$outmat,RSS=s$rss,R2=s$rsq,cp=s$cp,BIC=s$bic);l
x1x2x3x4x5x6x7RSSR2cp
1
(1)*
2
(1)**
3
(1)***
4
(1)****
5
(1)*****
6
(1)******
*
7
(1)*******
BIC
1
(1)
2
(1)
3
(1)
4
(1)
5
(1)
6
(1)
7
(1)
这里通过分析RSS,cp和BIC的值可以确定最佳模型,RSS的值越大越好,
BIC越小越好,由于第一个模型剔除太多,不具有太高的经济意义,不予考虑
~
通过综合的考虑,选择剔除x4和x5
采用逐步回归法对模型进行调整
>step(lmz,direction="forward")
Start:
AIC=
y~1+x1+x2+x3+x4+x5+x6+x7
Call:
lm(formula=y~1+x1+x2+x3+x4+x5+x6+x7,data=z)
Coefficients:
【
(Intercept)x1x2x3x4
x5x6x7
采用前进法得到的回归模型AIC=,模型没有改变
>step(lmz,direction="both")
Start:
AIC=
y~1+x1+x2+x3+x4+x5+x6+x7
DfSumofSqRSSAIC
-x51108092
~
107631
-x41109701
-x71114842
-x21115236
-x31115611
-x11116629
-x61131362
Step:
AIC=
y~x1+x2+x3+x4+x6+x7
/
DfSumofSqRSSAIC
-x41109890
108092
+x51107631
-x71115453
-x21115485
-x31115649
-x11116677
-x61132338
Step:
AIC=
/
y~x1+x2+x3+x6+x7
DfSumofSqRSSAIC
109890
+x41108092
+x51109701
-x21117783
-x31118177
-x71118209
-x11118601
-x61141489
、
Call:
lm(formula=y~x1+x2+x3+x6+x7,data=z)
Coefficients:
(Intercept)x1x2x3x6
x7
逐步回归后得到的结果与之前相似。
剔除x4和x,5AIC=
>lmz1=lm(y~+x1+x2+x3+x6+x7,data=z);lmz1
!
Call:
lm(formula=y~+x1+x2+x3+x6+x7,data=z)
Coefficients:
(Intercept)x1x2x3x6
x7
>anova(lmz1)
&
AnalysisofVarianceTable
Response:
y
DfSumSqMeanSqFvaluePr(>F)
x115040950409***
x2199559955**
x311475414754***
x613744137441***
x7183188318**
Residuals991098901110
---
…
Signif.codes:
0‘***’‘**’‘*’‘.’‘’1
>summary(lmz1)
Call:
lm(formula=y~+x1+x2+x3+x6+x7,data=z)
Residuals:
Min1QMedian3QMax
Coefficients:
《
EstimateStd.ErrortvaluePr(>|t|)
(Intercept)
x1**
x2**
x3**
x6***
x7**
---
Signif.codes:
0‘***’‘**’‘*’‘.’‘’1
Residualstandarderror:
on99degreesoffreedom
:
MultipleR-squared:
AdjustedR-squared:
F-statistic:
on5and99DF,p-value:
通过修改的模型具有明显各项方差检验和t检验均有明显的改善
最终得到的模型为
y=++
h)Givesomediagnosticplotsfortheregressionmodel.Summarizeyourfindings.
><-round(residuals(lmz1),2)
><-round(predict(lmz1),2)
>result<(y,,
》
>result
y
1
2
3
4
5
6
7
8
9
<
10
11
12
13
14
15
16
17
18
19
20
:
21
22
23
24
25
26
27
28
29
30
31
;
32
33
34
35
36
37
38
39
40
41
42
[
43
44
45
46
47
48
49
50
51
52
53
《
54
55
56
57
58
59
60
61
62
63
64
)
65
66
67
68
69
70
71
72
73
74
75
/
76
77
78
79
80
81
82
83
84
85
86
?
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
>fit1<-fitted(lmz1)
>residuals1<-resid(lmz1)
>plot(fit1,residuals1)
>residuals1<-resid
>qqnorm(residuals1)
>qqline(residuals1)
>library(car)
>qqPlot(lmz1)