多元回归与多项式回归.docx
《多元回归与多项式回归.docx》由会员分享,可在线阅读,更多相关《多元回归与多项式回归.docx(45页珍藏版)》请在冰点文库上搜索。
![多元回归与多项式回归.docx](https://file1.bingdoc.com/fileroot1/2023-5/29/955edb74-9f71-46b0-9b0f-d116a4632dca/955edb74-9f71-46b0-9b0f-d116a4632dca1.gif)
多元回归与多项式回归
第七章多元回归与多项式回归
当变量不止两个时,变量x1,x2,…,xp间的相关分析。
称为多元相关(偏相关)分析。
当自变量不止一个时,依变量y与自变量x1,x2,…,xp的回归分析,称为多元回归分析。
当只有一个自变量x,而取x的1,2,…,p次方为x1,x2,…,xp时,y与x1,x2,…,xp的回归分析称做多项式回归分析,类似的,当自变量有多个时,y与自变量的p次式及自变量乘积项的回归,亦称多元多项式回归。
多元线性回归分析要解决的问题是:
如何建立一个复回归方程来实现预测及控制。
多元相关分析要解决的问题是:
根据变量之间的相关性去假存真,真实揭示各变量之间在数量上的密切程度。
根据以上2种分析方法,还可以深入研究自变量对依变量所产生作用的重要性,即通径分析。
多项式回归分析要解决的问题是:
当自变量与依变量间的曲线关系难以确定时,建立一个适宜的多项式回归方程来逼近或拟合其曲线关系,以达到最佳的拟合效果。
第六章中介绍的几个SAS过程,都可以用于多元相关和回归资料的分析。
7.1多元线性回归分析
例7—1在工业高氟区测得黄牛毛中的含氟量和饲草、空气、饮水的含氟量资料如(单位:
毫克ppm/kg)程序数据步中。
试作多元线性回归分析。
1编程法分析
(1)程序
optionsnodatenonumber;
dataxu7a;
inputx1x2x3y;
cards;
48.4721.800.8570.00
40.6614.150.2551.20
49.8720.000.8370.00
33.5318.000.4960.00
40.585.310.3251.20
39.365.310.3554.10
35.265.310.2552.71
24.598.710.4054.14
19.125.450.2552.72
15.847.690.2540.32
10.873.270.2340.39
11.593.270.2841.36
10.763.150.2340.00
11.893.210.2542.91
11.803.210.2542.90
;
procregcorr;
title'1.backwardelimination';
modely=x1-x3/selection=backwardsls=.05stb;
run;
title'2.forwardselection';
modely=x1-x3/selection=forwardsle=.05stb;run;
title'3.stepwiseregression';
modely=x1-x3/selection=stepwisesls=.05sle=.05stb;
run;
title'4.maximumR-squareimprovement';
modely=x1-x3/selection=maxr;
run;
title'5.minimumR-sguareimprovment';
modely=x1-x3/selection=minr;
run;
title'6.Rsguaremethod';
modely=x1-x3/selection=rsquare;
run;
title'7.adjustedR-sguaremethod';
modely=x1-x3/selection=adjrsq;
run;
title'8.Cpmethod';
modely=x1-x3/selection=cp;
run;
title'9.multivarateregression';
modely=x1-x3/selection=nonestb;run;
(2)输出主要结果
1.backwardeliminationModel:
MODEL1
Correlation
Variable
x1
x2
x3
y
x1
1.0000
0.7519
0.7029
0.8741
x2
0.7519
1.0000
0.8629
0.8609
x3
0.7029
0.8629
1.0000
0.8894
y
0.8741
0.8609
0.8894
1.0000
Step0:
AllVariablesEntered:
R-Square=0.9158andC(p)=4.0000
AnalysisofVariance
Source
DF
SumofSquares
MeanSquare
FValue
Pr>F
Model
3
1282.10135
427.36712
39.88
<.0001
Error
11
117.87945
10.71631
CorrectedTotal
14
1399.98080
Variable
ParameterEstimate
StandardError
TypeIISS
FValue
Pr>F
Intercept
32.69611
2.05883
2702.68075
252.20
<.0001
x1
0.31430
0.09139
126.75356
11.83
0.0055
x2
0.15544
0.28648
3.15495
0.29
0.5982
x3
23.10223
8.52044
78.78250
7.35
0.0202
Step1:
Variablex2Removed:
R-Square=0.9135andC(p)=2.2944
AnalysisofVariance
Source
DF
SumofSquares
MeanSquare
FValue
Pr>F
Model
2
1278.94639
639.47320
63.40
<.0001
Error
12
121.03441
10.08620
CorrectedTotal
14
1399.98080
Variable
ParameterEstimate
StandardError
TypeIISS
FValue
Pr>F
Intercept
32.27624
1.85094
3066.96363
304.08
<.0001
x1
0.33435
0.08109
171.47843
17.00
0.0014
x3
26.39890
5.79523
209.29427
20.75
0.0007
Allvariablesleftinthemodelaresignificantatthe0.05level.
summaryofBackwardElimination
Step
VariableRemoved
NumberVars-In
PartialR-Square
ModelR-Square
C(p)
FValue
Pr>F
1
x2
2
0.0023
0.9135
2.2944
0.29
0.5982
AnalysisofVariance
Source
DF
SumofSquares
MeanSquare
FValue
Pr>F
Model
2
1278.94639
639.47320
63.40
<.0001
Error
12
121.03441
10.08620
CorrectedTotal
14
1399.98080
RootMSE
3.17588
R-Square
0.9135
DependentMean
50.93000
AdjR-Sq
0.8991
CoeffVar
6.23577
ParameterEstimates
Variable
DF
ParameterEstimate
StandardError
tValue
Pr>|t|
StandardizedEstimate
Intercept
1
32.27624
1.85094
17.44
<.0001
0
x1
1
0.33435
0.08109
4.12
0.0014
0.49203
x3
1
26.39890
5.79523
4.56
0.0007
0.54358
2.forwardselectionModel:
MODEL2
Step1:
Variablex3Entered:
R-Square=0.7911andC(p)=16.2960
AnalysisofVariance
Source
DF
SumofSquares
MeanSquare
FValue
Pr>F
Model
1
1107.46796
1107.46796
49.22
<.0001
Error
13
292.51284
22.50099
CorrectedTotal
14
1399.98080
Variable
ParameterEstimate
StandardError
TypeIISS
FValue
Pr>F
Intercept
35.14961
2.56116
4238.07786
188.35
<.0001
x3
43.19449
6.15692
1107.46796
49.22
<.0001
Step2:
Variablex1Entered:
R-Square=0.9135andC(p)=2.2944
AnalysisofVariance(同MODEL1Step1)
Noothervariablemetthe0.05significancelevelforentryintothemodel.
SummaryofForwardSelection
Step
VariableEntered
NumberVarsIn
PartialR-Square
ModelR-Square
C(p)
FValue
Pr>F
1
x3
1
0.7911
0.7911
16.2960
49.22
<.0001
2
x1
2
0.1225
0.9135
2.2944
17.00
0.0014
AnalysisofVariance(同MODEL1)
ParameterEstimates(同MODEL1)
3.stepwiseregressionModel:
MODEL3
Step1:
(同MODEL2Step1)
Step2:
(同MODEL2Step2)
Noothervariablemetthe0.05significancelevelforentryintothemodel.
SummaryofStepwiseSelection(同MODEL2)
AnalysisofVariance(同MODEL1)
ParameterEstimates(同MODEL1)
4.maximumR-squareimprovementModel:
MODEL4
Step1:
(同MODEL2Step1)
Theabovemodelisthebest1-variablemodelfound.
Step2:
(同MODEL2Step2)
Theabovemodelisthebest2-variablemodelfound.
Step3:
Variablex2Entered:
R-Square=0.9158andC(p)=4.0000
AnalysisofVariance(同MODEL1Step0)
MaximumR-SquareImprovement:
Step3
Theabovemodelisthebest3-variablemodelfound.
NofurtherimprovementinR-Squareispossible.
5.minimumR-sguareimprovmentModel:
MODEL5
Step1:
Variablex2Entered:
R-Square=0.7412andC(p)=22.8090
AnalysisofVariance
Source
DF
SumofSquares
MeanSquare
FValue
Pr>F
Model
1
1037.67275
1037.67275
37.23
<.0001
Error
13
362.30805
27.86985
CorrectedTotal
14
1399.98080
Variable
ParameterEstimate
StandardError
TypeIISS
FValue
Pr>F
Intercept
39.82450
2.27386
8548.83863
306.74
<.0001
x2
1.30305
0.21355
1037.67275
37.23
<.0001
Step2:
Variablex2Removed:
R-Square=0.7640andC(p)=19.8248
Variablex1Entered
AnalysisofVariance
Source
DF
SumofSquares
MeanSquare
FValue
Pr>F
Model
1
1069.65212
1069.65212
42.10
<.0001
Error
13
330.32868
25.40990
CorrectedTotal
14
1399.98080
Variable
ParameterEstimate
StandardError
TypeIISS
FValue
Pr>F
Intercept
34.92461
2.78917
3983.98261
156.79
<.0001
x1
0.59398
0.09155
1069.65212
42.10
<.0001
Step3:
Variablex1Removed:
R-Square=0.7911andC(p)=16.2960
Variablex3Entered
AnalysisofVariance(同MODEL2Step1)
Step4:
Variablex2Entered:
R-Square=0.8253andC(p)=13.8281
AnalysisofVariance
Source
DF
SumofSquares
MeanSquare
FValue
Pr>F
Model
2
1155.34779
577.67389
28.34
<.0001
Error
12
244.63301
20.38608
CorrectedTotal
14
1399.98080
Variable
ParameterEstimate
StandardError
TypeIISS
FValue
Pr>F
Intercept
36.03164
2.50484
4218.32708
206.92
<.0001
x2
0.55384
0.36139
47.87983
2.35
0.1513
x3
27.85995
11.59591
117.67504
5.77
0.0334
Step5:
Variablex3Removed:
R-Square=0.8595andC(p)=9.3516
Variablex1Entered
AnalysisofVariance
Source
DF
SumofSquares
MeanSquare
FValue
Pr>F
Model
2
1203.31884
601.65942
36.71
<.0001
Error
12
196.66196
16.38850
CorrectedTotal
14
1399.98080
Variable
ParameterEstimate
StandardError
TypeIISS
FValue
Pr>F
Intercept
35.33139
2.24449
4060.90824
247.79
<.0001
x1
0.35453
0.11151
165.64610
10.11
0.0079
x2
0.70934
0.24838
133.66672
8.16
0.0145
Step6:
Variablex2Removed:
R-Square=0.9135andC(p)=2.2944
Variablex3Entered
AnalysisofVariance(同MODEL1Step1)
Theabovemodelisthebest2-variablemodelfound.
Step7:
Variablex2Entered:
R-Square=0.9158andC(p)=4.0000
AnalysisofVariance(同MODEL1Step0)
Theabovemodelisthebest3-variablemodelfound.
NofurtherimprovementinR-Squareispossible.
6.RsguaremethodModel:
MODEL6
R-SquareSelectionMethod
NumberinModelR-SquareVariablesinModel
10.7911x3
10.7640x1
10.7412x2
-------------------------------------------
20.9135x1x3
20.8595x1x2
20.8253x2x3
-------------------------------------------
30.9158x1x2x3
7.adjustedR-sguaremethodModel:
MODEL7
AdjustedR-SquareSelectionMethod
NumberinModel
AdjustedR-Square
R-Square
VariablesinModel
2
0.8991
0.9135
x1x3
3
0.8928
0.9158
x1x2x3
2
0.8361
0.8595
x1x2
2
0.7961
0.8253
x2x3
1
0.7750
0.7911
x3
1
0.7459
0.7640
x1
1
0.7213
0.7412
x2
8.CpmethodModel:
MODEL8
C(p)SelectionMethod
NumberinModel
C(p)
R-Square
VariablesinModel
2
2.2944
0.9135
x1x3
3
4.0000
0.9158
x1x2x3
2
9.3516
0.8595
x1x2
2
13.8281
0.8253
x2x3
1
16.2960
0.7911
x3
1
19.8248
0.7640
x1
1
22.8090
0.7412
x2
9.multivarateregressionModel:
MODEL9
AnalysisofVariance
Source
DF
SumofSquares
MeanSquare
FValue
Pr>F
Model
3
1282.10135
427.36712
39.88
<.0001
Error
11
117.87945
10.71631
CorrectedTotal
14
1399.98080
RootMSE3.27358R-Square0.9158
DependentMean50.93000AdjR-Sq0.8928
CoeffVar6.42760
【程序说明】
数据步中第1—4列数据分别为饲草(x1)、空气(x2)、饮水(x3)的含氟量及牛毛中的含氟量(y)。
过程步中调用回归(reg)过程。
模型中选用了9种建立回归方程的方法。
并要求在用逐个剔除法、逐个选入法、逐步回归法及全模型法分析时,输出标准的偏回归系数(std),即通径系数(Py.x)。
【结果分析】
在9种分析方法中,较为常用的有上面MODEL1、3、4、6。
但以上9种方法最终的分析结果大同小异。
本例虽采用9种方法进行分析,实际应用时可根据分析需要择一即可。
因为9种方法分析的输出结果所占篇幅较大,某一种分析的中间结果或雷同之处将被省略。
Model1:
逐个剔除法(Backward)中,根据corr过程,输出x与y四个变量间的简单相关系数阵。
从中可分析四个变量两两之间的相关性。
在回归分析中,首先配合全模型(Step0),然后逐个剔除对依变量y影响最小且不显著的自变量,直至在模型中的变量皆达显著水平。
因为该法剔除某变量后,不再考虑此前被剔除过的变量是否又变为