数据分析.docx
《数据分析.docx》由会员分享,可在线阅读,更多相关《数据分析.docx(20页珍藏版)》请在冰点文库上搜索。
数据分析
解:
(1)拟合
与
的线性回归模型
利用
与
的观测数据,通过SAS系统progreg过程拟合线性回归模型
拟合出
的拟合值
,残差
及学生化残差
.
程序:
建立回归模型,输出因变量拟合值、残差、学生化残差
dataexercise2_9;
inputx1-x3y;
cards;
50512.348
36462.357
40482.266
41441.870
28431.889
49542.936
42502.246
45482.454
52622.926
29502.177
29482.489
43532.467
38552.247
34512.351
53542.257
36492.066
33562.579
29461.988
33492.160
55512.449
29522.377
44582.952
43502.360
;
run;
procregdata=exercise2_9;
modely=x1-x3;
outputout=ap=precditr=residstudent=student;
procprintdata=a;
run;
TheSASSystem18:
26Sunday,October10,20041
TheREGProcedure
Model:
MODEL1
DependentVariable:
y
AnalysisofVariance
SumofMean
SourceDFSquaresSquareFValuePr>F
Model34133.633221377.8777413.01<.0001
Error192011.58417105.87285
CorrectedTotal226145.21739
RootMSE10.28945R-Square0.6727
DependentMean61.34783AdjR-Sq0.6210
CoeffVar16.77232
ParameterEstimates
ParameterStandard
VariableDFEstimateErrortValuePr>|t|
Intercept1162.8759025.775656.32<.0001
x11-1.210320.30145-4.010.0007
x21-0.665910.82100-0.810.4274
x31-8.6130312.24125-0.70
Obsx1x2x3yprecditresidstudent
150512.34848.5888-0.5888-0.06150
236462.35768.8628-11.8628-1.28303
340482.26663.55102.44900.24682
441441.87068.44961.55040.17231
528431.88984.84964.15040.45204
649542.93642.6336-6.6336-0.78114
742502.24659.7986-13.7986-1.38301
845482.45455.7768-1.7768-0.18998
952622.92633.6754-7.6754-0.91730
1029502.17776.39400.60600.06341
1129482.48975.141913.85811.54979
1243532.46754.867912.13211.21437
1338552.24761.3103-14.3103-1.58470
1434512.35167.9539-16.9539-1.71028
1553542.25743.821513.17851.54520
1636492.06669.4490-3.4490-0.35404
1733562.57964.112114.88791.62692
1829461.98880.78037.21970.75782
1933492.16072.2187-12.2187-1.23677
2055512.44941.67597.32410.81159
2129522.37773.33963.66040.38762
2244582.95246.02165.97840.66555
2343502.36057.72702.27300.22775
画出学生化残差的正态QQ图
、
的拟合值的残差图
,并求相关系数
proccapabilitygraphicsnoprintdata=a;
qqplotstudent/normal(mu=0sigma=1);
run;
procsortdata=a;
bystudent;
prociml;
usea;
readallvar{student}intorr;
doi=1to23;
qi=probit((i-0.375)/23.25);
q=q//qi;
end;
rq=rr||q;
createcorrelvar{rq};
appendfromrq;
quit;
procprintdata=correl;
run;
proccorrdata=correl;
run;
procregdata=exercise2_9;
modely=x1-x3;
outputout=ap=fittedyr=residual;
run;
procprintdata=a;
run;
procgplotdata=a;
plotresidual*fittedyresidual*x1residual*x2residual*x3;
symbolv=doti=none;
run;
TheCORRProcedure
2Variables:
RQ
SimpleStatistics
VariableNMeanStdDevSumMinimumMaximum
R230.009541.027520.21944-1.710281.62692
Q2300.966910-1.928741.92874
PearsonCorrelationCoefficients,N=23
Prob>|r|underH0:
Rho=0
RQ
R1.000000.98357
<.0001
Q0.983571.00000
<.0001
点
大致在一条直线上,且由corr过程结果看出,二者的相关系数估计值
=0.98357接近于1,因此认为此线性回归模型中误差项服从正态分布的假设是合理的.
由残差图可知,它们没有明显的趋势性,是较为满意的形式.再结合有关误差项分布正态性检验的有关结果,认为相应的线性回归模型以及误差项独立同正态分布的假定对所给数据是较为合理和可行的.
(2)修正的复相关系数准则、
准则选择模型
dataexercise2_9;
inputx1-x3y;
cards;
50512.348
36462.357
40482.266
41441.870
28431.889
49542.936
42502.246
45482.454
52622.926
29502.177
29482.489
43532.467
38552.247
34512.351
53542.257
36492.066
33562.579
29461.988
33492.160
55512.449
29522.377
44582.952
43502.360
;
run;
procregdata=exercise2_9;
modely=x1-x3/selection=adjrsq;
run;
procregdata=exercise2_9;
modely=x1-x3/selection=cp;
run;
TheSASSystem19:
19Sunday,October10,200414
TheREGProcedure
Model:
MODEL1
DependentVariable:
y
C(p)SelectionMethod
Numberin
ModelC(p)R-SquareVariablesinModel
22.49510.6641x1x2
22.65790.6613x1x3
34.00000.6727x1x2x3
14.29950.5986x1
117.98650.3628x3
218.12000.3949x2x3
119.01310.3451x2
由
准则选择最优模型
.
由
准则选择最优模型
预测平方和准则选择PRESSp最优回归方程
dataexercise2_9;
inputx1-x3y;
cards;
50512.348
36462.357
40482.266
41441.870
28431.889
49542.936
42502.246
45482.454
52622.926
29502.177
29482.489
43532.467
38552.247
34512.351
53542.257
36492.066
33562.579
29461.988
33492.160
55512.449
29522.377
44582.952
43502.360
;
run;
procregdata=exercise2_9;
modely=x1/noprint;
outputout=a1press=press;
run;
procmeansussdata=a1;
varpress;
run;
procregdata=exercise2_9;
modely=x2/noprint;
outputout=a2press=press;
run;
procmeansussdata=a2;
varpress;
run;
procregdata=exercise2_9;
modely=x3/noprint;
outputout=a3press=press;
run;
procmeansussdata=a3;
varpress;
run;
procregdata=exercise2_9;
modely=x1x2/noprint;
outputout=a4press=press;
run;
procmeansussdata=a4;
varpress;
run;
procregdata=exercise2_9;
modely=x1x3/noprint;
outputout=a5press=press;
run;
procmeansussdata=a5;
varpress;
run;
procregdata=exercise2_9;
modely=x2x3/noprint;
outputout=a6press=press;
run;
procmeansussdata=a6;
varpress;
run;
procregdata=exercise2_9;
modely=x1x2x3/noprint;
outputout=a7press=press;
run;
procmeansussdata=a7;
varpress;
run;
TheMEANSProcedure
AnalysisVariable:
pressResidualwithoutCurrentObservation
USS
------------
3024.21
------------
USS
------------
4853.28
------------
USS
------------
4652.84
-----------
USS
------------
2714.10
------------
USS
------------
2693.43
------------
USS
------------
4966.43
------------
USS
------------
3046.29
------------
由上述预测平方和结果看出,
的预测平方和PRESSp=2693.43最小,此模型为最终选择的模型.
(3)逐步回归法
dataexercise2_9;
inputx1-x3y;
cards;
50512.348
36462.357
40482.266
41441.870
28431.889
49542.936
42502.246
45482.454
52622.926
29502.177
29482.489
43532.467
38552.247
34512.351
53542.257
36492.066
33562.579
29461.988
33492.160
55512.449
29522.377
44582.952
43502.360
;
run;
procregdata=exercise2_9;
modely=x1-x3/selection=stepwiseslentry=0.10slstay=0.10details;
run;
TheSASSystem19:
19Sunday,October10,200423
TheREGProcedure
Model:
MODEL1
DependentVariable:
y
StepwiseSelection:
Step1
StatisticsforEntry
DF=1,21
Model
VariableToleranceR-SquareFValuePr>F
x11.0000000.598631.31<.0001
x21.0000000.345111.070.0032
x31.0000000.362811.960.0024
Variablex1Entered:
R-Square=0.5986andC(p)=4.2995
AnalysisofVariance
SumofMean
SourceDFSquaresSquareFValuePr>F
Model13678.435853678.4358531.31<.0001
Error212466.78154117.46579
CorrectedTotal226145.21739
ParameterStandard
VariableEstimateErrorTypeIISSFValuePr>F
Intercept121.8318211.0422114299121.73<.0001
x1-1.527040.272883678.4358531.31<.0001
Boundsonconditionnumber:
1,1
-------------------------------------------------------------------------------------------------------------------------------
StepwiseSelection:
Step2
StatisticsforEntry
DF=1,20
Model
VariableToleranceR-SquareFValuePr>F
x20.7822760.66413.900.0622
x30.7523000.66133.700.0686
Variablex2Entered:
R-Square=0.6641andC(p)=2.4951
AnalysisofVariance
SumofMean
SourceDFSquaresSquareFValuePr>F
Model24081.219492040.6097519.77<.0001
Error202063.99790103.19989
CorrectedTotal226145.21739
TheSASSystem19:
19Sunday,October10,200424
TheREGProcedure
Model:
MODEL1
DependentVariable:
y
StepwiseSelection:
Step2
ParameterStandard
VariableEstimateErrorTypeIISSFValuePr>F
Intercept166.5913324.908444616.2675244.73<.0001
x1-1.260460.289191960.5609219.000.0003
x2-1.089320.55139402.783653.900.0622
Boundsonconditionnumber:
1.2783,5.1133
-------------------------------------------------------------------------------------------------------------------------------
StepwiseSelection:
Step3
StatisticsforRemoval
DF=1,20
PartialModel
VariableR-SquareR-SquareFValuePr>F
x10.31900.345119.000.0003
x20.06550.59863.900.0622
StatisticsforEntry
DF=1,19
Model
VariableToleranceR-SquareFValuePr>F
x30.3481210.67270.500.4902
Allvariablesleftinthemodelaresignificantatthe0.1000level.
Noothervariablemetthe0.1000significancelevelforentryintothemodel.
SummaryofStepwiseSelection
VariableVariableNumberPartialModel
StepEnteredRemovedVarsInR-SquareR-SquareC(p)FValuePr>F
1x110.59860.59864.299531.31<.0001
2x220.06550.66412.49513.900.0622
最优模型为
(4)最优模型的拟合检验
TheSASSystem19:
19Sunday,October10,200427
TheREGProcedure
Model:
MODEL1
DependentVariable:
y
AnalysisofVariance
SumofMean
SourceDFSquaresSquareFValuePr>F
Model24081.219492040.6097519.77<.0001
Error202063.99790103.19989
CorrectedTotal226145.21739
RootMSE10.15873R-Square0.6641
DependentMean61.34783AdjR-Sq0.6305
CoeffVar16.55924
ParameterEstimates
ParameterStandard
VariableDFEstimateErrortValuePr>|t|
Intercept1166.5913324.908446.69<.0001
x11-1.260460.28919-4.360.0003
x21-1.089320.55139-1.980.0622
复相关系数平方和为
与前面的结果0.6727相比较,可见均方残差、回归系数估计及拟合优度的度量值
均变化很小,即当
在模型中时,
对
的影响是很小的.最优回