数据分析83Word格式.docx
《数据分析83Word格式.docx》由会员分享,可在线阅读,更多相关《数据分析83Word格式.docx(11页珍藏版)》请在冰点文库上搜索。
1.00
394=n−k−1=398−3−1
398
395
2
3
Total
1.00/1.00
QuestionExplanationThisquestionreferstothefollowinglearningobjective(s):
Notethatthep-valuesassociatedwitheachpredictorareconditionalonothervariablesbeingincludedinthemodel,sotheycanbeusedtoassessifagivenpredictorissignificant,giventhatallothersareinthemodel.Thesep-valuesarecalculatedbasedonatdistributionwithn−k−1degreesoffreedom.
Question2
Arandomsampleof200womenwhowereatleast21yearsold,ofPimaIndianheritageandlivingnearPhoenix,Arizona,weretestedfordiabetesaccordingtoWorldHealthOrganizationcriteria.Themodelbelowisusedforpredictingtheirplasmaglucoseconcentrationbasedontheirdiastolicbloodpressure(bp,inmmHg),age(age,inyears),andwhetherornottheyarediabetic(type,YesandNo).Whatisthepredictedbloodglucoselevelofa30yearoldwomanwhohasadiastolicbloodpressureof72mmHgandisnotdiabetic?
38.1
140.67
117.46
114.1
Definethemultiplelinearregressionmodelas
y^=β0+β1x1+β2x2+⋅⋅⋅+βkxk
wheretherearekpredictors(explanatoryvariables).
Question3
Bodyfatpercentagecanbecomplicatedtoestimate,whilevariablessuchasAge,Height,Weight,andmeasurementsofvariousbodypartsareeasytomeasure.Basedondata1onbodyfatpercentageandothervariousmeasurements,alinearregressionmodelwasdevelopedtopredictbodyfatpercentage,basedoneasytoobtainmeasurements.Themodeloutputisshownbelow.
Basedonthisoutput,whatisthecorrectinterpretationofthecoefficientforwrist?
1Penrose,K.,Nelson,A.,andFisher,A.(1985),GeneralizedBodyCompositionPredictionEquationforMenUsingSimpleMeasurementTechniques,MedicineandScienceinSportsandExercise,7
(2),189.
Forevery1inchincreaseinwristcircumference,we’dexpectadecreaseinbodyfatpercentageofabout1.14%plusorminus1.96∗0.47%,onaverage.
Forevery1inchdecreaseinwristcircumference,we’dexpectadecreaseinbodyfatpercentageof1.14%,onaverage.
Forevery1inchdecreaseinwristcircumference,we’dexpectadecreaseinbodyfatpercentageofabout(13.57+1.14)%,onaverage.
Forevery1inchincreaseinwristcircumference,wewouldexpectbodyfatpercentagetobelowerby1.14%,onaverage.
-Interprettheestimatefortheintercept(b0)astheexpectedvalueofywhenallpredictorsareequalto0,onaverage.
-Interprettheestimateforaslope(sayb1)as“Allelseheldconstant,foreachunitincreaseinx1,wewouldexpectytobehigher/loweronaveragebyb1.”
Question4
Wemodeledthepricesof93cars(in$1,000s)usingitscityMPG(milespergallon)anditsmanufacturingsite(foreignordomestic).Theregressionoutputisprovidedbelow.Notethatdomesticisthereferencelevelformanufacturingsite.Dataareoutdatedsothepricesmayseemlow.
Whichofthefollowingisfalse?
CityMPGisasignificantpredictorofcarprice,giveninformationonthemanufacturingsiteofthecar.
Manufacturingsiteisasignificantpredictorofcarprice,giveninformationonthecityMPGofthecar.
Inorrect
0.00
Thep-valueforsiteislow,henceit’sasignificantpredictorinthismodel.
The95%confidenceintervalfortheslopeofcityMPGcanbecalculatedas−1.14±
(−8.03∗0.14).
Ifweaddanothervariabletothemodel,sayhighwayMPG,thep-valuesassociatedwithcityMPGandmanufacturingsitemaychange.
0.00/1.00
-ThesignificanceofthemodelasawholeisassessedusinganF-test.
-H0:
β1=β2=⋅⋅⋅=βk
HA:
Atleastoneβi≠0.
-df=n−k−1degreesoffreedom.
-Usuallyreportedatthebottomoftheregressionoutput.
-Notethatthep-valuesassociatedwitheachpredictorareconditionalonothervariablesbeingincludedinthemodel,sotheycanbeusedtoassessifagivenpredictorissignificant,giventhatallothersareinthemodel.
-Thesep-valuesarecalculatedbasedonatdistributionwithn−k−1degreesoffreedom.
-Thesamedegreesoffreedomcanbeusedtoconstructaconfidenceintervalfortheslopeparameterofeachpredictor:
bi±
t⋆n−k−1SEbi
Question5
R2willneverdecreasewhenapredictorisaddedtoalinearmodel.
False
True
NotethatR2willincreasewitheachexplanatoryvariableaddedtothemodel,regardlessofwhetherornottheaddedvariableisameaningfulpredictoroftheresponsevariable.ThereforeweuseadjustedR2,whichappliesapenaltyforthenumberofpredictorsincludedinthemodel,tobetterassessthestrengthofamultiplelinearregressionmodel:
R2adj=1−SSE/(n−k−1)SST/(n−1)
wherenisthenumberofcasesandkisthenumberofpredictors.
-NotethatR2adjwillonlyincreaseiftheaddedvariablehasameaningfulcontributiontotheamountofexplainedvariabilityiny,i.e.ifthegainsfromaddingthevariableexceedsthepenalty.
Question6
Considerthefollowingoutputfromamultiplelinearregressionmodelwith10predictors.
Ifyouweredoingbackwardsselectiononthismodelusingp-valueasthecriterion,whichofthefollowingwouldbeanacceptablenextstep?
Removethevariable“dollar”becauseithasthehighestp-value.
Removeoneofthevariables“resubj”or“attach”becausetheybothhavethelowestp-values.
Inbackwardsselectionusingp-valueasthecriterion,wewanttokeepvariableswithlowp-valuesandremovethevariablewiththehighestp-value.
Removeanyoneofthevariableswithahighp-valuewhichaslongasremovingthevariabledoesnotcausetheadjustedR2todecreaseinthere-fittedmodel.
Removethevariables“password”and“dollar”becausetheirhighp-valuesindicatecollinearitywithothervariables.
Thegeneralideabehindbackward-selectionistostartwiththefullmodelandeliminateonevariableatatimeuntiltheidealmodelisreached.
-p-valuemethod:
(i)Startwiththefullmodel.
(ii)Dropthevariablewiththehighestp-valueandrefitthemodel.
(iii)Repeatuntilallremainingvariablesaresignificant.
-adjustedR2method:
(ii)Refitallpossiblemodelsomittingonevariableatatime,andchoosethemodelwiththehighestadjustedR2.
(iii)RepeatuntilmaximumpossibleadjustedR2isreached.
Question7
AspartoftheSecondInternationalMathematicsStudyon8thgradersfromrandomlysam-pledclassroomsintheUSwhocompletedmathematicsachievementtestsatthebeginningandattheendoftheacademicyear.Studentsalsoansweredquestionersregardingtheirattitudestowardmathematics.Thelinearmodeloutputpredictsthegainscoreinthistest(post-test-pretestscore)usingthefollowingexplanatoryvariables:
•pretest:
scoreontheexamtakenatthebeginningofthesemester
•gender:
maleorfemale
•more_ed:
expectednumberofyearsforcontinuededucation(upto2years,2to5years,5to6years,8ormoreyears)
•useful:
Mathisusefulineverydaylife(stronglydisagree,disagree,undecided,agree,stronglyagree)
•ethnic:
ethnicityofstudent(AfricanAmerican,Anglo,Other)
Thefollowingistheresidualsplotforthismodel.Whichofthefollowingconditionscanthisplotbeusedtocheck?
Independentresiduals
NearlynormalresidualsFeedback:
Constantvariabilityofresiduals
Non-collinearexplanatoryvariables.
Requirespairwisescatterplotforeachcombinationofexplanatoryvariables.
Listtheconditionsformultiplelinearregressionas
(1)linearrelationshipbetweeneach(numerical)explanatoryvariableandtheresponse-checkedusingscatterplotsofyvs.eachx,andresidualsplotsofresidualsvs.eachx
(2)nearlynormalresidualswithmean0-checkedusinganormalprobabilityplotandhistogramofresiduals
(3)constantvariabilityofresiduals-checkedusingresidualsplotsofresidualsvs.y^,andresidualsvs.eachx
(4)independenceofresiduals(andhenceobservations)-checkedusingascatterplotofresidualsvs.orderofdatacollection(willrevealnon-independenceifda