Evaluating GoodnessofFit in Comparison of Models to Data.docx
《Evaluating GoodnessofFit in Comparison of Models to Data.docx》由会员分享,可在线阅读,更多相关《Evaluating GoodnessofFit in Comparison of Models to Data.docx(39页珍藏版)》请在冰点文库上搜索。
EvaluatingGoodnessofFitinComparisonofModelstoData
Runninghead:
Evaluatinggoodness-of-fit
EvaluatingGoodness-of-FitinComparisonofModelstoData
ChristianD.Schunn
UniversityofPittsburgh
DieterWallach
UniversityofAppliedSciencesKaiserslautern
Contactinformation:
LearningResearchandDevelopmentCenter
Room715
UniversityofPittsburgh
3939O’HaraSt.
Pittsburgh,PA15260
USA
Email:
schunn@pitt.edu
Office:
+14126248807
Fax:
+14126247439
Abstract
Computationalandmathematicalmodels,inadditiontoprovidingamethodfordemonstratingqualitativepredictionsresultingfrominteractingmechanisms,providequantitativepredictionsthatcanbeusedtodiscriminatebetweenalternativemodelsanduncoverwhichaspectsofagiventheoreticalframeworkrequirefurtherelaboration.Unfortunately,therearenoformalstandardsforhowtoevaluatethequantitativegoodness-of-fitofmodelstodata,eithervisuallyornumerically.Asaresult,thereisconsiderablevariabilityinmethodsused,withfrequentselectionofchoicesthatmisinformthereader.Whiletherearesomesubtleandperhapscontroversialissuesinvolvedintheevaluationofgoodness-of-fit,therearemanysimpleconventionsthatarequiteuncontroversialandshouldbeadoptednow.Inthispaper,wereviewvariouskindsofvisualdisplaytechniquesandnumericalmeasuresofgoodness-of-fit,settingnewstandardsfortheselectionanduseofsuchdisplaysandmeasures.
EvaluatingGoodness-of-FitinComparisonofModelstoData
Astheorizinginsciencebecomesmorecomplex,withtheadditionofmultiple,interactingmechanismspotentiallybeingappliedtocomplex,possiblyreactiveinput,itisincreasinglynecessarytohavemathematicalorcomputationalinstantiationsofthetheoriestobeabletodeterminewhethertheintuitivepredictionsderivedfromverbaltheoriesactuallyhold.Inotherwords,theinstantiatedmodelscanserveasasufficiencydemonstration.
Executablemodelsserveanotherimportantfunction,however,andthatisoneofprovidingprecisequantitativepredictions.Verbaltheoriesprovidequalitativepredictionsabouttheeffectsofcertainvariables;executablemodels(inadditiontoformallyspecifyingunderlyingconstructs)canbeusedtopredictthesizeoftheeffectsofvariables,therelativesizeoftheeffectsofdifferentvariables,therelativeeffectsofthesamevariableacrossdifferentdependentmeasures,andperhapsthepreciseabsolutevalueofoutcomesonparticulardimensions.Thesequantitativepredictionsprovidetheresearcherwithanothermethodfordeterminingwhichmodelamongalternativemodelsprovidesthebestaccountoftheavailabledata.Theyalsoprovidetheresearcherwithamethodfordeterminingwhichaspectsofthedataarenotaccountedforwithagivenmodel.
Therearemanysubtleandcontroversialissuesinvolvedinhowtousegoodness-of-fittoevaluatemodels,whichhaveleadsomeresearcherstoquestionwhethergoodness-of-fitmeasuresshouldbeusedatall(Roberts&Pashler,2000).However,quantitativepredictionsremainanimportantaspectofexecutablemodels,andgoodness-of-fitmeasuresinoneformoranotherremaintheviaregiatoevaluatingthesequantitativepredictions.Moreover,thecommoncomplaintsagainstgoodness-of-fitmeasuresfocusonsomepoor(althoughcommon)practicesintheuseofgoodness-of-fit,andthusdonotinvalidatetheprincipleofusinggoodness-of-fitmeasuresingeneral.
Onecentralproblemwiththecurrentuseofgoodness-of-fitmeasuresisthattherearenoformalstandardsfortheirselectionanduse.Insomeresearchareaswithinpsychology,thereareanumberofconventionsfortheselectionofparticularmethods.However,theseconventionsaretypicallymoresociologicalandhistoricalthanlogicalinorigin.Moreover,manyoftheseconventionshavefundamentalshortcomings(Roberts&Pashler,2000),resultingingoodness-of-fitargumentsthatoftenrangefromuninformativetosomewhatmisleadingtojustplainwrong.Thegoalofthispaperistoreviewalternativemethodsforevaluatinggoodness-of-fitandtorecommendnewstandardsfortheirselectionanduse.Whiletherearesomesubtleandperhapscontroversialissuesinvolvedintheevaluationofgoodness-of-fit,therearemanysimpleconventionsthatshouldbequiteuncontroversialandshouldthusbeadoptednowinresearch.
Thegoodness-of-fitofamodeltodataisevaluatedintwodifferentways:
1)throughtheuseofvisualpresentationsmethodswhichallowforvisualcomparisonofsimilaritiesanddifferencesbetweenmodelpredictionsandobserveddata;and2)throughtheuseofnumericalmeasureswhichprovidesummarymeasuresoftheoverallaccuracyofthepredictions.Correspondingly,thispaperaddressesvisualpresentationandnumericalmeasuresofgoodness-of-fit.
Thepaperisdividedintothreesections.Thefirstsectioncontainsabriefdiscussionofthecommonproblemsingoodness-of-fitissues.TheseproblemsaretakenfromarecentsummarybyRobertsandPashler(2000).Webrieflymentiontheseproblemsastheymotivatesomeoftheissuesinselectingvisualandnumericalmeasuresofgoodness-of-fit.Moreover,wealsobrieflymentionsimplemethodsforaddressingtheseproblems.Thesecondsectionreviewsandevaluatestheadvantagesanddisadvantagesofdifferentkindsofvisualdisplays.Thethirdsectionfinallyreviewsandevaluatestheadvantagesanddisadvantagesofdifferentkindsofnumericalmeasuresofgoodness-of-fit.
CommonProblemsinGoodness-of-FitMeasures
FreeParameters
Theprimaryproblemwithusinggoodness-of-fitmeasuresisthatusuallytheydonottakeintoaccountthenumberoffreeparametersinamodel—withenoughfreeparameters,anymodelcanpreciselymatchanydataset.Thefirstsolutionisthatonemustalwaysbeveryopenaboutthenumberoffreeparameters.Thereare,however,somecomplexissuessurroundingwhatcountsasafreeparameter:
justquantitativeparameters,symbolicelementslikethenumberofproductionrulesunderlyingamodel’sbehavior(Simon,1992),onlyparametersthataresystematicallyvariedinafit,oronlyparametersthatwerenotkeptconstantoverabroadrangeofdatasets.Inmostcasesscientistsrefertoamodelparameteras“free”whenitsestimationisbasedonthedatasetthatisbeingmodeled.Nevertheless,itisuncontroversialtosaythatthefreeparametersinamodel(howeverdefined)shouldbeopenlydiscussedandthattheyplayaclearroleinevaluatingthefitofamodel,ortherelativefitbetweentwomodels(forexamplesseeAnderson,Bothell,Lebiere,&Matessa,1998;Taatgen&Wallach,inpress).
RobertsandPashler(2000)providesomeadditionalsuggestionsfordealingwiththefreeparameterissue.Inparticular,onecanconductsensitivityanalysestoshowhowmuchthefitdependsontheparticularparametervalues.Conductingsuchasensitivityanalysisalsoallowsforapreciseanalysisoftheimplicationsofamodel’sunderlyingtheoreticalprinciplesandtheirdependenceuponspecificparametersettings.
Thereareseveralmethodsformodifyinggoodness-of-fitmeasuresbycomputingapenaltyagainstmorecomplexmodels(Grünwald,2001;Myung,2000;Wasserman,2000).Thesemethodsalsohelpmitigatethefreeparameterproblem.Manyofthesesolutionsarerelativelycomplex,arenotuniversallyapplicable,andarebeyondthescopeofthispaper.Theywillbediscussedfurtherinthegeneraldiscussion.
NoiseinData
Thedifferencesinvariousmodelfitscanbemeaninglessifthepredictionsofbothmodelsliewithinthenoiselimitsofthedata.Forexample,ifdatapointsbeingfithave95%ConfidenceIntervalsof300msandtwomodelsarebothalwayswithin50msofthedatapoints,thendifferentialgoodness-of-fitstothedatabetweenthemodelsarenotverymeaningful.However,itiseasytodeterminewhetherthisisthecaseinanygivenmodelfit.Oneshouldexamine(andreport)thevarianceinthedatatomakesurethefidelityofthefittothedataisnotexceedingthefidelityofthedataitself(Roberts&Pashler,2000).Thisassessmentiseasilydonebycomparingmeasuresofmodelgoodness-of-fittomeasuresofdatavariability,andwillbediscussedinalatersection.
Overfitting
Becausedataareoftennoisy,amodelthatfitsagivendatasettoowellmaygeneralizetootherdatasetslesswellthanamodelthatfitsthisparticulardatasetlessperfectly(Myung,2000).Inotherwords,thefreeparametersofthemodelaresometimesadjustedtoaccountnotonlyforthegeneralizableeffectsinthedatabutalsothenoiseornongeneralizableeffectsinthedata.Generally,modeloverfittingisdetectedwhenthemodelisappliedtootherdatasetsoristestedonrelatedphenomena(e.g.,Richman,Staszewski,&Simon,1995;Busemeyer&Wang,2000).Wemakerecommendationsforgoodness-of-fitmeasuresthatreduceoverfittingproblems.Mostimportantly,oneshouldexaminethevarianceinthedata,aswillbediscussedinalatersection.
UninterestingInflationsofGoodness-of-FitValues
Ageneralrule-of-thumbinevaluatingthefitofamodeltodataisthatthereshouldbesignificantlymoredatathanfreeparameters(e.g.,10:
1or5:
1dependingonthedomain).Astheratioofdatapointstofreeparametersapproaches1,itisobviousthatoverfittingislikelytooccur.Yet,thenumberofdatapointsbeingfitisnotalwaysthebestfactortoconsider—somedataareeasytofitquantitativelybecauseofsimplifyingfeaturesinthedata.Forexample,ifallthedatapointslieexactlyonastraightline,itiseasytoobtainaperfectfitforahundredthousanddatapointswithasimplelinearfunctionwithtwodegreesoffreedom.Onecaneasilyimagineotherfactorsinflatingthegoodness-of-fit.Forexample,i