Evaluating GoodnessofFit in Comparison of Models to Data.docx

资源描述

Evaluating GoodnessofFit in Comparison of Models to Data.docx

《Evaluating GoodnessofFit in Comparison of Models to Data.docx》由会员分享，可在线阅读，更多相关《Evaluating GoodnessofFit in Comparison of Models to Data.docx（39页珍藏版）》请在冰点文库上搜索。

Evaluating GoodnessofFit in Comparison of Models to Data.docx

EvaluatingGoodnessofFitinComparisonofModelstoData

Runninghead:

Evaluatinggoodness-of-fit

EvaluatingGoodness-of-FitinComparisonofModelstoData

ChristianD.Schunn

UniversityofPittsburgh

DieterWallach

UniversityofAppliedSciencesKaiserslautern

Contactinformation:

LearningResearchandDevelopmentCenter

Room715

UniversityofPittsburgh

3939O’HaraSt.

Pittsburgh,PA15260

USA

Email:

schunn@pitt.edu

Office:

+14126248807

Fax:

+14126247439

Abstract

Computationalandmathematicalmodels,inadditiontoprovidingamethodfordemonstratingqualitativepredictionsresultingfrominteractingmechanisms,providequantitativepredictionsthatcanbeusedtodiscriminatebetweenalternativemodelsanduncoverwhichaspectsofagiventheoreticalframeworkrequirefurtherelaboration.Unfortunately,therearenoformalstandardsforhowtoevaluatethequantitativegoodness-of-fitofmodelstodata,eithervisuallyornumerically.Asaresult,thereisconsiderablevariabilityinmethodsused,withfrequentselectionofchoicesthatmisinformthereader.Whiletherearesomesubtleandperhapscontroversialissuesinvolvedintheevaluationofgoodness-of-fit,therearemanysimpleconventionsthatarequiteuncontroversialandshouldbeadoptednow.Inthispaper,wereviewvariouskindsofvisualdisplaytechniquesandnumericalmeasuresofgoodness-of-fit,settingnewstandardsfortheselectionanduseofsuchdisplaysandmeasures.

EvaluatingGoodness-of-FitinComparisonofModelstoData

Astheorizinginsciencebecomesmorecomplex,withtheadditionofmultiple,interactingmechanismspotentiallybeingappliedtocomplex,possiblyreactiveinput,itisincreasinglynecessarytohavemathematicalorcomputationalinstantiationsofthetheoriestobeabletodeterminewhethertheintuitivepredictionsderivedfromverbaltheoriesactuallyhold.Inotherwords,theinstantiatedmodelscanserveasasufficiencydemonstration.

Executablemodelsserveanotherimportantfunction,however,andthatisoneofprovidingprecisequantitativepredictions.Verbaltheoriesprovidequalitativepredictionsabouttheeffectsofcertainvariables;executablemodels（inadditiontoformallyspecifyingunderlyingconstructs）canbeusedtopredictthesizeoftheeffectsofvariables,therelativesizeoftheeffectsofdifferentvariables,therelativeeffectsofthesamevariableacrossdifferentdependentmeasures,andperhapsthepreciseabsolutevalueofoutcomesonparticulardimensions.Thesequantitativepredictionsprovidetheresearcherwithanothermethodfordeterminingwhichmodelamongalternativemodelsprovidesthebestaccountoftheavailabledata.Theyalsoprovidetheresearcherwithamethodfordeterminingwhichaspectsofthedataarenotaccountedforwithagivenmodel.

Therearemanysubtleandcontroversialissuesinvolvedinhowtousegoodness-of-fittoevaluatemodels,whichhaveleadsomeresearcherstoquestionwhethergoodness-of-fitmeasuresshouldbeusedatall（Roberts&Pashler,2000）.However,quantitativepredictionsremainanimportantaspectofexecutablemodels,andgoodness-of-fitmeasuresinoneformoranotherremaintheviaregiatoevaluatingthesequantitativepredictions.Moreover,thecommoncomplaintsagainstgoodness-of-fitmeasuresfocusonsomepoor（althoughcommon）practicesintheuseofgoodness-of-fit,andthusdonotinvalidatetheprincipleofusinggoodness-of-fitmeasuresingeneral.

Onecentralproblemwiththecurrentuseofgoodness-of-fitmeasuresisthattherearenoformalstandardsfortheirselectionanduse.Insomeresearchareaswithinpsychology,thereareanumberofconventionsfortheselectionofparticularmethods.However,theseconventionsaretypicallymoresociologicalandhistoricalthanlogicalinorigin.Moreover,manyoftheseconventionshavefundamentalshortcomings（Roberts&Pashler,2000）,resultingingoodness-of-fitargumentsthatoftenrangefromuninformativetosomewhatmisleadingtojustplainwrong.Thegoalofthispaperistoreviewalternativemethodsforevaluatinggoodness-of-fitandtorecommendnewstandardsfortheirselectionanduse.Whiletherearesomesubtleandperhapscontroversialissuesinvolvedintheevaluationofgoodness-of-fit,therearemanysimpleconventionsthatshouldbequiteuncontroversialandshouldthusbeadoptednowinresearch.

Thegoodness-of-fitofamodeltodataisevaluatedintwodifferentways:

1）throughtheuseofvisualpresentationsmethodswhichallowforvisualcomparisonofsimilaritiesanddifferencesbetweenmodelpredictionsandobserveddata;and2）throughtheuseofnumericalmeasureswhichprovidesummarymeasuresoftheoverallaccuracyofthepredictions.Correspondingly,thispaperaddressesvisualpresentationandnumericalmeasuresofgoodness-of-fit.

Thepaperisdividedintothreesections.Thefirstsectioncontainsabriefdiscussionofthecommonproblemsingoodness-of-fitissues.TheseproblemsaretakenfromarecentsummarybyRobertsandPashler（2000）.Webrieflymentiontheseproblemsastheymotivatesomeoftheissuesinselectingvisualandnumericalmeasuresofgoodness-of-fit.Moreover,wealsobrieflymentionsimplemethodsforaddressingtheseproblems.Thesecondsectionreviewsandevaluatestheadvantagesanddisadvantagesofdifferentkindsofvisualdisplays.Thethirdsectionfinallyreviewsandevaluatestheadvantagesanddisadvantagesofdifferentkindsofnumericalmeasuresofgoodness-of-fit.

CommonProblemsinGoodness-of-FitMeasures

FreeParameters

Theprimaryproblemwithusinggoodness-of-fitmeasuresisthatusuallytheydonottakeintoaccountthenumberoffreeparametersinamodel—withenoughfreeparameters,anymodelcanpreciselymatchanydataset.Thefirstsolutionisthatonemustalwaysbeveryopenaboutthenumberoffreeparameters.Thereare,however,somecomplexissuessurroundingwhatcountsasafreeparameter:

justquantitativeparameters,symbolicelementslikethenumberofproductionrulesunderlyingamodel’sbehavior（Simon,1992）,onlyparametersthataresystematicallyvariedinafit,oronlyparametersthatwerenotkeptconstantoverabroadrangeofdatasets.Inmostcasesscientistsrefertoamodelparameteras“free”whenitsestimationisbasedonthedatasetthatisbeingmodeled.Nevertheless,itisuncontroversialtosaythatthefreeparametersinamodel（howeverdefined）shouldbeopenlydiscussedandthattheyplayaclearroleinevaluatingthefitofamodel,ortherelativefitbetweentwomodels（forexamplesseeAnderson,Bothell,Lebiere,&Matessa,1998;Taatgen&Wallach,inpress）.

RobertsandPashler（2000）providesomeadditionalsuggestionsfordealingwiththefreeparameterissue.Inparticular,onecanconductsensitivityanalysestoshowhowmuchthefitdependsontheparticularparametervalues.Conductingsuchasensitivityanalysisalsoallowsforapreciseanalysisoftheimplicationsofamodel’sunderlyingtheoreticalprinciplesandtheirdependenceuponspecificparametersettings.

Thereareseveralmethodsformodifyinggoodness-of-fitmeasuresbycomputingapenaltyagainstmorecomplexmodels（Grünwald,2001;Myung,2000;Wasserman,2000）.Thesemethodsalsohelpmitigatethefreeparameterproblem.Manyofthesesolutionsarerelativelycomplex,arenotuniversallyapplicable,andarebeyondthescopeofthispaper.Theywillbediscussedfurtherinthegeneraldiscussion.

NoiseinData

Thedifferencesinvariousmodelfitscanbemeaninglessifthepredictionsofbothmodelsliewithinthenoiselimitsofthedata.Forexample,ifdatapointsbeingfithave95%ConfidenceIntervalsof300msandtwomodelsarebothalwayswithin50msofthedatapoints,thendifferentialgoodness-of-fitstothedatabetweenthemodelsarenotverymeaningful.However,itiseasytodeterminewhetherthisisthecaseinanygivenmodelfit.Oneshouldexamine（andreport）thevarianceinthedatatomakesurethefidelityofthefittothedataisnotexceedingthefidelityofthedataitself（Roberts&Pashler,2000）.Thisassessmentiseasilydonebycomparingmeasuresofmodelgoodness-of-fittomeasuresofdatavariability,andwillbediscussedinalatersection.

Overfitting

Becausedataareoftennoisy,amodelthatfitsagivendatasettoowellmaygeneralizetootherdatasetslesswellthanamodelthatfitsthisparticulardatasetlessperfectly（Myung,2000）.Inotherwords,thefreeparametersofthemodelaresometimesadjustedtoaccountnotonlyforthegeneralizableeffectsinthedatabutalsothenoiseornongeneralizableeffectsinthedata.Generally,modeloverfittingisdetectedwhenthemodelisappliedtootherdatasetsoristestedonrelatedphenomena（e.g.,Richman,Staszewski,&Simon,1995;Busemeyer&Wang,2000）.Wemakerecommendationsforgoodness-of-fitmeasuresthatreduceoverfittingproblems.Mostimportantly,oneshouldexaminethevarianceinthedata,aswillbediscussedinalatersection.

UninterestingInflationsofGoodness-of-FitValues

Ageneralrule-of-thumbinevaluatingthefitofamodeltodataisthatthereshouldbesignificantlymoredatathanfreeparameters（e.g.,10:

1or5:

1dependingonthedomain）.Astheratioofdatapointstofreeparametersapproaches1,itisobviousthatoverfittingislikelytooccur.Yet,thenumberofdatapointsbeingfitisnotalwaysthebestfactortoconsider—somedataareeasytofitquantitativelybecauseofsimplifyingfeaturesinthedata.Forexample,ifallthedatapointslieexactlyonastraightline,itiseasytoobtainaperfectfitforahundredthousanddatapointswithasimplelinearfunctionwithtwodegreesoffreedom.Onecaneasilyimagineotherfactorsinflatingthegoodness-of-fit.Forexample,i

展开阅读全文