小程序中英文外文文献翻译Word格式.docx

上传人:聆听****声音 文档编号:469875 上传时间:2023-04-29 格式:DOCX 页数:18 大小:29.29KB
下载 相关 举报
小程序中英文外文文献翻译Word格式.docx_第1页
第1页 / 共18页
小程序中英文外文文献翻译Word格式.docx_第2页
第2页 / 共18页
小程序中英文外文文献翻译Word格式.docx_第3页
第3页 / 共18页
小程序中英文外文文献翻译Word格式.docx_第4页
第4页 / 共18页
小程序中英文外文文献翻译Word格式.docx_第5页
第5页 / 共18页
小程序中英文外文文献翻译Word格式.docx_第6页
第6页 / 共18页
小程序中英文外文文献翻译Word格式.docx_第7页
第7页 / 共18页
小程序中英文外文文献翻译Word格式.docx_第8页
第8页 / 共18页
小程序中英文外文文献翻译Word格式.docx_第9页
第9页 / 共18页
小程序中英文外文文献翻译Word格式.docx_第10页
第10页 / 共18页
小程序中英文外文文献翻译Word格式.docx_第11页
第11页 / 共18页
小程序中英文外文文献翻译Word格式.docx_第12页
第12页 / 共18页
小程序中英文外文文献翻译Word格式.docx_第13页
第13页 / 共18页
小程序中英文外文文献翻译Word格式.docx_第14页
第14页 / 共18页
小程序中英文外文文献翻译Word格式.docx_第15页
第15页 / 共18页
小程序中英文外文文献翻译Word格式.docx_第16页
第16页 / 共18页
小程序中英文外文文献翻译Word格式.docx_第17页
第17页 / 共18页
小程序中英文外文文献翻译Word格式.docx_第18页
第18页 / 共18页
亲,该文档总共18页,全部预览完了,如果喜欢就下载吧!
下载资源
资源描述

小程序中英文外文文献翻译Word格式.docx

《小程序中英文外文文献翻译Word格式.docx》由会员分享,可在线阅读,更多相关《小程序中英文外文文献翻译Word格式.docx(18页珍藏版)》请在冰点文库上搜索。

小程序中英文外文文献翻译Word格式.docx

外文文献翻译原文及译文

标题:

ENHANCINGAPPLICATIONPERFORMANCEUSINGMINI-

APPS:

COMPARISONOFHYBRIDPARALLELPROGRAMMINGPARADIGMS

作者:

GaryLawsonMichaelPoteatMashaSosonkinaRobertBaurle

期刊:

ComputerScience

年份:

2016原文

COMPARISONOFHYBRIDPARALLEL

PROGRAMMINGPARADIGMS

GaryLawsonMichaelPoteatMashaSosonkinaRobertBauric

ABSTRACT

Inmanyfields,real-worldapplicationsfbrHighPerformanceComputinghavealreadybeendeveloped.Eortheseapplicationstostayup-to-date,newparallelstrategiesmustbeexploredtoyieldthebestperformance;

however,restructuringormodifyingareal-worldapplicationmaybedauntingdependingonthesizeofthecode.Inthiscase,amini-appmaybeemployedtoquicklyexploresuchoptionswithoutmodifyingtheentirecode.Inthiswork,severalmini-appshavebeencreatedtoenhanceareal-worldapplicationperformance,namelytheVULCANcodefbrcomplexflowanalysisdevelopedattheNASALangleyResearchCenter.Thesemini-appsexplorehybridparallelprogrammingparadigmswithMessagePassingInterface(MPI)fbrdistributedmemoryaccessandeitherSharedMPI(SMPI)orOpenMPfbrsharedmemoryaccesses.PerformancetestingshowsthatMPI+SMPIyieldsthebestexecutionperformance,whilerequiringthelargestnumberofcodechanges.Amaximumspeedupof23wasmeasuredforMPI+SMPI,butonly10wasmeasuredfbrMPI+OpenMP.Keywords:

Mini-apps,Performance,VULCAN,Shared

Memory,MP1.OpenMP

1INTRODUCTION

Inmanyfields,real-worldapplicationshavealreadybeendeveloped.Forestablishedapplicationstostayup-to-date,newparallelstrategiesmustbeexploredtodeterminewhichmayyieldthebestperformance,especiallywithadvancesincomputinghardware.However,restructuringormodifyingareal-worldapplicationincursincreasedcostdependingonthesizeofthecodeandchangestobemade.Amini-appmaybecreatedtoquicklyexploresuchoptionswithoutmodifyingtheentirecode.Mini-appsreducetheoverheadofapplyingnewstrategies,thusvariousstrategiesmaybeimplementedandcompared.Thisworkpresentstheauthorsexperienceswhenfollowingthisstrategyforareal-worldapplicationdevelopedbyNASA.

VULCAN(ViscousUpwindAlgorithmforComplexFlowAnalysis)isaturbulent,noequilibrium,finite-ratechemicalkinetics,Navier-Stokesflowsolverfbrstructured,cell-centered,multiblockgridsthatismaintainedanddistributedbytheHypersonicAirBreathingPropulsionBranchoftheNASALangleyResearchCenter(NASA2016).Themini-appdevelopedinthisworkusestheHouseholderReflectorkernelfbrsolvingsystemsoflinearequations.Thiskernelisusedoftenbydifferentworkloads,andisagoodcandidatetodecidewhatstrategytypetoapplytoVULCAN.VULCANisbuiltonasingle-layerofMP1andthecodehasbeenoptimizedtoobtainperfectvectorization,thereforetwo-levelsofparallelismarecurrentlyused.Thisworkinvestigatestwoflavorsofshared-memoryparallelism,OpenMPandSharedMPI,whichwillprovidethethird-levelofparallelismfbrtheapplication.Athird-levelofparallelismincreasesperformance,whichdecreasesthetime-to-solution.

MP1hasextendedthestandardtoMPIversion3.0,whichincludestheSharedMemory(SHM)model(MikhailB,(Intel)2015,MessagePassingInterfaceForum2012),knowninthisworkasSharedMPI(SMPI).ThisextensionallowsMPItocreatememorywindowsthataresharedbetweenMPItasksonthesamephysicalnode.Inthisway,MPItasksareequivalenttothreads,exceptSharedMPIismoredifficultfbraprogrammertoimplement.OpenMPisthemostcommonshared-memorylibraryusedtodatebecauseofitsease-of-use(OpenMP2016).Inmostcases,onlyafewOpenMPpragmasarerequiredtoparallelizealoop;

however,OpenMPissubjecttoincreasedoverhead,whichmaydecreaseperformanceifnotproperlytuned.

Asearlyastheyear2000,theauthorsin(CappelloandEtiemble2000)foundthatlatencysensitivecodesseemtobenefitfrompureMPIimplementationswhereasbandwidthsensitivecodesbenefitfromhybridMPI+OpenMP.Also,theauthorsfoundthatfasterprocessorswillbenefithybridMPI+OpenMPcodesifdatamovementisnotanoverwhelmingbottleneck(CappelloandEtiemble2000).Sincethistime,hybridMPl+OpenMPimplementationshaveimproved,butnotwithoutdifficulties.In(Drosi-nosandKozins2004,ChorleyandWalker2010),itwasfoundthatOpenMPincursmanyperformancereductions,including:

overhead(fbrk/join,atomics,etc),falsesharing,imbalancedmessagepassing,andasensitivitytoprocessormapping.However,OpenMPoverheadmaybehiddenwhenusingmorethreads.In(Rabenseifher,Hager,andJost2009),theauthorsfoundthatsimplyusingOpenMPcouldincurper-fbrmancepenaltiesbecausethecompileravoidsoptimizingOpenMPloops-verifieduptoversion10.1.Althoughcompilershaveadvancedconsiderablysincethistime,applicationusersthatstillcompileusingolderversionsmaybeatriskifusingOpenMP.In(DrosinosandKoziris2004,ChorleyandWalker2010)theauthorsfoundthatthehybridMPI+OpenMPapproachoutperformsthepureMPIapproachbecausethehybridstrategydiversifiesthepathtoparallelexecution.Morerecently,MPIextendeditsstandardtoincludetheSHMmodel(M汰hailB.(Intel)2015).Theauthorsin(Hoetier,Dinan,Thakur,Barrett,Balaji,Gropp,andUnderwood2015)presentMPIRMAtheoryandexamples,whicharethebasisoftheSHMmodel.In(GerstenbergenBesta,andHoefler2013),theauthorsconductathoroughperformanceevaluationofMPIRMA,includinganinvestigationofdifferentsynchronizationtechniquesfbrmemorywindows.In(Hoefler,Dinan,Buntinas,Balaji,Barrett,Brightwell,Gropp,Kale,andThakur2013),theauthorsinvestigatetheviabilityofMPI+SMPIexecution,aswellascompareittoMPI+OpenMPexecution.ItwasfoundthatanunderlyinglimitationofOpenMPistheshared-by-defaultmodelformemory,whichdoesnotcouplewellwithMP1sincethememorymodelisprivate-by-default.Forthisreason,MPI+SMPIcodesareexpectedtoperformbetter,sincesharedmemoryisexplicitandthememorymodelfbrtheentirecodeisprivate-by-default.Mostrecently,anewMPIcommunicationmodelhasbeenintroducedin(Gropp,Olson,andSamfass2016),whichbettercapturesmultinodecommunicationperformance,andoilersanopen-sourcebenchmarkingtooltocapturethemodelparametersfbragivensystem.Independentofthesharedmemorylayer,MPIisthedefactostandardindatamovementbetweennodesandsuchamodelcanhelpanyMPIprogram.Theremainderofthispaperisorganizedintothefollowingsections:

2introducestheHouseholdermini-apps,3presentstheperformancetestingresultsfbrthemini-appsconsidered,and4concludesthispaper.

2HOUSEHOLDERMIN1-APP

Themini-appsusethehouseholdercomputationkernelfromVULCAN,whichisusedinsolvingsystemsoflinearequations.Thehouseholderroutineisanalgorithmthatisusedtotransformasquarematrixintotriangularform,withoutincreasingthemagnitudeofeachelementsignificantly(Hansen1992).TheHouseholderroutineisnumericallystable,inthatitdoesnotloseasignificantamountofaccuracyduetoverysmallorverylargeintermediatevaluesusedinthecomputation.

Mini-appsaredesignedtoperformspecificfunctions.Inthiswork,theimportantfeaturesareasfollows:

Acceptgenericinput.Validatethenumericalresultoftheoptimizedroutine.Measureperformanceoftheoriginalandoptimizedroutines.Tuneoptimizations.

Thegenericinputisreadinfromafile,wherethefilemustcontainatleastonematrixAandresultingvectorb.Shouldonlyonematrixandvectorbesupplied,theinputwillbeduplicatedfbrallinstancesofm.Validationoftheoptimizedroutineisperformedbytakingthedifferenceoftheoutputfromtheoriginalandoptimizedroutines.Themini-appwillfirstcomputethesolutionoftheinputusingtheoriginalroutine,andthentheoptimizedroutine.Thiswaytheoutputmaybecompareddirectly,andrelativeperformancemayalsobemeasuredusingexecutiontime.Shouldtheoptimizedroutinefeatureoneormoreparametersthatmaybevaried,theyaretobeinvestigatedsuchthattheoptimizationmaybetunedtothehardware.Inthiswork,thereisalwaysatleastonetunableparameter.Onefeaturethatshouldhavebeenfactoredintothemini-appdesignwasmodularizingthedifferentversionsoftheHouseholderroutine.Inthiswork,twomini-appsweredesignedbecauseeachimplementsadifferentversionoftheparallelHouseholderroutine;

however,itwouldhavebeenbettertodesignasinglemini-appthatusesmodulestoincludeotherversionsoftheparallelHouseholderkernel.Withthisfunctionality,itwouldbelesscumbersometoworkoneachversionofthekernel.ToparallelizetheHouseholderroutine,misdecomposedintoseparate,butequalchunksthatarethensolvedbyeachthread-sharedMPItasksareequivalenttothreadsinthisworkfbrbrevity.However,theoriginalroutinevariesoverminsidetheinner-mostcomputationalloop(anoptimizationthatbenefitsvectorizationandcaching),buttheparallelloopmustbetheouter-mostloopfbrbestperformance.Therefore,loopblockinghasbeeninvokedtortheparallelsectionsofthecode.Loopblockingisatechniquecommonlyusedtoreducethememoryfootprintofacomputationsuchthatitfitsinsidethecachefbragivenhardware.Therefore,theparallelHouseholderroutinehasatleastonetunableparameter,blocksize.

Inthiswork,twoflavorsofthesharedmemorymodelareinvestigated:

OpenMPandSMP1.ThedifferencebetweenOpenMPandSMP1liesinhowmemoryismanaged.OpenMPusesapublic-memorymodelwherealldataisavailabletoallthr

展开阅读全文
相关资源
猜你喜欢
相关搜索
资源标签

当前位置:首页 > 自然科学 > 物理

copyright@ 2008-2023 冰点文库 网站版权所有

经营许可证编号:鄂ICP备19020893号-2