全文搜索引擎的设计与实现外文翻译.doc

上传人:wj 文档编号:1294778 上传时间:2023-04-30 格式:DOC 页数:25 大小:128.50KB
下载 相关 举报
全文搜索引擎的设计与实现外文翻译.doc_第1页
第1页 / 共25页
全文搜索引擎的设计与实现外文翻译.doc_第2页
第2页 / 共25页
全文搜索引擎的设计与实现外文翻译.doc_第3页
第3页 / 共25页
全文搜索引擎的设计与实现外文翻译.doc_第4页
第4页 / 共25页
全文搜索引擎的设计与实现外文翻译.doc_第5页
第5页 / 共25页
全文搜索引擎的设计与实现外文翻译.doc_第6页
第6页 / 共25页
全文搜索引擎的设计与实现外文翻译.doc_第7页
第7页 / 共25页
全文搜索引擎的设计与实现外文翻译.doc_第8页
第8页 / 共25页
全文搜索引擎的设计与实现外文翻译.doc_第9页
第9页 / 共25页
全文搜索引擎的设计与实现外文翻译.doc_第10页
第10页 / 共25页
全文搜索引擎的设计与实现外文翻译.doc_第11页
第11页 / 共25页
全文搜索引擎的设计与实现外文翻译.doc_第12页
第12页 / 共25页
全文搜索引擎的设计与实现外文翻译.doc_第13页
第13页 / 共25页
全文搜索引擎的设计与实现外文翻译.doc_第14页
第14页 / 共25页
全文搜索引擎的设计与实现外文翻译.doc_第15页
第15页 / 共25页
全文搜索引擎的设计与实现外文翻译.doc_第16页
第16页 / 共25页
全文搜索引擎的设计与实现外文翻译.doc_第17页
第17页 / 共25页
全文搜索引擎的设计与实现外文翻译.doc_第18页
第18页 / 共25页
全文搜索引擎的设计与实现外文翻译.doc_第19页
第19页 / 共25页
全文搜索引擎的设计与实现外文翻译.doc_第20页
第20页 / 共25页
亲,该文档总共25页,到这儿已超出免费预览范围,如果喜欢就下载吧!
下载资源
资源描述

全文搜索引擎的设计与实现外文翻译.doc

《全文搜索引擎的设计与实现外文翻译.doc》由会员分享,可在线阅读,更多相关《全文搜索引擎的设计与实现外文翻译.doc(25页珍藏版)》请在冰点文库上搜索。

全文搜索引擎的设计与实现外文翻译.doc

江汉大学毕业论文(设计)

外文翻译

原文来源TheHadoopDistributedFileSystem:

ArchitectureandDesign

中文译文Hadoop分布式文件系统:

架构和设计

姓名XXXX

学号200708202137

2013年4月8日

英文原文

TheHadoopDistributedFileSystem:

ArchitectureandDesign

Source:

http:

//hadoop.apache.org/docs/r0.18.3/hdfs_design.html

Introduction

TheHadoopDistributedFileSystem(HDFS)isadistributedfilesystemdesignedtorunoncommodityhardware.Ithasmanysimilaritieswithexistingdistributedfilesystems.However,thedifferencesfromotherdistributedfilesystemsaresignificant.HDFSishighlyfault-tolerantandisdesignedtobedeployedonlow-costhardware.HDFSprovideshighthroughputaccesstoapplicationdataandissuitableforapplicationsthathavelargedatasets.HDFSrelaxesafewPOSIXrequirementstoenablestreamingaccesstofilesystemdata.HDFSwasoriginallybuiltasinfrastructurefortheApacheNutchwebsearchengineproject.HDFSispartoftheApacheHadoopCoreproject.TheprojectURLishttp:

//hadoop.apache.org/core/.

AssumptionsandGoals

HardwareFailure

Hardwarefailureisthenormratherthantheexception.AnHDFSinstancemayconsistofhundredsorthousandsofservermachines,eachstoringpartofthefilesystem’sdata.Thefactthatthereareahugenumberofcomponentsandthateachcomponenthasanon-trivialprobabilityoffailuremeansthatsomecomponentofHDFSisalwaysnon-functional.Therefore,detectionoffaultsandquick,automaticrecoveryfromthemisacorearchitecturalgoalofHDFS.

StreamingDataAccess

ApplicationsthatrunonHDFSneedstreamingaccesstotheirdatasets.Theyarenotgeneralpurposeapplicationsthattypicallyrunongeneralpurposefilesystems.HDFSisdesignedmoreforbatchprocessingratherthaninteractiveusebyusers.Theemphasisisonhighthroughputofdataaccessratherthanlowlatencyofdataaccess.POSIXimposesmanyhardrequirementsthatarenotneededforapplicationsthataretargetedforHDFS.POSIXsemanticsinafewkeyareashasbeentradedtoincreasedatathroughputrates.

LargeDataSets

ApplicationsthatrunonHDFShavelargedatasets.AtypicalfileinHDFSisgigabytestoterabytesinsize.Thus,HDFSistunedtosupportlargefiles.Itshouldprovidehighaggregatedatabandwidthandscaletohundredsofnodesinasinglecluster.Itshouldsupporttensofmillionsoffilesinasingleinstance.

SimpleCoherencyModel

HDFSapplicationsneedawrite-once-read-manyaccessmodelforfiles.Afileoncecreated,written,andclosedneednotbechanged.Thisassumptionsimplifiesdatacoherencyissuesandenableshighthroughputdataaccess.AMap/Reduceapplicationorawebcrawlerapplicationfitsperfectlywiththismodel.Thereisaplantosupportappending-writestofilesinthefuture.

“MovingComputationisCheaperthanMovingData”

Acomputationrequestedbyanapplicationismuchmoreefficientifitisexecutednearthedataitoperateson.Thisisespeciallytruewhenthesizeofthedatasetishuge.Thisminimizesnetworkcongestionandincreasestheoverallthroughputofthesystem.Theassumptionisthatitisoftenbettertomigratethecomputationclosertowherethedataislocatedratherthanmovingthedatatowheretheapplicationisrunning.HDFSprovidesinterfacesforapplicationstomovethemselvesclosertowherethedataislocated.

PortabilityAcrossHeterogeneousHardwareandSoftwarePlatforms

HDFShasbeendesignedtobeeasilyportablefromoneplatformtoanother.ThisfacilitateswidespreadadoptionofHDFSasaplatformofchoiceforalargesetofapplications.

NameNodeandDataNodes

HDFShasamaster/slavearchitecture.AnHDFSclusterconsistsofasingleNameNode,amasterserverthatmanagesthefilesystemnamespaceandregulatesaccesstofilesbyclients.Inaddition,thereareanumberofDataNodes,usuallyonepernodeinthecluster,whichmanagestorageattachedtothenodesthattheyrunon.HDFSexposesafilesystemnamespaceandallowsuserdatatobestoredinfiles.Internally,afileissplitintooneormoreblocksandtheseblocksarestoredinasetofDataNodes.TheNameNodeexecutesfilesystemnamespaceoperationslikeopening,closing,andrenamingfilesanddirectories.ItalsodeterminesthemappingofblockstoDataNodes.TheDataNodesareresponsibleforservingreadandwriterequestsfromthefilesystem’sclients.TheDataNodesalsoperformblockcreation,deletion,andreplicationuponinstructionfromtheNameNode.

TheNameNodeandDataNodearepiecesofsoftwaredesignedtorunoncommoditymachines.ThesemachinestypicallyrunaGNU/Linuxoperatingsystem(OS).HDFSisbuiltusingtheJavalanguage;anymachinethatsupportsJavacanruntheNameNodeortheDataNodesoftware.UsageofthehighlyportableJavalanguagemeansthatHDFScanbedeployedonawiderangeofmachines.AtypicaldeploymenthasadedicatedmachinethatrunsonlytheNameNodesoftware.EachoftheothermachinesintheclusterrunsoneinstanceoftheDataNodesoftware.ThearchitecturedoesnotprecluderunningmultipleDataNodesonthesamemachinebutinarealdeploymentthatisrarelythecase.

TheexistenceofasingleNameNodeinaclustergreatlysimplifiesthearchitectureofthesystem.TheNameNodeisthearbitratorandrepositoryforallHDFSmetadata.ThesystemisdesignedinsuchawaythatuserdataneverflowsthroughtheNameNode.

TheFileSystemNamespace

HDFSsupportsatraditionalhierarchicalfileorganization.Auseroranapplicationcancreatedirectoriesandstorefilesinsidethesedirectories.Thefilesystemnamespacehierarchyissimilartomostotherexistingfilesystems;onecancreateandremovefiles,moveafilefromonedirectorytoanother,orrenameafile.HDFSdoesnotyetimplementuserquotasoraccesspermissions.HDFSdoesnotsupporthardlinksorsoftlinks.However,theHDFSarchitecturedoesnotprecludeimplementingthesefeatures.

TheNameNodemaintainsthefilesystemnamespace.AnychangetothefilesystemnamespaceoritspropertiesisrecordedbytheNameNode.AnapplicationcanspecifythenumberofreplicasofafilethatshouldbemaintainedbyHDFS.Thenumberofcopiesofafileiscalledthereplicationfactorofthatfile.ThisinformationisstoredbytheNameNode.

DataReplication

HDFSisdesignedtoreliablystoreverylargefilesacrossmachinesinalargecluster.Itstoreseachfileasasequenceofblocks;allblocksinafileexceptthelastblockarethesamesize.Theblocksofafilearereplicatedforfaulttolerance.Theblocksizeandreplicationfactorareconfigurableperfile.Anapplicationcanspecifythenumberofreplicasofafile.Thereplicationfactorcanbespecifiedatfilecreationtimeandcanbechangedlater.FilesinHDFSarewrite-onceandhavestrictlyonewriteratanytime.

TheNameNodemakesalldecisionsregardingreplicationofblocks.ItperiodicallyreceivesaHeartbeatandaBlockreportfromeachoftheDataNodesinthecluster.ReceiptofaHeartbeatimpliesthattheDataNodeisfunctioningproperly.ABlockreportcontainsalistofallblocksonaDataNode.

ReplicaPlacement:

TheFirstBabySteps

TheplacementofreplicasiscriticaltoHDFSreliabilityandperformance.OptimizingreplicaplacementdistinguishesHDFSfrommostotherdistributedfilesystems.Thisisafeaturethatneedslotsoftuningandexperience.Thepurposeofarack-awarereplicaplacementpolicyistoimprovedatareliability,availability,andnetworkbandwidthutilization.Thecurrentimplementationforthereplicaplacementpolicyisafirsteffortinthisdirection.Theshort-termgoalsofimplementingthispolicyaretovalidateitonproductionsystems,learnmoreaboutitsbehavior,andbuildafoundationtotestandresearchmoresophisticatedpolicies.

LargeHDFSinstancesrunonaclusterofcomputersthatcommonlyspreadacrossmanyracks.Communicationbetweentwonodesindifferentrackshastogothroughswitches.Inmostcases,networkbandwidthbetweenmachinesinthesamerackisgreaterthannetworkbandwidthbetweenmachinesindifferentracks.

TheNameNodedeterminestherackideachDataNodebelongstoviatheprocessoutlinedinRackAwareness.Asimplebutnon-optimalpolicyistoplacereplicasonuniqueracks.Thispreventslosingdatawhenanentirerackfailsandallowsuseofbandwidthfrommultiplerackswhenreadingdata.Thispolicyevenlydistributesreplicasintheclusterwhichmakesiteasytobalanceloadoncomponentfailure.However,thispolicyincreasesthecostofwritesbecauseawriteneedstotransferblockstomultipleracks.

Forthecommoncase,whenthereplicationfactoristhree,HDFS’splacementpolicyistoputonereplicaononenodeinthelocalrack,anotheronadifferentnodeinthelocalrack,andthelastonadifferentnodeinadifferentrack.Thispolicycutstheinter-rackwritetrafficwhichgenerallyimproveswriteperformance.Thechanceofrackfailureisfarlessthanthatofnodefailure;thispolicydoesnotimpactdatareliabilityandavailabilityguarantees.However,itdoesreducetheaggregatenetworkbandwidthusedwhenreadingdatasinceablockisplacedinonlytwouniqueracksratherthanthree.Withthispolicy,thereplicasofafiledonotevenlydistributeacrosstheracks.Onethirdofreplicasareononenode,twothirdsofreplicasareononerack,andtheotherthirdareevenlydistributedacrosstheremainingracks.Thispolicyimproveswriteperformancewithoutcompromisingdatareliabilityorreadperformance.

Thecurrent,defaultreplicaplacementpolicydescribedhereisaworkinprogress.

ReplicaSelection

Tominimizeglobalbandwidt

展开阅读全文
相关资源
猜你喜欢
相关搜索
资源标签

当前位置:首页 > 求职职场 > 简历

copyright@ 2008-2023 冰点文库 网站版权所有

经营许可证编号:鄂ICP备19020893号-2