印刷体汉字识别技术的研究英文文献.pdf
《印刷体汉字识别技术的研究英文文献.pdf》由会员分享,可在线阅读,更多相关《印刷体汉字识别技术的研究英文文献.pdf(7页珍藏版)》请在冰点文库上搜索。
Vol.5No.4J.ofComput.Sci.&Technol.1990FeaturePointMethodofChineseCharacterRecognitionandItsApplicationZhangXinzhong(s),YanChangde(lq)andLiuXiuying(0J)ChineseInformationProcessing&ResearchCenter,BeijOtgInformationTechnologyInstituteReceivedDecember3,1988;revisedMarch27,1989.AbstractAnewmethodforrecognizingChinesecharactersisproposed.Itisbasedontheso-calledfeaturepointsofChinesecharacters.The.featurepointsweuseincludethoseonthestrokeofacharacter,i.e.,endpoints,turningpoints,forkpointsandcrosspoints,andthekeypointsonthebackgroundofcharacter.Thismethoddiffersfromthepreviousonesforitcombinesthefeaturepointsonstrokewiththoseonback-groundanditusesfeaturepointstorecognizeChinesecharactersdirectly.AChinesecharacterrecognitionsystembasedontop-downdynamicalmatchingoffeaturepointisdeveloped.Thesystemcanrecognizenotonly6763printedsampleSongfontChinesecharactersofsize5.62withhighrecognitionrate,butalsothegeneralprintedbooks,magazinesanddocumentswithasatisfactoryrecognitionrateandspeed.1.IntroductionWiththedevelopmentofChineseinformationprocessingtechnique,thecontradic-tionbetweeninputofChineseinformationbyhandandautomaticprocessing,outputofChineseinformationbecomessharpdaybyday.Infact,Chineseinformationinputhasbecomethebottleneckofthewholeprocessingsystem.ThecontradictioncanbesolvedwellwiththeChinesecharacterrecognitiontechniquebasedonpatternrecog-nitionandartificialintelligenceprinciple.RecognitionofprintedChinesecharacterhasbeenstudiedextensivelytt-rJandsev-eralexperimentalsystemshavebeencompletedinrecentyears.WiththedevelopmentofChineseinformationlibraryandofficeautomation,weareintheperiodofdevel-opingapracticalrecognitionsystemofprintedChinesecharacters,asystemthatcanrecognize3000-7000printedChinesecharacterswithhighperformance.Recognitionrateisnotrequiredveryhigh,butwemustpaygreatattentiontoitspracticality.Inotherwords,realizedonmicro-computerswithalittlehardware,thesystemcanrecog-nizetheoftenusedNo.5SongfontChinesecharacterswithenoughdisturbanceab-sorbabilityandcanbeconnectedtoChineseinformationprocessingsystemeasily.ThestatisticalandthestructuralmethodusedinChinesecharacterrecognitionhavedifferentproperties(seeFig.1).StatisticalmethodissuitableforrecognizingprintedChinesecharacters,becausethedeformationofprintedChinesecharactersisverysmall.IfwecombineitwithstructuralmethodtoextracthighinformationdensityfeaturesforrecognitionaccordingtostructuralpropertiesofChinesecharacter,notonlycanwereducethememoryneeded,runtherecognitionsystemonmicro-computers,butalsoincreasethesuitabilitytomulti-fontprintedcharactersorevenuseittorecognizehandprintedcharacters.Accordingtotheprinciplesabove,anewmethodbasedontheso-calledfeaturepointsofChinesecharacterforrecognizingChinesecharactersisproposed.Thismeth-odisbasedonourresearchonlimitedhandprintedChinesecharacterrecognitionI71.306J.ofComput.Sci.&Technol.Vo1.5sa,isvaous_oacafac,Area,docarat*IStructuralDictionarycreatingSuitableUnsuitableFig.1.Propertiesofstatisticalandstructuralmethod.2.FeaturePointsofChineseCharacterThekernelofChinesecharacterrecognitionisfeatureselection.Theprinciplesoffeatureselectionareasfollows.a.ThefeatureshouldreflecttheessentialpropertiesofChinesecharacterstructure,thatis,thefeaturehavenoconcernwiththechangeofcharacterfont,strokewidth,positionandevenwritingorder.b.Thefeatureshouldbesimple,lessmemoryneeded.c.Thefeatureshouldbeextractedandlearnedeasily.d.Differentcharactersshouldhavedifferentfeatures.Chinesecharacterisakindofstraightlinecharacter,consistingofstraightlinestrokesbasically.MostinformationofabinarizedChinesecharactermatrixisconcen-tratedontheskeletonofacharacter.Furthermore,theskeletoninformationofacharacterisconcentratedonsomefeaturepoints,i.e.,strokefeaturepoints(seeFig.2).Oncethestrokefeaturepointsareaffirmed,theChinesecharacterstrokesandstructurecanbedecidedaccordingtosomeconnectingrules.Skeletonrokefeaturepointsrk_._._dendpointocrosspointomforkpoint,LturningpointokeybackgroundpointFig.3.Chinesecharacterfeaturepoints.Fig.2.Chinesechaa,tcrskeletonanostrokefeaturepoints.ThebackgroundofaChinesecharacteralsohasmuchinformationwhichcandis-tinguishonecharacterfromanother.So,ifweselectsomepointsonbackground(whicharecalledkeybackgroundpoints),wecandistinguisheachcharactermoreefficiently.Infact,itisveryimportanttoselectsomekeybackgroundpointsforstroke-lesscharacters,becausethemaindistinctiveinformationbetweenstroke-lesscharacterandtheothercharactersisontheirbackground.Definition1.StrokefeaturepointsetTsofaChinesecharacterisasetofpohttsincludingendpointD,turningpointZ,forkpointQandcrosspointJ.Ts=D,Z,Q,J.Endpointsaretheendorstartpointsofstrokethatdonotconnectwithothers.Turningpointsarepointsonstrokeatwhichthedirectionofstrokechangesobvi-ously.Forkpointsarecrosspointsojtwostrokeswhichareattheendorthestartofonestrokeandinthemiddleoftheother.Crosspointsarepointscrossingtwostrokesinthemiddle.No.4ChineseCharacterRecognition307Definition2.ThekeybackgroundfeaturepointsBarethepointsthatcandistin-guishcharactersbasedonStrokefeaturepointsTs.Definition3.ChinesecharacterfeaturepointsetTconsistsofthestrokefeaturepointTsandthekeybackground.featurepointB.T=D,Z,Q,J,B.ChinesecharacterfeaturepointsareshowninFig.3.AccordingtotheresearchwedidonlimitedhandprintedChinesecharacterrecognitionI7.sJ,wethinkthatChinesecharacterstroketypeandnumber,relativeposi-tionofcomponents,relativepositionandconnectingrelationsofeachstrokeincompo-nentaretheessentialfeaturesofChinesecharacterpatternstructure.Itistheinherit-anceanddevelopmentoftheresearchthatweusefeaturepointstoexpressChinesecharacterpatterns.Infact,ChinesecharacterstrokefeaturepointsreflecttheessentialfeaturesofChinesecharacterandconcentratethemaininformationofChinesecharac-terstructure.EndandturningpointsdeterminethestrokepositionandshapeofaChinesecharacter.ForkandcrosspointsdeterminetheconnectingrelationbetweendiFferentstrokes.Keybackgroundpointscandistinguishstroke-likecharactersthatcan-notbedistinguishedbystrokefeaturepoints.BecausefeaturepointsaredeterminedbytheessentialstructureofaChinesecharactei,featurepointsofprintedcharacterofvariousfont(Fangsong,KaiandHeietc.)orevenlimitedhandprintedcharacterchangerarely.Infact,forkpoints,crosspointsandkeybackgroundpointswillnotchange.Inprinciple,wecanusefeaturepointstorecognizemulti-fontprintedorevenlimitedhandprintedChinesecharacters,thatis,useonemethodtorecognizebothprintedandhandprintedChinesecharacters.ThememoryneededforfeaturepointsisonlyonetotenpercentofthatneededbybinarizedChinesecharactermatrix.Inotherwords,ifweusefeaturepointstoex-pressChinesecharacter,structureinformationloseslittlebutmemoryneededisreducedbytentimes.Infact,featurepointsarethebeststructureexpressionofChinesecharac-tergraph.Recognitionratemaybeincreased,memoryneededmaybereducedmuchmoreandtherecognitionsystemmayberunonmicrocompute.rswiththeuseoffea-turepointmethod.FeaturepointsofChinesecharacterreflectstructurefeatureofcharacter.Thenon-structureinformation(strokewidth,characterpositionandlittleanglerotationetc.)ofChinesecharacterhaslessaffectiononfeaturepointsthanthatonstatisticalfeature.Sothedisturbanceabsorbingabilityandrecognitionratecanbeincreased.ThegeneralmethodusingfeaturepointstorecognizeChinesecharacteris,first,thinningcharacter,second,detectingstrokefeaturepoints,third,connectingfeaturepointstocreatelines,sub-strokesandstrokes,andthenrecognizingcharactersaccord-ingtothestrokedirection,lengthandotherfeatures.AnothermethodisrecognizingChinesecharactersaccordingtosub-strokedirection,numberandotherfeaturesex-tractedfromcharacterbackground.Wecombinestrokefeaturepointswithkeyback-groundpointstorecognizeChinesecharacteraccordingtoinformationofthefeaturepointsthemselves(pointtype,numberandpositionetc.).IfTisChinesecharacterfeatureexpression,Tisoneofthefeaturepoints,Kisthenumberoffeaturepoints,SisthetypeoffeaturepointT(endpointD,turningpointZ,forkpointQ,crosspointJandkeybackgroundpointB),Xk,YarecoordinationsoffeaturepointTincharactermatrixandPkisthesetofotherattributesoffeaturepointT,thenwehave308J.ofComput.Sci.&Technol.Vol.5T=Tkk=1,2,-.,K,Tk=(Sk,xk,Irk,Pk).
(1)3.TwoKindsofMatchMethodBecuasethememoryneededbyfeaturepointsisless,sowecanusetop-downmatchingmethod.Thatistosay,notonlycanweusethegeneralbottom-upmethodtoextractfeaturepointsofunknowncharacterfirst,andthenmatchitwithdictionary,butalsowecanusetop-downmethodtostorealltheChinesecharacterfeaturepointsindictionaryfirst,andthenmatchitwithunknowncharactersdynamically.Dit:
ferentmethodshavedifferentproperties.Theadvantageofbottom-upmatchmethodisthatithaswidesuitabilityforprintedorevenhandprintedChinesecharacters,butfeaturepointscannotbeextractedwithhighspeedandaccuraterate.Theadvantageoftop-downmatchmethodisthatitisnotnecessarytoextractfeatur