印刷体汉字识别技术的研究英文文献.pdf

上传人:wj 文档编号:14655165 上传时间:2023-06-25 格式:PDF 页数:7 大小:427.74KB
下载 相关 举报
印刷体汉字识别技术的研究英文文献.pdf_第1页
第1页 / 共7页
印刷体汉字识别技术的研究英文文献.pdf_第2页
第2页 / 共7页
印刷体汉字识别技术的研究英文文献.pdf_第3页
第3页 / 共7页
印刷体汉字识别技术的研究英文文献.pdf_第4页
第4页 / 共7页
印刷体汉字识别技术的研究英文文献.pdf_第5页
第5页 / 共7页
印刷体汉字识别技术的研究英文文献.pdf_第6页
第6页 / 共7页
印刷体汉字识别技术的研究英文文献.pdf_第7页
第7页 / 共7页
亲,该文档总共7页,全部预览完了,如果喜欢就下载吧!
下载资源
资源描述

印刷体汉字识别技术的研究英文文献.pdf

《印刷体汉字识别技术的研究英文文献.pdf》由会员分享,可在线阅读,更多相关《印刷体汉字识别技术的研究英文文献.pdf(7页珍藏版)》请在冰点文库上搜索。

印刷体汉字识别技术的研究英文文献.pdf

Vol.5No.4J.ofComput.Sci.&Technol.1990FeaturePointMethodofChineseCharacterRecognitionandItsApplicationZhangXinzhong(s),YanChangde(lq)andLiuXiuying(0J)ChineseInformationProcessing&ResearchCenter,BeijOtgInformationTechnologyInstituteReceivedDecember3,1988;revisedMarch27,1989.AbstractAnewmethodforrecognizingChinesecharactersisproposed.Itisbasedontheso-calledfeaturepointsofChinesecharacters.The.featurepointsweuseincludethoseonthestrokeofacharacter,i.e.,endpoints,turningpoints,forkpointsandcrosspoints,andthekeypointsonthebackgroundofcharacter.Thismethoddiffersfromthepreviousonesforitcombinesthefeaturepointsonstrokewiththoseonback-groundanditusesfeaturepointstorecognizeChinesecharactersdirectly.AChinesecharacterrecognitionsystembasedontop-downdynamicalmatchingoffeaturepointisdeveloped.Thesystemcanrecognizenotonly6763printedsampleSongfontChinesecharactersofsize5.62withhighrecognitionrate,butalsothegeneralprintedbooks,magazinesanddocumentswithasatisfactoryrecognitionrateandspeed.1.IntroductionWiththedevelopmentofChineseinformationprocessingtechnique,thecontradic-tionbetweeninputofChineseinformationbyhandandautomaticprocessing,outputofChineseinformationbecomessharpdaybyday.Infact,Chineseinformationinputhasbecomethebottleneckofthewholeprocessingsystem.ThecontradictioncanbesolvedwellwiththeChinesecharacterrecognitiontechniquebasedonpatternrecog-nitionandartificialintelligenceprinciple.RecognitionofprintedChinesecharacterhasbeenstudiedextensivelytt-rJandsev-eralexperimentalsystemshavebeencompletedinrecentyears.WiththedevelopmentofChineseinformationlibraryandofficeautomation,weareintheperiodofdevel-opingapracticalrecognitionsystemofprintedChinesecharacters,asystemthatcanrecognize3000-7000printedChinesecharacterswithhighperformance.Recognitionrateisnotrequiredveryhigh,butwemustpaygreatattentiontoitspracticality.Inotherwords,realizedonmicro-computerswithalittlehardware,thesystemcanrecog-nizetheoftenusedNo.5SongfontChinesecharacterswithenoughdisturbanceab-sorbabilityandcanbeconnectedtoChineseinformationprocessingsystemeasily.ThestatisticalandthestructuralmethodusedinChinesecharacterrecognitionhavedifferentproperties(seeFig.1).StatisticalmethodissuitableforrecognizingprintedChinesecharacters,becausethedeformationofprintedChinesecharactersisverysmall.IfwecombineitwithstructuralmethodtoextracthighinformationdensityfeaturesforrecognitionaccordingtostructuralpropertiesofChinesecharacter,notonlycanwereducethememoryneeded,runtherecognitionsystemonmicro-computers,butalsoincreasethesuitabilitytomulti-fontprintedcharactersorevenuseittorecognizehandprintedcharacters.Accordingtotheprinciplesabove,anewmethodbasedontheso-calledfeaturepointsofChinesecharacterforrecognizingChinesecharactersisproposed.Thismeth-odisbasedonourresearchonlimitedhandprintedChinesecharacterrecognitionI71.306J.ofComput.Sci.&Technol.Vo1.5sa,isvaous_oacafac,Area,docarat*IStructuralDictionarycreatingSuitableUnsuitableFig.1.Propertiesofstatisticalandstructuralmethod.2.FeaturePointsofChineseCharacterThekernelofChinesecharacterrecognitionisfeatureselection.Theprinciplesoffeatureselectionareasfollows.a.ThefeatureshouldreflecttheessentialpropertiesofChinesecharacterstructure,thatis,thefeaturehavenoconcernwiththechangeofcharacterfont,strokewidth,positionandevenwritingorder.b.Thefeatureshouldbesimple,lessmemoryneeded.c.Thefeatureshouldbeextractedandlearnedeasily.d.Differentcharactersshouldhavedifferentfeatures.Chinesecharacterisakindofstraightlinecharacter,consistingofstraightlinestrokesbasically.MostinformationofabinarizedChinesecharactermatrixisconcen-tratedontheskeletonofacharacter.Furthermore,theskeletoninformationofacharacterisconcentratedonsomefeaturepoints,i.e.,strokefeaturepoints(seeFig.2).Oncethestrokefeaturepointsareaffirmed,theChinesecharacterstrokesandstructurecanbedecidedaccordingtosomeconnectingrules.Skeletonrokefeaturepointsrk_._._dendpointocrosspointomforkpoint,LturningpointokeybackgroundpointFig.3.Chinesecharacterfeaturepoints.Fig.2.Chinesechaa,tcrskeletonanostrokefeaturepoints.ThebackgroundofaChinesecharacteralsohasmuchinformationwhichcandis-tinguishonecharacterfromanother.So,ifweselectsomepointsonbackground(whicharecalledkeybackgroundpoints),wecandistinguisheachcharactermoreefficiently.Infact,itisveryimportanttoselectsomekeybackgroundpointsforstroke-lesscharacters,becausethemaindistinctiveinformationbetweenstroke-lesscharacterandtheothercharactersisontheirbackground.Definition1.StrokefeaturepointsetTsofaChinesecharacterisasetofpohttsincludingendpointD,turningpointZ,forkpointQandcrosspointJ.Ts=D,Z,Q,J.Endpointsaretheendorstartpointsofstrokethatdonotconnectwithothers.Turningpointsarepointsonstrokeatwhichthedirectionofstrokechangesobvi-ously.Forkpointsarecrosspointsojtwostrokeswhichareattheendorthestartofonestrokeandinthemiddleoftheother.Crosspointsarepointscrossingtwostrokesinthemiddle.No.4ChineseCharacterRecognition307Definition2.ThekeybackgroundfeaturepointsBarethepointsthatcandistin-guishcharactersbasedonStrokefeaturepointsTs.Definition3.ChinesecharacterfeaturepointsetTconsistsofthestrokefeaturepointTsandthekeybackground.featurepointB.T=D,Z,Q,J,B.ChinesecharacterfeaturepointsareshowninFig.3.AccordingtotheresearchwedidonlimitedhandprintedChinesecharacterrecognitionI7.sJ,wethinkthatChinesecharacterstroketypeandnumber,relativeposi-tionofcomponents,relativepositionandconnectingrelationsofeachstrokeincompo-nentaretheessentialfeaturesofChinesecharacterpatternstructure.Itistheinherit-anceanddevelopmentoftheresearchthatweusefeaturepointstoexpressChinesecharacterpatterns.Infact,ChinesecharacterstrokefeaturepointsreflecttheessentialfeaturesofChinesecharacterandconcentratethemaininformationofChinesecharac-terstructure.EndandturningpointsdeterminethestrokepositionandshapeofaChinesecharacter.ForkandcrosspointsdeterminetheconnectingrelationbetweendiFferentstrokes.Keybackgroundpointscandistinguishstroke-likecharactersthatcan-notbedistinguishedbystrokefeaturepoints.BecausefeaturepointsaredeterminedbytheessentialstructureofaChinesecharactei,featurepointsofprintedcharacterofvariousfont(Fangsong,KaiandHeietc.)orevenlimitedhandprintedcharacterchangerarely.Infact,forkpoints,crosspointsandkeybackgroundpointswillnotchange.Inprinciple,wecanusefeaturepointstorecognizemulti-fontprintedorevenlimitedhandprintedChinesecharacters,thatis,useonemethodtorecognizebothprintedandhandprintedChinesecharacters.ThememoryneededforfeaturepointsisonlyonetotenpercentofthatneededbybinarizedChinesecharactermatrix.Inotherwords,ifweusefeaturepointstoex-pressChinesecharacter,structureinformationloseslittlebutmemoryneededisreducedbytentimes.Infact,featurepointsarethebeststructureexpressionofChinesecharac-tergraph.Recognitionratemaybeincreased,memoryneededmaybereducedmuchmoreandtherecognitionsystemmayberunonmicrocompute.rswiththeuseoffea-turepointmethod.FeaturepointsofChinesecharacterreflectstructurefeatureofcharacter.Thenon-structureinformation(strokewidth,characterpositionandlittleanglerotationetc.)ofChinesecharacterhaslessaffectiononfeaturepointsthanthatonstatisticalfeature.Sothedisturbanceabsorbingabilityandrecognitionratecanbeincreased.ThegeneralmethodusingfeaturepointstorecognizeChinesecharacteris,first,thinningcharacter,second,detectingstrokefeaturepoints,third,connectingfeaturepointstocreatelines,sub-strokesandstrokes,andthenrecognizingcharactersaccord-ingtothestrokedirection,lengthandotherfeatures.AnothermethodisrecognizingChinesecharactersaccordingtosub-strokedirection,numberandotherfeaturesex-tractedfromcharacterbackground.Wecombinestrokefeaturepointswithkeyback-groundpointstorecognizeChinesecharacteraccordingtoinformationofthefeaturepointsthemselves(pointtype,numberandpositionetc.).IfTisChinesecharacterfeatureexpression,Tisoneofthefeaturepoints,Kisthenumberoffeaturepoints,SisthetypeoffeaturepointT(endpointD,turningpointZ,forkpointQ,crosspointJandkeybackgroundpointB),Xk,YarecoordinationsoffeaturepointTincharactermatrixandPkisthesetofotherattributesoffeaturepointT,thenwehave308J.ofComput.Sci.&Technol.Vol.5T=Tkk=1,2,-.,K,Tk=(Sk,xk,Irk,Pk).

(1)3.TwoKindsofMatchMethodBecuasethememoryneededbyfeaturepointsisless,sowecanusetop-downmatchingmethod.Thatistosay,notonlycanweusethegeneralbottom-upmethodtoextractfeaturepointsofunknowncharacterfirst,andthenmatchitwithdictionary,butalsowecanusetop-downmethodtostorealltheChinesecharacterfeaturepointsindictionaryfirst,andthenmatchitwithunknowncharactersdynamically.Dit:

ferentmethodshavedifferentproperties.Theadvantageofbottom-upmatchmethodisthatithaswidesuitabilityforprintedorevenhandprintedChinesecharacters,butfeaturepointscannotbeextractedwithhighspeedandaccuraterate.Theadvantageoftop-downmatchmethodisthatitisnotnecessarytoextractfeatur

展开阅读全文
相关资源
猜你喜欢
相关搜索
资源标签

当前位置:首页 > 高等教育 > 教育学

copyright@ 2008-2023 冰点文库 网站版权所有

经营许可证编号:鄂ICP备19020893号-2