1、Vol.5 No.4 J.of Comput.Sci.&Technol.1990 Feature Point Method of Chinese Character Recognition and Its Application Zhang Xinzhong(s ),Yan Changde(lq )and Liu Xiuying(0J )Chinese Information Processing&Research Center,BeijOtg Information Technology Institute Received December 3,1988;revised March 27,
2、1989.Abstract A new method for recognizing Chinese characters is proposed.It is based on the so-called feature points of Chinese characters.The.feature points we use include those on the stroke of a character,i.e.,end points,turning points,fork points and cross points,and the key points on the backg
3、round of character.This method differs from the previous ones for it combines the feature points on stroke with those on back-ground and it uses feature points to recognize Chinese characters directly.A Chinese character recognition system based on top-down dynamical matching of feature point is dev
4、eloped.The system can recognize not only 6763 printed sample Song font Chinese characters of size 5.6 2 with high recognition rate,but also the general printed books,magazines and documents with a satisfactory recognition rate and speed.1.Introduction With the development of Chinese information proc
5、essing technique,the contradic-tion between input of Chinese information by hand and automatic processing,output of Chinese information becomes sharp day by day.In fact,Chinese information input has become the bottle neck of the whole processing system.The contradiction can be solved well with the C
6、hinese character recognition technique based on pattern recog-nition and artificial intelligence principle.Recognition of printed Chinese character has been studied extensively tt-rJ and sev-eral experimental systems have been completed in recent years.With the development of Chinese information lib
7、rary and office automation,we are in the period of devel-oping a practical recognition system of printed Chinese characters,a system that can recognize 3000-7000 printed Chinese characters with high performance.Recognition rate is not required very high,but w e must pay great attention to its practi
8、cality.In other words,realized on micro-computers with a little hardware,the system can recog-nize the often used No.5 Song font Chinese characters with enough disturbance ab-sorb ability and can be connected to Chinese information processing system easily.The statistical and the structural method u
9、sed in Chinese character recognition have different properties(see Fig.1).Statistical method is suitable for recognizing printed Chinese characters,because the deformation of printed Chinese characters is very small.If we combine it with structural method to extract high information density features
10、 for recognition according to structural properties of Chinese character,not only can we reduce the memory needed,run the recognition system on micro-computers,but also increase the suitability to multi-font printed characters or even use it to recognize handprinted characters.According to the princ
11、iples above,a new method based on the so-called feature points of Chinese character for recognizing Chinese characters is proposed.This meth-od is based on our research on limited handprinted Chinese character recognition I71.306 J.of Comput.Sci.&Technol.Vo1.5 sa,is vaous_oacafac,A r e a,d o carat*I
12、 Structural Dictionary creating Suitable Unsuitable Fig.1.Properties of statistical and structural method.2.Feature Points of Chinese Character The kernel of Chinese character recognition is feature selection.The principles of feature selection are as follows.a.The feature should reflect the essenti
13、al properties of Chinese character structure,that is,the feature have no concern with the change of character font,stroke width,position and even writing order.b.The feature should be simple,less memory needed.c.The feature should be extracted and learned easily.d.Different characters should have di
14、fferent features.Chinese character is a kind of straight line character,consisting of straight line strokes basically.Most information of a binarized Chinese character matrix is concen-trated on the skeleton of a character.Furthermore,the skeleton information of a character is concentrated on some f
15、eature points,i.e.,stroke feature points(see Fig.2).Once the stroke feature points are affirmed,the Chinese character strokes and structure can be decided according to some connecting rules.Skeleton roke feature points r k_._._ d end point o cross point o mfork point,L turning point o key background
16、 point Fig.3.Chinese character feature points.Fig.2.Chinese chaa,tcr skeleton ano stroke feature points.The background of a Chinese character also has much information which can dis-tinguish one character from another.So,if we select some points on background(which are called key background points),
17、we can distinguish each character more efficiently.In fact,it is very important to select some key background points for stroke-less characters,because the main distinctive information between stroke-less character and the other characters is on their background.Definition 1.Stroke feature point set
18、 Ts of a Chinese character is a set of pohtts including end point D,turning point Z,fork point Q and cross point J.Ts=D,Z,Q,J.End points are the end or start points of stroke that do not connect with others.Turning points are points on stroke at which the direction of stroke changes obvi-ously.Fork
19、points are cross points ojtwo strokes which are at the end or the start of one stroke and in the middle of the other.Cross points are points crossing two strokes in the middle.No.4 Chinese Character Recognition 307 Definition 2.The key background feature points B are the points that can distin-guish
20、 characters based on Stroke feature points Ts.Definition 3.Chinese character feature point set T consists of the stroke feature point Ts and the key background.feature point B.T=D,Z,Q,J,B.Chinese character feature points are shown in Fig.3.According to the research we did on limited handprinted Chin
21、ese character recognition I7.sJ,we think that Chinese character stroke type and number,relative posi-tion of components,relative position and connecting relations of each stroke in compo-nent are the essential features of Chinese character pattern structure.It is the inherit-ance and development of
22、the research that we use feature points to express Chinese character patterns.In fact,Chinese character stroke feature points reflect the essential features of Chinese character and concentrate the main information of Chinese charac-ter structure.End and turning points determine the stroke position
23、and shape of a Chinese character.Fork and cross points determine the connecting relation between diF ferent strokes.Key background points can distinguish stroke-like characters that can-not be distinguished by stroke feature points.Because feature points are determined by the essential structure of
24、a Chinese char actei,feature points of printed character of various font(Fangsong,Kai and Hei etc.)or even limited handprinted character change rarely.In fact,fork points,cross points and key background points will not change.In principle,we can use feature points to recognize multi-font printed or
25、even limited handprinted Chinese characters,that is,use one method to recognize both printed and handprinted Chinese characters.The memory needed for feature points is only one to ten percent of that needed by binarized Chinese character matrix.In other words,if we use feature points to ex-press Chi
26、nese character,structure information loses little but memory needed is reduced by ten times.In fact,feature points are the best structure expression of Chinese charac-ter graph.Recognition rate may be increased,memory needed may be reduced much more and the recognition system may be run on microcomp
27、ute.rs with the use of fea-ture point method.Feature points of Chinese character reflect structure feature of character.The non-structure information(stroke width,character position and little angle rotation etc.)of Chinese character has less affection on feature points than that on statistical feat
28、ure.So the disturbance absorbing ability and recognition rate can be increased.The general method using feature points to recognize Chinese character is,first,thinning character,second,detecting stroke feature points,third,connecting feature points to create lines,sub-strokes and strokes,and then re
29、cognizing characters accord-ing to the stroke direction,length and other features.Another method is recognizing Chinese characters according to sub-stroke direction,number and other features ex-tracted from character background.We combine stroke feature points with key back-ground points to recogniz
30、e Chinese character according to information of the feature points themselves(point type,number and position etc.).If T is Chinese character feature expression,T is one of the feature points,K is the number of feature points,S is the type of feature point T(end point D,turning point Z,fork point Q,c
31、ross point J and key background point B),Xk,Y are coordinations of feature point T in character matrix and Pk is the set of other attributes of feature point T,then we have 308 J.of Comput.Sci.&Technol.Vol.5 T=Tk k=1,2,-.,K,Tk=(Sk,xk,Irk,Pk).(1)3.Two Kinds of Match Method Becuase the memory needed b
32、y feature points is less,so we can use top-down matching method.That is to say,not only can we use the general bottom-up method to extract feature points of unknown character first,and then match it with dictionary,but also we can use top-down method to store all the Chinese character feature points
33、 in dictionary first,and then match it with unknown characters dynamically.Dit:ferent methods have different properties.The advantage of bottom-up match method is that it has wide suitability for printed or even handprinted Chinese characters,but feature points cannot be extracted with high speed and accurate rate.The advantage of top-down match method is that it is not necessary to extract featur
copyright@ 2008-2023 冰点文库 网站版权所有
经营许可证编号:鄂ICP备19020893号-2