Pursuant to 35 U.S.C. § 119(a), this application claims the benefit of earlier filing date and right of priority to Korean Patent Application Nos. 10-2004-0069320 and 10-2004-0069843, filed on Aug. 31, 2004 and Sep. 2, 2004, respectively, the contents of which are hereby incorporated by reference herein in their entirety.
1. Field of the Invention
The present invention relates to a method and apparatus for recognizing characters on a document image captured by a camera and saving recognized characters. Particularly, the present invention relates to a method and apparatus for recognizing characters on a name card image captured by a mobile camera phone with an internalized or externalized camera and automatically saving the recognized characters in corresponding fields of a predetermined form such as a telephone directory database.
2. Description of the Related Art
An optical character recognition (OCR) system or a scanner-based character recognition system has been widely used to recognize characters on a document image. However, since these systems are dedicated system for recognizing characters on a document image, massive applications and hardware sources are required to process and recognize the document image. Therefore, it is difficult to simply apply the character recognition method used in the OCR system or scanner based recognition system to a device having a limited process and memory. A mobile camera phone may be designed to recognize the characters. That is, the camera phone is used to take a picture of a small name card, recognize the characters on the captured image, and automatically save the recognized characters in a phone number database. However, since the mobile camera phone has a limited processor and memory, it is difficult to accurately process the image and recognize the characters on the image.
Describing a method for recognizing a name card using the mobile camera phone in more detail, a name card image is first captured by a camera of the mobile camera phone and the characters on the captured card image are recognized by fields using a character recognition algorithm. The recognized characters are displayed by fields such as a name, a telephone number, an e-mail address, and the like. Then, the characters displayed by fields are corrected and edited. The corrected and edited characters are saved in a predetermined form of a phone number database.
However, when the focus of the name card image is not accurately adjusted or the name card image is not correctly position, the recognition rate is lowered. Particularly, when the camera is not provided with an automatic focusing function, twisted, the focus adjustment and the correct disposition of the name card image must be determined by eyes of the user. This makes it difficult to take the clear name card image that can allow for the correct recognition.
Generally, when a user receives name cards from customers, friends and the like, the users opens a phone number editor of his/her mobile phone and inputs the information on the name card by himself/herself using a keypad of the mobile phone. This is troublesome for the user. Therefore, a mobile camera phone having a character recognizing function has been developed to take a picture of the name card and automatically save the information on the name card in the phone number database. That is, a document/name card image is captured by an internalized or externalized camera of a mobile camera phone and characters on the captured image are recognized according to a character recognition algorithm. The recognized characters are automatically saved in the phone number database.
However, when a relatively large number of characters are existed on image capture by the camera or scanner, since the mobile phone has a limited process and memory source, a relatively long process time is taken even when the recognition process is optimized. Furthermore, when the characters are composed in a variety of languages, the recognition rate may be deteriorated as compared with when they are composed in a single language.
A mobile phone includes a control unit 5, a keypad 1, a display unit 3, a memory unit 9, an audio converting unit 7c, a camera module unit 7b, and a radio circuit unit 7a.
The control unit 5 processes data of a document (name card) image read by the camera module unit 7b, output the processed data to the display unit 3, processes editing commands of the displayed data, which are inputted by a user, and save the data edited by the user in the memory unit 9. The keypad 1 functions as a user interface for selecting and manipulating the function of the mobile phone. The display unit 3 displays a variety of menu screens, a run screen and a result screen. The display unit 3 further displays an interface screen such as a document image data screen, a data editing screen and an edited data storage screen so that the user edits the data and save the edited data. The memory unit 9 is generally comprised of a flash memory, a random access memory, a read only memory. The memory unit 9 saves a real time operating system and software for processing the mobile phone, and information on parameters and states of the software and the operating system and performs the data input/output in accordance with commands of the control unit 5. Particularly, the memory unit 9 saves a phone number database in which the information corresponding to the recognized characters through a mapping process.
The audio converting unit 7c processes voice signal inputted through a microphone by a user and transmits the processed signal to the control unit 5 or outputs the processed signal through a speaker. The camera module unit 7b processes the data of the name card image captured by the camera and transmits the processed data to the control unit 5. The camera may be internalized or externalized in or from the mobile phone. The camera is a digital camera. The radio circuit unit 7a functions to connect to mobile communication network and process the transmission/receive of the signal.
A prior name card recognition engine includes a still image capture block 11, a character-line recognition block 12, and application software 13 for a name card recognition editor.
The still image capture block 11 converts the image captured by a digital camera 10 into a still image. The character line recognition block 12 recognizes the characters on the still image, converts the recognized characters into a character line, and transmits the character line to the application software. The application software 13 performs the name card recognition according to a flowchart depicted in
A photographing menu is first selected using a keypad 1 (S31) and the name card image photographed by the camera is displayed on the display unit (S32). A name card recognition menu for reading the name card is selected S33. Since the recognized data is not accurate in an initial step, the data cannot be directed transmitted to the database (a personal information managing data base such as a phone number database) saved in the memory unit. Therefore, the name card recognition engine recognizes the name card, coverts the same into the character line, and transmits the character line to the application software. The application software supports the mapping function so that the character line matches with an input form saved in the database.
The recognized name card data and the editing screen is displayed on the display unit so that the user can edits the name card data and performs the mapping process (S34 and S35). The user corrects or deletes the characters when there is an error in the character line. Then, the user selects a character line that he/she wishes to save and saves the selected character line. That is, when the mapping process is completed, the user selects a menu “save in a personal information box” to save the recognized character information of the photographed name card image in the memory unit (S36).
In order to improve the recognition rate of the mobile phone, a clear, correct document image data (a photographed name card image data) must be provided to an input device of the character recognition system.
The clear document image closely relates to a focus. The focus highly affects on the separation of the characters from the background and on the recognition of the separated characters. The twist of the image also affects on the accurate character recognition as the characters are also twisted when the overall image is twisted. Although a high performance camera or a camcorder has an automatic focusing function, when a camera without the automatic focusing function is associated with a mobile phone, the focusing and twist states of the image captured by the camera must be identified by naked eyes of the user. This causes the character recognition rate to be lowered.
Accordingly, the present invention is directed to a document image processing method and apparatus, which substantially obviate one or more problems due to limitations and disadvantages of the related art.
It is an object of the present invention to provide a method and apparatus for processing a document image, that can detects a focusing and/or twist states of the document image captured by a camera and provide the detected results to a user through a pre-view screen, thereby allowing a clear, correct document image to be obtained.
It is another object of the present invention to provide a method and apparatus for processing a document image, which can obtain a clear, correct document image by displaying a focusing and twist state of the document image captured by a camera through a pre-view screen before the characters of the document image is recognized.
It is still another object of the present invention to provide a method and apparatus for processing a document image, which can obtain a clear, correct document image even using a mobile phone camera that has no automatic focusing function.
Additional advantages, objects, and features of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention. The objectives and other advantages of the invention may be realized and attained by the structure particularly selected out in the written description and claims hereof as well as the appended drawings.
To achieve these objects and other advantages and in accordance with the purpose of the invention, as embodied and broadly described herein, there is provided a document image processing apparatus, comprising: an image capturing unit for capturing an image of a document; a detecting unit for detecting focusing and twisting states of the capture image; a display unit for displaying the detected focusing and twisting states; a character recognition unit for recognizing characters written on the capture image; and a storing unit for storing the recognized characters by fields.
The focusing and twisting states are displayed on a pre-view screen so as to let a user adjust the focusing and twist of the image.
According to another aspect of the present invention, there is provided a mobile phone with a name card recognition function, comprising: a detecting unit for detecting focusing and twisting states of a name card image captured by a camera; a display unit for displaying the focusing and twisting states of the name card image; a character recognition unit for recognizing characters written on the name card image; and a storing unit for storing the recognized characters in a personal information-managing database by fields.
The focusing and twisting states of the name card is detected by extracting an interesting area from the name card image, calculating a twisting level from a bright component obtained from the interesting area, and calculating a focusing level by extracting a high frequency component from the bright component.
According to another aspect of the present invention, there is provided a document image processing method of a mobile phone, comprising: capturing an image of a document using a camera; detecting focusing and/or twisting states of the captured image; displaying the detected focusing and twisting states; and guiding a user to finally capture the document image based on the displayed focusing and/or twist states.
According to still another aspect of the present invention, there is provided a name card image processing method of a mobile phone, comprising: capturing a name card image; detecting focusing and/or twisting states of the captured name card image; displaying the detected focusing and twisting states; guiding a user to finally capture the document image based on the displayed focusing and/or twist states; recognizing characters written on the captured image; and storing the recognized characters by fields.
It is to be understood that both the foregoing general description and the following detailed description of the present invention are exemplary and explanatory and are intended to provide further explanation of the invention as claimed.
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the principle of the invention. In the drawings:
Reference will now be made in detail to the preferred embodiments of the present invention, examples of which are illustrated in the accompanying drawings. Wherever possible, the same reference numbers will be used throughout the drawings to refer to the same or like parts.
As shown in
The operation of the name card recognition apparatus will be described hereinafter.
The name card image captured by the camera and camera sensor 100 and 110 is pre-processed by the photographing support unit 200. The photographing support unit 200 displays the focusing and leveling states of the name card image through a pre-view screen so that the user identifies if the name card image is clear or not. The higher the focusing and leveling, the higher the recognition rate of the image. Therefore, it is important to adjust the focusing of the image when the image is photographed. In the present invention, the photographing support unit displays the focusing and leveling states of the name card image to let the user know if the camera 100 is in a state where it can accurately recognize the characters on the name card image.
Generally, it is considered that the user takes a picture of the image within a twist angle range of −20-+20 degrees when it is assumed that the image is not turned down. In this case, by letting the user know the twist of the image through the pre-view screen, it becomes possible to adjust the image to the twist angle close to 0-degree. This will be described in more detail later.
The recognition field selection unit 300 allows the user to select the fields from the clear image. Therefore, the recognition process is performed only for the selected fields. In addition, the recognition engine unit 400 performs the recognition process only for the fields selected by the user. The fields recognized in the recognition engine unit 400 are stored in corresponding selected fields such as a name field, a telephone number field, a facsimile number field, a mobile phone number field, an e-mail address field, a company name field, a title field, an address field, and the like by the recognition result editing unit 500. Among the fields, only the six major fields such as the name field, the telephone number field, the facsimile number field, the mobile phone number field, the e-mail address field, and the memo field are displayed. The rest fields are displayed in an additional memo field.
The recognition result editing unit 500 stores the recognition results in the data storing unit 600 as a database format and allows for the data search, data edit, SMS data transmission, phone call, group designation. The recognition result editing unit 500 determines if an additional photographing of the name card is required. When the additional photographing is performed, the current image data is stored in a temporary buffer.
As shown in
As shown in
As shown in
The fields are selected by the user and the recognition results for the selected fields are illustrated in
As shown in
A sensor 103 formed of a charge coupled device or a complementary metal oxide semiconductor may be provided between the image capturing unit 100 and the camera lens 101.
Using the camera lens 101, the sensor 103 and the camera control unit 104 of the image capturing unit 100, the characters written on the name card is photographed. At this point, the detecting unit 200 of the image processing unit 200 detects if the focusing and leveling states of the photographed image is in a state where the characters written on the name card can be accurately recognized.
When it is determined that the focusing is not accurately adjusted, the location of the mobile phone is changed until a signal indicating the accurate focusing adjustment is generated. Likewise, the leveling is also adjusted in the above-described method.
As shown in
A bright signal of the captured name card image may be used to detect the focusing and/or leveling states of the desired fields. That is, the detecting unit receives only bright components of the image inputted from the image capturing unit. A size of the image inputted from the image capturing unit is less than QVGA(320×240). More generally, the size is QCIF(176×144) to process all frames of 15 fps image in rear time, thereby displaying the focusing and leveling values on the display unit (S504).
As shown in
That is, the size can be the 10(pixel)×1(pixel) and the brightness can be adjusted to reduce the amount of calculation of the histogram. In the present invention, the description is done based on 8 steps.
Histogram_Y[I,j+k]/32] (Equation 1)
The Y(I,j) is a bright value long the location (I,j) and the k has values from 0 to 9. In addition, the i indicates a longitudinal coordinate and the j indicates a vertical coordinate.
The overall image is binary-coded from the histogram information calculated according to the local area (S602). In this binary-coding process, a difference between a maximum value (max{Histogram_Y[k]})of 10-Histogram_Y[k] and a minimum value (min{Histogram_Y[k]}) is calculated. When the difference is greater than a critical value T1, the local area is regarded as an interesting area. A value “1” is inputted into Y(i,j). When the difference is less than a critical value T1, the local area is regarded as an uninteresting area. A value “o” is inputted into Y(i,j). In the present invention, although the critical value T1 is set as “4,” other proper values can be used within a scope of the present invention.
After the overall image is binary-coded, the binary-coded image is projected in a longitudinal direction and the interesting area is separated in a vertical direction from the image data projected in the longitudinal direction (S603 and S604).
In the process for projecting the binary-coded image in the longitudinal direction, the result value projected in the longitudinal direction as the mth line is stored in Vert(m), it can be expressed by the following equation 2.
When a value obtained by subtracting 20 pixels from the Vert[m] value is less than 20-pixel, it is set as “0.” When Vert[m−1] is identical to Vert[m+1], it is set as “0” only when a value that is not “0” in the longitudinal direction is above 2-pixel. When the interesting area is separated as described above, sum total and mean values of the widths in the vertical direction of the interesting area (S605).
In the process for separating the interesting area in the vertical direction, blanks are found and used as a boundary between the divided areas while scanning the values projected in the vertical direction. That is, when it is assumed that starting and ending points of the interesting area in the vertical direction are stored in ROI[m] in order, it can be described as follows.
First, the values 0-143 stored in Vert[m] are scanned in order. When an area having the Vert[m] value that is not “0” are recognized as the interesting area and a case where the Vert[m] value is not “0” starts, the location values m are consecutively mapped in odd number locations from Roi[I]. When the case where the Vert[m] is not “0” ends, the location values m are consecutively mapped in the odd number location from Roi[1]. Then, the size of the interesting area is determined according to the sum total and mean values of the widths in the vertical direction (S606).
In the process for calculating the sum total and mean values of the widths in the vertical direction, the sum total value is first calculated by adding widths of the area divided by boarders and the mean value is calculated by dividing the sum total value by the number of the areas. That is, the sum total value ROI—SUM and the mean value ROI_Mean can be expressed by the following equations 3 and 4.
In the process for determining the size of the interesting area according to the sum total and mean values of the widths in the vertical direction, the critical value by which the interesting area is divided into large and small areas is compared with the sum total value in the vertical direction.
In the equations 3 and 4, the ROI—SUM is a value used for the focus detecting unit and the ROI_Mean is a value used for the twist detecting unit. This will be described in more detail later.
The detecting unit extracts high frequency components from the image inputted from the image capturing unit (S701). Noise is eliminated from the high frequency components by filtering the high frequency component, thereby providing a pure high frequency component (S702). When the high frequency components are extracted from the inputted image, a bright component is extracted in advance from the inputted image and then the high frequency component is extracted.
In order to eliminated the noise, a critical value is preset. Some of the components, which are higher than the critical value, are determined as the noise. Some of the components, which are lower than the critical value, are determined as the pure high frequency components.
A method for extracting the high frequency components is based on the following determinants 5 and 6. The determinant 5 is a mask determinant and the determinant 6 represents the local image brightness value.
h1 h2 h3 h4 h5 h6 h7 h8 h9 (Determinant 5)
Y(0.0) Y(0.1) Y(0.2) Y(1.0) Y(1.1) Y(1.2) Y(2.0) Y(2.1) Y(2.2) (Determinant 6)
The high frequency components can be obtained by the following equation 5 based on the determinants 5 and 6.
high=h1×Y(0,0)+h2×Y(0,1)+h3×Y(0.2)+h4×Y(1,0)+h5×Y(1,1)+h6×Y(1,2)+h7×Y(2,0)+h8×Y(2,0)+h8×Y(2,1)+h9×Y(2,2) (Equation 5)
In the process for obtaining the pure high frequency components without the noise, when it is assumed that the critical value is T2 and the number of pixel of a value that is determined as the high frequency component with respect to the total number of pixels of the inputted image is high_count, the pure high frequency components are obtained according to the following description.
When the high absolute value calculated by the equation 5 is |high| and the condition |high|<T2 is satisfied at each pixel location while scanning the overall area of the inputted image, the high_count that is the number of pixel is increased by 1. In the present invention, the critical value T2 is set as 40. However, the critical value T2 may vary according to the type of the image.
In the process for calculating the focusing level value from the high frequency components according to the size of the interesting area, an critical value T3 by which the size of the interesting areas is classified into large and small cases. In addition, according to the number of the focusing level values, the focusing level value is calculating by allowing the high frequency component value to correspond to the focusing level value. That is, when the critical value is T3 and the focusing level is Focus_level, it can be expressed by
As described above, when the size of the interesting area is obtained by extracting the interesting area (S703) and the focusing level value is calculated from the high frequency components according to the size of the interesting area and displayed on the pre-view screen (S704), it becomes possible for the user to accurately adjust the focus.
That is, the focusing level value is calculated from the total sum value of the widths in the vertical direction.
As shown in
A angle level value (angle_level) is first calculated from the ROI_Mean with reference to the equation 4. It is determined that the ROI_Mean is greater than or equal to 4 and less than 16 (S901). When the ROI_mean is greater than or equal to 4 and less than 16, the twist angle value is set as 2 (S903). When the ROI_Mean is not greater than or equal to 4 and less than 16, it is determined if the ROI_mean is greater than or equal to 16 and less than 30 (S902). When the ROI_mean is greater than or equal to 16 and less than 30, the twist angle value is set as 1 (S904). When the ROI_mean is not greater than or equal to 16 and less than 30, the twist angle value is set as 0 (S905). That is, the mean value of the widths in the vertical direction according to the number of twist levels is the twist level value.
According to the present invention, since the focusing and twisting states of the photographed image is displayed on the pre-view screen, the user can adjust the focus and twist state to take the clearer photographing image.
Therefore, even when no focusing control unit is provided to the camera, the clearer image can be obtained by calculating the focusing and twisting level values, thereby making it possible to accurately recognize the characters written on the photographed image.
It will be apparent to those skilled in the art that various modifications and variations can be made in the present invention. Thus, it is intended that the present invention covers the modifications and variations of this invention provided they come within the scope of the appended claims and their equivalents.
Number | Date | Country | Kind |
---|---|---|---|
10-2004-0069320 | Aug 2004 | KR | national |
10-2004-0069843 | Sep 2004 | KR | national |