Claims
- 1. Method for removing unwanted information, lines or printed characters from documents prior to character recognition of written information, comprising the steps of:
- 1) segmentation of an image into image elements;
- searching each image element to determine if it comprises more than one image element by scanning a pixel array in a horizontal and a vertical direction, and identifying a common border between two parallel pixel runs, said common border having a length below a threshold value;
- cutting a connection between said two parallel runs at said common border to break an image element having said common border into several image elements;
- 2) extraction of feature information from each image element;
- 3) classification of each of the image elements;
- 4) removal of those image elements which are classified as unwanted information, lines and printed characters; and
- 5) processing remaining image elements for writing recognition.
- 2. Method as in claim 1, wherein those image elements that are below a required minimum size are discarded, in step 1.
- 3. Method as in claim 1, wherein said feature extraction from each image element is performed during the segmentation process.
- 4. Method as in claim 3, wherein neighborhood and local features are calculated, said neighborhood feature values describing the relationship between the single image element and its neighboring image elements, said local feature values describing properties of the image element itself.
- 5. Method as in claim 4, wherein as a neighborhood feature value the number of neighbored image elements in a specific direction is calculated in combination with counts of only those image elements having nearly the same size properties.
- 6. Method as in claim 4, wherein as local feature value there is calculated a density feature being the ratio between the number of foreground pixels and the number of background pixels in a rectangular area described by the maximum horizontal and vertical extensions of the image element.
- 7. Method as in claim 4, wherein each local feature value has a corresponding neighborhood feature value equivalent, said equivalent being calculated as the average of the local feature values from each image element inside a region given by a fixed radius, said calculated feature values being weighted by their specific distances.
- 8. Method as in claim 1, wherein in said classification step the feature values of each image element are fed into an artificial neural net, weighted internally, and an output is calculated giving a value indicative of the probability of whether the image element for that feature set does belong to a specific class.
- 9. Method as in claim 1, wherein in said classification step, calculating for each image element using an artificial neural network having multiple outputs, probability values for each image element class presented to said neural network during training of said neural network, and said probability values of the class membership of each image element is stored together with the image element for further processing, whereby recognized and stored classes are document parts.
- 10. Method as in claim 8, wherein said classification step is repeated until a stable result is achieved.
- 11. Method as in claim 8, wherein a feedback is incorporated by using a known probability value of a specific class membership for each image element as an additional feature value, by calculating the average value of the probability values for a specific class from each image element inside a region given by a fixed radius, these feature values also feeding into said neural network.
- 12. Method as in claim 8, wherein classified image elements are grouped together into clusters of corresponding image elements, said grouping being based on information regarding size, position or associated features values.
- 13. Method as in claim 1, wherein before removing unwanted image elements, those elements are checked for intersections with other image elements not to be removed.
- 14. Method as in claim 13, wherein a pair of intersecting image elements is replaced by a number of new image elements having no intersection, and the intersecting area itself is made part of one of the pair of original image elements.
Priority Claims (1)
Number |
Date |
Country |
Kind |
93110476 |
Jun 1993 |
EPX |
|
Parent Case Info
The application is a continuation, of application Ser. No. 08/263,326, filed Jun. 21, 1994 now abandoned.
US Referenced Citations (18)
Foreign Referenced Citations (1)
Number |
Date |
Country |
6001928 |
Jan 1985 |
EPX |
Continuations (1)
|
Number |
Date |
Country |
Parent |
263326 |
Jun 1994 |
|