The foregoing and other features of the present invention will become apparent to one skilled in the art to which the present invention relates upon consideration of the following description of the invention with reference to the accompanying drawings, wherein:
The present invention relates to systems and methods for the real-time recognition of postal indicia.
It will be appreciated that the characteristics of various postal indicia can vary significantly, and that different methods of analysis may be desirable for envelopes containing various types of indicia. For example, an algorithm utilized to identify a particular type of stamp and determine its value can be expected to differ significantly from an algorithm to determine the value of a metermark on metered envelope. Given the time constraints present in a mail handling system, applying these methods sequentially would be unacceptably inefficient.
It is equally problematic to apply the various analysis methodologies associated with the indicia types in parallel. At any given time, a CPU associated with a mail sorting system will be processing data associated with a number of envelopes. Conducting the analysis for multiple indicia would be an unnecessary use of processing resources at the expense of other classification tasks. Further, knowledge of the position of the postal indicia on an envelope provides an indication of the orientation and facing of the envelope. Reliable knowledge of the orientation and facing of the envelope allows for simplification of future analysis of the envelope image (e.g., optical character recognition of all or a portion of the address, postage verification, etc.). In addition, once the envelope is oriented and faced, it is canceled and sprayed with an identification tag. In order to process the mail appropriately, the cancellation and the id tag need to be placed in the correct location on the envelope.
To this end, the illustrated system 10 is designed to detect indicia within one or more regions of interest of an envelope image and identify a general category for the detected indicia in an extremely short period of time, generally on the order of a few milliseconds. During this time, the system scans each region of interest for candidate objects and classifies each candidate object into one of a plurality of indicia classes. It is necessary that indicia recognition system operate with great efficiency to retain time and processing resources for the downstream analysis of the envelope that the indicia recognition system 10 is intended to facilitate.
One or more images representing regions of interest are acquired for analysis at an image acquisition element 12. The image acquisition element 12 acquires at least one image of an envelope and attempts to isolate at least one region of interest from at least one predetermined location on the acquired at least one image. For example, in one implementation, respective lead and trail cameras on either side of a conveyer belt associated with the mail sorting system are used to take an image of each side of the envelope, such that a first image represents a front side of the envelope and second image represents a back side of the envelope. It will be appreciated that these images can comprise grayscale, color, and ultraviolet florescence images of various resolutions that can be binarized to produce one or more binarized images, in which each pixel is represented by a single bit as “dark” or “light”.
In an exemplary embodiment, one or more predetermined regions of interest are selected within the front and back images of the envelope to represent positions in which indicia are expected to appear. Generally speaking, the classes of postal indicia of interest for the system 10 are found in a specific corner of the front side envelope. Assuming that the envelope is maintained in a vertical position (i.e., longest edge vertical), but that the orientation of the envelope is otherwise unknown, the corner of the envelope traditionally associated with the postal indicia classes of interest can only appear in one of four positions. Specifically, the indicia will be in the upper left corner of the front of the envelope in a “normal” orientation, but the envelope can rotated one hundred eighty degrees, flipped to where the back of the envelope faces the lead camera, or both flipped to the back side and rotated one hundred eighty degrees.
To take advantage of this, the regions of interest can include the upper left corner and the lower right corner of the output of the lead camera, and the upper right corner and the lower left corner of the output of the trail camera. Accordingly, four regions of interest, representing the most likely locations for postal indicia, can be isolated from the first and second images for further analysis.
The isolated regions of interest are provided to a candidate locator 14, that locates possible indicia within each region of interest. Specifically, the candidate locator 14 can scan each region of interest for dense regions of dark pixels that may be indicative of the presence of postal indicia within the region.
In one implementation, the image can be binarized, such that every pixel is represented by a single bit as a “white” pixel or a “dark” pixel. For each region of interest, a horizontal projection can be performed to obtain a count of the number of dark pixels in each row of pixels comprising the region of interest. Once the total count for each row of pixels has been determined, the count for each row of pixels is compared to a horizontal count threshold value. The first row of pixels having a count greater than the threshold marks the beginning of a dense region.
Once the beginning of a dense region has been found, the candidate locator 14 continues to iterate through the rows comparing the total count to a horizontal count threshold value. Each time the count for a row is below the horizontal count threshold value, the value of a whitespace counter for the dense region is incremented. This continues until the whitespace counter value exceeds a tolerance threshold. At this point, the dense region is considered to end, and a length is calculated for the dense region as the number of rows between the beginning row of the dense region and the final row of the region less the value of the whitespace counter. If the calculated length exceeds a region length threshold, the location of the dense region is stored in memory. Either way, the horizontal projection continues until all of the dark pixel counts for the plurality of rows comprising the region have been compared to the threshold.
If no dense regions are discovered in the horizontal projection, the candidate locator 14 discards the region of interest without further analysis. For each potential dense region that is found from the horizontal projection analysis, a vertical projection is performed over each potential dense region area. Accordingly, the number of dark pixels in each column of the dense region is determined and compared to a vertical count threshold. The first column of pixels having a count greater than the vertical count threshold marks the beginning of a candidate object.
Once the beginning of a candidate object has been found, the candidate locator 14 continues to iterate through the columns comparing the total count to the vertical count threshold value. Each time the count for a column is below the vertical count threshold value, the value of a whitespace counter for the candidate object is incremented. This continues until the whitespace counter value exceeds a tolerance threshold. At this point, the candidate object is considered to end, and a width is calculated for the candidate object as the number of columns between the beginning column of the candidate object and the final column of the candidate object less the value of the whitespace counter.
If the calculated width exceeds an object width threshold, the location of the candidate object, including the vertical and horizontal extent of the object, is stored in memory. Either way, the vertical projection continues until all of the dark pixel counts for the plurality of columns comprising the dense region have been compared to the threshold. If no candidate objects are found, the dense region is discarded. Once all of the regions of interest have been analyzed via the horizontal and vertical projection, any located candidate objects are provided to a feature extractor 16.
The feature extractor 16 extracts numerical features from the identified candidate objects. The feature extractor 16 derives a vector of numerical measurements, referred to as a feature vector, from a given candidate object. Thus, the feature vector represents the candidate object in a modified format that attempts to represent as many aspects of the image portion associated with the candidate object as is possible.
The features used to generate the feature vector are selected both for their effectiveness in distinguishing among a plurality of possible postal indicia categories and for their ability to be quickly extracted from the image sample. For example, in an exemplary implementation, the candidate object can be superimposed on a white space of standard size, and the white space can be divided into one hundred forty-four regions. A pixel count can be calculated for each of the regions and divided by an area of the region (in pixels) to obtain a pixel density for the region. The pixel densities for the one hundred forty-four regions can each be utilized as a numerical feature value within the feature vector.
A second feature set can be derived by counting horizontal pixel runs within the candidate object. The feature extractor 16 starts at a first row of the candidate object and begins counting consecutive dark pixels each time a run of consecutive dark pixels are encountered. The length of each run of pixels is recorded as part of a histogram. The histogram comprises one hundred bins, representing counts of all pixel run lengths between one and one hundred. To limit the size of the histogram, pixel runs exceeding one hundred pixels are counted as one hundred pixel runs in the histogram. Once the length of all the pixel runs have been counted, the histogram, comprising the counts of all pixel run lengths from one to one hundred can be utilized as one hundred more entries in a feature vector. The width and the height of the candidate object can also be used as features.
The extracted feature vector is then provided to a classification system 18. The classification system 18 classifies each candidate object into one of a plurality of output classes representing different types of postal indicia. For example, the plurality of output classes can include classes representing metermarks, business reply mail markings, information based indicia (e.g., bar codes), and stamps, as well as a generic other class. The classification system 16 can include one or more classifiers of various types including statistical classifiers, neural network classifiers, and self-organizing maps that have been designed or adapted to distinguish among the various postal indicia according to the features associated with the feature extractor 16.
For example, the classification system 18 can include an artificial neural network trained to distinguish among various classes of postal indicia according to the extracted feature. A neural network is composed of a large number of highly interconnected processing elements that have weighted connections. It will be appreciated that these processing elements can be implemented in hardware or simulated in software. The organization and weights of the connections determine the output of the network, and are optimized via a training process to reduce error and generate the best output classification.
The values comprising the feature vector are provided to the inputs of the neural network, and a set of output values corresponding to the plurality of output classes is produced at the neural network output. Each of the set of output values represent the likelihood that the candidate image falls within the output class associated with the output value. The output class having the optimal output value is selected. What constitutes an optimal value will depend on the design of the neural network. In one example, the output class having the largest output value is selected.
The output of the classification system 18 can then be provided to one or more downstream analysis systems 20 that provide further analysis of the envelope image, or alternate representations thereof, according to the output of the classification system 18 and at least one additional input representing the envelope. For example, the downstream analysis systems 20 can include an orientation element that determines an associated orientation of the envelope at least in part from the determined type and position of the indicia on the envelope. The downstream analysis systems 20 can also include one or more specialized classifiers, each of which identify specific postal indicia within one of the broader category of postal indicia recognized by the system 10.
In the illustrated example, an input layer 52 comprises five input nodes, A-E. A node, or neuron, is a processing unit of a neural network. A node may receive multiple inputs from prior layers which it processes according to an internal formula. The output of this processing may be provided to multiple other nodes in subsequent layers. The functioning of nodes within a neural network is designed to mimic the function of neurons within a human brain.
Each of the five input nodes A-E receives input signals with values relating to features of an input pattern. Preferably, a large number of input nodes will be used, receiving signal values derived from a variety of pattern features. Each input node sends a signal to each of three intermediate nodes F-H in a hidden layer 54. The value represented by each signal will be based upon the value of the signal received at the input node. It will be appreciated, of course, that in practice, a classification neural network can have a number of hidden layers, depending on the nature of the classification task.
Each connection between nodes of different layers is characterized by an individual weight. These weights are established during the training of the neural network. The value of the signal provided to the hidden layer 54 by the input nodes A-E is derived by multiplying the value of the original input signal at the input node by the weight of the connection between the input node and the intermediate node (e.g., G). Thus, each intermediate node F-H receives a signal from each of the input nodes A-E, but due to the individualized weight of each connection, each intermediate node receives a signal of different value from each input node. For example, assume that the input signal at node A is of a value of 5 and the weights of the connections between node A and nodes F-H are 0.6, 0.2, and 0.4 respectively. The signals passed from node A to the intermediate nodes F-H will have values of 3, 1, and 2.
Each intermediate node F-H sums the weighted input signals it receives. This input sum may include a constant bias input at each node. The sum of the inputs is provided into a transfer function within the node to compute an output. A number of transfer functions can be used within a neural network of this type. By way of example, a threshold function may be used, where the node outputs a constant value when the summed inputs exceed a predetermined threshold. Alternatively, a linear or sigmoidal function may be used, passing the summed input signals or a sigmoidal transform of the value of the input sum to the nodes of the next layer.
Regardless of the transfer function used, the intermediate nodes F-H pass a signal with the computed output value to each of the nodes I-M of the output layer 56. An individual intermediate node (i.e. G) will send the same output signal to each of the output nodes I-M, but like the input values described above, the output signal value will be weighted differently at each individual connection. The weighted output signals from the intermediate nodes are summed to produce an output signal. Again, this sum may include a constant bias input.
Each output node represents an output class of the classifier. The value of the output signal produced at each output node is intended to represent the probability that a given input sample belongs to the associated class. In the exemplary system, the class with the highest associated probability is selected, so long as the probability exceeds a predetermined threshold value. The value represented by the output signal is retained as a confidence value of the classification.
In view of the foregoing structural and functional features described above, methodology in accordance with various aspects of the present invention will be better appreciated with reference to
At step 150, numerical feature values are extracted from the one or more candidate objects. For each candidate object, a feature vector derived from a plurality of associated quantifiable characteristics is produced. At step 170, the candidate objects are classified according to the extracted feature values. In an exemplary implementation, the output classes can include at least a class representing stamps, a class representing information based indicia, a class represent metermarks, a class representing business replay mail markings, and a generic other class.
At step 128, it is determined if any dense regions have been located within the region of interest. If not (N), the region of interest is rejected at step 130, and the methodology terminates. If any dense regions have been located (Y), the methodology advances to step 132, where a vertical projection is performed over the columns comprising each dense region to generate a total number of dark pixels for each column. At step 134, the values associated with the plurality of columns are compared, in sequence, with a threshold value to find columns having a large number of dark pixels. At step 136, candidate objects are identified wherever a series of spatially proximate columns having a number of dark pixels greater than the threshold is located.
At step 138, it is determined if any candidate objects have been located within a given dense region. If so (Y), the methodology advances to step 140, where the location of the candidate object is stored in memory. If no candidate objects are located within a given dense region (N), the dense region is rejected at step 142.
At step 156, a pixel density is calculated for each region. The number of dark pixels in each region is determined and divided by the total number of pixels comprising the region (i.e., the area of the region) to provide a density value for each region. At step 158, the lengths of horizontal runs of consecutive pixels are counted within the candidate object. In an exemplary implementation, the total counts of the number of horizontal runs having a given length are recorded as a one hundred element histogram, with a first element representing the number of single pixel runs, a second element representing the number of two pixel runs, and so on until a hundredth element representing the number of consecutive horizontal pixels runs having a length greater than or equal to one hundred.
At step 160, a feature vector representing the candidate object is complied from the density values, the histogram, and a measured height and width for the candidate object. Accordingly, in an exemplary implementation, the feature vector includes the one hundred forty-four density values, the one hundred element histogram, a length value, and a width value for a total of two hundred forty-six numerical feature values comprising the feature vector. This feature vector can then be provided as an input to an associated classifier.
A singulation stage 210 includes a feeder pickoff 212 and a fine cull 214. The feeder pickoff 212 would generally follow a mail stacker (not shown) and would attempt to feed one mailpiece at a time from the mail stacker to the fine cull 214, with a consistent gap between mailpieces. The fine cull 214 would remove mailpieces that were too tall, too long, or perhaps too stiff. When mailpieces left the fine cull 214, they would be in fed vertically (e.g., longest edge parallel to the direction of motion) to assume one of four possible orientations.
The the image lifting station 220 can comprise a pair of camera assemblies 222 and 224. As shown, the image lifting stage 220 is located between the singulation stage 210 and the facing inversion stage 230 of the system 200, but image lifting stage 220 may be incorporated into system 200 in any suitable location.
In operation, each of the camera assemblies 222 and 224 acquires both a low-resolution UV image and a high-resolution grayscale image of a respective one of the two faces of each passing mailpiece. Because the UV images are of the entire face of the mailpiece, rather than just the lower one inch edge, there is no need to invert the mailpiece when making a facing determination.
Each of the camera assemblies illustrated in
Further, it should be appreciated that UV and grayscale are representative of the types of image information that may be acquired rather than a limitation on the invention. For example, a color image may be acquired. Consequently, any suitable imaging components may be included in the system 200.
As shown, the system 200 may further include an item presence detector 225, a belt encoder 226, an image server 227, and a machine control computer 228. The item presence detector 225 (exemplary implementations of an item presence detector can include a “photo eye” or a “light barrier”) may be located, for example, five inches upstream of the trail camera assembly 222, to indicate when a mailpiece is approaching. The belt encoder 226 may output pulses (or “ticks”) at a rate determined by the travel speed of the belt. For example, the belt encoder 226 may output two hundred and fifty six pulses per inch of belt travel. The combination of the item presence detector 225 and belt encoder 226 thus enables a relatively precise determination of the location of each passing mailpiece at any given time. Such location and timing information may be used, for example, to control the strobing of light sources in the camera assemblies 222 and 224 to ensure optimal performance independent of variations in belt speed.
Image information acquired with the camera assemblies 222 and 224 or other imaging components may be processed for control of the mail sorting system or for use in routing mailpieces passing through the system 200. Processing may be performed in any suitable way with one or more processors. In the illustrated embodiment, processing is performed by image server 227. It will be appreciated that, in one implementation, an indicia detection and recognition system in accordance with an aspect of the present invention, could be implemented as a software program in the image server 227.
The image server 227 may receive image data from the camera assemblies 222 and 224, and process and analyze such data to extract certain information about the orientation of and various markings on each mailpiece. In some embodiments, for example, images may be analyzed using one or more neural network classifiers, various pattern analysis algorithms, rule based logic, or a combination thereof. Either or both of the grayscale images and the UV images may be so processed and analyzed, and the results of such analysis may be used by other components in the system 200, or perhaps by components outside the system, for sorting or any other purpose.
In the embodiment shown, information obtained from processing images is used for control of components in the system 200 by providing that information to a separate processor that controls the system. The information obtained from the images, however, may additionally or alternatively be used in any other suitable way for any of a number of other purposes. In the pictured embodiment, control for the system 200 is provided by a machine control computer 228. Though not expressly shown, the machine control computer 228 may be connected to any or all of the components in the system 200 that may output status information or receive control inputs. The machine control computer 228 may, for example, access information extracted by the image server 227, as well as information from other components in the system, and use such information to control the various system components based thereupon.
In the example shown, the camera assembly 222 and 224 is called the “lead” assembly because it is positioned so that, for mailpieces in an upright orientation, the indicia (in the upper right hand corner) is on the leading edge of the mailpiece with respect to its direction of travel. Likewise, the camera assembly 224 is called the “trail” assembly because it is positioned so that, for mailpieces in an upright orientation, the indicia is on the trailing edge of the mailpiece with respect to its direction of travel. Upright mailpieces themselves are also conventionally labeled as either “lead” or “trail” depending on whether their indicia is on the leading or trailing edge with respect to the direction of travel.
Following the last scan line of the lead camera assembly 222, the image server 227 may determine an orientation of “flip” or “no-flip” for the facing inverter 230. In particular, the inverter 230 is controlled so that that each mailpiece has its top edge down when it reaches the cancellation stage 235, thus enabling one of the cancellers 237 and 239 to spray a cancellation mark on any indicia properly affixed to a mailpiece by spraying only the bottom edge of the path (top edge of the mailpiece). The image server 227 may also make a facing decision that determines which canceller (lead 237 or trail 239) should be used to spray the cancellation mark. Other information recognized by the image server 227, such as information based indicia (IBI), may also be used, for example, to disable cancellation of IBI postage since IBI would otherwise be illegible downstream.
After cancellation, all mailpieces may be inverted by the inverter 242, thus placing each mailpiece in its upright orientation. Immediately thereafter, an ID tag may be sprayed at the ID spraying stage 244 using one of the ID tag sprayers 245 and 246 that is selected based on the facing decision made by the image server 227. In some embodiments, all mailpieces with a known orientation may be sprayed with an ID tag. In other embodiments, ID tag spraying may be limited to only those mailpieces without an existing ID tag (forward, return, foreign).
Following application of ID tags, the mailpieces may ride on extended belts for drying before being placed in output bins or otherwise routed for further processing at the stacking stage 248. Except for rejects, the output bins can be placed in pairs to separate lead mailpieces from trail mailpieces. It is desirable for the mailpieces in each output bin to face identically. The operator may thus rotate trays properly so as to orient lead and trail mailpieces the same way. The mail may be separated into four broad categories: (1) facing identification marks (FIM) used with a postal numeric encoding technique, (2) outgoing (destination is a different sectional center facility (SCF)), (3) local (destination is within this SCF), and (4) reject (detected double feeds, not possible to sort into other categories). The decision of outgoing vs. local, for example, may be based on the image analysis performed by the image server 227.
One or more images can be provided to the orientation determination element 260 as part of the first processing stage. A plurality of neural network classifiers 262, 264, and 266 within the orientation determination element 260 are operative to analyze various aspects of the input images to determine an orientation and facing of the envelope. A first neural network classifier 262 determines an appropriate orientation for the envelope according to the distribution of dark pixels across each side of the envelope. A second neural network classifier 264 can comprise an indicia detection and recognition system in accordance with an aspect of the present invention. A third neural network classifier 266 can review information related to four different corners (two front and two back) to determine the presence and type, if present, of postal indicia within these regions.
The outputs of all three neural network classifiers 262, 264, and 266 are provided to an orientation arbitrator 268. The orientation arbitrator 268 determines an associated orientation and facing for the envelope according to the neural network outputs. In the illustrated implementation, the orientation arbitrator 268 is a neural network classifier that receives the outputs of the three neural network classifiers 262, 264, and 266 and classifies the envelope into one of four possible orientations.
Once an orientation for the envelope has been determined, a second stage of processing can begin. During the second stage of processing, one or more primary image analysis elements 270, various secondary analysis elements 280, and a ranking element 290 can initiate to provide more detailed information as to the contents of the envelope. In accordance with an aspect of the present invention, the second stage is operative to run in approximately two thousand two hundred milliseconds. It will be appreciated that during this time, processor resources can be shared among a plurality of envelopes.
The primary image analysis elements 270 are operative to determine one or more of indicia type, indicia value, and routing information for the envelope. Accordingly, a given primary image analysis element 270 can include a plurality segmentation routines and pattern recognition classifiers that are operative to recognize postal indicia, extract value information, isolate address data, and read the characters comprising at least a portion of the address. It will be appreciated that multiple primary analysis elements 270 can analyze the envelope content, with the results of the multiple analyses being arbitrated at the ranking element 290.
The secondary analysis elements 280 can include a plurality of classification algorithms that review specific aspects of the envelope. In the illustrated implementation, the plurality of classification algorithms can include a stamp recognition classifier 282 that identifies stamps on an envelope via template matching, a metermark recognition system 283, a metermark value recognition system 284 that locates and reads value information within metermarks, one or more classifiers 285 that analyze an ultraviolet florescence image, and a classifier 286 that identifies and reads information based indicia (ISI).
It will be appreciated that the secondary analysis elements 280 can be active or inactive for a given envelope according to the results at the second and third neural networks 264 and 266. For example, if it is determined with high confidence that the envelope contains only a stamp, the metermark recognition element 283, metermark value recognition element 284, and the IBI based recognition element 286 can remain inactive to conserve processor resources.
The outputs of the orientation determination element 260, the primary image analysis elements 270, and the secondary analysis elements 280 are provided to a ranking element 290 that determines a final output for the system 250. In the illustrated implementation, the ranking element 290 is a rule based arbitrator that determines at least the type, location, value, and identity of any indicia on the envelope according to a set of predetermined logical rules. These rules can be based on known error rates for the various analysis elements 260, 270, and 280. The output of the ranking element 290 can be used for decision making throughout the mail handling system.
The computer system 300 includes a processor 302 and a system memory 304. Dual microprocessors and other multi-processor architectures can also be utilized as the processor 302. The processor 302 and system memory 304 can be coupled by any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, and a local bus using any of a variety of bus architectures. The system memory 304 includes read only memory (ROM) 308 and random access memory (RAM) 310. A basic input/output system (BIOS) can reside in the ROM 308, generally containing the basic routines that help to transfer information between elements within the computer system 300, such as a reset or power-up.
The computer system 300 can include one or more types of long-term data storage 314, including a hard disk drive, a magnetic disk drive, (e.g., to read from or write to a removable disk), and an optical disk drive, (e.g., for reading a CD-ROM or DVD disk or to read from or write to other optical media). The long-term data storage can be connected to the processor 302 by a drive interface 316. The long-term storage components 314 provide nonvolatile storage of data, data structures, and computer-executable instructions for the computer system 300. A number of program modules may also be stored in one or more of the drives as well as in the RAM 310, including an operating system, one or more application programs, other program modules, and program data.
A user may enter commands and information into the computer system 300 through one or more input devices 320, such as a keyboard or a pointing device (e.g., a mouse). These and other input devices are often connected to the processor 302 through a device interface 322. For example, the input devices can be connected to the system bus 306 by one or more a parallel port, a serial port or a universal serial bus (USB). One or more output device(s) 324, such as a visual display device or printer, can also be connected to the processor 302 via the device interface 322.
The computer system 300 may operate in a networked environment using logical connections (e.g., a local area network (LAN) or wide area network (WAN) to one or more remote computers 330. The remote computer 330 may be a workstation, a computer system, a router, a peer device or other common network node, and typically includes many or all of the elements described relative to the computer system 300. The computer system 300 can communicate with the remote computers 330 via a network interface 332, such as a wired or wireless network interface card or modem. In a networked environment, application programs and program data depicted relative to the computer system 300, or portions thereof, may be stored in memory associated with the remote computers 330.
It will be understood that the above description of the present invention is susceptible to various modifications, changes and adaptations, and the same are intended to be comprehended within the meaning and range of equivalents of the appended claims. The presently disclosed embodiments are considered in all respects to be illustrative, and not restrictive. The scope of the invention is indicated by the appended claims, rather than the foregoing description, and all changes that come within the meaning and range of equivalence thereof are intended to be embraced therein.