The present invention relates generally to an accelerated method and apparatus for recognizing alphanumeric characters, including those used to read mailpieces. In particular, the present invention relates to an accelerated method and apparatus for recognizing alphanumeric characters for sensing by optical character readers (“OCRs”).
Alphanumeric recognition systems are used in a variety of applications to read information with a machine rather than a pair of human eyes, thereby automating the process and increasing the efficiency of the throughput. Examples of machine-readable information include names or addresses written on mailpieces such as envelopes, information written on forms, such as tax forms and the census, product codes or part numbers inscribed on objects, and automobile VIN numbers.
OCR systems typically are used to sense, or “read,” information under severe time constraints, such as when information must be read, interpreted, validated, updated, and/or processed within a short period of time. Working against this time constraint is the interpretation process, performed by character recognition schemes that resolve ambiguous patterns sensed by the OCR. For example, the character “D” could be read as an “O” or a “C,” while a “G” could be read as a “C” or an “E.” As a result, hundreds of permutations or combinations of characters may be cross-referenced against a known word or number using a character recognition scheme before the correct permutation is recognized.
A known character recognition scheme used with OCRs In the United States Postal Service (“USPS”) to read names and addresses on mailpieces, such as letters, magazines, and parcels, is the DiGram scheme (Di=twice, Gram=letter). Using the DiGram scheme, the OCR interprets each character in the mail name (or mail address) as a letter (or number) and two alternate possibilities, creates a matrix of the possibilities, and generates permutations from the matrix. For a five-letter word, the matrix would be 3×5, and the number of permutations would be 35, or 243, possible combinations of letters. The DiGram scheme determines the “valid” name or address, using a data recognition logic algorithm, by comparing the permutations to a known name or address with the same or similar character string.
For example, the DiGram scheme has been used to identify mailpieces requiring a change of address (“CoA”). About 500 to 700 of the approximately 40,000 pieces of mail per hour that pass through an OCR system require the application of a new address based on information from CoA forms. In the context of a mailpiece undergoing a match with a CoA form submitted by a postal patron, the mailpiece is read by the OCR, the mail name is split from the mail address, and the characters of the mail name are evaluated and matched against a known string of characters. This process must be completed within the time it takes to send the mail address to a ZIP+4 address matching engine for “standardization.” Such standardization involves confirming that the city, state, and zip code in the mail address correspond to one another by checking them against a ZIP+4 database containing all corresponding city, state, and zip code information.
After the OCR reads and separates the mail name and mail address, it creates permutations of the mail name based on the DiGram scheme. A CoA form, which is the “known” data, contains the postal patron's name, old address, and new address. The permutations of the mail name are checked against the known CoA name or names associated with the mail address.
The DiGram scheme determines the correct mail name through a data recognition logic algorithm by comparing the permutations, which are generated from the three possible characters for each letter in the mail name, to the known CoA names for that particular mail address. If there is a match between the known name, a mail name permutation, and-the old mail address, the USPS can assume that the mail belongs to a particular postal patron and forward it to the new address as requested in the CoA. The new address and corresponding barcode are then “sprayed,” i.e., printed, on the mailpiece. If there is no match, the USPS can assume that the mail address written on the mailpiece by the sender is correct, and a barcode corresponding to the mail address written on the mailpiece is sprayed on the mailpiece.
The DiGram scheme requires a comparison of the known name against hundreds of permutations created by the DiGram scheme to recognize the correct name. Alphanumeric recognition could be more efficient if the method were accelerated by reducing the number of comparisons made.
In the alphanumeric character recognition method, a set of unknown characters is received from an imaging system and a set of known characters is received from a memory device. A first set of pairs of characters is then created from the set of unknown characters and a second set of pairs of characters is created from the set of known characters. A matrix is generated having a plurality of cells, wherein each cell of the plurality of cells contains a pair of characters. The plurality of cells of the matrix are interrogated with the first set of pairs of characters to generate a first result, and the plurality of cells of the matrix are interrogated with the second set of pairs of characters to generate a second result. The first result is compared with the second result. A first predetermined action is taken if the first result matches the second result, while a second predetermined action is taken if the first result does not match the second result.
The apparatus for recognizing alphanumeric characters comprises a scanner for imaging a set of unknown characters, a memory device comprising a database including a set of known characters, and a processor. The processor generates a first set of pairs of characters from the set of unknown characters and a second set of pairs of characters from the set of known characters. The processor also generates a matrix having a plurality of cells, wherein each cell of the plurality of cells contains a pair of characters. The processor interrogates the plurality of cells of the matrix with the first set of pairs of characters to generate a first result and interrogates the plurality of cells of the matrix with the second set of pairs of characters to generate a second result. The processor then compares the first result with the second result, and takes a first predetermined action if the first result matches the second result and a second predetermined action if the first result does not match the second result.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate the various features and aspects of the method and apparatus for alphanumeric recognition and, together with the description, serve to explain its advantages and principles.
In the drawings:
a and 3b are respective exemplary alphabetic and alphanumeric matrices used in the alphanumeric recognition method;
a and 4b illustrate exemplary comparisons of a matrix interrogation by unknown characters and matrix interrogations by known characters, showing where a match is found between the unknown mail matrix and known CoA matrix in
c illustrates an exemplary result generated by a matrix interrogation of the known CoA name “SMITH”;
d illustrates a comparison of the results from comparing
e illustrates a comparison of
Reference will now be made in detail to an implementation of the present invention as illustrated in the accompanying drawings. Wherever possible, the same reference numbers will be used throughout the drawings to refer to the same or like parts.
Although the method is described with respect to mail sorting process 100, the present method of alphanumeric recognition can be implemented in any process in which alphanumeric characters are recognized, and is not limited to postal applications. Provided that each character is recognized as one of a limited number of possibilities and there is a known set of characters with which to compare unknown characters, the possibilities for the unknown characters may be compared against the known set of characters and the unknown characters may be determined.
As shown in
Following imaging stage 102, the data assumed to represent a mail name enters interpretation stage 104, in which the OCR interprets each letter of the mail name as either a specific character or two alternate possible characters. A method utilizing more than three possible characters may be implemented, but as the number of possibilities increase, the number of iterations of pairs of characters used in interrogating a matrix also will increase, as will the time it takes to determine whether a match exists and to verify the correct mail name. In the described embodiment, it has been determined that identifying three possible characters for each letter of a mail name is suitable for the OCRs used in mail processing system 100, as the accuracy of the OCR identification of three possibilities is adequate. Any benefit in the described application of using more than three possibilities is outweighed by the time lost in performing additional matrix interrogations. In other applications, however, a greater or lesser number of possible characters might be used.
In stage 106, the possibilities for each character in the mail name are sent to Name Recognition System 300. Simultaneously, in stage 108, data assumed to represent the mail address from imaging stage 102 is sent to a ZIP+4 engine. The ZIP+4 engine takes a mail address and matches it against a ZIP+4 database. This ZIP database contains ranges of addresses on streets within a city, state, and zip code. The ZIP database attempts to standardize the elements of an address and ensure that that the address exists in that city, state, and five digit zip code. The database also provides a “plus 4” add-on zip code, a carrier identification number, and a two-digit delivery point code that are useful for the subsequent printing, coding, and sorting stages (114).
Mail address information from the ZIP+4 engine is processed by CoA database 200 to determine if the mail address matches CoA address information stored in CoA database 200 (stage 202). Stored information may include first and last names, old and new addresses, and dates of moves provided on CoA forms and entered into CoA database 200. If the mail address matches address information from a CoA form in CoA database 200 as determined at stage 202, then the name or names in CoA database 200 associated with the mail address are sent to Name Recognition System 300, as shown in stage 204. If there is no CoA address in CoA database 200 that matches the mail address in stage 202, presumably the mailpiece is addressed correctly and the mailpiece is sent downstream for printing, coding, sorting, and further processing (stage 114).
If there is a match in stage 202, Name Recognition System 300 uses the mail name information from stage 106 and the CoA name information from stage 204 to create pairs of characters and interrogate an alphanumeric matrix (stage 303), a process described in more detail with respect to
An example, for demonstrative purposes only, is useful for describing the alphanumeric recognition method utilized by Name Recognition System 300. Assume postal patrons John Smith and Jane Doe have submitted CoA forms and the information has been entered into the CoA database. John Smith moved from an address in Beverly Hills, Calif. to an address in Long Beach, Calif., while Jane Doe moved from the same address in Beverly Hills, Calif. to an address in San Diego, Calif.
Following the method shown in
The ZIP+4 engine standardizes the Beverly Hills address by ensuring that the city, state, and zip code correspond to one another, and sends the Beverly Hills address to CoA database 200. CoA database 200 contains CoA information from forms submitted by postal patrons such as John Smith. In stage 202, the database is queried for the Beverly Hills address. If a match is found, the name or names in the CoA database 200 corresponding to the Beverly Hills address, in the present example SMITH and DOE, are sent in stage 204 to Name Recognition System 300.
Utilizing the alphanumeric recognition method, Name Recognition System 300 accelerates the time required to determine if a mailpiece requires a change of address to be applied. As noted, it is contemplated that the alphanumeric recognition method may be used in any process in which alphanumeric characters are recognized. Provided that each character is recognized as one of a limited number of possibilities and there is a known set of characters with which to compare the unknown characters, the method disclosed may be used to compare these possibilities against the known set of characters, and the unknown characters may be determined.
Continuing with the SMITH example for demonstrative purposes, the alphanumeric recognition method is described in detail with respect to
After sensing each of the letters in “SMITH” as a “primary” letter and two alternate possibilities in stage 104, the mail name information is sent to Name Recognition System 300, shown in
From the possible characters, Name Recognition System 300 employs algorithms to create pairs of characters from the mail name (stage 106) and interrogate a matrix, shown in
At this point in the alphanumeric recognition method, the stage of matrix interrogation begins (stage 303 in
As shown using the SMITH example and consistent with the present alphanumeric character recognition method,
a is the result generated by a matrix interrogation of the unknown character pairs from the mail name (stage 302). Cells of the 27×27 matrix that correspond to each of the 44 character pairs #C, #S, #B, CW, CV, CM, SW, SV, SM, BW, BV, BM, WL, WJ, WI, VL, VJ, VI, ML, MJ, MI, etc. from the unknown mail name are checked off on one matrix, indicated in
b is the result generated by a matrix interrogation of the known CoA name “SMITH.” The cells of the 27×27 matrix that correspond to each of the character pairs from the known CoA name SMITH (#S, SM, MI, IT, TH, H#) are checked off this matrix, indicated in
c is the result generated by a matrix interrogation of the second, known CoA name “DOE.” Recall that John Smith and Jane Doe shared a Main Street, Beverly Hills address before each moved to a separate address. The cells of the 27×27 matrix that correspond to each of the character pairs from the known CoA name DOE (#D, DO, OE, E#) are checked off this matrix, indicated in
d is a comparison of the results from the matrix interrogation of character pairs from the unknown mail name possibilities against the results from the matrix interrogation of character pairs from the known name “SMITH.” In other words,
Similarly,
A match occurs when some of the cells “checked off” during the matrix interrogation by the unknown mail name character pairs (stage 302) are the same as the cells “checked off” during the matrix interrogation by the CoA name character pairs (stage 304). Matches are indicated in
If there is a match when the “unknown” matrix is compared against the “known” matrices using the alphanumeric character recognition method in stage 308, the new (Long Beach) address for the postal customer (John Smith) from CoA database 200 is applied, as discussed previously with respect to stage 112 in
At stage 502, a set of unknown characters, such as the primary characters in a mail name written on a mailpiece and the two possible alternate characters for each primary character, is received from an imaging or scanning system, such as an OCR. At stage 504, a set of known characters, such as name submitted on a CoA form, is received from a memory device, such as a hard drive, read only memory (“ROM”) or other medium storing records, such as the CoA database.
A first set of pairs of characters is then created in stage 506, as previously described with respect to
A matrix is generated in stage 510 that has a plurality of cells. The matrix may be alphabetic (
At stage 512, the plurality of cells of the matrix are interrogated with the first set of pairs of characters to generate a first result (
The first result is compared with the second result at stage 516.
A first predetermined action is taken if the first result matches the second result, as shown at stage 518. For example, if the matrices for the known CoA name and the unknown mail name match, a new address is applied to the mailpiece and it is ultimately sorted and delivered to the new address. Meanwhile, a second predetermined action is taken at stage 520 if the first result does not match the second result. For example, if the matrices for known CoA names and an unknown mail name do not match, a new address is not sprayed on the mailpiece because the USPS assumes there was no CoA form submitted, the mailpiece is sent for further sorting, and ultimately it is delivered to the address written on its face.
ZIP+4 engine 606 takes a mail address and matches it against a ZIP+4 database. As previously discussed, the ZIP+4 database attempts to standardize the elements of an address to ensure that the particular address exists in the particular city, state, and five digit zip code. The ZIP+4 database also provides a “plus 4” add-on zip code, a carrier identification number, and a two-digit delivery point code that are needed for subsequent sorting stages.
CoA database 200 processes mail address information from ZIP+4 engine 606 to determine if the mail address matches COA information stored in COA database 200. As discussed with respect to
If there is a match, processor 610 generates one set of pairs of characters from the mail name imaged by OCR 604 and generates another set of pairs of characters from the CoA name or names sent by CoA database 200. Processor 610 also generates an alphanumeric matrix of character pairs, as previously described with respect to
If the result of the mail matrix interrogation matches one of the results from the CoA matrix interrogation, then a printer 612 sprays on a new address on mailpiece 602. The new address, which is obtained from CoA database 200, is the new address corresponding to the CoA name that matches the mail name and the old mail address. Once the new address is applied, mailpiece 602 continues downstream to subsequent components of the sorting system for further sorting to its final destination, the new address.
If result of the mail matrix interrogation does not match one of the results from the CoA matrix interrogation, printer 612 does not spray on a new address, and mailpiece 602 continues downstream to subsequent components of the sorting system for further sorting to its final destination, the old mail address that the sender wrote on the face of mailpiece 602.
Additional exemplary predetermined actions involve other types of imaged information. For example, imaging systems, such as cameras, are used at traffic signals and toll booths to photograph license plates of a vehicles involved in traffic violations. The image of the license plate is read by a scanner. Using character creation and matrix interrogation, the license plate number is matched against a database of known license plates, names of vehicle owners, and their addresses. If a match exists, a citation could be sent to the vehicle owner. Similarly, the method of alphanumeric recognition can be used for any imaged information that can be compared against a known standard, including but not limited to addresses, part numbers, product and bar codes, and VIN numbers.
Other embodiments will be apparent to those skilled in the art from consideration of the specification and practice of the method disclosed herein. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the method being indicated by the following claims.
The present application is related to and claims the priority of U.S. Provisional Application No. 60/255,435, filed Dec. 15, 2000, in the name of Robert Snapp, and entitled Method for Name Recognition for Optical Character Readers, the entire contents of which are fully incorporated herein by reference.
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/US01/47944 | 12/13/2001 | WO | 00 | 6/13/2003 |
Publishing Document | Publishing Date | Country | Kind |
---|---|---|---|
WO02/48953 | 6/20/2002 | WO | A |
Number | Name | Date | Kind |
---|---|---|---|
4058795 | Balm | Nov 1977 | A |
4088981 | Gott | May 1978 | A |
4308523 | Schapira | Dec 1981 | A |
4701961 | Hongo | Oct 1987 | A |
5392212 | Geist | Feb 1995 | A |
5703783 | Allen et al. | Dec 1997 | A |
6978044 | Akagi | Dec 2005 | B2 |
Number | Date | Country |
---|---|---|
2000-341572 | Dec 2000 | JP |
Number | Date | Country | |
---|---|---|---|
20040032986 A1 | Feb 2004 | US |
Number | Date | Country | |
---|---|---|---|
60255435 | Dec 2000 | US |