There are a number of reasons for wanting to ensure that mailing lists are as accurate as possible. First, a mailer wishes to make sure that the mail reaches the intended recipient so that the intended communication can be delivered. The mailer's expense of preparing a mail piece and the postage costs are wasted when a faulty address prevents delivery. Further, the Postal Service incurs additional expenses in processing and returning undeliverable mail. Thus, it is in the interest of mailers and the Postal Service (or other delivery service) to ensure that mailing lists are as accurate as possible.
There are several steps that can be taken to ensure that mailing lists are accurate and up-to-date. Mailers can apply address hygiene software to their lists to ensure that individual addresses are in proper, postal approved, format. If non-standard abbreviations or address components are used, then postal automation devices may not be able to interpret the information for sorting. Hygiene software can also add four digit zip code extensions to facilitate postal processing. Data is available to validate that a particular address is actually on the master list of addresses that the Postal Service can deliver to. Other data and software are available to incorporate the latest recipient move updates, as provided to the Postal Service, and to incorporate the latest information on undeliverable mail from previous mailings.
Data and application software for these processes to update and correct mailing lists are typically copied onto CD's and sent to mailers via a software subscription business model. In some cases, it is also known upload mailing lists to a remote computer that can also provide address list correction using a service based model.
The present invention enhances the service based model of providing remote address cleansing. In this model, mailers are able to upload their address lists to a remote computer and to select what services they want performed on the list. The remote computer processes the lists, and a corrected list is downloaded back to the mailer.
One difficulty with this model is that the format of data and the content of the data being sent by mailers can vary greatly. The remote computer needs to be able to recognize what it is receiving in order to perform the correct processing. Mailers may be required to identify or verify the nature of the data that they are sending. The present invention simplifies that process and adds additional intelligence to assist the mailer in verifying the profile of the data that they are sending. An alternative approach not contemplated within the scope of the invention would require the mailers to pre-process their lists to conform to a uniform format. The pre-formatting approach does not allow the flexibility and convenience achieved using the present invention.
A plurality of address file hash values are stored and associated with a plurality of known address data file profiles. An uploaded address file is received at the processing site from a sender who wishes to have his address list processed. A received address data file profile is identified for the uploaded address file. A first hash value is calculated based on the identified received address data file profile. The first hash value is compared with the stored plurality of address file hash values. If the first hash value matches one of the stored plurality of hash values, then the known address data profile of the matching stored hash value is associated with the uploaded address file. If the first hash value does not match any of the stored plurality of hash values, then a new address file profile is prepared, a new hash is generated of the new profile, and the new profile is stored along with the associated new hash.
Address data profiles may be comprised of address data file formats and data field structure. The “format” of the data file refers to the type of database and tables that the sender uses, and the overall structure in which the data is stored. “Data field structure” refers to the particular characteristics of data stored in the various columns of the database. For example, the fact that a first column is an integer with a maximum length of 6 characters and the second column is text with a maximum length of 20 characters are examples of data field structures. The step of identifying the received address data file profile may include identifying a received address data file format and received data field structure. The step of calculating the first hash value may include calculating based on the received address data file format and received data field structure.
In some embodiments of the invention, the sender can be queried to confirm that certain data fields are being properly interpreted. Such embodiments may also include the ability to automatically analyze characteristics of data in data fields to determine if the data can be recognized as pertaining to a known type of address data field. The data fields are automatically identified based on the analyzed characteristics. The sender may then be queried as to whether they agree with the automatically identified data fields.
Once all of the data fields are properly identified using the invention, the system can proceed with providing services such as address verification and cleansing on the uploaded address file. The calculated hash for a particular data file may also incorporate the type of service to be performed, since the ability to reuse previously identified profiles might depend on whether those profiles are applicable to different services.
When a sender decides that a previously defined address data file profile needs to be changed, the updated information can be entered and a new hash value can be recalculated and stored for future use.
The accompanying drawings, which are incorporated in and constitute a part of the specification, illustrate presently preferred embodiments of the invention, and together with the general description given above and the detailed description of the preferred embodiments given below, serve to explain the principles of the invention.
The field matching step 11 addresses the problem of varying data types and formats of different mailers, as described above. In this step, the various fields in the data tables are identified, so that the appropriate processing can be applied to those fields for address correction. The enhancement described herein allows that a variety of formats can be submitted to the address correction service, and that the sender of the data can be relieved of some of the burden of making sure that fields are properly identified by the service for processing.
At step 12, the processing job is performed on the uploaded data and a corrected data file is generated. The results can be reviewed by the mailer at step 13. At the checkout step 14, the corrected data file is downloaded back to the mailer, and the transaction is finalized by providing a job detail report (step 15).
The uploading process may also include steps for analyzing the mailer's file data to try to make an educated guess as to what category of information is in a given field. This process is referred to as automatic field identification. For example, a field can be compared against a list cities, a list of states and state abbreviations, or a list of words like “road,” “street,” or “drive,” to determine whether the information in that field appears to match one of the required fields. If the data field appears to match one of the required categories, then it can be tentatively identified as such, pending user verification, as depicted in
As seen in
The functionality of the “Browse” button 23 is further depicted in
In operation, the process begins with uploading a file for processing at step 50. A hash is calculated at step 51 based on the profile of the uploaded file. The input for the hash algorithm may be the database format of the file, field identifications, number of fields, and field properties of the fields. Any known hash algorithms can be applied, the only criteria being that there should be a very low probability that any two different address file profiles will result in the same hash. The more data that is input into the hash algorithm, the less likely it will be that there will be a false match. Accordingly, mail file profiles should include as many details about the data fields as possible. An advantage of hash algorithms is that any difference in the input profile will result in a completely different and unique hash number being output. The calculated hash is stored in a stored file 52 with the uploaded file.
At step 53, it is determined whether the calculated hash from step 51 matches any hashes that have been calculated and stored from previous jobs. Hashes from previous jobs are stored in association with their corresponding data file profiles. If there is no match, then the new hash and the profile of the new uploaded file are stored in the system (step 58) for future comparison. If an existing match is found for the calculated hash, then the profile for the preexisting match can be applied to the new file, and the mailer's fields corresponding to the system required fields are automatically identified, with little or no input from the mailer.
The system also provides that modified hashes can be calculated based on additional mapping done by the mailer to further refine and correct the identification of fields. At step 54, if it is determined that the preexisting hash is a modified hash, then it is known that the mailer has provided the additional mapping, and no further action needs to be taken. If the matching hash is an original hash, then step 55 checks to see if there is any additional mapping by the mailer to modify the file. If there is no additional mapping, then the process is done. If additional mapping is done, then a modified hash is calculated at step 56, using the same hashing algorithm, and the modified hash is stored with the associated mapping profile (step 57), before the process is finished.
Another exemplary profile component could be an identification of the address correction services to be done on the file. For example, different services might have different required fields. If a mapping for a previous job did not require matching of a particular field, it may be desired to do a more intensive manual matching before relying on an automated one.
The profile components 60-63 are input into a hash algorithm which outputs a unique value. That unique value is stored as a stored hash value 65 in association with the mapping of the data fields to the required fields for successful address correction processing.
While the present invention has been described in connection with what is presently considered to be the most practical and preferred embodiments, it is to be understood that the invention is not limited to the disclosed embodiment, but, on the contrary, is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.