1. Field of the Invention
The present invention relates to a process of collecting and enhancing commercial data and, more particularly, to quality assurance and five quality drivers.
2. Description of the Related Art
To be successful, businesses need to make informed decisions. In risk management, businesses need to understand and manage total risk exposure. They need to identify and aggressively collect on high-risk accounts. In addition, they need to approve or grant credit quickly and consistently. They also need to verify prospect, customer and supplier data to ensure compliance with government regulations. In sales and marketing, businesses need to determine the most profitable customers and prospects to target, as well as incremental opportunity in an existing customer base. They need to understand who and how big their most important customers are, acquire new high-growth customers that look like their best customers and reallocate their sales force based on growth and opportunity. In supply management, businesses need to understand the total amount being spent with suppliers to negotiate better. They also need to uncover risks and dependencies on suppliers to reduce exposure to supplier failure.
The success of these business decisions depends largely on the quality of the information behind them. Quality is determined by whether the information is accurate, complete, timely, and Cross-Border Consistent. Accuracy is defined as having the right information on the right business. Completeness is defined as providing breadth and depth of data. Timeliness is making frequent updates to keep the information fresh. Cross-Border Consistency is providing consistent data across the globe. With thousands of sources of data available, it is a challenge to determine which is the quality information a business should rely on to make decisions. This is particularly true when businesses change so frequently. In the next 60 minutes in the U.S., 251 businesses will have a suit, lien, or judgment filed against them, 58 business addresses will change, 246 business telephone numbers will change or be disconnected, 81 directorship (CEO, CFO, etc.) changes will occur, 41 new businesses will open their doors, 7 corporations will file for bankruptcy, and 11 companies will change their name.
Conventional methods of providing business data are incomplete. Some providers collect incomplete data, fail to completely match entities, have incomplete numbering systems that recycle numbers, fail to provide corporate family information or provide incomplete corporate family information, and merely provide incomplete value-added predictive data. It is an object of the present invention to provide more complete, timely, accurate, and consistent business data. This includes data collection, entity matching, identification number assignment, corporate linkage, and predictive indicators. This produces high quality business information that provides insights so businesses can trust and decide with confidence.
One aspect of the present invention is a method of data integration comprising collecting information comprising primary data. The primary data is tested for accuracy and processed to produce secondary data and enhanced information comprising the primary data and the secondary data is provided. In some embodiments, primary and/or secondary data is sampled periodically thereby generating sample data. The sample data is evaluated against at least one predetermined condition. Based upon this evaluation, testing and/or processing steps are adjusted.
In some embodiments, testing comprises at least one of the following steps: (a) determining if the primary data matches stored data and (b) assigning an identification number to the primary data. It is determined if the primary data meets a first threshold condition before assigning an identification number in step (b) if the primary data does not match the stored data in step (a). The first threshold condition is multiple sources confirm that a business associated with the primary data exists. The identification number is an entity identifier. The primary data is stored in a separate repository and assigned an identification number if it does not meet the first threshold condition. Additional primary data is received and it is determined if the primary data and the additional primary data meet the first threshold condition, the entity is moved into the multi-source repository.
Another aspect of the present invention is a system for data integration. The system includes a data generator, a testing unit, a first processing unit, and a second processing unit. The data generator is capable of gathering primary data from at least one data source. The testing unit is capable of testing the primary data for accuracy. The first processing unit is capable of analyzing the primary data and generating secondary data from the result of the analysis. The second processing unit is capable of merging the primary data and the secondary data to form enhanced information. The testing unit, first processing unit, and the second processing unit may be the same or independent of one another. In some embodiments, the testing unit comprises at least one of a data matching unit and entity identifier unit. The first processing unit comprises at least one of a corporate linkage unit and a predictive indicator unit.
Another aspect of the present invention is a machine-readable medium for storing executable instructions for data integration. The instructions include collecting information comprising primary data, testing the primary data for accuracy, processing the primary data to produce secondary data, and providing enhanced information comprising the primary data and the secondary data.
In some embodiments, the primary and/or secondary data is sampled periodically, thereby generating sample data. The sample data is evaluated against at least one predetermined condition. The testing and/or processing is adjusted based upon the evaluation.
These and other features, aspects, and advantages of the present invention will become better understood with reference to the drawings, description, and claims.
In the following detailed description, reference is made to the accompanying drawings. These drawings form a part of this specification and show, by way of example, specific preferred embodiments in which the present invention may be practiced. These embodiments are described in sufficient detail to enable those skilled in the art to practice the present invention. Other embodiments may be used and structural, logical, and electrical changes may be made without departing from the spirit and scope of the present invention. Therefore, the following detailed description is not to be taken in a limiting sense and the scope of the present invention is defined only by the appended claims.
Data collection driver 108 brings together data from a variety of sources worldwide. Then, the data is integrated into database 118 through entity matching driver 110, resulting in a single, more accurate picture of each business entity. Next, identification number driver 112 applies an identification number as a unique means of identifying and tracking a business globally through any changes it goes through. Corporate linkage driver 114 then builds corporate families to enable a view of total corporate risk and opportunity. Finally, predictive indicators driver 116 uses statistical analysis to rate a business' past performance and indicate the likelihood of a business to perform in a specific way in the future.
For data collection 108, a very large amount of global data is collected from a variety of sources for increased accuracy. Quality assurance 400 is performed for data collection 108 to verify legal name and ownership to identify potential fraud, to update contact information, to update and make changes based on events, to verify and enhance third party information, and to ensure accuracy, completeness, timeliness and cross-border consistency. Quality assurance 400 continually refines and enhances data collection 108.
For entity matching 110, incoming data is matched to data in database 118. Quality assurance 402 is performed for entity matching with manual and automated quality checks to ensure accurate matches and eliminate duplicates. Based on customer feedback and matching learnings, quality assurance 402 for entity matching 110 is continually refined and enhanced.
For identification number 112, businesses are uniquely identified and tracked. Quality assurance 404 is performed for identification number 112 by retaining an identification number for the life of a business and by being recognized as an industry standard. The identification number allows verification of information in each of the five drivers. For data collection 108, if data is not linked to an identification number, it indicates the possibility of a new business. For entity matching 110, the identification number allows new data to be accurately matched to existing businesses. For corporate linkage 114, corporate families are assembled based on each business' identification number. For predictive indicators 116, numbered data is used to build predictive tools. A verification process assigns an identification number when commercial activity is confirmed. Quality assurance 404 for identification number 112 includes validating and protecting against duplication. The identification number assignment process is continually refined and enhanced.
For corporate linkage 114, corporate families are built to provide a view of total risk and opportunity. Quality assurance 406 for corporate linkage 114 includes building corporate families globally and updating them after mergers, acquisitions, and other events. Quality assurance 406 for corporate linkage 114 includes increasing completeness and accuracy of corporate families by having a dedicated team review corporate families and by matching corporate families. Based on customer feedback, the corporate linkage 114 is continually refined and enhanced.
For predictive indicators 116, statistical analysis is used to indicate the likelihood of a business to perform in a specific way in the future. Quality assurance 408 for predictive indicators 116 includes continually monitoring and adjusting predictive indicators 116 to reflect new information. Based on customer feedback, the predictive indicators 116 are continually refined and enhanced.
Thus, the five main components or drivers work together to integrate the data collected into quality information 106 that is useful for making business decisions. The process is continually enhanced to continually improve quality based on feedback, learnings and experience spanning over the past 160 years. Each of the five drivers is examined in more detail below, starting with data collection driver 108.
Global Data Collection
In an example database 118, top news providers are monitored every day to uncover changes and updates that affect the risk level and/or marketing attributes and of the user's customers, prospects, and suppliers. This data is focused on publicly traded companies with additional coverage devoted to mergers and/or acquisitions and high risk or business deterioration. News is posted within 24 hours of release. The types of events include mergers and acquisitions, control changes, purchase or sale of assets, officer, name or location changes, earnings updates, and business closings. The benefit is updated information that affects the risk level of companies the user does business with and indications of key changes that can be used for marketing purposes.
In an example database 118, payment experiences from companies are collected to help the user predict future payment habits of prospects and customers. Accounts receivable data on U.S.-based companies provides an overall evaluation of how quickly and completely a company made payment to each vendor. Many reports created using database 118 include payment data. This payment experiences data has many benefits. Users get a picture of how a company is paying their vendors, bank loans, and other financial obligations. It enables showing payment trends over time. It enables creation of predictive scores for use in applications such as automated credit approvals. It helps pre-screen potential customers based on their ability to pay on time. Payment experiences are summarized to show the user how different industries are paid and credit limits.
An example database 118 has public records from U.S. courts and legal filing offices to provide critical insights into the risk of a company. This data includes U.S.-based company information on suits, liens, judgments, bankruptcies, and U.C.C. filings (collectively called public records), information obtained from courts and recording offices, company filing for bankruptcy protection under Chapter 11 (re-organization) or Chapter 7 (liquidation). This data captures a majority of the U.S. public filings and has many benefits. Over 10 years of historical coverage enable predictive credit ratings and scores. Users understand legal actions that could affect a company's ability to continue as an ongoing concern. A company's rating is negatively impacted when a bankruptcy takes place. Users are notified about all companies affected in a corporate family when a bankruptcy occurs within the corporate family.
In an example database 118, complete coverage of public company financial statements and many privately held company financial statements help the user to understand financial strength. This data includes balance sheet and income statements and private company financial statements collected from certified public accountants (CPAs) or from corporate officers. In the US, for example, public company financial information is obtained from the Securities and Exchange Commission (SEC) or annual reports, 10K's and 10Q's. The database 118 has complete coverage on public companies. Most financial statements are on privately-held companies. This data has many benefits. Users understand financial strength, ability to pay on time and ability to continue as an ongoing concern. This data helps target prospects by size or financial strength.
An example database 118 has data from telephone calls that verify and enhance the third party information leading to over one and one-half million updates to the database 118 everyday. This data includes interviews with business principals to verify and enhance information from other sources. Every public company is monitored daily.. There is a focus on collecting value-added data (e.g., business name, address, telephone number, SIC, employee number, sales, CEO/owner name). This has many benefits. It serves as an additional check on the accuracy of the data, helps validate third party data, builds content on small businesses, and makes the data consistent across the globe. Consistency of data enables customers to rely on the same high quality of information country to country, creating opportunity for growth, consistency in credit and marketing policies globally, understanding risk exposure, marketing opportunity and reliance on suppliers globally.
The URL file is collected from external and internal sources. Each URL is mined several times a year to confirm its status (live, parked, under construction, redirect, inactive) and verify it belongs to the company it has been assigned to using the name, address or telephone number from the existing database. Besides verification several times a year additional data elements such as security data, certificate data, strength of encryption and other data are collected from the URL. The URL's verified are populated in the database using one-down linkage to expand coverage across family tree members.
In an example, telephone company data is collected to identify new businesses, changes in existing records and to provide updated contact information. Businesses request new listings when initiating phone service. The benefits of this data include indication of a new business or change in phone number and enabling creation of new records or enhancing existing ones, providing the most recent address, phone number, and line of business (SIC) information.
In an example, database 118 includes business registrations from state government registries to verify legal name and ownership to identify potential frauds. Database 118 has information on business registrations filed at the time a company is incorporated. This has many benefits. It enables verification of the existence of registered businesses, confirms information, such as a company's organizational structure, date, and state of incorporation (or organization), help aid in fraud investigation through review of names and principals and business standing within a state, and identification of all changed records and new-to-file records.
Quality assurance 102 of database 118 ensures accuracy, completeness, timeliness, and cross-border consistency of global data. Quality assurance includes standardizing data, correcting and updating data, ensuring phone numbers connect and mailing addresses deliver to the intended recipient, and conducting manual reviews.
Quality assurance 102 includes standardizing data. Numerous quality edits and validations are made at the time of data entry. Data is validated to ensure consistency between branch and headquarter names, reasonability between number of employees, sales volume and line of business, prevent duplication of records, validate out-of-business status changes and more. Global cleansing software, is used to standardize marketable records and ensure consistency in presentation of records Addresses are standardized before inclusion in the database.
Quality assurance 102 includes correcting and updating data. In an example, the status of suits, liens, judgments and bankruptcy filings are reviewed and updated. Data flows between internal teams to ensure information is consistently updated between areas of news, risk, ratings and delivery. Constantly updating and refreshing the data, leads to high response rates on customer acquisition promotions, high match rates between files and high quality data in the database 118.
Quality assurance 102 includes manual reviews. Third party data is validated with manual reasonability reviews. Payment re-checks are manually performed on trade references appearing abnormal or exaggerated. Financial statements are reviewed to identify high risk businesses, ensure accuracy and apply capital strength ratings consistently across the universe of records. Comparisons of merger/acquisition update volumes are done with externally published numbers to ensure complete coverage.
Data is continually refined and enhanced through quality assurance 102 and global data collection 108.
Entity Matching
There are many benefits from entity matching driver 110. Entity matching driver 110 detects similarities in incoming data and combines it into a single business. Queries are more likely to be accurate, customer, supplier, and prospect information is consolidated to provide more complete and accurate profiles, and there are less duplicate records. In addition, the customer can receive information about the quality of their matched records via D&B's matching feedback mechanisms, allowing the customer to decide how to use the matched information in their business processes. Another benefit is that the customer receives a consistent answer as the matching process is repeatable and defined.
To ensure quality assurance 102 of entity matching 110, manual and automated checks are performed. Samples of matched records are manually reviewed. Based on experience, customer feedback and learnings, entity matching 110 is recalibrated. Entity matching 110 allows and corrects for variations in spelling, formats, trade names, addresses, and the like. Entity matching 110 uses a match grade and confidence code to determine if the match passes the quality threshold. Entity matching 110 provides a consistent, repeatable process that is not based on human judgment. The benefits are more accurate matches and less duplicates.
Quality assurance 102 of entity matching 110 includes continually refining and enhancing entity matching 110 based on customer feedback. Samples of matched records are manually reviewed, technology allows for corrections in spelling, formats, trade names, addresses. Technology also interprets context of key parts of the inquiry to better find difficult matches (i.e. interpret parts of the sound, geographic position, implied line of business, acronyms). Quality assurance is also ensured by using a customized retrieval approach for each inquiry that looks at the best way to find a match to optimize the result for each unique inquiry (i.e. some matches are better made by using sound algorithms, other matches are better made by using exact name matches). As enhancements are made, they become available both online and in batch systems to ensure consistency. The benefits of these improvements are increased search candidates, additional functionality and increased throughput. In other words, more hits, more better hits, and more better hits faster. Matching capabilities include matches to a proprietary database containing multiple names and addresses per record, the ability to identify matches that don't look exactly like each other, and the ability to select by the quality of the match.
DUNS Number
Identification (ID) number driver 112 appends a unique identification number to every business location so it can be easily and accurately identified. This identification number is non-indicative. One example of the unique identification number is such as the D-U-N-S® Number available from Dun & Bradstreet headquartered in Short Hills, N.J., which is a nine-digit number that allows business locations to be easily tracked through changes and updates. The identification number is retained for the life of a business. No two business locations ever receive the same identification number and the identification numbers are never recycled. The identification number acts as an industry standard for business identification. It is endorsed by the United Nations, the European Commission, and over fifty industry groups.
The identification number is a central concept in the data processing method according to the present invention. For quality assurance, the identification number allows verification of information at every stage of the process. For data collection driver 108, if data is not linked to an existing identification number, it indicates the possibility of a new business. For entity matching driver 110, the identification number allows new data to be accurately matched to existing businesses. For corporate linkage driver 114, corporate families are assembled based on each business' identification number. For predictive indicators driver 116, the identification number is used to build predictive tools.
Additionally, the identification number opens new areas of opportunity to a user's business by helping to verify that a business exists and validating the business location. Users are provided a complete view of prospects, customers, and suppliers. Existing data is clarified, duplication is identified, and related businesses are shown to be related. Users can more easily manage large groups of customers or suppliers when the identification number is appended to the user's information. The identification number enables fast and easy data updates when appended to the user's information. The identification number provides a complete view of prospects and customers by placing businesses, where applicable, within their domestic and global corporate ‘families’, identifying penetration and opportunities for up-sell and cross-sell. The identification number also helps aggregate data from multiple and disparate systems to gain better insight with one complete view of prospects, customers and suppliers.
The identification number not only helps identify duplication in files within the database, but also enables customers with a unique key that can be used to identify duplication in the customer's existing portfolio of accounts.
Quality assurance 102 includes how identification numbers are managed. In an example, an identification number is retained for the life of a business. No two businesses ever receive the same identification number. Identification numbers are never recycled.. The identification number is retained when a company moves anywhere within the same country. The identification number is preferably an industry standard for business identification.
Quality assurance 102 of identification number driver 112 includes validation and protection against duplication. Rigorous processing is done to identify duplicate identification numbers including using duplicate scoring systems, implementing controls around bulk file building and undergoing validations prior to entering the database. In an example, every business is validated before it is included in database 118 so that the address is based on postal standards, incoming records are validated in relation to a town file (e.g., address, city, ZIP, state, and telephone number), and phone number and line of business are verified. There is multiple source validation, i.e., business registrations sometimes do not indicate a business has begun operations.
Quality assurance 102 of identification number driver 112 includes refining and enhancing the identification number assignment process.
Corporate Linkage
As shown in
Corporate linkage driver 114 opens up profitable opportunities in risk management, sales and marketing, and supply management for a user. It allows the user to understand the total risk exposure and regulatory and statutory compliance implications across a corporate family. The user recognizes the relationship between bankruptcy or financial stress in one company and the rest of its corporate family. The user increases sales by up-selling and cross-selling with a corporate family. The user reduces expenses by reducing research time. The user can maximize the opportunity based on revenues from an entire corporate family. The user can understand where purchase decisions are made. The user can identify possible conflicts of interest. The user can determine its total spend with a corporate family to better negotiate.
Members of a corporate family are identified by their relationship to other members. In an example, members include a global ultimate, a domestic ultimate, parents corporations, subsidiaries, headquarters, and branches. A global ultimate is a highest ranking member of a corporate family globally. A domestic ultimate is the highest ranking member of a corporate family within a specific country. A parent corporation is a company that owns more than half of another company. A subsidiary is a company that is more than half owned by a parent company. Headquarters is a company with reporting branches or divisions. A branch is a secondary location or operation, not a separate entity.
Quality assurance 102 during corporate linkage 114 increases the completeness and accuracy of corporate families. In an example, a dedicated team reviews corporate families. This ensures business names, tradestyles, and SICs are consistent within a corporate family. Quality assurance 102 includes checking for duplicates. There are central review and updates for the largest global family trees. Changes are monitored to identify and track mergers and acquisitions and other major events. Quality assurance 102 includes matching of corporate families. There are quality programs to ensure business entities are linked properly and to handle linkage breaks within a corporate tree. Corporate linkage is done through legal ownership. Quality assurance 102 of corporate linkage 114 includes continually refining and enhancing corporate linkage based on customer feedback. Corporate linkage 114 capabilities include global cross-border linkage, U.S. linkage, public company linkage, private company linkage, and linkage defined by legal ownership versus business name. Quality assurance processes include using a validation tool to identify erroneously unlinked records or ‘look-a-likes’. The quality assurance processes are continually refined and enhanced based on learnings, feedback and reviews.
Predictive Indicators
Predictive indicator driver 116 summarizes the information collected on a business and uses it to predict future performance. Predictive indicators use statistical analysis to indicate the likelihood of a business to perform in a specific way in the future. There are many benefits to predictive indicators. Users can make faster, more consistent decisions by allowing automated decisions for increased efficiency. Users can free up resources to look at time-intensive borderline decisions. Users can make more consistent decisions across the entire organization. Users can allow faster processing of large volumes of transactions. Users can apply scores across an entire portfolio to quickly identify risk and opportunity. Users can help estimate demand to target the right prospects and reduce acquisition costs.
There are three types of predictive indicators: descriptive ratings, predictive scores, and demand estimators. Descriptive ratings summarize how a customer has historically been paying bills. Predictive scores are a prediction of how likely it is for a business to pay promptly or continue as an ongoing concern. Demand estimators estimate how much of a product a business is likely to buy in total (response, approval, look-a-like models).
Predictive indicators help a user to accelerate and impact profitability in all areas of its business. In risk management, descriptive ratings and predictive scores help the user grant or approve credit. A rating indicates creditworthiness of a company based on past financial performance. A score indicates likelihood of a business to continue as an ongoing concern or pay on time. Predictive scores can be applied across the user's whole portfolio to quickly identify high-risk accounts and begin aggressive collection immediately or to evaluate the credit worthiness of each applicant. A commercial credit score predicts the likelihood of a business paying slow over the next twelve months. A financial stress score predicts the likelihood of a business failing over the next twelve months. In sales and marketing, look-a-like models, response models and demand estimators let a user: identify prospects that look like their best customers, identify who is likely to respond to an offer, and/or how much product they will buy so that it can prioritize opportunities among customers or prospects. Examples of demand estimators include number of personal computers and local or long distance spending. In supply management, predictive scores can be applied to all of a user's suppliers to quickly understand their risk of failing in the future.
In addition, predictive scores may be customized according to a user's specific need and criteria. For example, criteria may be used, such as (1) what behavior does the user want to predict; (2) what is the size of the business the user wants to assess; and (3) what are the decision rules based on the user's risk tolerance to translate risk assessment in to a credit decision or risk management or marketing action.
Predictive indicators are enabled by analytic capability and data capability. For example, a dedicated team of experienced business-to-business (B2B) expert PhDs may build the underlying predictive models and have access to industry-specific knowledge, financial and payment information, and extensive historical information for analysis.
A development sample is selected from a business universe 1814, a demographic profile is created of the business universe 1816, and exploratory data analysis is performed 1818 (univariate analysis of all variables. Tasks are performed such as determining the relationships between the variable and what is being predicted, the range of a variable, the type of variable, including or not including variables, and other functions related to understanding what to put in the model. Variables may be selected in accordance with the observation period and the performance period and weights may be assigned to indicate accuracy or representativeness. Trends are factored in. Quality assurance includes periodically checking to see if anything in the business universe effects the initial model and to take a score and run it against a prior period to check that it is still indicative or predictive.
Continuing on
Quality assurance 102 of predictive indicators 116 includes continually monitoring and adjusting predictive indicators to reflect new information. In an example, this includes periodic testing of predictiveness, continuous manual refinement and recalibration, automated changes, monthly audits and annual validation, and analyzing data for each model with respect to its predictive qualities and importance whenever models are created or updated. Also, predictive indicators are continually refined and enhanced based on customer feedback. Predictive indicators 116 has data depth, including demographic data, payment information, detailed public record information, such as suits, liens, judgments, bankruptcies, and UCC filings, public and private company financial information, and linkage data used to assign risk to the responsible entity (i.e., score branches with HQ data). An independent group of reviewers check and validate the results of the scores, from which continual refinement and enhancement is realized. Customer needs and industry trends are also considered when quality assurance processes are done to continually improve the models and scores.
The present invention has many advantages. Preferably, a global database used to perform a method of data integration encompasses millions of records and is updated daily. Users gain a fresher, more complete picture of each of their customers, prospects, and suppliers, because of the large number of daily updates to the database. Users are able to assess the risk of non-U.S. companies, because the database has global data. Users can more completely identify the risk from small business customers. Users make more informed risk decisions. Users identify new prospects from data drawn from multiple sources. Users gain access to international customers, suppliers and prospects. Users receive enhanced prospect lists with value-added information, such as line of business and contact name. Users can assess risk from foreign suppliers. Users can identify more complete the risk from suppliers.
It is to be understood that the above description is intended to be illustrative and not restrictive. Many other embodiments will be apparent to those of skill in the art upon reviewing the above description. Various embodiments for performing data collection, performing entity matching, applying an identification number, performing corporate linkage and providing predictive indicators are described. The present invention has applicability to applications outside the business information industry. Therefore, the scope of the present invention should be determined with reference to the appended claims, along with the full scope of equivalents to which such claims are entitled.
The present application is a continuation-in-part of and claims the benefit of application Ser. No. 10/368072, filed Feb. 18, 2003, entitled “Data Integration Method,” which is currently pending.
Number | Date | Country | |
---|---|---|---|
Parent | 10368072 | Feb 2003 | US |
Child | 11137821 | May 2005 | US |