1. Field of the Disclosure
The present disclosure relates to electronic transaction processing. More specifically, the present disclosure is directed to method and system for compiling the transactional volume of aggregate merchants to merchant locations.
2. Brief Discussion of Related Art
The use of payment devices for a broad spectrum of cashless transactions has become ubiquitous in the current economy, according to some estimates accounting for hundreds of billions or even trillions of dollars in transaction volume annually. The process and parties typically involved in consummating a cashless payment transaction can be visualized for example as presented in
In cases where the merchant 16 has an established merchant account with an acquiring bank (also called the acquirer) 20, the merchant communicates with the acquirer to secure payment on the transaction. An acquirer 20 is a party or entity, typically a bank, which is authorized by the network operator 22 to acquire network transactions on behalf of customers of the acquirer 20 (e.g., merchant 16). Occasionally, the merchant 16 does not have an established merchant account with an acquirer 20, but may secure payment on a transaction through a third-party payment provider 18. The third party payment provider 18 does have a merchant account with an acquirer 20, and is further authorized by the acquirer 20 and the network operator 22 to acquire payments on network transactions on behalf of sub-merchants. In this way, the merchant 16 can be authorized and able to accept the payment device 14 from a device holder 12, despite not having a merchant account with an acquirer 20.
The acquirer 20 routes the transaction request to the network operator 22. The data included in the transaction request will identify the source of funds for the transaction. With this information, the network operator 22 routes the transaction to the issuer 24. An issuer 24 is a party or entity, typically a bank, which is authorized by the network operator 22 to issue payment devices 14 on behalf of its customers (e.g., device holder 12) for use in transactions to be completed on the network. The issuer 24 also provides the funding of the transaction to the network provider 22 for transactions that it approves in the process described. The issuer 24 may approve or authorize the transaction request based on criteria such as a device holder's credit limit, account balance, or in certain instances more detailed and particularized criteria including transaction amount, merchant classification, etc., which may optionally be determined in advance in consultation with the device holder and/or a party having financial ownership or responsibility for the account(s) funding the payment device 14, if not solely the device holder 12.
The decision made by the issuer 24 to authorize or decline the transaction is routed through the network operator 22 and acquirer 20, ultimately to the merchant 16 at the point of sale. This entire process is typically carried out by electronic communication, and under routine circumstances (i.e., valid device, adequate funds, etc.) can be completed in a matter of seconds. It permits the merchant 16 to engage in transactions with a device holder 12, and the device holder 12 to partake of the benefits of cashless payment, while the merchant 16 can be assured that payment is secured. This is enabled without the need for a preexisting one-to-one relationship between the merchant 16 and every device holder 12 with whom they may engage in a transaction.
The issuer 24 may then look to its customer, e.g., device holder 12 or other party having financial ownership or responsibility for the account(s) funding the payment device 14, for payment on approved transactions, for example and without limitation, through an existing line of credit where the payment device 14 is a credit card, or from funds on deposit where the payment device 14 is a debit card. Generally, a statement document 26 provides information on the account of a device holder 12, including merchant data as provided by the acquirer 20 via the network operator 22.
The network operator 22 can further build and maintain a data warehouse that stores and augments transaction data, for use in marketing, macroeconomic reporting, etc. To this end, transaction data from multiple transactions is aggregated for reporting purposes according to a location of the merchant 16. Additionally, one merchant 16 may operate plural card acceptance locations. Consider, for example, a chain or franchise having multiple business locations. These merchant locations are beneficially aggregated and assigned an aggregate merchant location identifier for reporting purposes.
Of all the data handled in the transaction process, the merchant's data tends to be the least stable and most difficult with which to deal. One of the challenges with merchant data is the fact that there is no universal merchant location identifier. Rather, the network operator 22 must build and maintain the data warehouse itself, derived from merchant data included in the transaction data delivered via the acquirer 20. Similarly, there is no reliable location identifier in the data received that indicates if a merchant location belongs to a chain or not, for example for aggregation purposes. Again, the network operator 22 augments transactions with this information, based on the received merchant name, the acquiring bank, and several other fields. The process of grouping merchant locations into sets of chain merchants is called “merchant aggregation” and maintaining the integrity of these aggregations is a challenge. Ultimately, the network operator 22 must rely on imperfect inference from the transaction data to perform its merchant aggregation.
Merchants 16 and acquirers 20 do not consistently submit their data in the same way, thus creating the need to monitor the integrity of this data. Merchants 16 can change acquirers 20; they open and close locations; they rebrand themselves—just to name a few of the challenges. When any of these or other changes to merchant data occur, the rules used to assign an identifier to a merchant location and/or associate that merchant location with an aggregate merchant location identifier often fail. Even cursory human oversight of each and every merchant location would be prohibitively expensive considering the total number of merchants 16 accepting authorized payment devices 14, or even that subset of aggregate merchants whom the network operator 22 wishes to monitor.
Merchant identification data, in particular DBA name and address, are notoriously inaccurate and unstable. The data is inaccurate in the sense that it is often provided in a non-standardized in form, and certain data may be cross-polluted among the various fields making up the merchant location entry. There is also the possibility that the data is intentionally fraudulent or misleading by bad actors.
Existing merchant aggregation efforts rely on text matching, address recognition, or even feedback from the merchant to properly group and/or classify merchants in the aggregate. However, no method to date can assure that every eligible merchant location is contained within the aggregate. Furthermore, a merchant point-of-sale (POS) terminal can be resold or transferred among merchants. If the POS terminal is not rebuilt properly before redistributing to a different merchant, techniques that look to the POS terminal identification data to aid in the aggregation may result in inaccuracies. Likewise, an unreputable merchant who intentionally selected their name so as to be mistaken for a different entity would be prone to misaggregation. A solution to this aggregate merchant data quality deficit problem remains wanting.
The instant application describes a solution to the problem of aggregate merchant data quality deficit.
Among the problems influencing the merchant data quality deficit is that, in the example of the largest merchants, they may use more than one acquirer 20 to process all of their transaction volume across their entire chain of stores. This may or may not be divided by merchant subsidiary, and may be without regard to plural transaction device 14 acceptance terminals at a given location. Each acquirer 20 may have a different data format for merchant name and location. In some cases, multiple terminals, even those processed through the same acquirer 20 and in the same location of a given merchant 16, may have variations in data presentation. Franchise chain data can be particularly troublesome, as the merchant is generally an independent entity, although the value in data reporting is to be found in aggregating transactions under the franchise umbrella.
A related application by the present inventive entity is entitled MERCHANT CONTINUITY CORRECTION USING CARDHOLDER LOYALTY INFORMATION, filed 2 May 2013 and assigned U.S. patent application Ser. No. 13/875,803 (Applicant Ref. No P00915-US-UTIL; Attorney Docket No. 1788-100), the complete disclosure of which is hereby incorporated by this reference for all purposes. Therein, the present inventors addressed the problem of merchant continuity correction. Changes in merchant data that are not reflective of changes in ownership, e.g., where a merchant had a change in acquirer 20, or installed a new Point-Of-Sale (POS) terminal that introduces a perturbation in merchant identification data such as address or name, induce the network operator 22 to create a new merchant location entry corresponding to the new data. However, such changes are not always indicative of new ownership, and in fact the previous merchant remains open without interruption.
The above-referenced application leverages the characteristic of cardholder loyalty to a particular location. More colloquially, a certain percentage of shoppers/client/customers of a particular merchant tend to remain loyal to that merchant. On the other hand, it would be unusual to see a cohort of cardholders switch loyalty from one merchant to another virtually simultaneously en masse. Therefore, where a disrupted merchant location and a new merchant location exhibit a threshold level of cardholder loyalty, it can be inferred that what are two merchant locations from the perspective of the network operator 22 are in fact one and the same continuous operation. The foregoing analysis and solution also lends itself to an automated implementation.
As applied to merchant aggregation, it has been observed that a certain cohort of cardholders exhibit a degree of loyalty to a particular merchant brand across multiple locations. This brand loyalty can be leveraged as an indicator of relationship between separate merchant locations.
Therefore, provided according to the present disclosure is a method of aggregating merchant data from transaction data. The method includes retrieving transaction data set from a data warehouse, the transaction data set including a merchant location identifier and the corresponding merchant's Doing Business As (DBA) name and address data. A data set is then formed from the transaction data, having merchant locations exhibiting at least a threshold level of common cardholder patronage. A metric is calculated related to the textual similarity between a merchant location's DBA name for each pair of merchant locations within the data set. Each pair of merchant locations having a metric related to the textual similarity between the merchant locations' DBA names that exceeds a predetermined threshold is aggregated with each other, where the merchant locations making up the pair do not share an address. The aggregation between merchant locations is recorded in the data warehouse.
In a further embodiment of the presently disclosed method, each merchant DBA name is pre-processed to remove common, generic or descriptive terms, including without limitation those related to the goods or services sold by the merchant, or to the geographic location of the merchant.
In a further embodiment of the presently disclosed method, the transaction data set comprises transactions occurring within at least one of a predetermined time period, a predetermined geographical location, and involving predetermined merchant characteristics.
Optionally, the merchant pair data in the transaction data set may be graphically represented as an interconnected network, with the merchant locations corresponding to nodes of the network, and where the nodes are connected by edges that correspond to at least one cardholder patronizing both merchant location nodes on either side of the edge. Optionally or additionally, merchant locations that are constituents of a fully connected subgraph are further identified for aggregation.
In a further embodiment of the presently disclosed method, the threshold level of common cardholder patronage is related to at least one of a number of cardholders patronizing both merchant locations of a pair, a number of transactions with both merchants each cardholder makes, a percentage of common cardholders as a portion of the client base for each connected merchant location independently, the product of the percentage of cardholders overlapping from each location independently, or some combination of these.
In a further embodiment of the presently disclosed method, the metric related to the textual similarity between the merchant locations' DBA names comprises at least one of inverse document frequency measurement, Levenshtein Distance, a Soundex comparison, and an value related to each common substring of any length between the respective merchant locations' DBA names.
Further provided according to the present disclosure is a system for aggregating merchant data from transaction data. The disclosed system includes a processor and a non-transitory machine-readable storage medium storing thereon a program of instruction that, when executed by the processor, causes the processor to carry out a method including any of the above-recited features. Further provided according to the present disclosure is a non-transitory machine-readable storage medium described with reference to the system.
These and other purposes, goals and advantages of the present disclosure will become apparent from the following detailed description of example embodiments read in connection with the accompanying drawings.
Some embodiments are illustrated by way of example and not limitation in the figures of the accompanying drawings, in which like reference numerals refer to like structures across the several views, and wherein:
In the present inventors' work on merchant continuity correction, brand loyalty was, in fact, a problem. That is to say, where a cardholder was loyal to a brand perhaps patronizing many locations of the same brand, the purchase record would include merchants of similar name text. If such a brand-loyal customer responded to the closing of one location by patronizing another location of the same brand, this might be erroneously perceived as an indication of continuity between what were two separate merchant locations, when in fact the separation of locations was appropriate, and one of these locations simply discontinued operation. The presently disclosed method seeks to leverage the customer's brand loyalty among merchant locations having similar DBA names.
Consider, for example, Table 1 below. Table 1 shows a list of merchant locations with many overlapping cardholders. More particularly, each row in Table 1 represents a pair of merchant locations having at least one (or some other threshold number) cardholder who patronized both merchant locations within the data set. Each merchant location is assigned an arbitrary ID. The data in Table 1 was anonymized for the purpose of the present disclosure, however it is representative of actual merchant location data.
The data in Table 1 demonstrates that a cardholder who eats regularly at one BURGER NOW (a fictive, archetype chain of fast food restaurants) location is very likely to patronize other BURGER NOW locations. Therefore, this cardholder brand preference for BURGER NOW could be used as an indicator for merchant aggregation. More specifically, merchant aggregation may be indicated by aggregating two locations with overlapping cardholder patrons, provided their addresses were different, the DBA names were typographically similar, and they had an abnormal number of cardholders shopping at both locations. The BURGER NOW ID H merchant location is particularly demonstrative. The merchant location that served as the basis for BURGER NOW ID H is in fact located in a regional shopping center. That regional shopping center attracts cardholders who normally or regularly visit at least one of 17 surrounding BURGER NOW locations. In this fashion, the noted 18 BURGER NOW locations can be aggregated automatically by relying on the cardholder brand preference.
The disclosed method also allows merchant aggregation with reduced emphasis on text matching. The method is therefore well suited to automated aggregate merchant matching in foreign countries that rely on transliteration for payment network merchant data.
Referring now to
Certain threshold criteria 105 can be applied in order to cull this data set at the selection phase 104. Threshold criteria 105 may include a temporal criteria 105a. Under the temporal criteria 105a, the data under consideration may be time-limited, looking only to transactions within a predetermined period of time, e.g., day, week, month, or any other arbitrary timeframe. It will be appreciated that longer timeframes will allow more merchant pairs to emerge among cardholders. On the other hand, larger sample sizes are more computationally intensive. Alternately or additionally, the merchant data set can be culled by applying a location criteria 105b. Location criteria 105b would limit the geographic location of the merchant (e.g., city, state, region, country, etc.). Moreover, the time criteria 105a and location criteria 105b may be interrelated and combined, upon consideration of a typical cardholder's ability to patronize two merchants within the prescribed geographic area and the determined time span.
Alternately or additionally, merchant characteristic criteria 105c may be applied. Merchant aggregation in general is typically, though not exclusively, applied to brick-and-mortar business locations. Therefore, and when brick-and-mortar locations are the focus of inquiry, merchant locations of a given class, for example those that are, as example only and without limitation, known to be partially, exclusively, or predominantly e-commerce merchants, mail-order merchants, telephone-order merchants, or centrally-billed merchants (i.e., those where the address of the merchant location billing the customer and/or customer's payment device 14 is remote from the location of the customer or where the product or service is delivered to the customer), can be removed from consideration. The reverse will also be seen as applicable, for example by eliminating brick-and-mortar merchant locations to aggregate related e-commerce, etc., merchants. Moreover, the method may be performed without regard to merchant class where the intention is to aggregate an online retailer with a corresponding brick-and-mortar presence.
Optionally, certain data pre-processing 106 can be performed to improve the power of a textual match between respective merchant location DBA names. For example, a black list of common, generic, or descriptive terms may be excised from the merchant DBA name before performing a comparison. The blacklist terms at issue include those that relate to the goods or services provided, including without limitation, “pizza,” “restaurant,” “salon,” etc. Geographic indicators, e.g., a city name or street name included in a given merchant location entry, can likewise be removed from the merchant location DBA name field for the purpose of subsequent textual similarity comparison. Such generic terms will have little distinguishing or predictive power to indicate related merchants that are amenable to aggregation.
The selected data is structured 108 for aggregate matching. The simplest implementation of the pre-processed data is a list of all merchant pairs, e.g., Table 1. A merchant pair in Table 1 represents that one cardholder patronized both merchant locations represented in each line of the table. Alternately, an undirected graph structure can be prepared, with reference to
Referring again to
Merchant location pairs meeting an edge weight criteria 110 can then be further pruned, or consolidated 112, based upon a substring matching metric on the respective DBA names. DBA name matching may further include the optional pre-processing 106 described above. Textual similarity can be determined by a variety of methods. Known methods of measuring textual similarity include term frequency—inverse document frequency (tf-idf) measurement; Levenshtein Distance; or Soundex comparison, among others. At least one method of determining textual similarity is disclosed in U.S. Pat. No. 8,219,550, issued 10 Jul. 2012 to Merz, et al., (“Merz '550”), which is a continuation application of U.S. Pat. No. 7,925,652 issued 12 Apr. 2011 (“Merz '652”), both patents being commonly assigned with the instant application. The disclosures of both Merz '550 and Merz '652 are hereby incorporated by this reference in their entirety for all purposes.
At least one method of substring match metric between the processed DBA name fields, for example, is conducted according to the following method. Comparing respective DBA names for two merchant locations, each character in common increments the metric by 1; each consecutive pair of letters in common the metric is incremented by an additional 1; each substring of three letters in common the metric is incremented by an additional 1; each substring of four letters in common the metric is incremented by an additional 1, and so on until every length of match has been recursively sought in the compared strings until the length of the shorter string has been reached.
Whichever metric employed, the results of the round-robin comparison represented in the edge weighting 110 are consolidated at 112 by comparing the calculated metric against a user-defined threshold. Those edges having a metric that exceed the threshold are aggregated at 114, with appropriate updates made to the data warehouse 102. Any edges that do not have sufficiently similar DBA names to exceed the metric threshold are discarded. Generally, though not exclusively, address similarity is not considered in this edge weighting 110 metric. However, merchant locations are screened against aggregation when the two merchant locations have a sufficient textual similarity in their addresses. This is generally the reverse of the prior-mentioned work concerning merchant continuity correction, where address similarity is a positive indicator of association.
A breadth-first search can then be conducted to add all connected merchant location nodes 302 with edge weights exceeding the user defined threshold to that merchant aggregate. Alternatively or additionally, from the compiled list of all connected merchant locations, sets of fully connected subgraphs may be formed. An example of a fully-connected subgraph is shown below in Table 2, and depicted graphically, generally 400, in
A fully connected subgraph 400 is a subset of merchant locations where each member of the subset is connected to each other member of the subset with an edge weight that exceeds the threshold criteria. Consider for example Table 2, which represents a group of commonly-branded coffee shops located on a large university campus. The location are each represented as a node 402 in
All of these Java Joe locations are frequented by the same cardholders, an indication of brand loyalty, and present an edge weighting that exceeds the threshold for aggregation. Further, the DBA names are similar textually. They will therefore be included in a breadth-first search. Further, the example subgroup 400 is fully-connected. This may be referred to as a ‘clique’ in the relevant professional networking literature. The likelihood that these locations are related increases proportionally with the number of fully-connected locations. While a fully connected subgraph represents a high-accuracy method for merchant aggregation from this data structure, it is suggested with the understanding that such an algorithm would be less efficient than the proposed edge-weighting.
It will be appreciated by those skilled in the art that the method described above may be operated by a machine operator having a suitable interface mechanism, and/or more typically in an automated manner, for example by operation of a network-enabled computer system including a processor executing a system of instructions stored on a machine-readable medium, RAM, hard disk drive, or the like. The instructions will cause the processor to operate in accordance with the present disclosure.
Turning then to
Variants of the above-disclosed and other features and functions, or alternatives thereof, may be desirably combined into many other different systems or applications. Various presently unforeseen or unanticipated alternatives, modifications, variations, or improvements therein may be subsequently made by those skilled in the art which are also intended to be encompassed by the following claims.