The present invention generally relates to customer insight systems, customer list generation, advertising targeting, business reputation management, and generation of automated campaign messages.
Customer Relationship Management (CRM) systems and/or Customer Insight (CI) systems track and measure marketing campaigns over multiple networks. CI and/or CI systems can track customer analysis using gathered customer information. CI and/or CI systems are used by many types of businesses to track customers. Such businesses can include merchants, call centers, social media, direct mail, data storage files, banks, and customer data queries. The goals of CI and/or CI systems typically include providing insight into the nature of customers, providing a platform for communicating with customers, and sometimes providing a platform for payment processing and query management. Often, CI and/or CI systems are used by businesses in order to generate leads or maximize sales to customers. CI and/or CI systems can also be used to identify and reward customers over a period of time.
Customer Insight (CI) systems in accordance with various embodiments of the invention gather information sets from multiple remote information sources and can merge the information sets to identify authoritative information describing the named entity. In several embodiments, the information sets and/or the authoritative information are identified using geographic location information associated with the information sets. In many embodiments, the CI systems identify relationship information within the merged information sets and use the relationship information to identify customers of businesses. Once identified, merged and/or authoritative information sets describing customers can be used to build customer lists, typical customer profiles, and best customer profiles. In addition, the CI system can utilize information describing customers to automatically generate advertising targeting data and online advertising campaigns.
One embodiment of a the method of the invention includes: merging information sets gathered by a customer insight system from multiple remote information sources to create merged information sets for given named entities corresponding to specific businesses, where the merged information sets for a given named entity that corresponds to a specific business are stored in a feeds database maintained by the customer insight system; identifying, using the customer insight system, relationships between given named entities in the feeds database and various related named entities based upon references in data within the merged information sets for a given named entity to a specific related named entity; identifying specific related named entities that correspond to customers of given named entities in the feeds database using the customer insight system; adding the specific related named entities identified as corresponding to customers of given named entities in the feeds database to a customer database using the customer insight system; identifying named entities in the customer database that correspond to customers of a specific named entity in the feeds database that corresponds to a particular business using the customer insight system; retrieving characteristic data describing named entities from the customer database that correspond to customers of a specific named entity using the customer insight system; generating a typical customer profile for the specific named entity in the feeds database that corresponds to the particular business from the characteristic data retrieved from the customer database that describes named entities that correspond to customers of the particular business using the customer insight system; and generating advertising targeting data for an online advertising campaign based at least in part upon the typical customer profile using the customer insight system.
In a further embodiment, identifying specific related named entities that correspond to customers of given named entities in the feeds database includes identifying transaction data within the merged information sets for given named entities in the feeds database that describe transactions that only occur between the given named entities and related named entities that are customers of the given named entities.
In another embodiment, identifying specific related named entities that correspond to customers of given named entities in the feeds database includes identifying matching content in the merged information sets for the given entities and named entities known to correspond to customers.
In a still further embodiment, matching content includes content selected from the group consisting of: the presence of an entity name in the merged information sets of both named entities; the presence of the same geographic location information in the merged information sets of both named entities; and the presence of the same uniquely identifying information in the merged information sets of both named entities.
In still another embodiment, identifying specific related named entities that correspond to customers of given named entities in the feeds database includes identifying relationship information in merged information sets including at least one piece of relationship information selected from the group consisting of: a name of the related entity in any record in the merged information sets for a given named entity in the feeds database; a phone number associated with a related named entity listed in a phone log in the merged information sets for a given named entity in the feeds database; email address associated with a related named entity on an email message in a set of emails in the merged information sets for a given named entity in the feeds database; an IP address or a MAC address associated with a specific related entity in a server log or an email message in the merged information sets for a given named entity in the feeds database; a name, or mailing address associated with a specific related named entity in loyalty program records in the merged information sets for a given named entity in the feeds database; and a name, credit card number, or billing address associated with a specific related named entity in credit card records in the merged information sets for a given named entity in the feeds database.
In a yet further embodiment, generating a typical customer profile includes filtering named entities identified in the customer database as corresponding to customers of a specific named entity in the feeds database in accordance with at least one filtering criterion.
In yet another embodiment, the at least one filtering criterion is selected from the group consisting of average transaction value, transaction frequency, and residential address.
In a further embodiment again, the advertising targeting data includes at least one piece of advertising targeting data selected from the group consisting of demographic targeting data, location targeting data, user targeting data, and keyword targeting data.
In another embodiment again, generating a typical customer profile for the specific named entity in the feeds database that corresponds to the particular business includes: determining at least one piece of demographic information from the retrieved characteristic data describing the named entities from the customer database that correspond to customers of the particular business using the customer insight system. In addition, generating advertising targeting data for an online advertising campaign based at least in part upon the typical customer profile includes generating demographic targeting data for an online advertising campaign using the at least one piece of demographic information using the customer insight system.
In a further additional embodiment, generating a typical customer profile for the specific named entity in the feeds database that corresponds to the particular business includes determining at least one piece of geographic location information from the retrieved characteristic data describing the named entities from the customer database that correspond to customers of the particular business using the customer insight system. In addition, generating advertising targeting data for an online advertising campaign based at least in part upon the typical customer profile includes generating location targeting data for an online advertising campaign using the at least one piece of geographic location information using the customer insight system.
In another additional embodiment, generating advertising targeting data for an online advertising campaign based at least in part upon a customer list includes: identifying named entities within the customer database matching the typical customer profile using the customer insight system; retrieving characteristic data from the customer database describing the named entities within the customer database matching the typical customer profile using the customer insight system; determining at least one user identifier for a specific service from the retrieved characteristic data describing the named entities from the customer database matching the typical customer profile using the customer insight system; and generating user targeting data for an online advertising campaign on the specific service using the at least one user identifier for the specific service using the customer insight system.
In a still yet further embodiment, generating a typical customer profile for the specific named entity in the feeds database that corresponds to the particular business includes determining at least one keyword from the retrieved characteristic data describing the named entities from the customer database that correspond to customers of the particular business using the customer insight system. In addition, generating advertising targeting data for an online advertising campaign based at least in part upon the typical customer profile includes generating keyword targeting data for an online advertising campaign using the at least one keyword using the customer insight system.
In still yet another embodiment, generating a typical customer profile for the specific named entity in the feeds database that corresponds to the particular business further includes: retrieving characteristic data describing named entities from the customer database that correspond to customers of the particular business using the customer insight system; and generating a plurality customer profile segments using the retrieved characteristic data. In addition, generating advertising targeting data for an online advertising campaign based at least in part upon the typical customer profile includes generating advertising targeting data using at least one of the plurality of customer profile segments.
A still further embodiment again also includes outputting the advertising targeting data as part of an online advertising campaign built using the customer insight system to at least one advertising network selected from the group consisting of a display advertising network, a search advertising network, a social media service advertising network, and a location based advertising network using the customer insight system.
In still another embodiment again, identifying named entities in the customer database that correspond to customers of a specific named entity in the feeds database that corresponds to a particular business further includes filtering named entities related to the specific named entity in accordance with at least one filtering criterion.
In a yet further embodiment again, the at least one filtering criterion is selected from the group consisting of average transaction value, transaction frequency, and residential address.
In yet another embodiment again, merging information sets gathered by a customer insight system from multiple remote information sources to create merged information sets for given named entities that correspond to specific businesses includes: obtaining at least one initial piece of identifying information for a given named entity using the customer insight system; and building an identifying information set based on the at least one initial piece of identifying information using the customer insight system by gathering additional identifying information that describes characteristics of the given named entity from a plurality of information sources, where the identifying information set includes geographic location information. In addition, merging information sets includes repeatedly: querying a plurality of remote information sources using the customer insight system, where queries provided to the plurality of remote information sources contain at least the geographic location information included in the identifying information set; receiving at least one information set from the plurality of remote information sources using the customer insight system, where the at least one received information set includes characteristic data describing at least the given named entity and the characteristic data includes geographic location information; and merging at least a subset of the received at least one information set with the identifying information set for the given named entity to create merged information sets for the given named entity that are stored in the feeds database using the customer insight system, where the customer insight system merges at least one given information set with the identifying information set for the given named entity based upon a comparison of geographic location information included in the at least one given information set and geographic location information included within the identifying information set.
In a yet further additional embodiment, wherein merging information sets gathered by a customer insight system from multiple remote information sources to create merged information sets for given named entities corresponding to specific businesses further includes: selecting characteristic data from the merged information sets to be used in an authoritative information set for the given named entity using the customer insight system, where the customer insight system selects at least one piece of characteristic data as part of an authoritative information set based upon at least one factor including a comparison of geographic location information associated with each of a plurality of different pieces of characteristic data that provide conflicting descriptions of a specific characteristic of the given named entity; and storing the authoritative information set in a production database using the customer insight system, wherein the production database stores authoritative information sets for a plurality of named entities generated using the merged information sets for the plurality of named entities maintained in the feeds database.
In yet another additional embodiment, the customer insight system selects at least one piece of characteristic data as part of the authoritative information set based upon at least one factor including: counting the number of times a characteristic data value is repeated within the merged information sets for the given named entity using the customer insight system; and weighting the counts of the number of times a characteristic data value is repeated within the merged information sets for the given named entity based upon scores of the relative reliability of remote information sources of the characteristic data within the merged information sets using the customer insight system, where the customer insight system maintains and updates the scores of the relative reliability of remote information sources over successive query operations.
In a further additional embodiment again, the customer insight system merges at least one given information set with the identifying information set for the given named entity based upon a comparison of geographic location information included in the at least one given information set and geographic location information included within the identifying information set by: determining at least one distance between the geographic location information included in the at least one given information set and the geographic location information included within the identifying information set using the customer insight system; and comparing the determined at least on distance to a threshold for merging information sets using the customer insight system.
In another additional embodiment again, determining at least one distance using the customer insight system includes generating geographic coordinates from the geographic location information included in the at least one given information set and the geographic location information included within the identifying information set.
In another further embodiment, the identifying information set includes at least one name, at least one address, and at least one phone number for the given named entity.
In still another further embodiment, the geographic location information included in the identifying information set includes at least one piece of information selected from the group consisting of an address, a geographic coordinate, a latitude and longitude coordinate pair, and relative location information.
In yet another further embodiment, selecting characteristic data from the merged information sets to be used in the authoritative information set further includes selecting a first piece of characteristic data from a first information set received from a first remote information source and a second piece of characteristic data describing a different characteristic of the given named entity from a second remote information source using the customer insight system.
In another further embodiment again, the given named entity is a named entity with a name attribute value that is non-unique and where the given named entity has characteristic data describing a plurality of location addresses.
In another further additional embodiment, the remote information sources include at least one remote information source selected from the group consisting of a search engine service, an online directory, a review website, a website, a server log, an email service, a messaging service, and a social media service.
In still yet another further embodiment, the merged information sets of a given named entity in the feeds database include at least one piece of information selected from the group consisting of: scrapes of web pages containing descriptions of a named entity; email messages obtained from email accounts associated with a named entity; phone logs for telephone accounts associated with a named entity; reviews associated with a named entity; checkins via location based social media services; likes, follows, and/or followers of user identities on social media services associated with a named entity; mentions of a named entity in posts to social media services; mobile application data from mobile devices associated with a named entity; and server logs of servers associated with a named entity.
In still another further embodiment again, the characteristic data describing characteristics of the identified named entities in the customer database are expressed as attribute value pairs.
A customer insight system in accordance with one embodiment of the invention includes at least one processing unit, and a memory storing a customer insight application. In addition, the customer insight application directs the at least one processing unit to: merge information sets gathered from multiple remote information sources to create merged information sets for given named entities corresponding to specific businesses, where the merged information sets for a given named entity that corresponds to a specific business are stored in a feeds database maintained by the customer insight system; identify relationships between given named entities in the feeds database and various related named entities based upon references in data within the merged information sets for a given named entity to a specific related named entity; identify specific related named entities that correspond to customers of given named entities in the feeds database; add the specific related named entities identified as corresponding to customers of given named entities in the feeds database to a customer database; identifying named entities in the customer database that correspond to customers of a specific named entity in the feeds database that corresponds to a particular business; retrieving characteristic data describing named entities from the customer database that correspond to customers of a specific named entity; generating a typical customer profile for the specific named entity in the feeds database that corresponds to the particular business from the characteristic data retrieved from the customer database that describes named entities that correspond to customers of the particular business; and generating advertising targeting data for an online advertising campaign based at least in part upon the typical customer profile.
Turning now to the drawings, customer relationship management (CRM) systems and/or customer insight (CI) systems in accordance with embodiments of the invention are illustrated. The CI systems of several embodiments gather consumer and business information and identify relationships between consumers and businesses. The CI systems use these relationships to provide several functionalities that are useful in managing customer relationships with businesses. These functionalities can include (but are not limited to) the automated generation of customer lists, building profiles for typical customers of businesses, building profiles of customers identified as the best customers, generation of advertising targeting data, and/or automated generation of campaign messages for identified customers of businesses. In order to enable these and additional functionalities, CI systems in accordance with several embodiments of the invention gather information from information sources, merge the gathered information, and relate merged information sets for businesses and consumers.
In many embodiments, the CI systems gather information from information sources on named entities including (but not limited to) consumers, businesses, transactions, locations, and things. The information sources can include (but are not limited to) websites, consumer devices, public directories, domain registrations, public records, merchant terminals, merchant-run or third-party loyalty programs, and additional sources that are discussed below. The gathered information can include (but is not limited to) attribute values for names, addresses, phone numbers, reviews, connections, dates, purchases, sales, and/or prices associated with entities. The gathered information can also include social media postings by consumers. In addition, CI systems in accordance with several embodiments of the invention can collect information regarding the transactions between consumers and businesses. CI systems can also perform further operations on the gathered consumer, business, and/or transaction information to produce insights into customers of businesses.
In several embodiments the CI systems merge gathered information sets according to the sets' similarity to particular consumers or businesses. When several sets of information gathered from information sources are similar enough that they can be said to refer to the same person or business, the CI systems can merge the several sets of information. For example, the CI systems can merge information sets from several social media profiles where they pass certain thresholds of similarity. The CI systems can also merge information sets from several online directories that contain listings that are determined to refer to the same business.
In a number of embodiments, CI systems can use geographic coordinates (referred to herein as “geocodes”) to assist in merging information sets, relating information sets, and/or other information management operations. For instance, where an information source provides an address, or where the information includes location metadata, a CI system can convert these addresses or location metadata into geographic coordinates. The geocodes can be used to assess whether information sets refer to the same location, or whether a consumer is interacting with a business. In some embodiments, geocodes correspond to latitude and longitude. In other embodiments, any of a variety of representations of geographic location information appropriate to the requirements of specific applications can be utilized for the encoding of geocodes.
In various embodiments, CI systems generate authoritative information sets from merged sets of information. An authoritative information set is a CI system's most accurate description of a named entity (e.g., the correct name, address, and phone number of a business or a consumer). The authoritative information set can also be the information set with the most complete description of the named entity including data aggregated from all of the merged information sets. CI systems can generate authoritative information sets using several techniques. In multiple embodiments, CI systems rate information sources for accuracy and size. A CI system can also consider how many times pieces of information are repeated across information sources (e.g., when multiple information sources provide the same name, address, or phone number for a consumer or business). In addition, CI systems can balance the ratings of sources against the repetition of information across sources. For instance, a CI system may select a piece of information that is repeated relatively infrequently, where the piece of information comes from a particularly trustworthy source. Based on at least one of the above described techniques, a CI system can identify an authoritative information set for a given named entity.
In several embodiments, CI systems dynamically update merged and authoritative information sets in response to queries for information. When a CI system receives a query for information, the CI system can respond by presenting the most up to date information from the merged and/or authoritative information sets. CI systems in accordance with certain embodiments of the intervention also update scheduled gathering of information to include specific crawling operations for information associated with received user queries. The crawls themselves can be performed using any of a variety techniques including (but not limited to) populating a set of URL templates with appropriate keywords drawn from a user query and/or additional characteristic data discovered by crawls executed in response to a user query. For instance, when users query for listings of businesses, a CI system can present listings from merged information sets associated with the queried business and also update the continuous gathering of information to include specific crawls for any updated information related to the businesses identified by the queries.
When used herein, the term “information set” can include structured data and/or unstructured data as required for varying embodiments of the invention. Information sets that include structured data can include elements that are tagged and/or parsed into specific fields. As an example, an authoritative information set for a person can include data parsed into specific fields, such as (but not limited to) name, address, phone number, and/or various status flags. Unstructured data can include freeform text and/or keywords. For instance, a merged information set for a business can include several keywords that trigger search hits for the business's name but are stored in an unstructured manner. Some embodiments include parsing operations to convert free form information into a structured information set. Such parsing operations may be performed as part of information gathering and/or crawling operations. Moreover, information sets can be the output of processes in accordance with embodiments of the invention, such as during parsing operations. Alternatively, information sets can be the input to processes in accordance with embodiments of the invention, such as during merging and/or authoritative information set generation. Different embodiments can use unstructured and/or structured information sets as inputs and/or outputs to varying processes and/or operations.
In multiple embodiments, CI systems relate merged and authoritative information sets for businesses and consumers. By relating the information sets, the CI systems can identify customers of businesses. In many embodiments, gathered transaction information can also be used to identify that a consumer has become a customer of a business. Additional sources of relationship data can include loyalty programs, point of sale systems, advertising network data, call tracking lines, phone records, emails between entities, and electronic contacts. Alternatively, or in addition to using transaction information, CI systems can compare geocodes generated from consumer and business information to identify when a consumer has become a customer of a business. For example, CI systems can use metadata within social media postings by consumers to identify when consumers have visited and/or transacted within the premises of businesses. Additionally, techniques similar to those used to merge information sets belonging to the same named entity can be used to relate information sets belonging to different entities. Establishing relationships between consumers and businesses can enable CI systems in accordance with a number of embodiments of the invention to provide powerful analytic functionalities.
In various embodiments, CI systems can use related information sets, merged information sets, and/or authoritative information sets to generate customer lists for businesses. The CI systems can present customer lists to users through web interface(s) or a phone or tablet “mobile” app. Customer lists can also be used in conjunction with other functionalities provided by the CI systems, such as linking customer lists with customer profiles generated from information sets.
In many embodiments, CI systems can produce customer profiles for consumers based on their interactions with businesses. The customer profiles can contain information including (but not limited to) transaction histories, various spending ratings, and/or demographic details regarding a customer. When used in conjunction with the automated customer lists, the customer profiles can enable a CI system to generate targeted advertising information for use in advertising networks. In addition, CI systems can also analyze the profiles of customers of a specific business in order to generate a typical customer profile(s) for the business. A typical customer profile can include information such as (but not limited to) geographic location, demographic information, and/or financial and economic data for the typical customer of the business and/or the profile of the typical best customer of the business. In some embodiments, the typical customer profile can also include but is not limited to one or more of the following pieces of information: gender balance, home ownership rates, education levels, annual household income, relationship status, number of children for the typical customer of the business, interests, and/or proximity to business from either home or work. The typical customer profiles can further be used in performing look-alike advertising targeting and/or 1:1 advertising targeting that leverages the known information about customers of a business to find potential or existing customers. By targeting a businesses' best customers, the CI system can increase the frequency with which the best customers patronize the business.
In various embodiments, CI systems use customer profile data to generate maps indicating geographic concentrations of a business's customers. A CI system can use the association between a customer list and the underlying geographic data from the merged and/or authoritative information sets for consumers to identify geographic concentrations of customers. In several embodiments, CI systems can generate automated campaign messages for use in marketing campaigns to customers identified in the automated customer lists based on triggers or characteristics of the customers. These automated campaign messages can be targeted toward customers that, for example, have not transacted with a business for a period of time (a trigger) or to customers that fit certain segmentation rules (characteristics). The automated campaign messages are directed to customers using interfaces provided by the CI systems. The automated campaign messages are transmitted through the interfaces of the CI systems to various channels including (but not limited to) social media sites, Internet messengers, and/or emails. However, customers often do not wish to be sent messages on channels on which they have not interacted with a business. The CI systems of many embodiments restrict the transmission of automated campaign messages based on the interactions customers have had with businesses. The CI systems of these embodiments can limit transmission of automated campaign messages to channels on which customers have interacted with businesses (e.g., only sending a message over a social media website when a customer has interacted with a business on the social media website). Additional types of automated campaign messages and conditions for their generation are discussed below.
CI systems in accordance with a number of embodiments of the invention may not expose all of the information the CI systems have gathered. CI systems can gather more information than the users of the CI systems have rights to access. Often, the users of the CI systems are merchants seeking information on customers associated with businesses. Merchant users often do not have rights to access certain otherwise public information gathered by the CI systems. For instance, minors may post information to social media websites, but sharing of information associated with minors is restricted in many legal jurisdictions. Accordingly, CI systems can restrict access to certain gathered information in order to comply with legal requirements and to respect other privacy considerations. In addition, the CI systems can comply with any legal requirements placed upon the gathering and storing of information in the legal jurisdictions in which they are implemented.
Having discussed a brief overview of the operations and functionalities CI systems in accordance with many embodiments of the invention, a more detailed discussion of system and methods for CI systems in accordance with embodiments of the invention follows below.
A network architecture for a customer insight system for gathering, relating, and presenting business and consumer information in accordance with an embodiment of the invention is illustrated in
As illustrated in
In the embodiment illustrated in
In many embodiments, CI management system 102 can gather information from information sources over network 104. These information sources include web, file, and/or email servers 106, computing devices 108, and/or mobile devices 112. Web, file, and/or email servers 106 can include numerous source types, such as (but not limited to) newspaper websites, social media websites, social network websites, blogs, vertical information sites, travel guides, local search sites, internet yellow pages, entertainment guides, city guides, radio websites, television station websites, best of websites, business databases, consumer databases, consumer directory sites, marketing sites, deal and offer websites, coupon sites, coupon applications, general search engines, online encyclopedias, events sites, community sites, specialty websites, corporate websites, magazines, shopping sites, ecommerce sites, classifieds, phone number directories, domain directories, specially marked up sites, opt-in single sign on sites, social aggregation sites, music websites, TV websites, movie sites, social bookmarking sites, discussion sites, APIs, photo sharing sites, social sharing sites, review sites, app directories, app review sites, job listings, business card sites, personal websites, business websites, voicemail recordings converted to text, reverse picture lookups and matching services and websites, instant messaging lookup and/or directory sites, real estate information sites, Q&A sites, digital content stores, political and/or campaign information sites, check-in sites and/or apps, and/or mobile apps. Web, file, and/or email servers 106 can also include any addressable IP location or URL that contains consumer or business information.
Computing devices 108 include end machines (e.g., desktop computers, laptop computers, and/or virtual machines) that contain or provide consumer or business information. CI management system 102 may receive information from these machines via an email or may request this information directly where a consumer agrees to provides the information. Computing devices 108 can also serve as an information source in a similar manner to those listed above with respect to web, file, and/or email servers 106.
Mobile devices 112 are devices (e.g., cellular phones, laptop computers, smart phones, and/or tablet computers) that can contain or provide information. Mobile devices 112 typically provide richer geographic location information than computing devices 108 or web, file, and/or email servers 106 as many mobile devices 112 include Global Position System (GPS) hardware (e.g., a GPS receiver and/or a GPS antenna). In addition, information gathered from mobile devices often has metadata tags with geocodes that reveal, for instance, where a picture received from a mobile device was taken. In several embodiments, CI management system 102 can take advantage of the rich information provided by mobile devices 112 in order to relate consumer information to business information. For instance, the CI management system 102 can use the GPS data provided by the mobile devices to identify that a consumer has transacted with a business.
Although a specific architecture is shown in
The process 200 can gather (220) information based on received queries and/or scheduled crawls. The queries typically contain information that suggests certain relevant entities. For instance, the query may contain an attribute value (e.g., a name, an address, or a phone number) associated with a named entity. When the query includes such attribute values, the process 200 can gather information based on the included attribute values. In numerous embodiments, the gathering may be performed via crawler processes, which are discussed further below. For instance, process 200 may gather the information via web crawling operations using attribute values of queries as search terms.
The gathered information can then optionally be used in a series of information management operations. These information management operations can be used in order to identify named entities and characteristic data for said named entities from the gathered information. In some embodiments, an initial identifying information set is gathered prior to further operations. This initial identifying information set can include basic identification information, such as (but not limited to) name, address, and/or phone numbers. The initial identifying information set can be used in querying remote information sources for information utilizing characteristic data in the initial identifying information. Typically, the identifying information set will include characteristic data likely to uniquely identify a particular named entity. One example of such uniquely identifying information is a cellular phone number. Cellular phone numbers often are used by only a single named entity, whereas characteristic data such as landline phone numbers could overlap with multiple named entities. Embodiments of the invention can utilize various combinations of identifying information to assist in gathering information sets as required based on available information for particular named entities.
The process 200 optionally generates (230) merged information sets for at least one entity from gathered information. In several embodiments, the merging of information may be continuously performed as a background process. A merged information set contains information from multiple sources that refers to the same named entity (e.g., a consumer, a business, a transaction, a thing, a customer, and/or a location). The generation (230) of merged information sets includes gathering sets of information from information sources and merging gathered information sets when they are of sufficient similarity according to certain thresholds of similarity. The gathered information can include (but is not limited to) standard identity information, such as names, addresses, and phone numbers for various entities. The information sources can include numerous types of sources as discussed above. Similarity thresholds can serve to verify that gathered information sets refer to the same named entity (e.g., a same person or business). In numerous embodiments, the similarity is assessed by comparing the attribute values (e.g., names, addresses, and/or phone numbers) of sets of gathered information. In other embodiments a variety of pieces of identifying information can be used in determining whether to merge information sets from different sources of data in accordance with embodiments of the invention.
As an example of merging gathered information sets, assume that a directory website and a social media website both yield information sets indicating that a person named “Jon D. Doe” lives at “555 Smith St. in California”. The data points of “Jon D. Doe” and “555 Smith St. in California” from the directory website comprise a first information set. The data points of “Jon D. Doe” and “555 Smith St. in California” from the social media website comprise a second information set. Because the directory website information set and the social media website information set are sufficiently similar, the process 200 can merge the two information sets. Once merged, a CI system can identify that the first information set from the directory website and the second information set from the social media website refer to the same person. In which case, CI system can assign a common unique identifier to the merged information sets.
The process 200 optionally generates (240) authoritative information sets from merged information sets and/or gathered information. The generation (240) of authoritative information sets can be continuously performed as a background process. In numerous embodiments, the CI systems can use authoritative information sets as the most reliable sets of information for a named entity. CI systems in accordance with several embodiments of the invention may use measures of reliability to determine what information to use for authoritative sets when gathered information does not match (e.g., when two merged information sets contain different information, such as different phone numbers). In various embodiments, the CI systems may maintain various ratings or scores for information sources, such as (but not limited to) accuracy and size ratings. In combination with these ratings, a CI system can select the most commonly listed information as influenced by the size and accuracy ratings for the information sources.
As an example of comparing source ratings, assume that a CI system according to an embodiment of the invention is retrieving information from a high size, high accuracy rating directory website and a low size, low accuracy rating advertising website. If the directory website lists Jon D. Doe's phone number as (555) 123-4567 and the advertising website lists Jon D. Doe's phone number as (555) 321-4567, then the CI system in this example will have higher accuracy and size ratings for the directory website listing and use (555) 123-4567 as the phone number for an authoritative information set for Jon D. Doe.
The process 200 optionally updates (250) merged and/or authoritative information sets based on gathered information. Previous crawls could have resulted in stored merged and/or authoritative information sets for entities. The continuous gathering and crawling process can result in a need to update previously stored information. Publically available information, particularly that available via the Internet, has a tendency to degrade in quality over time. Due to people moving, businesses closing, and erroneous data entry; information only gets less reliable with time. Accordingly, the information gathered in connection with received queries is used to update merged and/or authoritative information sets. For instance, when a query involves a particular business, information gathered for that business can be used to update the merged information sets for that business. In embodiments where authoritative and merged information sets are maintained, updating a merged information set may result in a recalculation of an associated authoritative information set. As an example, if a person's account name on a highly reputable search website has changed, the authoritative and merged information sets may both be updated due to the weight of highly reputable search website as an information source.
The process can decide whether to continue crawling (255). Process 200 may stop crawling once crawling operations cease returning information that is different than previous crawls. For instance, once queries on a certain set of attributes for a named entity cease returning different results for the named entity, the process will cease crawling for a time. In addition, process 200 may stop crawling when an indication that a user of the CI system can stopped looking at a particular named entity for which crawls and/or gathering operations are being performed.
The process 200 optionally identifies (260) relationships between different named entities. In some embodiments, the process 200 may use the information in the merged and/or authoritative information sets in order to relate the entities represented by the information sets. By relating the entities, the CI systems can identify customers of businesses. CI systems can use gathered transaction information to identify that a consumer has become a customer of a business. Alternatively, or in addition to using transaction information, CI systems can compare geocodes generated and/or gathered from entity information to identify when consumers have become customers of businesses. For example, CI systems may use metadata within social media postings by consumers to identify social media postings made within premises of businesses (e.g., when a consumer checks-in at a restaurant). Establishing relationships between consumers and businesses enables a CI system of to identify customers of businesses. Customer identification enables numerous powerful customer insight functionalities that will be discussed in detail below.
The process 200 returns (270) information in response to queries. The information returned can take many forms. In several embodiments, the returned information can include (but is not limited to) data from merged information sets, data from authoritative information sets, customer lists for businesses identified from the relationships between consumers and businesses. Alternatively, or in addition to the returned information discussed above, CI systems can return information describing relationships between gathered information (e.g., information that identifies customers of businesses). In many embodiments, CI systems can produce customer profiles for consumers based on their interactions with businesses. The customer profiles can contain transaction histories, various spending ratings, and/or details regarding a customer. The process 200 can return customer profiles in response to queries. CI systems can analyze the customer profiles in order to generate typical customer profiles for a given businesses. Typical customer profiles can indicate ranges of demographic, financial, and economic data for typical customers of businesses. The process 200 can return the customer profiles or typical customer profiles in response to queries. The process 200 of numerous embodiments can also return maps indicating geographic concentrations of customers for businesses generated from the customer profiles. Further capabilities of the CI systems of multiple embodiments are discussed in more detail below.
While the operations described as part of process 200 were presented in the order as they appeared in the embodiment illustrated in
A customer insight (CI) system in accordance with an embodiment of the invention is illustrated in
The scheduler process server 305 controls how the crawler process server 310 queries information sources 320 over network 315. The scheduler process server 305 can prioritize different searches based on several factors. Higher priority can be given to real-time information requests received from web server 355. Real-time information requests can occur when a user queries information about a named entity directly. A real-time information request can also be inferred from attributes contained within a query. Attributes within a query can include (but are not limited to) a name, an address, and/or a phone number associated with an entity. When a real-time information request is received, the scheduler process server 305 can instruct the crawler process server 310 to update the priority of scheduled information crawling based on attributes within or inferred from the real-time information request. The scheduler process server 305 can also instruct the crawler process server 310 to perform lower priority, batch gathering of information. Batch gathering can relate to old information that is need of updating, or simply lower priority crawls. In several embodiments, the CI system 300 only stores gathered information for a particular period of time (e.g., between six to twelve months) and deletes information that has been stored for a time exceeding the particular period of time.
The crawler process server 310 can gather information from information sources 320. As discussed above, the crawler process server 310 can receive instructions from the scheduler process server 305 concerning information for which to search and the priority in which to execute searches. The crawler process server 310 interacts with network 315 to reach information sources 320. In the embodiment illustrated in
Information sources 320 can be any network addressable source of information. Information sources 320 can include web, file, and/or email servers, computing devices, and/or mobile devices. Examples of information sources 320 include (but are not limited to) newspaper websites, social media websites, social network websites, blogs, vertical information sites, travel guides, local search sites, internet yellow pages, entertainment guides, city guides, radio websites, television station websites, best of websites, business databases, consumer databases, consumer directory sites, marketing sites, deal and offer websites, coupon sites, coupon applications, general search engines, online encyclopedias, events sites, community sites, specialty websites, corporate websites, magazines, shopping sites, ecommerce sites, classifieds, phone number directories, domain directories, specially marked up sites, opt-in single sign on sites, social aggregation sites, music websites, TV websites, movie sites, social bookmarking sites, discussion sites, APIs, photo sharing sites, social sharing sites, review sites, app directories, app review sites, job listings, business card sites, personal websites, business websites, voicemail recordings converted to text, reverse picture lookups and matching services and websites, instant messaging lookup and/or directory sites, real estate information sites, Q&A sites, digital content stores, political and/or campaign information sites, check-in sites and/or apps, and/or mobile apps. The gathered information can include (but is not limited to) standard identity information, such as the name, address, and/or phone numbers of various businesses and consumers along with transaction information such as purchases and prices. The crawler process server 310 can gather different types of information (e.g., consumer, business, and/or transaction information) from the information sources 320 according to different processes.
The information gathered by the crawler process server 310 is stored in at least one crawler database 325. The crawler database 325 can store raw crawled data before it is parsed or merged into other forms of data. An application server 330 can perform initial parsing of the raw data in the crawler database 325. Parsed data can be stored in the feeds database 335. In a number of embodiments, the application server 330 additionally stores the parsed information in container files according to the information types collected. For instance, a container file for business information categorizes gathered information as belonging to certain attribute values, such as a name, an address, and/or a phone number. Various embodiments of the invention provide for many different ways to containerize gathered information.
The merge process server 340 merges information sets stored in order to build merged information sets for entities. Merged information sets are clusters of information from different information sources that are sufficiently similar to be considered to be referring to the same entity (e.g., two profiles for the same person from two different social media websites). Information sets in the feeds database 335 are merged where they are sufficiently similar according to certain thresholds of similarity. The thresholds of similarity can serve to verify that gathered information sets refer to the same entity. In multiple embodiments, the similarity is assessed by comparing the names, addresses, and phone numbers of sets of gathered information. In some embodiments, the merge process server 340 scores information sets for similarity to other information sets based on the attribute values stored in the information sets. The merge process server 340 may also merge information sets where the names, addresses, or phone numbers within evaluated information sets vary by limited permutations or small values. In some embodiments, the merge process server 340 assigns a same common unique identifier to merged information sets. For example, all merged information sets for a person named “Jon D. Doe” could be assigned a common unique identifier.
The production process server 345 can generate authoritative information sets for entities using information from the merged information sets stored in the feeds database 335. An authoritative information set is the CI system's 300 most accurate description of a named entity. The production process server 345 generates authoritative information sets using several techniques. In numerous embodiments, the production process server 345 rates information sources for accuracy and size. The production process server 345 can also consider how many times a piece of information is repeated across information sources. The production process server 345 can assess the ratings of the sources of information and also measure how often information is repeated across a merged information set. In addition, the production process server 345 of some embodiments balances the ratings of sources against the repetition of information across sources. For instance, the production process server 345 may select a piece of information for use in an authoritative information set, where the piece of information is repeated relatively infrequently but the piece of information comes from a particularly trustworthy source. Based on at least one of these techniques, the production process server 345 identifies authoritative information sets for named entities. Once authoritative information sets are created, they are stored in the production database 350.
The relation process server 365 can generate and/or infer relationships between entities. The relation process server 365 can use merged and authoritative information sets for different named entities such as (but not limited to) businesses, location, events, and consumers in order to generate relationship information. Through generating relationship information, the relation process server 365 provides many of the customer insight functionalities of the CI system 300. In many embodiments, the relation process server 365 uses gathered transaction information to identify relationships between entities. Alternatively, or in addition to using transaction information, the relation process server 365 of several embodiments compares geocodes generated from consumer and business information to identify relationships between entities. For example, the relation process server 365 of a number of embodiments uses metadata within social media postings by consumers to identify when consumers have transacted within the premises of businesses. In addition, the relation process server 365 can use reviews posted by consumers and social media check-ins as the basis of relating consumers to businesses. In addition, the relation process server 365 can also identify relationships between other types of entities, such as (but not limited to) relationships between consumers (e.g. who a person's friends are), relationships between businesses (e.g., business to business transactions), and relationships between locations and consumers (e.g., where a person frequents or lives). In many embodiments, the relation process server 365 stores the generated relationship information with merged and authoritative information sets in the feeds database 335 and/or the production database 350.
Once the relation process server 365 has generated relationship information, the relationship information can be used to provide customer insight functionalities. In many embodiments, the customer process server 370 can utilize relationship information, transaction information, merged information sets, and/or authoritative information sets to automatically identify current and/or potential customers of businesses. Typically, the customer process server 370 stores the identified customers in the customer database 375. In addition, the customer process server 370 of many embodiments can generate customer lists from the identified customers. The customer lists in several embodiments are presented to users through the user interface 360.
The targeting process server 380 of many embodiments produces advertising targeting data from previously identified customers. The advertising targeting data can be the basis for advertising campaigns that leverage the known information about customers in CI systems in accordance with embodiments of the invention. In addition, the targeting process server 380 of many embodiments can produce customer profiles for consumers based on their interactions with certain businesses. The customer profiles contain transaction histories, various spending ratings, and details regarding a customer. The targeting process server 380 can analyze the customer profiles for businesses in order to generate typical customer profiles for the businesses. The targeting process server 380 can further leverage known customer information to performed look-alike targeting and 1:1 targeting to allow discovery of new customers for targeting based on what is known about existing customers. Thereby, the targeting process server 380 can identify potential customers for a business for which advertising should be targeted. From the various identified and generated customer information, the targeting process server 380 can generate advertising targeting data. In some embodiments, the targeting process server 380 can further segment the generated advertising targeting data into more narrow categories of targeting.
Many of the functionalities of the targeting process server 380 and the other servers can be accessed through the web server 355. The web server 355 can use the merged information sets stored in the feeds database 335, the authoritative information sets stored in the production database 350, and the relationships established by the relation process server 365 to provide customer insight functionalities through the user interface 360. For instance, the web server 355 can return information from the feeds database 335 and the production database 350 in response to user queries. The web server 355 can also provide users access to the relationship information established by the relation process server 365.
The user interface 360 of various embodiments is the channel by which users can access the customer insight functions provided by the web server 355. For instance, automated campaign message services are run through the user interface 360 (as opposed to through private emails of the users of CI system 300). The web server 355 of some embodiments also updates the scheduler process server 305 in response to user queries received from the user interface 360 so that queried information is as current as possible and that future information gathering reflects queried information.
While the servers and databases of CI system 300 are shown as separate entities in the embodiment illustrated in
In many embodiments, the CI systems gather information from information sources describing named entities. The gathered information can include (but is not limited to) attribute values for names, addresses, phone numbers, reviews, connections, dates, prices, transactions, and interests. The gathered information can also include social media and other website postings and the associated metadata of the social media postings of consumers. The gathering process is typically performed by a crawler process server. The following discussion details the various gathering processes that can be performed by crawler process servers in accordance with embodiments of the invention.
A process performed by a CI system to gather business information in accordance with an embodiment of the invention is illustrated in
Gathered business information can be parsed (420) to identify attribute values for businesses within the gathered information (e.g., names, addresses, phone numbers, and/or hours of operations). In some embodiments, the parsing process additionally involves storing the parsed information in container files according to the information types collected. For instance, a container file for business information can categorize information relating to the business such as the business's name, address, phone numbers, hours, prices, and/or reviews. In many embodiments, parsed and containerized business information is stored in a crawler database. As can be appreciated, any of a variety of information can be gathered and parsed as appropriate to the requirements of specific applications in accordance with embodiments of the invention.
Sets of parsed business information can optionally be associated (430) with source identifiers. Association with source identifiers is not necessary, where the information already includes a source identifier provided by the information source. However, when information sources do not provide source identifiers, associations between parsed information sets and source identifiers can be generated. In multiple embodiments, the generated source identifier for an information set is the URL from which the information set was gathered.
In numerous embodiments, the business information sets are (440) in a feeds database. The business information sets are initially stored unmerged (e.g., not associated with other business information sets). Further merge processing can be performed in order to identify clusters of business information sets that describe the same businesses. In some embodiments, gathered information includes information that is relevant to both consumer entities and business entities. For instance, reviews of businesses posted by consumers are stored as both business information and as consumer information (e.g., a review for the business and a review by the consumer).
A process performed by a CI system to gather consumer information in accordance with an embodiment of the invention is illustrated in
Gathered consumer information can be parsed (520) to identify attribute values for consumers within the gathered information. In numerous embodiments, the parsing process additionally involves storing the parsed information in container files according to the information types collected. For instance, a container file for consumer information can categorize information relating to the consumer such as the consumer's names, addresses, phone numbers, relatives, friends, owned properties, and/or income. In multiple embodiments, parsed and containerized consumer information can be stored in a crawler database. As can readily be appreciated, any of a variety of information can be gathered as appropriate to the requirements of the invention.
Sets of parsed consumer information can optionally be associated (530) with source identifiers. Association with source identifiers is not necessary, where the information already includes a source identifier provided by the information source. For instance, information gathered from major social media websites will include social media website specific source identifiers. However, when information sources do not provide source identifiers, associations between parsed information sets and source identifiers can be generated. In some embodiments, the generated source identifier for an information set is the URL from which the information set was gathered.
In numerous embodiments, the consumer information sets are stored (540) in a feeds database. The consumer information sets are initially stored unmerged (e.g., not associated with other consumer information sets). Further merge processing can be performed in order to identify clusters of consumer information sets that describe the same consumers. In some embodiments, gathered information includes information that is relevant to both business entities and consumer entities. For instance, reviews of businesses posted by consumers can be stored as both consumer information and as business information (e.g., a review for the business and a review by the consumer).
A process performed by a CI system in gathering transaction information between entities (e.g., purchases by consumers from businesses) in accordance with an embodiment of the invention is illustrated in
Furthermore, merchants can be a source of transaction information. For instance, a merchant can provide credit card records utilizing login credentials supplied by the merchant to access the data through crawling. In addition, some embodiments provide for the use of optical character recognition (OCR) scans of paper records or pictures of paper records from businesses and merchants. The scans can indicate transactions, such as credit card purchases. Moreover, where merchants have applications or systems that customers interact with during transactions, several embodiments provide for application programming interface (API) integrations with these applications or systems.
Gathered transaction information can be parsed (620) to identify attribute values for transactions within the gathered information. The identified attribute values for transactions can include (but are not limited to) times, dates, amounts, parties, any deals present, what was purchased, related recommendations, and/or frequencies of transactions. In many embodiments, the parsing process additionally involves storing the parsed information in container files according to the information types collected. For instance, a container file for transaction information can categorize information relating to the transaction such as the transaction's time, date, amount, and/or parties to the transaction. In numerous embodiments, parsed and containerized transaction information can be stored in a crawler database. As can readily be appreciated, any of a variety of information can be gathered as appropriate to the requirements of the invention.
Sets of parsed transaction information can be associated with source identifiers. Association with source identifiers is not necessary, where the information already includes a source identifier provided by the information source. For instance, information gathered from credit bureaus can often include source identifiers provided by the credit bureaus. However, when information sources do not provide source identifiers, associations between parsed information sets and source identifiers can be generated. In various embodiments, the generated source identifier for an information set is the URL from which the transaction information set was gathered.
In several embodiments, transaction information sets are stored (640) in a feeds database. The transaction information sets are initially stored unmerged (e.g., not associated with other information sets). Further merge and relationship processing can performed in order to identify previously stored information sets with which to associate the transaction information sets. For instance, merge and relationship processing may be necessary to associate collected transaction information sets with particular consumers or businesses. In numerous embodiments, each transaction information set is merged and related to at least two other information sets (e.g., related to a consumer information set and a business information set where the consumer transacted with the business).
A process performed by a CI system in gathering information on things, events, and/or locations in accordance with an embodiment of the invention is illustrated in
Gathered information on things, events, and/or locations can be parsed (720) to identify attribute values for information on things, events, and/or locations within the gathered information. The identified attribute values can include (but are not limited to) times, dates, addresses, viewers, ratings, and/or sizes. In many embodiments, the parsing process additionally involves storing the parsed information in container files according to the information types collected. For instance, a container file for information on things, events, and/or locations can categorize information relating to the information on things, events, and/or locations. In a number of embodiments, parsed and containerized information on things, events, and/or locations can be stored in a crawler database. As can readily be appreciated, any of a variety of information can be gathered as appropriate to the requirements of the invention.
Sets of parsed information on things, events, and/or locations can optionally be associated (730) with source identifiers. Association with source identifiers is not necessary, where the information already includes a source identifier provided by the information source. However, when information sources do not provide source identifiers, associations between parsed information sets and source identifiers can be generated. In numerous embodiments, the generated source identifier for an information set is the URL from which the information on things, events, and/or locations was gathered. In some embodiments, the information sets for things, events, and/or locations are stored (740) in a feeds database. The information sets for things, events, and/or locations are initially stored unmerged (e.g., not associated with other information sets). Further merge and relationship processing can performed in order to identify previously stored information sets with which to associate the information sets for things, events, and/or locations.
While the operations described as part of processes 400, 500, 600, and 700 were presented in the order as they appeared in the embodiments illustrated in
CI systems in accordance with many embodiments of the invention merge gathered information sets according to the sets' similarity to particular named entities. When several sets of information gathered from information sources are similar enough that they can be said to refer to a same named entity (e.g., a person or a business), the CI systems can merge the sets of information to create a merged information set that describes the named entity. As discussed above, information sets can include clusters of information gathered from information sources, such as (but not limited to) profiles of persons from social media websites, listings of businesses from directory websites, and/or reviews of a businesses (submitted from a mobile device). The CI systems can use several measures of similarity to determine when gathered information sets refer to the same entity. The CI systems can assess differences (or lack thereof) between attribute values in the gathered information sets. For example, CI systems in accordance with many embodiments of the invention merge information sets where the differences between the information sets is merely a permutation in a name, a minor numerical difference in addresses, and/or where the information sets are gathered from similar geocodes. Merging information sets can be an important process for CI systems in accordance with embodiments of the invention as information about a single person or business can come from many different information sources.
A process performed by a CI system to merge gathered information sets in accordance with an embodiment of the invention is illustrated in
In several embodiments, at least two gathered sets of information are selected (820) from gathered information sets for comparison. Information sets can be selected from a feeds database as part of a continuous selection process or as information sets are added to the feeds database. For instance, in several embodiments of the invention, the CI system may select and assess gathered information sets as they are added to a feeds dataset. This ensures that newly gathered information sets are compared and assessed before storage with the remainder of the gathered information sets. Other embodiments may use a continuous crawling process to assess previously stored information sets from the feeds database. In a continuous crawling process, a CI system can continuously compare stored information sets for their relative similarity to each other. Numerous embodiments of the invention select sets of information to compare for merger based on a shortened comparison scheme that compares basic information from sets in order to identify information sets to select for a more full assessment.
Similarity of attribute values within two or more sets of information can be scored (830). Different embodiments of the invention can use various methods to score the similarity of attributes within the selected sets of information. For instance, the attribute values can be assessed for matching percentages (e.g., where all the attributes are the same between two information sets, the matching percentage would be 100%). Alternatively, or in addition to matching comparisons, embodiments of the invention can use location information within the information sets to identify geocodes for the information sets. For instance, where the gathered information sets have attribute values for addresses, or where the gathered information sets have geographic metadata (such as when the gathered information sets are gathered from mobile devices with GPS technology), the merge processes of a number of embodiments convert the addresses and/or location metadata into geographic coordinates (i.e., geocodes). These geocodes can be used to assess whether the selected information sets should be merged.
Selected information can (optionally) be merged (840) based upon the similarity of the selected information sets. For instance, two selected information sets can be merged when the differences in their attribute values fall within a threshold percentage. Two selected information sets can also be merged, where the differences between their geocodes satisfy certain geometric and statistical requirements. An election not to merge the selected information sets can occur when the selected information sets fail to satisfy any assessment of similarity. In such circumstances, CI systems in accordance with several embodiments of the invention judge the dissimilar sets to refer to different entities. For instance, the selected sets may refer to different individuals.
Some embodiments can merge information sets based on only sub-portions of information being similar between the selected information sets. For instance, information sets can be merged where only a single common attribute, such as a name, address, or phone number, is found between the two selected information sets. Such mergers may be performed where the selected information sets are gathered from different types of sources. For instance, where a first selected information set is a phone record and a second information set in a web page, yet both information sets include at least one sufficiently similar attribute. The merger of disparate types of information sets based on limited points of similarity allows for the binding of information of entities from diverse sources. The merged information sets can be stored in a merge database of a CI system.
Several example information sets to be selected, assessed, and optionally merged are conceptually illustrated in
Information sets 910, 920, and 930 each include several attribute value pairs. The attribute value pairs include names, addresses, and/or phone numbers. In addition, a source identifier field and a common ID field are present in each of the information sets. The attribute value pairs and fields shown in example 900 are pairs and fields for one embodiment of the invention. Other embodiments may include additional attribute value pairs and fields to store additional information (such as time gathered, time stored, source ratings, and/or files sizes) or may include fewer attribute pairs and fields (e.g., some embodiments do not include a Common ID field). In addition, other embodiments of the invention may include additional attribute value pairs for multiple names, multiple addresses, and/or multiple phone numbers. For instance, the attribute value pairs can include cell phone number, home phone number, and work phone number. In many embodiments, the information sets are stored in databases and/or in container files that categorize and organize attribute values for gathered information to enable more efficient comparison of values between information sets.
As indicated in
A CI system in accordance with many embodiments of the invention score the similarity of attribute values within several selected information sets (in this case, information sets 910, 920, and 930). This scoring can be accomplished by comparing the various attribute values and field values of the selected information sets. Information set 910 includes the same name value as information set 920, but the name value for information set 930 (John Dough) is significantly different from information set 910 (Jon D. Doe) and information set 920 (Jon Doe). The similarity score for information set 930 in comparison to information sets 910 and 920 with regards to the name attribute value would be fairly low in several embodiments.
Information sets 910 and 920 include similar addresses, “555 Smith St. Evale” and “555 Smith St. Evale, Calif.”, respectively. However, Information set 930 has a significantly different address of “222 Smith St”. Multiple embodiments of the CI systems compare geocodes from address attribute values and perform geometric and geographic analyses on addresses in order to assess their similarity. In the example shown in
Information sets 910 and 920 include the same phone number “(555) 123-4567”. However, information set 930 has a different phone number of “(555) 321-4567”. CI systems in accordance with a number of embodiments of the invention compare phone numbers via a permutation computation that calculates a similarity score based on how many permutations exist between phone numbers. For instance, where two compared phone numbers are only one permutation apart, then the phone numbers can be considered to be similar. In example 900, information set 930 is separated by two permutations from the phone numbers from information sets 910 and 920. Accordingly, information set 930 can be scored as dissimilar to information sets 910 and 920. In embodiments where multiple phone number types are included in the information sets, differences in phone number types can be the basis for generating similarity and/or dissimilarity scores between information sets. For instance, where two information sets have a matching phone number, but the phone number is for a cell phone in the first information set and a work phone in the second information set.
CI systems in accordance with several embodiments of the invention can generate a composite score for assessed information sets that combines the similarity scores generated from the comparisons of the attribute value pairs and the field values. In example 900, information set 910 can be scored as having a high composite similarity score with regards to information set 920 on the basis that the attribute value pairs between information set 910 and information set 920 have (1) a high similarity score in the name attribute, (2) a high similarity score in the address attribute, and (3) a high (matching) similarity score in the phone attribute. However, information set 930 would have a low composite similarity score to information sets 910 and 920 as the attribute value pairs between information set 930 and information sets 910 and 920 include (1) a low similarity score in the name attribute, (2) a low similarity score in the address attribute, and (3) low similarity score in the phone attribute. CI systems in accordance with several embodiments of the invention use the composite scores for information sets 910, 920, and 930 as the basis for making a decision as to whether to merge the information sets.
In example 900, information set 910 and information set 920 are sufficiently similar to be merged. By merging information set 910 and information set 920, the CI system identifies the two information sets as referring to a same person, Jon D. Doe. In many embodiments, a CI system can assign common unique identifiers to merge information sets in order to identify the sets as being merged. The common unique identifier is common to merged information sets, and each collection of merged information has a unique identifier (e.g., the information sets that are merged with respect to Jon D. Doe get a unique identifier that is common to all of the merged information sets for Jon D. Doe). As shown in example 900, the common unique identifier 1234-55555 is assigned to information sets 910 and 920 for Jon D. Doe. No common unique identifier (or a different unique identifier) is assigned to information set 930 due to its low similarity score with information sets 910 and 920. Other embodiments of the invention may use different numerical conventions for common unique identifiers, including (but not limited to) hexadecimal and/or additional digits as appropriate to the requirements of specific application. Additional techniques for merging information sets using geographic location information are discussed below.
CI systems in accordance with many embodiments of the invention use location information within information sets to identify and/or generate geocodes for the information sets. The geocodes can be used to identify relationships between different information sets based on whether they were gathered from a same or different location. The geocodes can also be used to identify when information sets are related to a same or different location. For instance, the geocodes can be used to identify when a consumer has checked in at a business, or to identify when two information sets refer to the same location. The geocodes can be generated from address attribute values in selected information sets, or from geographic metadata connected to the selected information sets. For instance, mobile devices with GPS technology often tag the information with metadata describing a geographic location. In many embodiments, the geocodes are latitude and longitude; however other embodiments may employ different types of geocodes. Multiple embodiments employ geocodes as part of a merge process. The merge processes of several embodiments convert these addresses or location metadata into geographic coordinates (i.e., geocodes) and evaluate information sets for merger.
As a part of, or in addition to the merge processes previously discussed, a process performed by a CI system to generate and compare geocodes of selected information sets in accordance with an embodiment of the invention is illustrated in
Geocodes can optionally be generated (1020) from the selected information sets. Various embodiments employ public and/or private geocode generation systems to generate geocodes from location information. Such geocode generation systems include (but are not limited to) the MapQuest Geocoder service provided by AOL, the Geocoding API of Google Maps provided by Google, and/or the TIGER (Topologically Integrated Geographic Encoding and Referencing) services provided by the United States Census Bureau. In other embodiments, CI systems can use any of a variety of processes and/or services to generate geocodes from location data as appropriate to the requirements of specific applications. In addition, previously generated and/or stored location information can be used in combination with the location information from the selected information sets to infer the geocodes from the previously generated and/or stored information. Generated geocodes can be used to score the similarity between the selected information sets. Different embodiments may use different operations on the generated geocodes. Accordingly, process 1000 includes operations that may or may not be performed in different embodiments of the invention.
Distances between generated geocodes can optionally be calculated (1010) and evaluated. These distances can be computed according to “as the crow flies” distances on a map, or based on road-wise distances that account for travelling between the compared geocodes. Numerous embodiments take advantage of GPS data and/or geographic location information in order to determine distances between geocodes for different information sets. The calculated distances can be compared to thresholds of similarity and/or used to generated scores of similarity for the selected information sets. In addition, distances can be computed based on the latitude and longitude values associated with the geocodes.
Geometric analysis of the generated geocodes (1040) can also be optionally performed. Geometric analysis can comprise defining areas on maps that encompass the generated geocodes and assessing the relative positions of the geocodes within the defined areas. For instance, CI systems in accordance with some embodiments of the invention define circles around clusters of geocodes for several selected information sets; and evaluate the relative density and positions of the geocodes within and/or outside the defined area (e.g., a circle or a circumference). Other embodiments can use alternative geometric shapes to analyze the geocodes, such as rectangular, linear, or graphical objects. In several embodiments, CI systems can determine a center position between the geocodes prior to defining any geometric shapes. For instance, a CI system can identify a center point between several geocodes, and then draw a circle of a particular radius around that center point. The radius and/or size of the geometric object(s) used can vary depending on the type of assessment performed by the CI systems. When assessing whether social media posts from mobile devices (i.e., check-ins) refer to the a named entity encompassing a large area, such as (but not limited to) a park, an outdoor arena, and/or a shopping center, a moderate radius of several hundred feet may be used. Whereas when assessing whether two reviews of a named entity having a smaller geographic footprint such as (but not limited to) a small office, a home, and/or a street corner, then a shorter radius in the tens of feet can be used.
The similarity of the selected information sets can be assessed (1050) based on the previous analysis or analyses of geocodes. Different embodiments of the CI systems can assess the selected information sets differently. For instance, some embodiments can generate similarity and/or distance scores based on the previous analysis. Similarity and/or distance scores may be compared to thresholds to determine when geocodes are close enough to refer to a same location. Geometric proximity scores may be generated which can yield either relative distances or binary “close enough/not close enough” results. In a number of embodiments, the thresholds can adapt based upon factors including (but not limited to) the similarity of other pieces of characteristic data, knowledge of the existence of multiple locations sharing the same name and/or the density of the multiple locations. As can readily be appreciated, thresholds can be adapted using any of a variety of criterion appropriate to the requirements of specific applications in accordance with embodiments of the invention.
The at least two information sets can be optionally merged (1060). In an independent merge process, the decision to merge the information sets can be based on the generated scores. Where the scores pass certain thresholds the information sets can be merged. In a number of embodiments, process 1000 is a sub-process of a larger merge process that is optionally performed when information sets include address, geographic, and/or location data. In other embodiments, process 1000 can be performed as a singular merge operation that merges information only based on the geocodes without any other merge analyses. In yet other embodiments, process 1000 can be a part of a relationship establishing process that establishing relationships between information sets for different named entities based on the similarity of generated geocodes between the information sets for the different named entities. Specific examples of the geometric analysis of geocodes in accordance with embodiments of the invention are discussed further below.
The circle 1140 around the center point 1145 has been drawn in order to assist in analysis of information sets 1110, 1120, and 1130 and geocodes 1115, 1125, and 1135. CI systems in accordance with many embodiments of the invention can use the circle to identify whether geocodes are close enough to be regarded as the same location. In the geographic area 1100, the radius of the circle is set to a neighborhood setting (e.g., the length of one or several houses). When assessing other information sets, such as those within large areas such as parks or stadiums, different length radii may be used. Numerous embodiments can take advantage of WiFi, radio, and/or other cellular technology from information sources to analyze the relative distances between geocodes in conjunction with the geometric analysis. Other embodiments may use different radii for circles or different lengths of polygons used in geometric analysis.
As shown in geographic area 1100, geocodes 1115 and 1125 from information sets 1110 and 1120 are within the circle 1140 whereas geocode 1135 from information set 1130 is outside of circle 1140. Where information set 1130 is being analyzed in connection with the application of a merge process to information sets 1110 and 1120, information set 1130 can be scored as having a low similarity score with information sets 1110 and 1120. As a result, the merge process of may not merge information set 1130 with information sets 1110 and 1120. However, as information sets 1110 and 1120 are both within circle 1140; a merge process can score information sets 1110 and 1120 as having a high similarity score. As a result, the merge process of some embodiments can merge information sets 1110 and 1120.
During a merge process, various embodiments of the invention do a word or character permutation based comparison of attribute values to assess the similarity attribute values between information sets. In the example illustrated in geographic area 1100, such a permutation based strategy can yield misleading results. The geocoding reveals that information set 1110 at geocode 1115 is much closer to information set 1120 at geocode 1125 than to information set 1130 at geocode 1135. However, according to simple permutations, information set 1110 includes substantially fewer differences from information set 1130 (e.g., only the “3” is different).
While geometric analysis has been discussed in terms of merge processes, other embodiments of the invention can utilize geometric analysis techniques similar to those described above with respect to
In various embodiments, CI systems can generate authoritative information sets from merged sets of information. An authoritative information set is a CI system's most accurate description of a named entity (e.g., the correct name, address, and phone number of a business or a consumer). A CI system can select different attribute values from different merged information sets in order to generate an authoritative information set. In several embodiments, generation of an authoritative information set can involve using attribute values from information sets that have been merged by a merge process. In other embodiments, attribute values from any gathered information sets whether merged or not merged can be utilized. In several embodiments, the process of generating the authoritative information sets involves assessing the reliability of the attribute values in order to generate authoritative information sets for named entities.
As an example, CI systems in accordance with embodiments of the invention can merge several information sets for a person “Jane Doe”. The several merged information sets can include an information set gathered from a directory site and an information set gathered from a social media site. In this example, a CI system can select the phone number from the directory site information set and the name from the social media information set to include in an authoritative information set for “Jane Doe”. A process for generating authoritative information sets in accordance with this example is illustrated in
Process 1200 includes identifying (1210) at least two information sets. The at least two identified information sets can be merged information sets. Where the at least two identified information sets are merged information sets, then the CI system has previously identified the merged information sets as relating to the same entity. In several embodiments, the at least two identified information sets can draw from information sets that have not been merged by a prior merge process.
The sources of the identified at least two information sets can be (optionally) identified (1220). In a number of embodiments, each identified information set includes a source identifier that identifies the information source from which the information set was gathered. The source identifiers of can include unique identifiers assigned by the CI system, and/or addresses from which the information was gathered (e.g., a URL). Source identifiers can be used to identify sources for the identified information sets.
The reliability of sources associated with the at least two identified information sets can be compared (1230). In many embodiments, the comparison involves assessing the sources of the identified information sets using ratings maintained for various information sources. CI systems can maintain ratings for various information sources that rate the sources for qualities including (but not limited to) accuracy, reliability, trustworthiness, and/or reputation. Further ratings for information sources may be used and/or maintained as appropriate to the requirements of specific applications. In numerous embodiments, the ratings are for particular types of attribute values. In several embodiments, CI systems generate ratings for information sources and/or obtain ratings from a ratings source.
Each attribute value in the identified information sets can be scored (1240) for the attribute value's reliability relative to similar attribute values in other information sets. Similar attribute values can be attribute value types such as (but not limited to) name, address, phone number, time, price, and/or dates. As mentioned above, different sources can have different ratings for different types of attribute values. For instance, a directory website can have a very high reliability rating with regard to phone numbers, whereas a mapping application can have a very high accuracy rating with regard to addresses. Thus, phone attribute values in information sets from the directory website can be scored as more reliable relative to phone attribute values in information sets from a lower rated information score.
Each attribute value in the identified information sets can be (optionally) scored (1250) for the attribute value's frequency amongst the identified information sets. Where attribute values of the same type are repeated across information sets (e.g., when a same name is present in multiple information sets), the repeating attribute value can be scored (1250) as more reliable in addition to scoring the attribute values based on source ratings.
The highest scoring attribute values from the identified information sets can be stored (1260) as part of an authoritative information set for the given named entity. The highest scoring attribute values can come from any combination of the identified information sets (or all from the same identified information set). Of note, storing an authoritative information set for a given entity is functionally equivalent to generating an authoritative information set for the given entity. Where process 1200 is performed by a production process server, the production process server can store generated authoritative information set(s) in a production database. While process 1200 is discussed in the context of generating a first authoritative information set for a given entity, of a variety of processes to update existing authoritative information sets. For instance, when a CI system gathers new information for a given named entity or when a CI system merges an additional information set for a given named entity, the CI system may update the existing authoritative information set for the given named entity using similar techniques to those described with respect to
Having discussed the generation of authoritative information sets in connection with process 1200, the following discussion will detail examples of information sets being used to generate an authoritative information set. Information sets used to generate an authoritative information set are conceptually illustrated in
Information sets 1310, 1320, and 1330 are gathered from various information sources and/or are previously merged as information sets belonging to “Jon D. Doe”. Information sets 1310, 1320, and 1330 all share a common ID of 1234-55555 assigned to merged information sets for “Jon D. Doe”. Information sets 1310, 1320, and 1330 each include several attribute value pairs. The attribute value pairs include names, addresses, and phone numbers. In addition, a source identifier field and a common ID field are present in each of the information sets. Other information sets may include additional attribute value pairs and fields storing additional information, such as (but not limited to) time gathered, time stored, source ratings, and/or files sizes); or fewer attribute pairs and fields. For instance, some information sets do not include a Common ID field. In addition, other embodiments of the invention may include additional attribute value pairs for multiple names, multiple addresses, and multiple phone numbers. For instance, the attribute value pairs can include cell phone number, home phone number, and work phone number. As can be appreciated, any of a variety of information types and/or attribute value types can be utilized appropriate to the requirements of specific applications in accordance with embodiments of the invention.
As indicated in
Authoritative information set 1340 includes the high scoring attribute values from information set 1310 and information set 1320. As indicated by the arrows, authoritative information set 1340 includes the name attribute value from information set 1310 and the address and phone number attribute values from information set 1320. Authoritative information set 1340 shares a common ID with information sets 1310, 1320, and 1330 to indicate its relationship with the merged information sets. While a particular example is illustrated in
CI systems in accordance with many embodiments of the invention merge information sets, generate authoritative information sets, and relate various sets of information for different named entities. In addition, CI systems can receive queries from users regarding gathered information. These and other operations can be the basis for generating and scheduling crawls for information. For instance, where a CI system receives a user query that contains a particular attribute value that relates to a previously merged information set, the CI system can generate a crawl for information to seek out additional information related to the attribute value. In addition, a CI system can update scheduled batches of crawls for information, where authoritative information sets have not been updated for certain periods of time.
A process performed by a CI system to generate and schedule batches of crawls for information that take into account received input from other CI system operations in accordance with an embodiment of the invention is illustrated in
Batches of crawls for information can be generated (1420). In several embodiments, the batches are automatically generated as part of a general crawling of all available information sources. The generation of a batch of crawls can also take into account received input from user interfaces, CI operations, and CI functionalities as discussed above. Batches of crawls for information can include instructions to gather information from many different types of information sources. As can be appreciated, any of a variety of information sources can be crawled depending upon the requirements of the specific applications in accordance with embodiments of the invention. In a number of embodiments, existing batches of crawls can be updated and/or re-prioritized in addition to generating batches of crawls.
Priorities can be generated (1430) to the generated and/or updated batches of crawls. Where a particular crawl was generated in response to a user query, the particular crawl can be given a high priority to reflect the real time nature of the user query. Whereas a crawl that is to be performed on a cyclical basis as a background process can be given a low priority. The generated and/or updated batches of crawls to can be issued (1440) to crawler processes according to assigned priorities batches of crawls can be performed to gather information for use in CI operations.
Previously gathered information sets can be (optionally) updated (1450) based on information gathered from the issued batches of crawls. For instance, where an issued crawl for information relating to a user query returns new information with regard to a particular named entity, a CI system can update merged, related, and/or authoritative information sets concerning the particular named entity with the new information. In many embodiments, an update operation (1450) is performed by separate servers than those that perform the crawls. For example, an updating operation (1450) can be performed by an application server, merge process server, production process server, and/or relation process server. Furthermore, the update operation (1450) can be applied to information stored in feeds database and/or production database.
While process 1400 is illustrated as a discrete process with a start and a completion, in multiple embodiments the scheduling of batches of crawls, generation of crawls, and/or issuing of crawls, are performed as a continuous process that accepts input from various operations of CI systems and updates gathered information with newly crawled information in a continuous manner. While the operations described as part of process 1400 were presented in the order as they appeared in the embodiments illustrated in
CI systems in accordance with multiple embodiments of the invention can relate merged, authoritative, and/or other gathered information sets for named entities. Relationships can be identified using several different techniques in varying embodiments. For instance, where a CI system in accordance with embodiments of the invention has identified a transaction between entities, the CI system can use this transaction to establish a relationship between the entities. Alternatively, or in addition to using transaction data, the CI system can use content correlations between information sets to identify relationships between the entities associated with the information sets. In addition, geographic correlations between information sets can be identified using geographic location information associated with and/or included in each of the information sets. These varying relationship identification techniques allow CI systems in accordance with embodiments of the invention to identify current and potential customers of businesses from gathered information sets. Identified current and potential customers of businesses can also be used to form customer lists for businesses, assist in targeting of campaigns, and further CI system functionalities that will be described in greater detail below.
A process 1500 to identify relationships between named entities in accordance with an embodiment of the invention is illustrated in
Geographic correlations between information sets can optionally be identified (1520). Geographic correlation identification includes identifying where different information sets for different named entities have similar geocodes and/or addresses attribute values. Geographic correlations between geocodes and/or addresses attribute values across information sets for different entities can be the basis for identifying relationships between entities. For instance, a geographic correlation can occur where a social media post has geographic metadata that is similar to a business address. Geographic correlations can be of particular use in identifying relationships between different types of entities, such as relationships between consumers and businesses, businesses and businesses, and/or consumers and consumers. Information sets for consumers that include geographic location information similar to that of information sets for businesses can be used to identify said consumers as customers of said businesses.
Transaction relationships between information sets of different entities can optionally be identified (1530). Other processes in accordance with embodiments of the invention, such as the transaction gathering processes described above, may be used in conjunction with process 1500 to gather transaction information as a part of identifying transaction relationships. In addition, transaction relationships can be identified directly via transaction gathering processes or indirectly by inference through content similarities between information sets. The identified transactions can be used in establishing relationships between named entities listed as parties to the identified transactions. As discussed above, many embodiments gather information regarding consumers from (but not limited to) landline phone records, mobile phone records, email messages, web data, loyalty systems, discount programs, point of sale systems, credit card gateways, and/or credit card records from merchants. This gathered consumer information can also be used to identify transactions between consumers and businesses, and thereby identify customers. For instance, a phone record for a consumer can be used to identify a business that was called on the phone record. Specifically, call tracking lines and/or crawling phone record documents provided by a merchant user can yield transaction relationship information. Often, tracking lines and/or crawling phone records can be accessed using a phone provider's information through a website and/or an API. Also, the consumers identified in gathered credit card records can then be identified as customers of merchants from which the credit card records were gathered.
Relationship information describing relationships between information sets and/or named entities can be generated (1540). The relationship information generated (1540) can incorporate information identified using the several techniques (1510, 1520, and/or 1530) used from process 1500. Content correlations can be used to generation relationships between information sets that relate to a same entity. Geographic correlations can be used to link information sets from different entities to a common location. Transaction relationships can be used to identify when consumers have become customers or businesses. In addition, some embodiments of the invention can infer that consumers could be potential customers of a business where identified content correlations, geographic correlations, and/or transaction relationships suggest such potential. Various embodiments include thresholds of correlation and similarity for establishing relationships. In many embodiments, information sets are marked as being related using identifiers shared across related information sets.
Relationship information can optionally be stored (1550). In many embodiments, the generated relationship information is stored in a production database as sub-components of authoritative information sets for various entities. For instance, the stored relationship information may be stored as a part of one or more authoritative information sets and/or merged information sets for which the relationship information describes relationship(s) for named entities. Various embodiments may use different database configurations for storing relationship information, such as storing the relationship information in a feeds database, a production database, a customer database, and/or a merge database. The stored relationship information can include (but is not limited to) landline phone records associated with customers and/or businesses, mobile phone records associated with customers and/or businesses, email messages between customers and/or businesses (can be to, from, cc, and/or in bodies of email messages such as in the signature line of the email messages), web data to, from, or exchanged between customers and/or businesses (such as reviews, checkins, likes, follower status, and/or mentions), information linking customers to businesses from loyalty and/or discount systems and programs, point of sale systems indicating customer relationships, credit card records from credit card gateways, and/or credit card records from merchants. While the operations described as part of process 1500 were presented in the order as they appeared in the embodiments illustrated in
Relationships between existing and/or potential customers and businesses are of particular relevance for CI systems. Embodiments of the invention can use generated relationship information along with other information to identify consumers as current and/or potential customers of businesses. In addition, identified customers can be placed into customer lists associated with businesses. A process 1600 to identify current and/or potential customers of in accordance with an embodiment of the invention is illustrated in
Relationship information can optionally be received (1610). In many embodiments, relationship information is stored in a production database along with or as a part of authoritative information sets. Thus, the relationship information can be received from a production database of a CI system in many embodiments. The received relationship information can include (but is not limited to) telecommunication bills gathered via optical character resolution, APIs, logins, or crawling; call records tracking communications between consumers and businesses and/or merchants; and/or any of the above described examples of relationship information.
Transaction information can optionally be received (1620). In several embodiments, transaction information is stored in a production database along with or as a part of authoritative information sets and/or stored in a feeds database along with or as parts of merged information sets. Thus, the relationship information can be received from a production database and/or a feeds database of a CI system. The received transaction information can include (but is not limited to) transactions between consumers and businesses and/or merchants; credit card transactions; and/or any of the above described transaction gathering techniques.
Merged and/or authoritative information sets for entities can optionally be received (1630). In some embodiments, merged information sets are stored in a feeds database and authoritative information sets are stored in a production database. Thus, the merged and/or authoritative information sets can be received from a production database and/or a feeds database of a CI system. The received merged and/or authoritative information sets can include (but is not limited to) web forms and email accounts that are parts of merged and/or authoritative information sets for entities; reviews associated with merged and/or authoritative information sets for entities; checkins that are parts of merged and/or authoritative information sets for entities; likes, follows, and/or followers that are parts of merged and/or authoritative information sets for entities; mentions of businesses that are parts of merged and/or authoritative information sets for entities; mobile app operations and/or data that are parts of merged and/or authoritative information sets for entities; and/or any of the above described merged and/or authoritative information set generation techniques.
Customers can be automatically identified (1640) utilizing the received relationship information (1610), transaction information (1620), merged information sets (1630), and/or authoritative information sets (1630). While the above discussion of receiving relationship information (1610), transaction (1620), and merged and/or authoritative information sets (1630) provided several examples for each respective information category, embodiments of the invention can use examples from different categories and/or other information as necessary to implement CI functionalities. Thus, process 1600 can utilize various combinations and sub-combinations of the different information types that can be optionally received as discussed above.
The identified customers can be automatically added (1650) to a customer database. Different embodiments of the invention may utilize different storage techniques involving any variety of storage mechanisms. The identified customers can optionally be added (1660) to customer lists for businesses and/or merchants. In some embodiments, the customer lists are stored in a database of a CI system. For instance, embodiments may store the customer lists in a customer database and/or a customer list database. Customer lists of existing customers can be used by embodiments of the invention to produce typical customer profiles
Although specific processes are described above with respect to
CI systems in accordance with numerous embodiments of the invention can utilize identified customers to generated advertising targeting data and/or advertising campaigns. Advertising networks typically receive targeting information and display ads and/or creatives based on the received targeting information. Advertising targeting data is data that is provided to an advertising network that enables the advertising network to determine the circumstances under which an advertisement should be displayed. Advertising targeting data can include varying types of data that can be provided to advertising networks. The advertisement displayed by the advertising network can be automatically generated by the network and/or provided as part of the advertising campaign. Advertising networks can be a part of CI systems in accordance with many embodiments of the invention, or the advertising networks can be provided as a service by server systems maintained by third parties. CI systems in accordance with several embodiments of the invention can leverage the power of customer information aggregated within a customer database from a variety of information sources to target ads more effectively using one or more advertising networks. For example, a CI system can utilize information posted by a customer of a business on a first social network in an advertisement targeting users of a second social network having a known relationship to the customer.
In several embodiments, characteristic data describing named entities corresponding to customers of a business maintained within the customer database of a CI system can be utilized to generate demographic targeting information corresponding to a typical customer of the business. In a number of embodiments, characteristic data describing named entities corresponding to customers of a business maintained within the customer database of a CI system can be utilized to identify specific user identities to target via an online social network associated with existing customers, and/or potential customers matching a typical customer profile. In many embodiments, characteristic data describing named entities corresponding to customers of a business maintained within the customer database of a CI system can be utilized to automatically identify posts to online social networks that can be promoted and/or utilized as creatives in advertising campaigns. In certain embodiments, user identities to specifically target using the promoted posts are also identified using the characteristic data describing specific named entities corresponding to customers within the customer database maintained by a CI system.
A process 1700 that can be utilized to generate one or more advertising campaigns in accordance with an embodiment of the invention is illustrated in
A typical customer can optionally be identified (1720). In many embodiments, the typical customer is a “good” customer for the business that would tend to spend more than an average amount of money over time for the business or more than a threshold amount of money. Any of a variety of processes described herein can be utilized in the scoring of customers and/or estimating the revenue generated by specific customers can be utilized to generate a typical customer profile as appropriate to the requirements of specific applications in accordance to embodiments of the invention. Identification of a typical customer can optionally be used in profiling (1730) customers to directly target and/or use as a seed to perform look-alike targeting. In addition, potential customers who are similar to the identified typical customer and/or a set of seed customers can be identified (1740).
Advertising targeting data can be generated (1750) based upon customer information associated with the actual and/or potential customers identified during the generation of the advertising campaign. The advertising targeting data generated can include (but is not limited to) demographic targeting data, location targeting data, user targeting data, and/or keyword targeting data. In several embodiments, the advertising targeting data is generated by the CI system using characteristic data maintained in the customer database describing individual actual and/or potential customers such as (but not limited to) characteristic data describing a phone number, an email address, an IP address, name and location, specific devices, and/or any other piece of data that can be utilized by an advertising network to individually target a specific individual. In several embodiments, the advertising targeting data can also target based on general information such as (but not limited to) general profiles, interests, niches, and/or any other generalized targeting methods provided by a given advertising network. As an example of general targeting, advertising targeting data can be directed to persons who have an interest in motorcycling, are 30-50, are male, are married, have no children, and live in Pasadena. The example includes demographic targeting information, interest targeting information, and location targeting information that can all be derived from characteristic data describing named entities corresponding to customers and/or potential customers of a business that is maintained in the customer database. Moreover, advertising targeting data can include generic targeting that identifies individuals according to their uses of relevant websites, apps, and/or media. For instance, generic targeting can target individuals based on numbers of visits to a particular website or uses of a specific app. As can readily be appreciated, the manner in which a named entity described within the customer database can be targeted is only limited by the types of characteristic data aggregated about the named entity from different information sources and the targeting capabilities of specific advertising networks.
The generated advertising targeting data can optionally be segmented (1760) into more narrow categories of advertising targeting data. The segmentation can be accomplished by profiling a database of existing customers to identify points for segmentation. Segmentation of customers to target can occur along demographic lines, such as (but not limited to) age, location, marital status, children status, household income, education levels, home ownership, and/or gender. The targeting advertising data segmented to the following (but not limited to) categories of ads and ad platforms: targeting to search ads, targeting to display ads, targeting to mobile devices, targeting to mobile ads, targeting to emails, targeting to social networks, targeting to social sharing sites, and/or targeting to phones.
The final advertising targeting data can be output (1770) to an advertising network for distribution and display. The final advertising targeting data can direct advertising networks to display ads that include (but not limited to) pay per click ads, pay for performance ads, banner ads, mobile ads within apps, and/or mobile ad networks according to device characteristics.
In many embodiments, CI systems can generate and present customer profiles for an individual consumer that interacts with a specific named entity such as (but not limited to) a business. A customer profile can contain information including (but not limited to) transaction histories, various spending ratings, and/or demographic details regarding a customer. A customer profile can be generated by identifying a relationship between a specific consumer and a given business. In numerous embodiments, information regarding an individual consumer can be used to generate customer profiles with respect to multiple businesses (e.g., one consumer can have two customer profiles, one profile for a first business and a second profile for a second business).
Customer profile 1800 also includes a customer name indicator 1820 that in this case indicates a name of “Jane Doe”. Customer profile 1800 displays several data points for the customer indicated by the customer name indicator 1820. In several embodiments, the customer name indicator 1820 indicates a name of a customer for which the CI system stores several merged information sets and at least one authoritative information set. For instance, the information presented in customer profile 1800 can be from merged information sets and an authoritative information set for “Jane Doe”.
Customer profile 1800 displays several ratings 1840-1843 for “Jane Doe”. These ratings include levels of education 1840, professional status 1841, social influence 1842, and disposable income 1843. The ratings are derived by the CI system by analyzing merged and authoritative information sets related to “Jane Doe”. In addition, customer profile 1800 shows an activity timeline 1850 for “Jane Doe”. The CI system can generate an activity timeline using transaction histories generated from merged information sets. For instance, a CI system can populate a transaction history for a consumer, where: the consumer has interacted with mobile devices at locations corresponding to a business; consumers have interacted with social media websites; or publically available credit information reveals that consumers have made purchases at a location of the business. As shown, activity timeline 1850 includes events where “Jane Doe” spent money, checked in via social media sites, and/or submitted to business review websites. In a number of embodiments, each event in a given activity timeline is drawn from information sets merged based upon a particular consumer. For instance, each review submitted for “Jane Doe” can be an information set gathered from a business review website that is merged to be associated with “Jane Doe”.
Customer profile 1800 also displays a map with geographic data that relates the address for the customer indicated by the customer name indicator 1820 with the address for the business indicated by the business name 1810. As shown, a distance and a road route are displayed connecting the addresses of the customer and the business. In addition, customer profile 1800 displays an activity summary showing total interactions, reviews, last transaction spending, and estimated yearly value for the customer indicated by the customer name indicator 1820. The customer profile 1800 also displays a customer summary 1880 that includes distance, age ranges, and household income ranges for the customer indicated by the customer name indicator 1820.
As mentioned above, CI systems need do not expose all of the information the CI systems has gathered with respect to a customer. CI systems of some embodiments can gather more information than the users of CI systems have rights to access. Often, the users of the CI systems are merchants seeking information on customers associated with businesses. Merchant users often do not have rights to access certain otherwise public information gathered by the CI systems of some embodiments. Accordingly, customer summary 1880 reveals only age ranges and household income ranges, rather than specific data with regards to “Jane Doe”. While the above discussed customer profile 1800 was discussed in connection with a consumer “Jane Doe”, embodiments of the invention are not limited to the specific consumer or customer shown in customer profile 1800.
The screenshot of customer profile 1800 shown in
In many embodiments, CI systems can produce typical customer profiles that show information regarding a typical customer of a business. The typical customer profiles provide an overview of information regarding customers to allow merchant users to assess their customers in aggregate. The typical customer profiles can contain information including (but not limited to) transaction histories, various spending ratings, and/or demographic details regarding average customers.
A process 1905 that can be utilized to generate one or more advertising campaigns in accordance with an embodiment of the invention is illustrated in
Demographic information for the identified customers can be identified (1925). Identifying information associated with the identified customers can be utilized to query production databases and/or merge databases of a CI system to return characteristic data, authoritative information sets, and/or merged information sets describing the identified customers. Alternatively, any of the processes described above for gathering information for named entities can be utilized in accordance with various embodiments of the invention to identify demographic information of the identified customers.
Transactions for the identified customers can be identified (1925). Identifying information associated with the identified customers can be utilized to query customer, production and/or merge databases of a CI system to return transaction and/or relationship information describing transactions between the identified customers and the given business. The returned transaction information can include transaction values. In some embodiments, estimated transaction values can be generated to estimate the values of transactions for customers, who do not have specific transaction values stored in databases of the CI system.
A typical customer profile can be generated (1945) utilizing the identified customers, customer demographic information, and transaction information. The typical customer profile can include various ranges and averages describing customers of the given business, along with a list of customers for the given business. The typical customer profile can optionally be used to generate an interface (1955) showing the typical customer profile.
Customer listing 1910 shows a subset of an automated customer list for a business indicated by business name indicator 1960. The particular subset of the automated customer list for the business indicated by business name indicator 1960 is selected by the customer listing type menu 1920. Customer listing type menu 1920 includes several options for what subset of the automated customer list can be displayed in the customer listing 1910. The several options include (but are not limited to) best customers, most frequent customers, worst customers, and/or all customers. Other embodiments may include additional menu display options as necessary to facilitate the display of information regarding typical customers.
Average customer statistics 1930 includes several statistics for a typical customer of the business indicated by business name indicator 1960. As shown, average customer statistics 1930 includes demographic information on the gender balance of a typical customer, home ownership percentages of a typical customer, education attainment of a typical customer, annual household income of a typical customer, relationship status of a typical customer, and children of a typical customer. Embodiments of the invention are not limited to the particular listed statistics of average customer statistics 1930. Additional statistics may be presented in other embodiments. Customer analysis page 1900 also displays customer location table 1940 and customer top interest list 1950. Customer location table 1940 indicates major locations where customers are concentrated. Customer top interest list 1950 lists several top interests and likes by customers. As shown, customer analysis page 1900 has demographic view indicator 1980 selected. Upon selection of the map view 1970 indicator a different page can be presented. The screenshot of customer analysis page 1900 shown in
Customer heat map 2000 indicates geographic concentrations of customers for the business indicated by business name indicator 2002. In several embodiments, the CI systems use the associations between customer lists and underlying geographic data and/or geographic location information from the merged and/or authoritative information sets for consumers to identify geographic concentrations of customers. The screenshot of customer heat map 2000 shown in
The embodiments illustrated in the screenshots shown in
In several embodiments, CI systems can generate automated campaign messages for use in marketing campaigns to customers identified in the automated customer lists. These automated campaign messages can be targeted toward customers that, for example, have not transacted with a business for a period of time. The automated campaign messages are directed to customers using interfaces provided by the CI systems. An example user interface that includes an automated campaign message generation interface 2100 is shown in
The message type interface 2110 indicates several types of base messages from which automated campaign messages can be generated. The several types of base messages include (but are not limited to) “we miss you” messages, deal messages, special offers messages, reminder messages, and/or new product messages. Once a type of base message is selected from the message type interface 2110, then the automated campaign message generation interface 2100 can automatically generate an editable campaign message that is displayed in main message window 2106. Main message window 2106 shows an automatically generated but user editable campaign message that can be sent to customers through the campaign message generation interface 2100.
The editable, automatically generated campaign message(s) will not be directed to specifically identified users in some embodiments. The CI systems of some embodiments limit avenues by which merchant users of the CI systems can contact customers in order to respect the privacy of customers identified by the CI systems. As shown, main message 2106 will be directed to customers indicated by send to interface 2108. Send to interface 2108 indicates to which types of customers the automated message will be sent. Several options are presented by the send to interface 2108, including (but not limited to) best customers, most frequent customers, worst customers, all customers, infrequent customers, closest customers, and/or most distant customers.
The CI systems of many embodiments provide further channels by which merchant users of the CI systems can reach customers. For instance, the automated campaign messages can be transmitted through the interfaces of the CI systems to various channels including (but not limited to) social media sites, Internet messengers, and/or emails. However, customers often do not wish to be sent messages on channels on which they have not interacted with a business. The CI systems of many embodiments restrict the transmission of automated campaign messages based on the interactions customers have had with businesses, place, thing, and/or any other named entity for which a campaign is generated. Accordingly, CI systems in accordance with many embodiments of the invention can limit transmission of automated campaign messages to channels on which customers have interacted with businesses, place, thing, and/or any other named entity for which a campaign is generated. For instance, CI systems may only send a message over a particular social media website when a customer has interacted with a business on the particular social media website.
The screenshot of automated campaign message generation interface 2100 shown in
The Internet has enabled vast numbers of websites to contain listings information for businesses. Numerous websites contain incorrect or at least outdated information. In several embodiments, CI systems can identify listings of businesses from gathered information and compare these listings with correct information provided by merchant users or authoritative information sets. The CI systems of a number of embodiments further provide user interfaces through with merchant users can correct the listings for their businesses.
The screenshot of listing correction interface 2300 shown in
As described above in connection with the screenshots shown in
A process 2350 to correct business listing information in accordance with an embodiment of the invention is illustrated in
Listings associated with the business can be identified (2360). Different embodiments of the invention can utilize different techniques for identifying listings associated with a business. In some embodiments, merged, related, and authoritative information sets for various entities can provide the connections between the identity of a business and its various listings across listing sources. For example, the merged information sets for a business entity can include the listings sources associated with the business entity. Listing sources can include (but are not limited to) websites, directories, online review sites, social media sites, and/or search websites.
The identified listings can be assessed (2365) for accuracy. The accuracy can be assessed via direct comparison of the listed information within the listing sources to the received correct listing information. Some embodiments can optionally provide (2370) a summary of the accuracy of the identified listings. An example of a summary of the accuracy of the identified listings is shown in business listing review interface 2200 of
The screenshots of business listing review interface 2200 shown in
Internet reviews are often the basis for consumer choices between competing businesses. The management of online reputations has become a major component of online marketing. Accordingly, embodiments of the invention provide interfaces by which business owners and survey online business reviews and also communicate through the channels provided by the sites hosting the online business reviews.
The screenshot of reputation management interface 2500 shown in
The Internet has provided new and powerful tools to enable customers and businesses to communicate. Where the old model of customer feedback involves phone calls or paper messages dropped in the box, the Internet enables direct communication between customers and businesses via electronic platforms. Accordingly, embodiments of the invention provide for a customer feedback platform that aggregates and displays customer feedback from multiple social media websites and/or applications.
The screenshot of customer feedback interface 2600 shown in
CI systems in accordance with various embodiments of the invention rely on server hardware and/or software to be implemented. The various processes described above can be implemented using any of a variety of server system architectures. Specific server systems that can be utilized to implement CI systems in accordance with embodiments of the invention and implement the various processes illustrated above are described below. Specifically,
An architecture of a scheduler process server in accordance with an embodiment of the invention is illustrated in
An architecture of a crawler process server in accordance with an embodiment of the invention is illustrated in
An architecture of a merge process server in accordance with an embodiment of the invention is illustrated in
An architecture of a production process server in accordance with an embodiment of the invention is illustrated in
An architecture of a relation process server in accordance with an embodiment of the invention is illustrated in
An architecture of a web server in accordance with an embodiment of the invention is illustrated in
An architecture of a customer process server in accordance with an embodiment of the invention is illustrated in
An architecture of a targeting process server in accordance with an embodiment of the invention is illustrated in
The various process servers discussed above can be implemented as singular, discrete servers. Alternatively, they can each be implemented as shared and/or discrete servers on any number of physical, virtual, or cloud computing devices. For instance, the merge and production process servers can be implemented as a single cluster of physical machines whereas the relation process server can be implemented as a distinct physical machine. Persons of ordinary skill in the art will recognize that various implementations methods may be used to implement the process servers of embodiments of the invention.
While the above description contains many specific embodiments of the invention, these should not be construed as limitations on the scope of the invention, but rather as an example of one embodiment thereof. Accordingly, the scope of the invention should be determined not by the embodiments illustrated, but by the appended claims and their equivalents.
The present application claims priority under 35 U.S.C. §120 as a continuation of U.S. patent application Ser. No. 14/586,891 entitled “Systems and Methods for Generating Advertising Targeting Data Using Customer Profiles Generated from Customer Data Aggregated from Multiple Information Sources”, filed Dec. 30, 2014. U.S. patent application Ser. No. 14/586,891 claims priority under 35 U.S.C. §120 as a continuation of U.S. patent application Ser. No. 14/586,505 entitled “Systems and Methods for Gathering, Merging, and Returning Data Describing an Entity Based Upon a Single Piece of Uniquely Identifying Information”, filed Dec. 30, 2014 and also claims priority under 35 U.S.C. §119(e) to U.S. Provisional Patent Application Ser. No. 62/090,839 entitled “Customer Relationship Management System with Automatic Customer List Generation and Advertising Targeting” filed Dec. 11, 2014. The disclosures of U.S. patent application Ser. No. 14/586,891, U.S. patent application Ser. No. 14/586,505, and U.S. Provisional Patent Application Ser. No. 62/090,839 are hereby incorporated by reference herein in their entirety.
Number | Date | Country | |
---|---|---|---|
62090839 | Dec 2014 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 14586891 | Dec 2014 | US |
Child | 15045174 | US | |
Parent | 14586505 | Dec 2014 | US |
Child | 14586891 | US |