Retailers may implement user accounts such that all of a user's browsing and purchasing activity may be aggregated and used to facilitate understanding of the user's interest and behavior. Websites may also implement cookies that are stored within the user's browser that enable the user to be identified each time the user visit's the website.
These approaches have limitations. Users may access various sites that do not share account information with one another. Users may fail to log in, decline to accept cookies, clear cookies, or browse in incognito mode. These result in missed opportunities to understand the interests and behavior of a user.
The systems and methods disclosed herein provide an improved approach for providing product recommendations to user's browsing a website.
In order that the advantages of the invention will be readily understood, a more particular description of the invention briefly described above will be rendered by reference to specific embodiments illustrated in the appended drawings. Understanding that these drawings depict only typical embodiments of the invention and are not therefore to be considered limiting of its scope, the invention will be described and explained with additional specificity and detail through use of the accompanying drawings, in which:
The methods disclosed herein may be implemented in a network environment 100 including some or all of the illustrated components. In particular, a server system 102 may execute the methods disclosed herein with respect to browsing activity of one or more user computers 104a, 104b. The computers 104a, 104b may include desktop or laptop computers, tablet computers, smart phones, wearable computers, internet enabled appliances, or any other type of computing device.
The browsing activities of the computers 104a, 104b may include webpage requests submitted by the computers 104a, 104b to a web server executing on the server system 102 or be reported to the server system 102 by a third party server or by a software component executing on the computers 104a, 104b.
The computers 104a, 104b maybe coupled to the server system 102 by means of a network 106 including a local area network (LAN), wide area network (WAN), the Internet, or any other number of wired or wireless network connections. The network 106 may be understood to possibly include one or more intermediate servers by way of which browsing activities of the computers 104a, 104b are transmitted to the server system 102.
The computers 104a, 104b may execute a browser 108 programmed to retrieve and process data 110, such as by rendering web pages, execute scripts within web pages, formatting website data according to style sheets (e.g., .css files). The browser 108 may execute scripts or process web forms that cause the browser 108 to transmit data submitted through web pages to a source of a web page or some other server system, such as the server system 102.
Communications from the browser 108 may include one or more items of information 112 about the browser itself, such as a type (SAFARI, EXPLORER, FIREFOX, CHROME, etc.) as well as a version of the browser. The browser information 112 may include information about the device 104a, 104b on which it is executing such as operating system (WINDOWS, MACOS, IOS, LINUX, etc.), operating system version, processor type, screen size, peripheral devices (e.g., additional screens, audio device, camera), etc. Browser information may include a current time, time zone, font information, storage accessibility (size of local storage 116 described below), location information (e.g., longitude and latitude, city, address, etc.), accessibility information, and the like. This information is used according to the methods disclosed below and may be included in browser requests. Other information (e.g., fonts) may be obtained using executable code executing on one or both of the server system 102 or embedded in website data 110.
The browser 108 may execute one or more browser plugins 114 that extend or enhance the functionality of the browser, such as ADOBE ACROBAT READER, ADOBE FLASH PLAYER, JAVA VIRTUAL MACHINE, MICROSOFT SILVERLIGHT, and the like. In some embodiments, the browser information 112 and listing of plugins 114 may be transmitted with requests for web pages or be accessible by scripts executed by the browser, which may then transmit this information to the server system 102 directly or by way of another server system.
The computer 104a, 104b may further include local storage 116 that includes browser-related data such as cookies 118 that are stored by websites visited using the computer 104a, 104b.
The server system 102 stores information gathered from browser requests or received from third party servers in one of a user identifier (UID) record 120 and a browser user identifier (BUD) vector 124. As described below, a UID record 120 stores data received from a browser that is explicitly mapped to a particular user identifier. A most common example, is due to the browser storing a cookie 118 that has previously been stored on a source 104a, 104b of the browser request and either received with the browser request or accessed by a script or other executable embedded in a website and transmitted to the server system 102.
Browser requests may include metadata that is stored in the UID record 120 when the browser request is explicitly mapped to cookie data 122a or other user identifiers included in the UID record 120. The UID record 120 may also include data from browser requests lacking explicit identification information but mapped to the UID record 120 with sufficient certainty according to the methods disclosed herein.
The browser data may include various types of data that are organized herein into three categories: global data history 122b, device data history 122b, and browser data history 122d.
The global data history 122b stores values from browser requests that is independent of the browser or device from which the request was received, such as time zone, language, a time stamp in the browser request, IP (internet protocol) address, location (if accessible), and the like.
The device data history 122c stores values from browser requests relating to the computer 104a, 104b that generated the browser request such as operating system, operating system version, screen size, available devices, battery state, power source, a listing of installed fonts, and the like.
The browser data history 122d stores value from browser requests relating to the browser from which it was received, such as the browser type (SAFARI, EXPLORER, FIREFOX, CHROME, etc.), browser version, plugins available in the browser, cookies, cookie accessibility, size and accessibility of the local storage 116 for the browser, size and accessibility of session storage, audio configuration data, video configuration data, navigator data, and the like.
The UID record 120 may further include a user history 122e. Browser requests may include requests for web pages (e.g., URLs). User interactions with a website may also be recorded in the user history 122e, e.g. search terms, links clicked, values submitted into fillable forms, etc. These values may be stored in raw form and may additionally or alternatively be processed to estimate user attributes (age, income, gender, education) and interests that are stored in the user history 122e as well.
In response to a browser request that does not include cookie data 122a or other user identifiers, the server system 102 may create a BUID vector 124 that includes some or all of global data 126a, device data 126b, and browser data 126c included in the browser request. The data 126a-126c may include some or all of the values described above as being included in the data histories 122b-122d.
Referring to
The method 200 may include evaluating 202 whether the browser request is in the context of a browser session in which a new UID is created, e.g. a user creates a new account or otherwise provides an indication that a UID record 120 does not currently exist for the user that invoked the browser request. If so, then a new UID record 120 is created and populated with data from the browser request and possibly identification information provide as part of the browser session including the browser request, such as cookie data 122a placed on the computer 104a, 104b or a user name assigned to the user.
If not, the method 200 may include evaluating 206 whether the browser request includes sufficient data for a “front end” data match, i.e. the browser request includes cookie data, a user name, or other explicit identifiers that are uniquely associated with a UID record 120. Step 206 may be executed by a script executed by the browser or on the server system 102. If a front end match is found 204, then some or all of the history data 122a-122e may then be updated 208 according to data included in the browser request and other information received during the browser session initiated by the browser request.
The data that may be used for a front end data match may include a ULID (user link ID), ckid (third party cookie ID), bkid (back end identifier provided by the server system 102). Note that the ULID may include any identification information that is provided by vendors and clearly identifies a user, such as username, email, user ID, or a hash of an input field that may be used for unique identification. If any of these are present in a browser request, a corresponding UID record 120 may be uniquely associated with the browser request. In some embodiments, local storage of the browser may include identifying information, such as a username or other identifier. Accordingly, a script executing in the browser may obtain this information and return it to the server system 102, thereby enabling a front end data match at step 204.
If a front end match is found 206 not to be possible, the method 200 may include populating 210 a BUID vector 124 with data from the browser request. This BUID vector 124 may then be compared 212 to one or more UID records 120 to identify 214 one or more, typically several, candidate UID records. Of these candidate records, one or more of them may then be evaluated and eliminated 216 as being inconsistent. An example implementation of steps 214-216 is described with respect to
Of those that remain, a probability associated with each candidate record may be maintained the same or adjusted 218 based on consistency with values included in the BUID vector 124. An example of this process is described below with respect to
The method 200 may further include selecting a threshold according to an application of the method 200, i.e. a purpose for which any corresponding UID record 120 will be used. For example, for purposes of selecting an advertisement, an exact match is not required. Step 220 may be an essentially manual step, with the application being known and the corresponding threshold being predetermined for that application.
If the probability threshold for the given application is found 222 to be met by one or more candidate records 120, then one of them may be selected as corresponding to the same user that generated the browser request and one or more actions may be taken, such as selecting 224 content according to the user history 122e of the selected candidate record 120. Where only one candidate record is found 222 to meet the threshold, it may be selected. Where multiple records meet the threshold, the candidate record with the highest probability after step 218 may be selected for use at step 224. Content selected at step 224 may then be transmitted to the source of the browser request in the form of advertisements, search results, relevant articles, other media content, or the like.
If the candidate record 120 is also found 226 to meet a certainty threshold, which may be higher than the threshold of step 222, the data 126a-126c of the BUID vector 124 may be used to update 208 the data histories 122a-122e of the candidate record 120. For example, a certainty threshold may be a predetermined value, such as a value of 95 percent or higher.
Referring to
The method 300 may include generating one or more hashes of the subject vector. This may include generating some or all of: generating a hash of the entire subject vector, generating labeled hashes of the values of the subject vector (each hash will indicate the field or attribute of the value from which the each was made), generating unlabeled hashes of the values of the subject vector (field or attribute of the value will not be retained or considered). The hash function may be a lossy function such that each output of the hash function could represent a range of possible input values. The hash function is also preferably such that the range of possible input values are similar to one another, e.g. a contiguous range of values. For example, MD5 and similar hash functions are also suitable. Other hash functions known in the art may also be used.
The method 300 may then include identifying one or more candidate UID records 120 (“candidate records”) based on comparison to the hashes. In particular, one or more hashes of values in each record of a plurality of UID records 120 may be generated, such as in the same manner as for the subject vector at step 302.
Candidate records may be identified as having one or more hashes equal to hashes of the subject vector. Where hashes are labeled, this may include determining that hashes for one or more labels in a candidate record match hashes with the same labels in the subject vector. In some embodiments, matching hashes may be processed according to a function that determines a probability according to the number and possibly labels of the matching hashes. For example, one label may have a higher weight such that matching hashes for that label will increase the probability more than another label.
Those UID records 120 having probabilities above a threshold may be identified 304 as candidate records. Each of the candidate records may be selected 306 and evaluated based some or all of steps 308, 310, 312. Steps 308, 310, 312 may be performed in the illustrated or in a different order. Those that are found to be inconsistent at steps 308, 310, 312 are eliminated 314 from among the candidate records. Those that are found to be consistent are processed at step 316 wherein the probabilities associated with them may be adjusted according to the method 400 of
Step 308 includes evaluating whether operating system information in the candidate record is inconsistent with operating system information included in the subject vector. Note that a candidate record may be associated with a particular user and may record activities of the user from multiple devices over time. Accordingly, the evaluation of step 308 may include evaluating whether at least one instance of operating system information in the candidate record is consistent. If not, the candidate record is determined to be inconsistent. For example, step 308 may implement some or all of the following logic:
Step 310 includes evaluating whether device information in the candidate record is inconsistent with device information included in the subject vector. For example, step 310 may evaluate whether the candidate record includes reference to a device with identical values for some or all of the following labels: OS name and version, device type and version, availability of audio device(s), availability of camera(s), screen size, average network speed and the like. If not, the method 300 determines that the candidate record is inconsistent.
Step 312 includes evaluating whether browser information in the candidate record is inconsistent with browser information included in the subject vector. Note that a candidate record may be associated with a particular user and may record activities of the user from multiple devices over time. Accordingly, the evaluation of step 312 may include evaluating whether at least one instance of browser information in the candidate record is consistent. If not, the candidate record is determined to be inconsistent. For example, step 312 may implement some or all of the following logic:
Note that the evaluation of the version and type of a browser may be used in an identical manner to evaluate the type and version of other components or modules executed by a browser, such as a specific plugin, webkit, and the like. Accordingly, if backward movement in version number is found from the candidate record to the BUD vector, the candidate record may be eliminated.
Note also that evaluating the version of a browser, plugin, or other component or module may include evaluating a hashes of version number in order to save space. Accordingly, only differences in version number that are sufficiently large to change the hash value will result in the possibility of detection of a difference according to the method 300.
The evaluations of steps 308, 310, 312 are just examples of criteria that may be used to eliminate a candidate record. Other criteria may be used in addition to, or in place of, the illustrated criteria. For example:
Referring to
The method 400 may include evaluating 402 whether one or more “Accept” parameters in a header of the browser request correspond to those in the candidate record.
For example, whether a language in the subject vector matches a language included in the candidate record. A browser request may include multiple languages. Accordingly, step 402 may include evaluating whether each and every language in the subject vector is found in the candidate record. If not, then the probability of the candidate record is reduced 404. In some embodiments, the amount of the reduction increases with the number of languages in the subject vector that are not found in the candidate record.
Other accept parameters include supported encodings (for encryption, images, audio, video, etc.) listed in the header. If one or more of these other parameters are not found in the candidate record, then the probability of the candidate record is reduced 404.
The method 400 may include evaluating 406 whether at least one plugin in the subject vector matches a plugin included in the candidate record. A browser request may include a list of multiple plugins. Accordingly, step 406 may include evaluating whether each and every plugin in the subject vector is found in the candidate record. If not, then the probability of the candidate record is reduced 408. In some embodiments, the amount of the reduction increases with the number of plugins in the subject vector that are not found in the candidate record. Plugins are received as a list in each browser request. Accordingly, the probability is reduced 424 unless a plugin list in a previous browser request recorded in the candidate record exactly matches the plugin list of the candidate record. The probability may be reduced 424 by the number of difference between the closest matching plugin list of the candidate record and the plugin list of the subject vector.
The method 400 may include evaluating 410 whether at least one font in the subject vector matches a font included in the candidate record. A browser request may include one or more fonts. Accordingly, step 410 may include evaluating whether each and every font in the subject vector is found in the candidate record. If not, then the probability of the candidate record is reduced 412. In some embodiments, the amount of the reduction increases with the number of fonts in the subject vector that are not found in the candidate record.
The method 400 may include evaluating 414 whether a time zone in the subject vector is found in the candidate record. In particular, step 414 may include evaluating a difference in a time zone in the subject vector relative to a last time zone in the candidate record, i.e. a time zone obtained from a last-received browser request that has been used to update the candidate record. The last-received browser request may have a first time in it. The subject vector also has a second time in it that is obtained from the browser request used to generate it. A difference in the last-received time zone of the candidate record may be compared to the time zone of the subject vector. If the difference exceeds a threshold that is a function of a difference between the first time and the second time, the probability of the candidate record is reduced 416. In particular, the threshold may increase with increase in the difference between the first time and the second time. In some embodiments, the larger the change in time zone and the smaller the intervening elapsed time, the greater the reduction 416 in probability.
The method 400 may include evaluating 418 whether battery parameters in the subject vector are consistent with last-received battery parameters found in the candidate record. In particular, step 418 may include evaluating a difference in a battery state in the subject vector relative to a last-received battery state in the candidate record, i.e. a battery state obtained from a last-received browser request that has been used to update the candidate record. The last-received browser request may have a first time in it. The subject vector also has a second time in it that is obtained from the browser request used to generate it. A difference in the last-received battery state of the candidate record may be compared to the battery state of the subject vector. If the difference exceeds a threshold that is a function of a difference between the first time and the second time, the probability of the candidate record is reduced 420. In particular, the threshold may increase with increase in the difference between the first time and the second time. In some embodiments, the larger the change in battery state and the smaller the intervening elapsed time, the greater the reduction 420 in probability. This accounts for the fact that charging and discharging of a battery are not instantaneous and therefore large changes in battery state with small elapsed time are unlikely to occur in the same device.
The method 400 may include evaluating 422 whether at least one accessible device listed in the subject vector matches an accessible device included in the candidate record. A browser request may include a list of one or more devices such as an additional screen, pointing device (mouse, trackpad), audio device, camera, or other peripherals that are coupled to the computing device 104a, 140b that issued the browser request. Accordingly, step 422 may include evaluating whether each and every accessible device in the subject vector is found in the candidate record. If not, then the probability of the candidate record is reduced 424. In some embodiments, the amount of the reduction increases with the number of accessible devices in the subject vector that are not found in the candidate record.
The method 400 may include evaluating 426 whether an IP (internet protocol) address or other network routing information (e.g., MAC (machine access code) address) included in the subject vector is found in the candidate record. If not, then the probability of the candidate record is reduced 428. In some embodiments, the amount of the reduction increases with the difference between a closest matching IP address in the candidate record and the IP address in the subject vector, accounting for the fact that IP addresses in the same domain or sub domain may still correspond to the same device.
The method 400 may include evaluating 430 whether an amount of local storage in the subject vector is consistent with the candidate record. Local storage refers to tracking data (cookies, etc.), browser history, and other information stored by the browser over time. Browser requests may list the amount of local storage. Accordingly, step 430 may include evaluating a difference in an amount of local storage in the subject vector relative to an amount of local storage in a last-received browser request that has been used to update the candidate record. The last-received browser request may have a first time in it. The subject vector also has a second time in it that is obtained from the browser request used to generate it. A difference in the last-received amount of local storage in the candidate record may be compared to the amount of local storage in the subject vector. If the difference exceeds a threshold that is a function of a difference between the first time and the second time, the probability of the candidate record is reduced 432. In particular, the threshold may increase with increase in the difference between the first time and the second time. In some embodiments, the larger the change in the amount of local storage and the smaller the intervening elapsed time, the greater the reduction 432 in probability.
The method 400 may include evaluating 434 whether one or more user attributes included in the subject vector are found in the candidate record. User attributes may include a name, company name, address, phone number, or the like. User attributes may include age, gender, income, or other demographic attributes. User attributes may further include interest or behavioral information such as user interest in certain colors, sizes, categories, sale or discounted items, new arrivals, rate of clicks per session, views per session, scrolling habits, whether the user operates a browser in incognito mode, and the like. For example, where the browser request is invoked by a user submitting a form, the browser request may include one or more user attributes. If each and every user attribute in the subject vector is either absent from or identical to user attributes in the candidate record, then the user attributes may be found 434 to match. If not, then the probability of the candidate record may be reduced 436. For example, the probability may be reduced according to the number of inconsistent attributes. Some attributes, if inconsistent, may result in a greater reduction 436 in the probability than others as determined by an operator to account for the relative importance of attributes. In another example, user activities such as search terms submitted, repetition of search terms, categories of products selected for viewing or purchasing, price range of products viewed or purchased, time frame of browsing activates (day of the week, time of day, etc.), domains of interest, and the like may also be user attributes that may be compared 434 between the BUID vector and the candidate record.
The method 400 may include evaluating 438 whether a window size (i.e., browser window size) in the subject vector are found in the candidate record. If the window size matches a window size in the candidate vector, they may be found 438 to match. If not, then the probability of the candidate record may be reduced 440. For example, the probability may be reduced according to an amount of the difference between the window size of the subject vector and the closest window size in the candidate vector, such as based on a sum or weighted sum of differences in width and height.
The method 400 may include evaluating 442 whether a location in the subject vector is consistent with the candidate record. Location data may be included in metadata of a browser request, derived from an IP address of the browser request, or provided by the user in a data submission, such as a request for information about the user's current location. Browser Step 430 may include evaluating a difference in the location in the subject vector relative to a location for a last-received browser request that has been used to update the candidate record. The last-received browser request may have a first time in it. The subject vector also has a second time in it that is obtained from the browser request used to generate it. A difference in the last-received location in the candidate record may be compared to the location in the subject vector. If the difference exceeds a threshold that is a function of a difference between the first time and the second time, the probability of the candidate record is reduced 444. In particular, the threshold may increase with increase in the difference between the first time and the second time. In some embodiments, the larger the change between the locations of the subject vector and the candidate vector, the greater the reduction 444 in probability.
The method 400 illustrates a sample of values in the subject vector that may be considered to determine the probability of a candidate record corresponding to the same user. Other values may also be evaluated in a similar manner.
Note also that the factors evaluated with respect to the method 400 and the corresponding reductions in probability may be performed in the context of a machine learning model. In particular, a machine learning model may be trained to adjust the probability for a give candidate record for a given subject vector. Training data may include candidate records and subject vectors that are known to be related or not related. The machine learning model may then be trained to distinguish between these two cases. The probability of candidate vectors as determined or adjusted by the machine learning algorithm may then be compared to a predetermined threshold and those below the threshold may be eliminated. Of those that remain, a highest probability case may be selected for purposes of generating content. If one candidate record meets a certainty threshold, the subject vector may be merged with the candidate record as described above. In a similar manner, the elimination of candidate records according to the method 300 may be performed using a trained machine learning model operating on parameters of the BUID vector and the candidate records.
The method 500 of
Generating the hash values in step 508 and other hash-generating steps of the method 500 may include generating and storing hashes without data labels indicating the type of data (name, credit card, address, phone number, etc.) from which the hash is derived. As for step 302 of the method 300, the hash values may be generated according to a lossy function such that each output of the hash function could represent a range of possible input values. The hash function is also preferably such that the range of possible input values are similar to one another, e.g. a contiguous range of values. Examples of suitable hash functions include MD5 and similar hash functions or any other hash function known in the art. The hash value may be 32, 64, or 128 bits. To ensure that the original data is not recoverable, a 64 bit or smaller size is preferable. To protect privacy, the submitted data values may be converted to hash values on the computing device 104a, 104b on which they were received, such as by a software component embedded in a website, plugin, or other component executing within the browser on the computing device 104a, 140b. In this manner, data values are not acquired in their original form. Hash values may further be encrypted during transmission and storage to protect privacy.
If insufficient information is found 502 to have been provided to associate a browsing session with a particular user, the method 500 may still include evaluating 510 whether any data is submitted during the session. If not, metadata included in browser requests may still be used to attempt 512 to match a BUID vector 124 for a browser request with a UID record 120 according to the methods of
If data values are submitted, then hashes of these values are added 514 to the BUID vector 124 in the same manner as for step 508 and step 512 may also be performed to attempt to match the BUID vector 124 to a UID record 120.
It may occur in some instances that the BUID vector 124 is matched to a UID record 120 with sufficient certainty according to the methods of
The method 600 may include eliminating 604 one or more candidate records that are inconsistent with the selected record. This may include evaluating some or all of the criteria described above with respect to the method 300 of
The method 600 may further include adjusting 606 probabilities for one or more candidate records that remain after the elimination step 604. This may include evaluating some or all of the parameters evaluated according to the method 400. As for step 604, parameters that are not device specific may be evaluated such as some or all of language, time zone, IP address, user attributes, location, and time overlap of browser sessions. The result of step 606 may be probabilities associated with candidate records.
The method 600 may include evaluating 608 intersections of hash values in the selected record with the candidate records and adjusting the probabilities associated with the candidate records accordingly. In particular, candidate records that match a hash value or group of hash values in the selected record may be identified. In particular, for each hash value that matches between the selected record and the candidate record, the probability for that candidate record may be increased. The degree of adjustment may increase with the infrequency of occurrence of the hash value. For example, where a matching hash has a large number of occurrences among the candidate records, the amount of the increase in probability may be smaller than where the number of occurrences of the matching hash is smaller. A hash of a user's email, for example, may have few occurrences and therefore be highly predictive whereas a hash of a user's first name has many occurrences and therefore is less predictive.
If the probability of a candidate record following steps 606-608 is found to 610 meet a threshold certainty, then the content of that candidate record and the selected record may be combined 612, such as by merging the content of one record with the other. For example, where one of the selected record and matching record is a UID record 120 and the other is a BUID vector 124, the data of the BUID vector 124 may be added to the UID record 120. Where both the selected and matching records are UID records 120, then the data of the newer UID record 120 (last created) may be added to the older UID record 120. Where both are BUID vectors 124, the data of the newer BUID vector 124 may be added to the older.
Adding data from one record to another may include augmenting the global data 126a, device data 126b, browser data 126c, and possibly user history, of one record with corresponding data from the other record. Adding data from one record to another may preserver association of the data form one record, i.e. its source as from a different record may be stored. In other embodiments, this is not the case.
Note that in some instances a single unique value may be found in only one of the other records. However, in some instances, the condition of step 608 may only be found to be met if two, three, or some other threshold number of hash values, as a combination, are unique to the selected record and the matching record. This is the case inasmuch as hash values correspond to a range of input values and a match does not necessarily indicate that the underlying input values were identical.
Note also that discrete steps 606-608 are described as being performed to determine the probabilities of candidate records with respect to the selected record. In other embodiments, the content of a candidate record and the selected record may be evaluated according to a machine learning algorithm that evaluates some or all of the parameters of the records to determine a probability that the candidate record and the selected record correspond to the same user. In a like manner, the elimination step 604 may be performed using a trained machine learning model processing some or all of the same parameters of the selected record and candidate record.
In some embodiments, steps 608-610 may also be used for identification of correspondence between a BUID vector and a candidate record according to the method 200. In particular, adjusting 218 the probability of a candidate record may include executing both the method 400 and evaluating hash value intersections as described above with respect to steps 608-610 in order to determine the probability for a particular candidate record.
Referring to
A user profile value is a value assigned to a user according to the methods disclosed herein and provides a characterization of a facet of the user's shopping behavior, personality, or other attribute of the user. For example, a user profile value might be characterized as a “High Spender” value that will increases for users that buy high margin products without being incentivized by discounts. Another user profile value might be a “Loyal Customer” value that increases for users that purchase products frequently from a merchant. Another user profile might be “Hesitant Buyer” that increases for users that purchase products only after waiting for a period or evaluating many alternatives. Another user profile value might be a “Price Sensitive” value that increases with sensitivity of a user to price increases, i.e. increases for customers that are less likely to purchase a product if the price is high or are more likely to respond to discounts or other promotions.
Note that these labels are human generated and the behaviors they represent are difficult to characterize. However, the methods described below enable measurable activities of a user to be related to profile values corresponding to a type of behavior. These profile values may then be used to select more effective promotions, advertising contacts, and product recommendations for the user.
The system 700 may include a clustering module 702 that takes as an input contents of a database 704 characterizing user behavior. For example, the database 704 may store UID records 120 including some or all of the data included in the UID record 120 as described above. The UID records 120 as used according to the illustrated system 700 may store other data describing a user and a user's behavior acquired using any other approach known in the art.
The clustering module 702 assigns users to clusters 706 according to similarity of parameters in the UID records 120. The clusters 706 may then be assigned scores by a scoring module 708. The scoring module 708 scores each cluster according to parameters in the UID records 120 of the cluster, such as parameters that indicate a behavior that is to be characterized by a particular user profile value. For example, clustering may be performed using a first portion of the parameters of the UID records 120 and scoring may be performed using a second portion of the parameters of the UID records 120. The parameters of the first portion may be different from the parameters of the second portion. In some instances, there may be parameters in the first portion that are also included in the second portion.
For a particular cluster, a score for the cluster may be calculated as a function of values for the second portion of the parameters for the UID records 120 assigned to the cluster. For example, an individual score according to an aggregation of values for the second portion of the parameters for a single UID record 120. The individual scores for UID records 120 of a cluster may then be aggregated, e.g. summed, averaged, etc., to obtain a score for the cluster. In an alternative approach, for each parameter, a parameter score for each parameter of the second portion may be calculated as a function (e.g., sum, average, etc.) of values for the each parameter for the UID records 120 assigned to a cluster. The parameter scores for a cluster may then be aggregated (summed, averaged, weighted and summed, weighted and averaged, or the like) to obtain a score for the cluster.
The clusters as scored according to the scoring module 708 may then be input to a mapping module 710. The mapping module 710 inputs the scores to a mapping function that outputs a profile value. The profile value of a cluster may then be assigned to each UID record 120 of the cluster. Examples of functions that could be used are described below with respect to
The profile value for a UID record 120 may then be stored, such as in the UID record 120.
User activity may be accumulated in the form of event logs that are generated when the user interacts with one or more merchant websites or performs other actions that can be associated with the UID record 120 of the user. For example, for each page view of the user, an event log may be created that indicates some or all of an identity of the page viewed, when the user entered the page, when the page started loading, when the page finished loading, when the page was closed, interactions with the page (clicking, hovering time, scrolling, input to fields, search terms, etc.), periods of inactivity (e.g., user away from the computer), content of the page (product or category the page represents, product recommendations included on the page, brand represented by the page, etc.). Other data for an event may include information included in a browser request for the page view, such as some or all of the data used for browser fingerprinting or cross-device identification as discussed above. Other data such as start and end time of a page view, an elapsed time from a previous page view of a page of a merchant, or other timing data may also be calculated and stored in the event log.
These event logs provide insights about consumer interests and behavior that is very helpful. For example, one may determine how many products a user looked at (e.g., different brands for comparison) before purchasing a product. This is a helpful signal to determine type of product a user is interested and for indicating time spent performing comparisons before purchases, thereby indicating a careful and possibly price sensitive consumer.
Events may indicate reading of reviews, comparison of prices, interest in the newest, cheapest, highest reviewed, refurbished, or latest version of a product. These behaviors may then be used to select products for recommendation and for determining the timing of promotions to when a user is ready to buy based on past behavior. Events may indicate loyalty to a brand or to the merchant. Events may further indicate price sensitivity or a lack of price sensitivity of a user.
The user profile values as calculated according to the method 800 may therefore be used to characterize facets of customer behavior based on these events that would otherwise be difficult to characterize or quantify.
The user activity may additionally or alternatively include some or all of the following parameters:
The UID records 120 may then be clustered 804 using the received user activity. In particular, a machine learning clustering method may be used. For example, clustering may be performed using K-means clustering, mean-shift clustering, density-based spatial clustering of applications with noise (DBSCAN), expectation-maximization (EM) clustering using Gaussian Mixture Models (GMM), agglomerative hierarchical clustering, or any other clustering algorithm known in the art.
In some embodiments, clustering 804 may be performed using a subset of available parameters, i.e. the first portion of the parameters noted above with respect to
In some embodiments, for one or both of the first portion of the available parameters and the second portion of the available parameters, the amount of user history that is used may be limited. For example, only values for user history within a time window (e.g. a week, month, etc.) preceding performance of the method 800 is considered in some embodiments. In other instances, only user history for a particular session is used.
Likewise, for one or both of the first portion of the available parameters and the second portion of the available parameters both a value for a parameter (e.g. a particular type of event), number of times an event occurred (e.g. a counter) and a time stamp at which the event occurred may be considered during the clustering step. For example, the first portion may include a number of times a user visited a URL, a number of visits to a URL in a given time period, or other metric of a number and timing of visits to a URL. In another example, the number of times a user viewed the product page of a product before purchasing, an elapsed time from first view until purchase, number of different devices the product page was viewed on, or other actions with respect to a product may be considered as the first portion during the clustering step. In yet another example, interest in products may be sorted based on time elapsed between opening a product page for a product and the time the page was closed. For example, which product page was opened first and which remained open the longest.
The method 800 may then include scoring 806 the clusters. As noted above, for each cluster, the second parameters of the UID records 120 assigned to the each cluster may be evaluated to assign a score to the cluster. For example, for each UID record 120 assigned to the each cluster, an individual score may be calculated as the value of a single second parameter, a sum of values for multiple second parameters, a weighted sum of values for multiple second parameters, or some other function of values for one or more second parameters. The individual scores may then be aggregated by averaging, summing, weighting and averaging, weighting and summing, or some other function of the individual scores. In an alternative approach, for each second parameter, a parameter score for the each second parameter may be calculated as a function (e.g., sum, average, etc.) values for the each second parameter for the UID records 120 assigned to the each cluster. The parameter scores for the each cluster may then be aggregated (summed, averaged, weighted and summed, weighted and averaged, or the like) to obtain a score for the each cluster.
The method 800 may then include mapping 808 clusters to a profile value according to a function. For example, the cluster score may simply be input to a mapping function that outputs a profile value corresponding to that cluster score. In another approach, the cluster scores may be normalized based on the lowest and highest cluster scores, such as using the ARCTANGENT function. The normalized scores may then be input to the mapping function. Alternatively, the normalizing function may be the mapping function. In another approach, the cluster scores for the clusters defined at step 804 are ranked, such as smallest to largest or largest to smallest. The rank of a cluster is then input to a mapping function that outputs a profile value for that rank. In some embodiments, the output of the mapping function is a value, e.g. percentage, between 0 and 100, with 0.
The method 800 may include adding 810 the subject profile value for a UID record 120 to the UID record 120. For example, the profile value calculated at step 808 for a cluster may be added to the UID records 120 assigned to that cluster.
The method 800 may be performed for activity of users with respect to a specific merchant or with respect to all activities of users with respect to any number of merchants. Where the subject profile value is calculated without limiting the activity evaluated to a particular merchant, the subject profile values may be normalized for a particular merchant in order to enable the merchant to relate to subject profile values to the merchant's own customers. For example, data and score may be normalized for a merchant based on on number of pages or products, number of categories, minimum and maximum product prices, average and median prices, total number of sales, total revenue, average discount rate and many other parameters like these that produce a unique curve for normalization for each vendor.
As shown in
As shown in
Any number of profile values may be defined by an operator, each having a set of second parameters used to define them and may further include a scoring function that calculates a score based on the second parameter for use at step 806. Examples of profile values include:
Of course, other profile values may be defined by an operator as desired. In particular, a set of second parameters defining another profile value may be selected based on an expectation that that set of second parameters will be relevant to characterizing a facet of user behavior.
For example, for the “loyal customer” profile value, the second parameters may include number of browsing sessions with a merchant in a first time window preceding a time of evaluation (e.g., month), number of checkouts within a second time window preceding a time of evaluation (second time window may be same or different from first time window), a delta T value (time to the last one month period in which a session occurred), or other parameters that increase with frequency of interactions and purchases. Other parameters may include a number of checkouts (e.g., per unit time) as compared to an average user of the website. Another parameter may include money spent (e.g., per unit time) as compared with an user average. In some embodiments, parameters may also include a time elapsed between a purchase and a return purchase (“return time”) as compared to the average return time for other users of the website (a shorter return time indicating a more loyal customer).
In another example, to compute the “hesitant buyer” profile value, the first portion or second portion of available parameters used for clustering and/or assigning a score to a cluster may include number of product pages viewed in a time frame (e.g. three-week period), number of unique product pages viewed, number of products added to a cart, and a number of checkouts and/or number of products purchased. Another parameter may include a number of page views of a product page prior to converge in adding the product from that product page to a cart. Another parameter may include a number of product purchased as compared to a number of products added to a cart (e.g., for a given unit of time such as a month or some other time period). Another parameter may include browser session duration (aggregate of multiple sessions or one session) until checkout happens. Another parameter may include a number of browser sessions preceding purchase. Some or all of these parameters may be used and may be normalized based on average values for these parameters as derived for some or all users of a merchant's web site.
Any number of the profile values of a customer may be combined (summed, weighted and summed, averaged, etc.) to obtain an overall score for a customer, e.g. a “shopping score” indicating a general likelihood of a customer to converge toward purchase of a product. For example, the “shopping score” may be calculated according to the method 800 using the user profile values as calculated according to the method 800 as one or both of the first portion of available parameters or the second portion of available parameters.
Once a user profile values is known for a UID record 120 of a user, actions may be taken with respect to the user based on the profile values, such as generating promotions, recommending products, timing of emails or other interactions, or the like. For example, for a profile value indicating interest in a particular product, the users with the highest profile values (e.g., the top N profile values) may receive promotions for that product. Alternatively, a product may be selected for a promotion to be sent to a user as being one of the products with the top N highest product-specific profile values for that user. Scores for categories of products and promotions for the categories of products may also be assigned in a similar manner. When a user purchases a product, the profile value for that product may be resent to zero based on the assumption that the customer is unlikely to purchase another unit of that product soon.
User profile values as determined according to the methods used herein may be used for training a machine learning model. In particular, these user profile values provide more detailed information describing a user and may therefore be more relevant to certain machine learning algorithms.
Computing device 1000 may be used to perform various procedures, such as those discussed herein. Computing device 1000 can function as a server, a client, or any other computing entity. Computing device can perform various monitoring functions as discussed herein, and can execute one or more application programs, such as the application programs described herein. Computing device 1000 can be any of a wide variety of computing devices, such as a desktop computer, a notebook computer, a server computer, a handheld computer, tablet computer and the like.
Computing device 1000 includes one or more processor(s) 1002, one or more memory device(s) 1004, one or more interface(s) 1006, one or more mass storage device(s) 1008, one or more Input/Output (I/O) device(s) 1010, and a display device 1030 all of which are coupled to a bus 1012. Processor(s) 1002 include one or more processors or controllers that execute instructions stored in memory device(s) 1004 and/or mass storage device(s) 1008. Processor(s) 1002 may also include various types of computer-readable media, such as cache memory.
Memory device(s) 1004 include various computer-readable media, such as volatile memory (e.g., random access memory (RAM) 1014) and/or nonvolatile memory (e.g., read-only memory (ROM) 1016). Memory device(s) 1004 may also include rewritable ROM, such as Flash memory.
Mass storage device(s) 1008 include various computer readable media, such as magnetic tapes, magnetic disks, optical disks, solid-state memory (e.g., Flash memory), and so forth. As shown in
I/O device(s) 1010 include various devices that allow data and/or other information to be input to or retrieved from computing device 1000. Example I/O device(s) 1010 include cursor control devices, keyboards, keypads, microphones, monitors or other display devices, speakers, printers, network interface cards, modems, lenses, CCDs or other image capture devices, and the like.
Display device 1030 includes any type of device capable of displaying information to one or more users of computing device 1000. Examples of display device 1030 include a monitor, display terminal, video projection device, and the like.
Interface(s) 1006 include various interfaces that allow computing device 1000 to interact with other systems, devices, or computing environments. Example interface(s) 1006 include any number of different network interfaces 1020, such as interfaces to local area networks (LANs), wide area networks (WANs), wireless networks, and the Internet. Other interface(s) include user interface 1018 and peripheral device interface 1022. The interface(s) 1006 may also include one or more user interface elements 1018. The interface(s) 1006 may also include one or more peripheral interfaces such as interfaces for printers, pointing devices (mice, track pad, etc.), keyboards, and the like.
Bus 1012 allows processor(s) 1002, memory device(s) 1004, interface(s) 1006, mass storage device(s) 1008, and I/O device(s) 1010 to communicate with one another, as well as other devices or components coupled to bus 1012. Bus 1012 represents one or more of several types of bus structures, such as a system bus, PCI bus, IEEE 1394 bus, USB bus, and so forth.
For purposes of illustration, programs and other executable program components are shown herein as discrete blocks, although it is understood that such programs and components may reside at various times in different storage components of computing device 1000, and are executed by processor(s) 1002. Alternatively, the systems and procedures described herein can be implemented in hardware, or a combination of hardware, software, and/or firmware. For example, one or more application specific integrated circuits (ASICs) can be programmed to carry out one or more of the systems and procedures described herein.
In the above disclosure, reference has been made to the accompanying drawings, which form a part hereof, and in which is shown by way of illustration specific implementations in which the disclosure may be practiced. It is understood that other implementations may be utilized and structural changes may be made without departing from the scope of the present disclosure. References in the specification to “one embodiment,” “an embodiment,” “an example embodiment,” etc., indicate that the embodiment described may include a particular feature, structure, or characteristic, but every embodiment may not necessarily include the particular feature, structure, or characteristic. Moreover, such phrases are not necessarily referring to the same embodiment. Further, when a particular feature, structure, or characteristic is described in connection with an embodiment, it is submitted that it is within the knowledge of one skilled in the art to affect such feature, structure, or characteristic in connection with other embodiments whether or not explicitly described.
Implementations of the systems, devices, and methods disclosed herein may comprise or utilize a special purpose or general-purpose computer including computer hardware, such as, for example, one or more processors and system memory, as discussed herein. Implementations within the scope of the present disclosure may also include physical and other computer-readable media for carrying or storing computer-executable instructions and/or data structures. Such computer-readable media can be any available media that can be accessed by a general purpose or special purpose computer system. Computer-readable media that store computer-executable instructions are computer storage media (devices). Computer-readable media that carry computer-executable instructions are transmission media. Thus, by way of example, and not limitation, implementations of the disclosure can comprise at least two distinctly different kinds of computer-readable media: computer storage media (devices) and transmission media.
Computer storage media (devices) includes RAM, ROM, EEPROM, CD-ROM, solid state drives (“SSDs”) (e.g., based on RAM), Flash memory, phase-change memory (“PCM”), other types of memory, other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store desired program code means in the form of computer-executable instructions or data structures and which can be accessed by a general purpose or special purpose computer.
An implementation of the devices, systems, and methods disclosed herein may communicate over a computer network. A “network” is defined as one or more data links that enable the transport of electronic data between computer systems and/or modules and/or other electronic devices. When information is transferred or provided over a network or another communications connection (either hardwired, wireless, or a combination of hardwired or wireless) to a computer, the computer properly views the connection as a transmission medium. Transmissions media can include a network and/or data links, which can be used to carry desired program code means in the form of computer-executable instructions or data structures and which can be accessed by a general purpose or special purpose computer. Combinations of the above should also be included within the scope of computer-readable media.
Computer-executable instructions comprise, for example, instructions and data which, when executed at a processor, cause a general purpose computer, special purpose computer, or special purpose processing device to perform a certain function or group of functions. The computer executable instructions may be, for example, binaries, intermediate format instructions such as assembly language, or even source code. Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the described features or acts described above. Rather, the described features and acts are disclosed as example forms of implementing the claims.
Those skilled in the art will appreciate that the disclosure may be practiced in network computing environments with many types of computer system configurations, including, an in-dash vehicle computer, personal computers, desktop computers, laptop computers, message processors, hand-held devices, multi-processor systems, microprocessor-based or programmable consumer electronics, network PCs, minicomputers, mainframe computers, mobile telephones, PDAs, tablets, pagers, routers, switches, various storage devices, and the like. The disclosure may also be practiced in distributed system environments where local and remote computer systems, which are linked (either by hardwired data links, wireless data links, or by a combination of hardwired and wireless data links) through a network, both perform tasks. In a distributed system environment, program modules may be located in both local and remote memory storage devices.
Further, where appropriate, functions described herein can be performed in one or more of: hardware, software, firmware, digital components, or analog components. For example, one or more application specific integrated circuits (ASICs) can be programmed to carry out one or more of the systems and procedures described herein. Certain terms are used throughout the description and claims to refer to particular system components. As one skilled in the art will appreciate, components may be referred to by different names. This document does not intend to distinguish between components that differ in name, but not function.
It should be noted that the sensor embodiments discussed above may comprise computer hardware, software, firmware, or any combination thereof to perform at least a portion of their functions. For example, a sensor may include computer code configured to be executed in one or more processors, and may include hardware logic/electrical circuitry controlled by the computer code. These example devices are provided herein purposes of illustration, and are not intended to be limiting. Embodiments of the present disclosure may be implemented in further types of devices, as would be known to persons skilled in the relevant art(s). At least some embodiments of the disclosure have been directed to computer program products comprising such logic (e.g., in the form of software) stored on any computer useable medium. Such software, when executed in one or more data processing devices, causes a device to operate as described herein.
Computer program code for carrying out operations of the present invention may be written in any combination of one or more programming languages, including an object-oriented programming language such as Java, Smalltalk, C++, or the like and conventional procedural programming languages, such as the “C” programming language or similar programming languages. The program code may execute entirely on a computer system as a stand-alone software package, on a stand-alone hardware unit, partly on a remote computer spaced some distance from the computer, or entirely on a remote computer or server. In the latter scenario, the remote computer may be connected to the computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider).
The present invention is described above with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions or code. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
These computer program instructions may also be stored in a non-transitory computer-readable medium that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable medium produce an article of manufacture including instruction means which implement the function/act specified in the flowchart and/or block diagram block or blocks.
The computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide processes for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
While various embodiments of the present disclosure have been described above, it should be understood that they have been presented by way of example only, and not limitation. It will be apparent to persons skilled in the relevant art that various changes in form and detail can be made therein without departing from the spirit and scope of the disclosure. Thus, the breadth and scope of the present disclosure should not be limited by any of the above-described exemplary embodiments, but should be defined only in accordance with the following claims and their equivalents. The foregoing description has been presented for the purposes of illustration and description. It is not intended to be exhaustive or to limit the disclosure to the precise form disclosed. Many modifications and variations are possible in light of the above teaching. Further, it should be noted that any or all of the aforementioned alternate implementations may be used in any combination desired to form additional hybrid implementations of the disclosure.